Bleep Bloop

Claude Code, Gemini CLI, and Friends

September 16, 2025

Shipping a Side‑Project with Agentic Coding

Some of us are “one-or-two-projects-at-a-time” people. Others (hi 👋) have at least a dozen side projects simmering and a new idea every other day. Have I finished them all? Debatable. Is anything ever truly “done”? Also debatable.

So I decided to game the system. One of my side projects would be… A tool to manage side projects. 🤷‍♂️

Big Think

Most of us have used IDEs with autocomplete, or even have our own custom snippets. Back when I was nerding pretty hard on keyboards, i had an entire layer on my xd-75 with only code snippets. What I had not done was go all-in on agentic coding. Determined to complete this side project and wanting to really dig in to the current state of agentic coding, I figured I would kill two birds with one stone.

Goal: Build a simple Next.js + Supabase app (deployed on Vercel) using agentic coding tools to see how far I could push a real workflow.


TL;DR


My Environment & Editor Choices

I mainly work in Linux/macOS. My current IDE stack includes STM32CubeIDE, CLion, VS Code, Xcode, Arduino IDE, and (occasionally) Mu/Thonny. I’m not a Vim/Nvim person—tried it, but on Colemak it felt like overkill when VS Code is right there.

For this project I briefly tried Cursor for generation, then returned to VS Code (with Git + a plain terminal for agents). I keep Copilot installed but with autosuggestions disabled—the defaults feel too aggressive.

I like tools that assist when asked, not ones that take the wheel unsolicited. I can see why people like Cursor, but it felt like a bit too much. And that 'upgrade to Pro' was always in the back of my mind and I don't need another subscription in my life.


The Project

A small Next.js + Supabase app with Shadcn UI:


The Agentic Toolset


Workflow

1) Ideation

I’m faster brainstorming in ChatGPT than Anthropic’s web UI, so I used ChatGPT for PRD creation, initial feature list, and image prompts (then handed those to Midjourney). Although ChatGPT can generate images, it was often way slower and less awesome than Midjourney.

2) Boilerplate

I tried letting Cursor and then Claude Code generate the baseline, then scrapped both and did it manually.

This is the way

I went with manual for two reasons:

  1. Generation can be too much, too fast; it’s easy to miss a subtle hallucination in a config.
  2. Hand‑rolling the baseline sets the tone and keeps me oriented.

With a minimal skeleton in place, I ran /init in Claude Code, had it read the PRD, and used plan mode to break execution into phases. From there:

3) Infrastructure Early

I wanted CI/previews from day one—Vercel preview deployments made it easy to visualize pending features. With Supabase - I didn't plan on using more than the free tier and only wanted one project which would correlate to prod (I have absolutely ZERO interest in setting up Test/UAT environments for side projects). This meant I needed to invest quite a bit of time setting up my local dev environment for Supabase. I needed a clear understanding of the orchestration surrounding local dev, environment setup for testing, and pushing migrations to the remote.

4) The Loop (Per Sub‑Issue)

I worked through the app one sub-issue at a time. I found this to be the sweet spot for the number of files touched and new code created for each PR.

My standard prompt set to CC:

Once complete, I ran Gemini CLI as a critical reviewer (via gh), followed by a fresh CC review (after /clear). Finally, I asked CC (in plan mode) to discuss the diffs and the validity of the feedback from both reviews.

Approaching code generation and reviews in this way used a lot of context && credits. It IS however what I would expect a normal person to do - make sure this issue/feature is still needed, push the fix, and get/give solid reviews.

Tip: As the repo grew, I disabled the GitHub MCP server and prompted context7 only for specific files to conserve context and credits.

5) Tests

Following the boilerplate phase of the project, I had CC include any task relevant unit and E2E tests as a part of each sub-issue. Even when trying to be minimal, CC still generated hundreds of tests and thousands of lines of code. Tests were by far the highest cause of hallucinations, and issues primarily stemmed from:

Is there a use-case for Agentic TDD?? Probably, but I wouldn't recommend for side-projects.

So...I rolled back those commits, and decided to postpone comprehensive test scaffolding until the hardening phase.

You have been chopped

Instead, I wrote small, surgical unit tests for logic. I used Jest for unit tests, Playwright for e2e, and Puppeteer for UI checks.

E2E Auth Tips



Where Agents Shine (and Don’t)

Tool Comparisons

Shine

Struggle

Reality check: There were hallucinations and moments where I had to hit [Esc] and redirect, but far fewer than I expected once the process was dialed in.

A Simple Roadmap

My overall approach to the project: Mermaid Chart

Why this order? You get a functioning vertical slice early, then round‑trip to deepen each layer, and only then go wide with tests and hardening.

“Who’s Driving?” (by Phase)

Knowing when to take the wheel is huge for time savings. effort

Timeline

If this were a full‑time effort pre-AI, I’d story point it at ~4 weeks (~160 hours) including design, documentation, tooling, tests, and deployment for an MVP with a single dev FE & BE. That would be for a minimally useful app. In reality, as a side project I shipped most of it in ~2.5 weeks of scattered evenings (~20 hours). There’s a small backlog left, but the tool is already useful for me.


Takeaways

Using cars as an analogy: “no‑agent” coding is a hand‑finished Ferrari—gorgeous but slow to produce. Claude + Gemini can be a dependable Civic: not flashy, but ships. Fully “vibecoding” with no human oversight? That’s how you get a 90s beater—cheap now, expensive later.

Small Side-Notes