Claude Code, Gemini CLI, and Friends
Shipping a Side‑Project with Agentic Coding
Some of us are “one-or-two-projects-at-a-time” people. Others (hi 👋) have at least a dozen side projects simmering and a new idea every other day. Have I finished them all? Debatable. Is anything ever truly “done”? Also debatable.
So I decided to game the system. One of my side projects would be… A tool to manage side projects. 🤷♂️
Most of us have used IDEs with autocomplete, or even have our own custom snippets. Back when I was nerding pretty hard on keyboards, i had an entire layer on my xd-75 with only code snippets. What I had not done was go all-in on agentic coding. Determined to complete this side project and wanting to really dig in to the current state of agentic coding, I figured I would kill two birds with one stone.
Goal: Build a simple Next.js + Supabase app (deployed on Vercel) using agentic coding tools to see how far I could push a real workflow.
TL;DR
- Claude Code is production‑capable when you drive it well.
- There’s a learning curve: you must invest in setup, prompting, and process.
- Usage limits are real: even with careful
/clear
,/compact
, and context hygiene, I hit caps as the codebase grew. - After shipping with agents, “no‑agent” coding felt Stone Age.
My Environment & Editor Choices
I mainly work in Linux/macOS. My current IDE stack includes STM32CubeIDE, CLion, VS Code, Xcode, Arduino IDE, and (occasionally) Mu/Thonny. I’m not a Vim/Nvim person—tried it, but on Colemak it felt like overkill when VS Code is right there.
For this project I briefly tried Cursor for generation, then returned to VS Code (with Git + a plain terminal for agents). I keep Copilot installed but with autosuggestions disabled—the defaults feel too aggressive.
I like tools that assist when asked, not ones that take the wheel unsolicited. I can see why people like Cursor, but it felt like a bit too much. And that 'upgrade to Pro' was always in the back of my mind and I don't need another subscription in my life.
The Project
A small Next.js + Supabase app with Shadcn UI:
- Basic auth
- A landing page
- CRUD UI for managing side projects
- Deploy on Vercel
- Minimal CI from day one
The Agentic Toolset
- Claude Code (primary coding agent), using context7, GitHub, and Supabase MCP servers (selectively enabled as the repo grew)
- Gemini CLI (peer code critic that posts PR reviews via
gh
) - ChatGPT for ideation, PRD drafting, and image prompts
- Midjourney for layout/visual exploration
- OpenAI Codex (web) — I tested it briefly, but CC + Gemini covered my needs
Workflow
1) Ideation
I’m faster brainstorming in ChatGPT than Anthropic’s web UI, so I used ChatGPT for PRD creation, initial feature list, and image prompts (then handed those to Midjourney). Although ChatGPT can generate images, it was often way slower and less awesome than Midjourney.
2) Boilerplate
I tried letting Cursor and then Claude Code generate the baseline, then scrapped both and did it manually.
I went with manual for two reasons:
- Generation can be too much, too fast; it’s easy to miss a subtle hallucination in a config.
- Hand‑rolling the baseline sets the tone and keeps me oriented.
With a minimal skeleton in place, I ran /init
in Claude Code, had it read the PRD, and used plan mode to break execution into phases. From there:
- Create a GitHub Action per phase.
- Before each phase, ask CC to re‑examine the repo and open sub‑issues for the concrete task list.
3) Infrastructure Early
I wanted CI/previews from day one—Vercel preview deployments made it easy to visualize pending features. With Supabase - I didn't plan on using more than the free tier and only wanted one project which would correlate to prod (I have absolutely ZERO interest in setting up Test/UAT environments for side projects). This meant I needed to invest quite a bit of time setting up my local dev environment for Supabase. I needed a clear understanding of the orchestration surrounding local dev, environment setup for testing, and pushing migrations to the remote.
4) The Loop (Per Sub‑Issue)
I worked through the app one sub-issue at a time. I found this to be the sweet spot for the number of files touched and new code created for each PR.
My standard prompt set to CC:
- “I want to work on GitHub issue #XX. Has any of this been implemented?”
- “Make a plan to implement the feature.”
- Add edits to the plan if needed.
- “Implement the task list, commit to a new branch, run prettier, lint, unit tests, and tsc. Open a PR when done.”
Once complete, I ran Gemini CLI as a critical reviewer (via gh
), followed by a fresh CC review (after /clear
). Finally, I asked CC (in plan mode) to discuss the diffs and the validity of the feedback from both reviews.
Approaching code generation and reviews in this way used a lot of context && credits. It IS however what I would expect a normal person to do - make sure this issue/feature is still needed, push the fix, and get/give solid reviews.
Tip: As the repo grew, I disabled the GitHub MCP server and prompted context7 only for specific files to conserve context and credits.
5) Tests
Following the boilerplate phase of the project, I had CC include any task relevant unit and E2E tests as a part of each sub-issue. Even when trying to be minimal, CC still generated hundreds of tests and thousands of lines of code. Tests were by far the highest cause of hallucinations, and issues primarily stemmed from:
- Using implementation patterns for older/deprecated versions of the libraries used.
- Over-complicated implementations.
- Many of these stemmed from CC trying to generalize an approach in a way that allowed for easy implementation of a future/potential feature - regardless of whether I actually planned on implementing said feature.
- Forgetting to update affected tests after new features were pushed.
Is there a use-case for Agentic TDD?? Probably, but I wouldn't recommend for side-projects.
So...I rolled back those commits, and decided to postpone comprehensive test scaffolding until the hardening phase.
Instead, I wrote small, surgical unit tests for logic. I used Jest for unit tests, Playwright for e2e, and Puppeteer for UI checks.
E2E Auth Tips
- Consider test‑only auth routes or seeded tokens behind a guard in preview envs.
- Keep auth state and cleanup explicit per test to avoid cross‑test bleed.
Where Agents Shine (and Don’t)
Tool Comparisons
- Claude Code “just works.” Simple setup, reliable file ops, solid plan/execute loop. Occasional hallucinations; manageable with tight prompts and
/clear
discipline. - Gemini CLI wasn’t my builder; it was my code‑review peer. I felt more comfortable giving CC write access; Gemini critiqued and posted structured PR comments.
- Copilot/Cursor: I realized I prefer opt‑in assistance. Autosuggest firehoses slow me down and added noise.
- Codex (web): usable, but slower in my flow than CC+Gemini.
Shine
- Repetitive scaffolding and “mechanical” coding
- Generating implementation drafts that you refine
- Hardening passes: surfacing edge cases, adding missing lint/tsconfig/CI steps
Struggle
- Authenticated e2e flows (stateful cross‑app interactions)
- Complex UI interactions that require nuanced, domain‑specific judgment
- Long‑lived projects where context discipline becomes make‑or‑break
Reality check: There were hallucinations and moments where I had to hit [Esc] and redirect, but far fewer than I expected once the process was dialed in.
A Simple Roadmap
My overall approach to the project:
Why this order? You get a functioning vertical slice early, then round‑trip to deepen each layer, and only then go wide with tests and hardening.
“Who’s Driving?” (by Phase)
Knowing when to take the wheel is huge for time savings.
Timeline
If this were a full‑time effort pre-AI, I’d story point it at ~4 weeks (~160 hours) including design, documentation, tooling, tests, and deployment for an MVP with a single dev FE & BE. That would be for a minimally useful app. In reality, as a side project I shipped most of it in ~2.5 weeks of scattered evenings (~20 hours). There’s a small backlog left, but the tool is already useful for me.
Takeaways
Using cars as an analogy: “no‑agent” coding is a hand‑finished Ferrari—gorgeous but slow to produce. Claude + Gemini can be a dependable Civic: not flashy, but ships. Fully “vibecoding” with no human oversight? That’s how you get a 90s beater—cheap now, expensive later.
- Agentic coding is a force multiplier, especially once you enforce a clean, phase‑based process.
- Context hygiene matters: use
/clear
,/compact
, and only bring the files you really need. - Let agents draft; you decide. Your taste and judgment keep the project coherent.
- I didn't discuss front-end and back-end separately because the agents were just as capable of working in either domain (atleast with Supabase).
- I don’t see myself going back to zero‑agent coding for greenfield web apps.
Small Side-Notes
- The project I used for this was fog.
- It seems like some studies and devs report less cognitive load and work stress when working with AIs but not much in the realm of time savings. While it is true that when you are in the zone and code is flowing, your speed isn't the issue - prod releases are a lot more than just code. You have meetings, notes & action items, documentation, code reviews, bug fixes, etc. and today's agents can help with practically every role & phase of the SDLC. That's where the time savings come in - not in replacing an entire person's job, but the aggregate in making everyone on the project a little more effective.
- Also - both ChatGPT and CC were excellent for scripting and step-by-step (non-code) guidance with infrastructure, networking, and dev ops.