May 17, 2026

In search for the perfect AGENTS.md workflow

Remember when vibecoding was just simple prompts and hitting tab for autocomplete?

Recently, I've been spending a lot of time fine-tuning coding agents into my opencode config. The way we give our harnesses context has become the most important part of the process.

Before we jump into the setup, it's important to understand why this workflow is necessary. Way back then (a couple years ago), we were trying to optimize our prompts, but now the real issue is "environmental ambiguity".

Agents typically fail because they are dropped into repositories designed exclusively for humans and are expected to infer conventions from scattered files. Historically, we suffered from "context silos" where instructions were locked into IDE-specific paths.

This meant tribal knowledge, the exact sequence of prompts needed to guide an agent through a complex module, got lost in chat histories and never reached version control. For example, two engineers working on the identical codebase would get entirely different outputs just because they used different AI tools.

To fix this, the repository itself must become "agent native." We need to standardize what the agent sees before optimizing how it thinks, creating an operational contract that every tool can read.

The progression from a simple hidden rules directory to domain-specific instruction files that load on demand is fascinating. While there are many ways to configure your agents, I figured I might as well write a simple and easy to understand breakdown of the perfect AGENTS.md workflow based on what's working right now.

A baseline AGENTS.md

To begin, we are going to need a foundational instruction layer. AGENTS.md is a README.md for agents: a dedicated, predictable place to provide context and instructions. This is where you write your agent's hard rules and coding standards. It becomes the single source of truth for your agent's operations.

This file sits at the root of your repo and tells the agent about your project architecture, coding standards, and how to run tests. Here's a simple example:

# Core Repository Directives

## Dev environment tips

- The repository utilizes `pnpm` exclusively. Never use `npm` or `yarn`.

- Use `pnpm dlx turbo run where <project_name>` to jump to a package location.

## Testing instructions

- Run `pnpm turbo run test --filter <project_name>` to execute every check defined for a localized package.

- The commit MUST pass all tests before merging is permitted.

This explicit configuration prevents the agent from guessing and hallucinating wrong commands.

Add a virtual team with Garry Tan's gstack

Having a global ruleset is great, but massive AGENTS.md files can bloat the context window. This is where dynamic skill loading comes in. Y Combinator CEO Garry Tan open-sourced his personal Claude Code configuration as a toolkit called gstack, and I find it to be useful.

Instead of a single monolithic file, gstack packages 23 specialist skills that act as your virtual engineering team. You get a CEO reviewer, Engineering Manager, Lead Designer, and more, all accessible via slash commands. For instance, running /plan-ceo-review forces the AI to evaluate your request to find the "10-star product" before any code is written.

Enforce rigor with Matt Pocock's skills

If Garry Tan's setup is about product velocity, Matt Pocock's skills repository is about extreme engineering rigor. It's a rejection of loose vibecoding in favor of strict alignment that follows standard software engineering principles.

My most used technique of his repo is /grill-me-with-docs. One of the biggest issues with vibecoding is that the agent and developer often have different understandings of the problem. Different wording/terminology can lead to different assumptions. This skill forces you to confront that head on. When you invoke it, the agent acts as a skeptical architect and relentlessly interviews you about your plan until you reach a shared understanding.

/grill-me-with-docs I want to implement a social graph feature that
connects users based on shared interests and interactions. The
graph should be able to recommend new connections and content
based on user behavior

This dialectical approach forces you to build a shared language with the AI, documenting hard-to-explain decisions before a single line of code is written in a /docs directory.

Also included are defensive engineering tools like the git-guardrails-claude-code skill, which actively blocks dangerous git operations and demands human approval before an agent can rewrite shared branches or break your CI.

Tie it together with PRDs

If you've ever taken a software engineering course (or have worked in software engineering), you know how boring it is to write requirements. The final step to mastering the agent workflow is how you manage these requirements. But don't worry! We are moving away from burying requirements in external ticketing systems like Jira and moving towards PRD driven development.

Hopefully not a shameless plug, but I personally built a collection of skills that I wrote for writing high quality PRDs and implementing them. You can see them here

You'll find that the create-prd skill is a great way to finish a grill session. It will synthesize the conversation into a concise, implementation-ready PRD. implement-prd then takes that PRD and creates a 1:1 implementation. This is important because it allows you to plan with a higher model and transfer that plan without losing any context to a lower model that can execute it, allowing you to use cross-agent workflows (ex: Codex to plan -> Opencode to implement)

/grill-me-with-docs I want to create this feature XYZ,
then at the end run /create-prd

/implement-prd @docs/my-prd-file.md

Because here's the thing: You do not need GPT 5.5 to write code. You need it to think, plan, while keeping architecture, security, and scope in mind. The actual implementation can be done by a much smaller (and cheaper!) model. Personally, I use GPT through Codex and Opencode for implementation, and I find that the results are just as good (and much more cost effecient) as using an expensive model for both. These skills used along grill-me-with-docs ensure high quality written code.

Wrap up

Do these things and you now have a highly advanced, context-aware AI agent workflow that runs for cheap. Go build something.

-Caleb