Every API team I talk to has the same question right now: "We want Claude Code's speed, but how do we actually set it up so things go well instead of sideways?"
Fair question. The incident stories get all the attention. But quietly, thousands of teams are shipping faster, catching more bugs, and writing better tests with Claude Code every single day. The difference is not luck — it is patterns.
This post is the playbook. These are the patterns that consistently work for API development teams, including what we have seen work across teams building on platforms like ModelsLab. No theory. No hand-wringing. Just the setups that produce good outcomes.
Start With CLAUDE.md — Your Team's Single Source of Truth
The highest-impact thing you can do takes about 20 minutes: write a CLAUDE.md file for your project. This file lives in your repo root and Claude Code reads it at the start of every session. It is the difference between Claude understanding your codebase and Claude guessing at your codebase.
A good CLAUDE.md for an API team covers four areas:
# CLAUDE.md,[object Object],,[object Object],,[object Object],,[object Object],,[object Object],,[object Object],,[object Object],
- markdownUse feature flags for risky changes
,[object Object],,[object Object],,[object Object],
The key insight: Claude Code treats CLAUDE.md instructions like a senior developer treats team standards. It follows them consistently, which means your guardrails work even at 2am when nobody is watching.
Keep It Focused
If your CLAUDE.md is 500 lines, Claude starts ignoring parts of it. Aim for 50-100 lines of high-signal rules. Move detailed domain knowledge into .claude/skills/ files instead:
.claude/skills/api-conventions.md # REST design rulestesting-patterns.md # How we write tests heredeployment-rules.md # What can and cannot be deployed
Claude loads relevant skills automatically when the context matches. This keeps your main CLAUDE.md clean while still giving Claude deep domain knowledge.
Permission Scoping: The Allowlist Pattern
The default Claude Code permission model asks you to approve every action. That gets tedious fast, and tedium leads to rubber-stamping — which is worse than no permissions at all.
The pattern that works is an explicit allowlist. You define exactly which commands are safe, and everything else requires approval:
{"permissions": {"allow": ["Bash(npm run test*)","Bash(npm run lint*)","Bash(php artisan test*)","Bash(composer run pint*)","Bash(git status)","Bash(git diff*)","Bash(git log*)","Read(*)","Write(src/**)","Write(tests/**)"],"deny": ["Bash(rm -rf*)","Bash(*terraform destroy*)","Bash(*DROP TABLE*)","Bash(*--force*)","Write(.env*)","Write(*credentials*)","Write(*secret*)"]}}
This configuration lets Claude freely run tests, lint code, read any file, and write to source and test directories. But it blocks destructive commands and sensitive file access entirely. No prompt, no approval — just blocked.
For API teams specifically, add deny rules for production endpoints:
"deny": ["Bash(*curl*api.production*)","Bash(*curl*modelslab.com/api*)","Bash(*--production*)"]
Enterprise Teams: Managed Settings
If you are running Claude Code across a larger organization, use managed-settings.json to enforce policies that individual developers cannot override. Push it via MDM (Jamf, Kandji, Intune) and Claude Code reads it at startup:
{"permissions": {"deny": ["Write(.env*)","Bash(*terraform destroy*)","Bash(*DROP DATABASE*)"]},"env": {"CLAUDE_CODE_DISABLE_NETWORK": "1"}}
No developer on the team can override these rules. That is the kind of guardrail that lets you sleep at night.
Sandbox Mode: OS-Level Isolation
Permissions tell Claude what it should not do. Sandboxing ensures it cannot do it — even if a prompt injection or confused instruction tries.
Claude Code's sandbox uses operating system-level enforcement:
- Filesystem isolation: Claude can only read and write to directories you specify
- Network isolation: Outbound connections route through a proxy that enforces domain allowlists
Enable it with /sandbox in your session or configure it in settings. In Anthropic's internal testing, sandboxing reduced permission prompts by 84% while maintaining the same security posture.
For API development, sandbox mode is particularly valuable when:
- Testing webhook handlers that make outbound calls
- Working with third-party API integrations
- Running code that processes user input
- Building features that interact with external services like ModelsLab's inference APIs
The practical workflow: run Claude Code in sandbox mode during development, only allow network access to your local dev server and test API endpoints, and block everything else.
Test-Driven Development With Claude Code
This is the pattern that consistently produces the best results. It is also the one most teams skip because it feels slow. It is not slow — it is fast in the ways that matter.
The Test-First Loop
Instead of telling Claude "build me a retry wrapper for API calls," tell it:
Write Pest tests for a retry wrapper that:1. Retries failed API calls up to 3 times2. Uses exponential backoff (1s, 2s, 4s)3. Does not retry on 4xx client errors4. Logs each retry attempt5. Throws after max retries exhausted
Do NOT write the implementation yet. Only tests.
Claude writes the tests. You review them — this is where your domain knowledge matters. Then:
Run the tests. Confirm they all fail. Then implement theRetryHandler class to make them pass.
Why this works:
- Tests become the specification. Claude has a clear, verifiable target instead of an ambiguous prompt.
- You review tests, not implementation. Tests are shorter, easier to reason about, and expose intent. If the tests are wrong, you catch it before any code is written.
- Claude self-corrects. When tests fail after implementation, Claude iterates automatically. The feedback loop is tight and deterministic.
- Regression protection is built in. You end every task with a passing test suite, not just code that "looks right."
PostToolUse Hooks for Continuous Verification
Set up hooks that automatically run your test suite after every file edit:
{"hooks": {"PostToolUse": [{"matcher": "Write|Edit","command": "php artisan test --compact --filter=${FILENAME_BASE}","timeout": 30000}]}}
Every time Claude writes or edits a file, the relevant tests run automatically. Claude sees the results and adjusts. This is not optional nicety — it is the core mechanism that makes AI-assisted development reliable.
Teams building API integrations at scale — for example, teams integrating ModelsLab's 100,000+ model catalog into their products — find this pattern especially valuable because API contracts are precise. A test either matches the expected response schema or it does not. There is no ambiguity for Claude to get confused by.
Code Review Workflows That Actually Catch Issues
Claude Code can write pull requests. It can also review them. The pattern that works is using both capabilities together with human oversight at the decision points.
The Three-Stage Review
Stage 1: Claude writes the PR. Give Claude a task. It creates a branch, makes changes, writes tests, and opens a pull request with a description.
Stage 2: Claude reviews the PR. Run a second Claude session (or use Claude Code GitHub Actions) to review the PR from scratch. This second session does not share context with the first — it evaluates the code cold, the same way a human reviewer would.
# .github/workflows/claude-review.ymlname: Claude Code Reviewon:pull_request:types: [opened, synchronize]
jobs:review:runs-on: ubuntu-lateststeps:- uses: anthropics/claude-code-action@v1with:anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}review_mode: trueclaude_md: |Review this PR for:- API contract changes (breaking vs non-breaking)- Missing error handling- Test coverage gaps- Security issues (exposed credentials, injection risks)- Performance regressions in hot paths
Stage 3: Human makes the call. A human engineer reviews both the code and Claude's review comments. The human focuses on architectural decisions, business logic correctness, and judgment calls that AI cannot reliably make.
This three-stage pattern catches significantly more issues than either pure human review or pure AI review alone.
PreToolUse Hooks for Safety Checks
Add hooks that run before Claude executes any command:
{"hooks": {"PreToolUse": [{"matcher": "Bash","command": "python3 scripts/check_safe_command.py \"${COMMAND}\"","timeout": 5000,"onFailure": "block"}]}}
Your check_safe_command.py can scan for production database strings, destructive flags, or commands that should never run in development. If the check fails, Claude's action is blocked before it executes.
Safe Deployment Patterns
The rule is simple: Claude Code should never deploy to production directly. Here is how to structure it.
Environment Separation
┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Claude │────>│ Staging │────>│ Production ││ Code │ │ (auto) │ │ (manual) │└─────────────┘ └─────────────┘ └─────────────┘writes deploys requirescode & after CI humantests passes approval
Claude Code can push to feature branches and open PRs. CI/CD deploys to staging automatically when tests pass. Production deployment requires explicit human approval — a button click, not a command Claude can run.
API-Specific Deployment Safety
For teams building on AI API platforms, add these safeguards:
-
API key rotation is a manual process. Claude Code should never have access to production API keys. Store them in your secrets manager (AWS Secrets Manager, HashiCorp Vault, etc.) and reference them via environment variables that only exist in production.
-
Rate limit testing happens in staging. If you are testing how your app handles ModelsLab API rate limits, do it against a staging endpoint or a mock server, never against production.
-
Schema validation in CI. Add a CI step that validates your API response schemas against the provider's documented contract. This catches breaking changes before they reach production.
# CI step for API contract validation- name: Validate API schemasrun: |php artisan test --filter=ApiContractTestnpm run test:api-schemas
- Feature flags for new integrations. When Claude Code adds a new API integration (say, adding video generation to an app that only did image generation), wrap it in a feature flag. Deploy the code, then enable the feature separately.
The Pattern Checklist
Print this out. Pin it next to your monitor. Run through it every time you set up Claude Code on a new project.
- [ ] CLAUDE.md exists with project overview, API conventions, testing rules, and safety rules
- [ ] Skills directory (
.claude/skills/) contains domain-specific knowledge files - [ ] Permission allowlist is configured — only safe commands are auto-approved
- [ ] Deny rules block destructive commands, credential access, and production endpoints
- [ ] Sandbox mode is enabled for development sessions
- [ ] Test-first workflow is documented in CLAUDE.md — tests before implementation
- [ ] PostToolUse hooks run tests and linters automatically after every file edit
- [ ] PreToolUse hooks scan commands for dangerous patterns before execution
- [ ] Code review workflow uses three stages: Claude writes, Claude reviews, human decides
- [ ] CI/CD pipeline deploys to staging automatically but requires human approval for production
- [ ] Production API keys are stored in a secrets manager, never in code or .env files
- [ ] Managed settings (enterprise) enforce organization-wide policies developers cannot override
- [ ] Feature flags wrap new integrations so code deploys separately from feature activation
Real-World Workflow: Adding a New API Endpoint
Here is what this looks like end to end. Say your team needs to add a new endpoint that generates images via ModelsLab's text-to-image API.
Step 1: Create a feature branch.
Claude, create a branch called feature/text-to-image-endpoint
Step 2: Write tests first.
Write Pest feature tests for a POST /api/v1/images/generate endpoint that:- Accepts a prompt, model_id, width, height, and num_outputs- Validates all inputs (prompt required, dimensions within range)- Returns a 202 with a request_id and status_url- Returns 422 for invalid inputs- Returns 429 when rate limited- Queues the generation job instead of processing synchronously
Do NOT write the implementation. Only tests.
Step 3: Review the tests. You read them. Adjust any edge cases. Approve.
Step 4: Implement.
Run the tests. Confirm they fail. Then implement the endpoint,form request, controller, and job to make all tests pass.
Step 5: Claude opens a PR. It generates a description, lists what changed, and tags it for review.
Step 6: CI runs. Tests pass. Linter passes. API schema validation passes. Auto-deploys to staging.
Step 7: You verify in staging. Hit the endpoint. Check the response. Confirm the queue job processes correctly.
Step 8: Deploy to production. Click the button. Done.
The whole process takes a fraction of the time it would without Claude Code. And every step has a checkpoint that catches problems before they matter.
Frequently Asked Questions
Is Claude Code safe to use with production codebases?
Yes, with the right guardrails. The patterns in this post — permission allowlists, sandbox mode, deny rules, and environment separation — create multiple layers of protection. The key principle is that Claude Code should have full access to your development environment and zero access to production infrastructure. Teams that follow this separation report zero production incidents from Claude Code usage.
How do we prevent Claude Code from leaking API keys or credentials?
Three layers work together. First, deny rules in your permissions block writes to .env files and reads of credential directories. Second, sandbox mode enforces filesystem isolation at the OS level so Claude literally cannot access files outside its allowed directories. Third, use a secrets manager for production credentials and reference them via environment variables that only exist in deployed environments — never in your local development setup where Claude Code runs.
Should we use Claude Code for writing tests or writing implementation code?
Both, but in that order. The test-first pattern consistently produces better results than asking Claude to write implementation code directly. When Claude writes tests first, it has a clear specification to implement against. When it writes implementation first, it tends to write code that "looks right" but misses edge cases. Start with tests, review them, then let Claude implement. The feedback loop from failing tests guides Claude toward correct implementations much more reliably than prose instructions alone.
How does Claude Code fit into our existing code review process?
It adds a stage, it does not replace one. The most effective pattern is three-stage review: Claude writes the code, a separate Claude session reviews the PR (catching issues like missing error handling, test gaps, and API contract violations), and then a human engineer makes the final decision. This catches more issues than human-only review because Claude is thorough and tireless with mechanical checks, while humans focus on architecture and business logic — the areas where judgment matters most.
What is the minimum setup to use Claude Code safely on an API project?
At minimum, you need three things: a CLAUDE.md file with your project conventions and safety rules, a permission configuration with deny rules for destructive commands and credential access, and a test-first workflow documented in your CLAUDE.md. You can add sandbox mode, hooks, managed settings, and CI/CD integration over time, but those three elements are enough to start safely. Most teams are fully set up within an afternoon.
