Paperclip AI: An Autonomous Engineering Workforce

What if you could hire 12 engineers and they worked 24 hours a day, never called in sick, never needed a standup, and cost less per month than a single junior developer? That is not a thought experiment. We built it.

Paperclip AI is our autonomous engineering workforce: 12 specialized AI agents running on the Paperclip platform, managed through Telegram, processing real GitHub issues, and producing real code. This is how it works, what it costs, and where it breaks.

The 12-Agent Roster

Each agent has a defined role, a dedicated system prompt, and a preferred language model. The roster mirrors a startup team:

CEO — strategic planning, task prioritization, cross-team coordination
Dev Lead — architecture decisions, code review, technical standards
Dev-1 — complex engineering: algorithms, integrations, difficult debugging
Dev-2 — volume engineering: scaffolding, migrations, bulk implementation
Frontend Engineer — UI components, responsive design, accessibility
Marketing Lead — campaign planning, messaging strategy, positioning
SEO Manager — technical SEO, content optimization, keyword targeting
Research Lead — competitor analysis, market research, technology scouting
Content Writer — blog posts, landing copy, documentation
Grok Researcher — real-time information gathering, trend analysis
GPT Analyst — quantitative analysis, data modeling, pattern detection
Data Analyst — metrics, reporting, performance tracking

The agents are not interchangeable. Dev-1 runs on GPT-Codex for maximum reasoning depth on hard problems. The SEO Manager runs on Qwen 3.5-plus because SEO work is text-heavy and cost-sensitive. The right model for the right job.

The Adapter Layer: Provider-Agnostic by Design

Here is the engineering challenge: the Paperclip platform expects agents to use the Claude CLI. Our agents run on a mix of GPT-Codex, Qwen, and GLM-5. The solution is qwen-shim — a shell script that translates between the Claude CLI interface and whatever model is actually answering.

Every agent is configured with the claude_local adapter, which points to qwen-shim.sh. The shim does three things:

Adds workspace directories via --include-directories, giving agents access to the full codebase
Injects heartbeat procedures when PAPERCLIP_TASK_ID is set, so the platform knows the agent is alive
Routes the actual LLM call through a proxy on port 3101 that dispatches to OpenClaw or DashScope based on the agent's preferred model

The result: the platform thinks it is talking to Claude. The agent thinks it is a standalone LLM. The shim handles the translation. Swap a model by changing one line of configuration — no agent code changes, no platform modifications.

The LLM Proxy: One Port, Many Models

Port 3101 runs our LLM proxy, which routes requests based on the model name in the header:

openai-codex/* and gpt-* models route to OpenClaw on port 18800
qwen* and glm* models route to DashScope
Unknown models get a sensible default

The proxy also injects trading bot context into every request, so agents working on trading-related tasks have full awareness of the system architecture without needing to read documentation files.

Self-Healing: The Supervisor

Agents fail. Models go down. API rate limits hit. Network connections drop. In a human team, someone notices and fixes it. In Paperclip, the supervisor handles it automatically.

paperclip_supervisor.py runs a 60-second loop that checks every agent's health. If an agent logs three consecutive adapter failures — meaning the LLM proxy returned errors three times in a row — the supervisor automatically pauses that agent. No cascading failures. No infinite retry loops burning tokens.

On the model side, every agent has a fallback chain. If GPT-Codex is unavailable, the request falls back to Qwen 3.5-plus. If Qwen is down, it tries GLM-5. The agent does not know the difference. Work continues.

Real Work, Real Output

These agents are not running in a sandbox. They process actual GitHub issues assigned to the Tacavar AI company. When an issue is assigned to an agent, the Paperclip platform triggers a wakeup call via API. The agent reads the issue, plans the work, writes the code, and commits it.

A typical workflow:

A GitHub issue is created: "Add sitemap.xml to avoidtravelscam.com"
The CEO agent triages it and assigns it to the SEO Manager
The SEO Manager wakes up, reads the issue, generates the sitemap
The Frontend Engineer reviews the output for any HTML issues
The code is committed to the repository

Content creation follows the same pattern. Research tasks, competitive analyses, blog posts, technical documentation — all flow through the same issue-to-output pipeline.

Telegram: The Management Console

The entire workforce is managed through Telegram commands:

/paperclip status — see which agents are active, paused, or failing
/paperclip trigger — manually wake an agent to process its queue
/paperclip enable <agent> / disable <agent> — toggle individual agents
/paperclip enableall / disableall — fleet-wide controls

No SSH. No dashboards. No deployment pipelines. Open Telegram, type a command, the workforce responds. Managing 12 agents takes less effort than managing a single CI/CD pipeline.

The Cost Math

Here is where it gets interesting. Ten of the twelve agents run on GPT-Codex via an OpenClaw subscription at $0 per token. The remaining two (Dev-2 and SEO Manager) run on Qwen 3.5-plus via DashScope, which costs fractions of a cent per thousand tokens.

The total monthly cost for running 12 AI agents — including the OpenClaw subscription, DashScope usage, and server resources — is less than what most companies pay for a single junior developer's monthly salary. And these agents do not need health insurance, PTO, or a MacBook Pro.

The comparison is not entirely fair, of course. A junior developer brings creativity, judgment, and the ability to handle ambiguous requirements. But for well-defined, repeatable tasks — the kind that fill most engineering backlogs — the cost advantage is not 2x or 5x. It is closer to 100x.

Where It Breaks

Honesty requires acknowledging the failure modes:

Ambiguous requirements kill agents. A GitHub issue that says "make the homepage better" will produce garbage. Agents need specific, well-defined tasks. The CEO agent helps by breaking vague requests into concrete subtasks, but the input quality still matters enormously.

Novel problems are hard. Agents excel at tasks that resemble their training data. Ask Dev-1 to implement a standard REST API and it will produce clean code. Ask it to design a novel distributed consensus algorithm and it will produce something that looks right but has subtle bugs. Human review is still essential for anything genuinely new.

Coordination overhead is real. When agents need to collaborate on a single feature, the handoff between them can lose context. We mitigate this with the shared state graph, but it is not perfect. Sometimes Agent B misinterprets what Agent A produced, and the result needs human correction.

The Vision: Teams That Never Sleep

The endgame is not replacing human engineers. It is giving every company access to a baseline engineering and growth capability that runs continuously, handles the repetitive work, and frees human engineers to focus on the problems that actually require human judgment.

Imagine a startup founder who needs a landing page, SEO content, competitive research, and a basic API — all before their next investor meeting. Today, that is four hires or four freelancers and weeks of coordination. With Paperclip, it is a set of GitHub issues and a few hours of autonomous execution.

The agents never sleep. They never quit. They never have a bad day. They do not produce perfect work — nothing does — but they produce consistent work at a pace and cost that changes what is possible for small teams.

Paperclip AI is running in production today, processing real tasks for Tacavar. We will keep sharing what works, what fails, and what we learn from treating AI agents as permanent members of the team.