A modular runtime and orchestration system
for AI agents.
Structured pipelines, gated phases, specialized agents. Works with Claude Code, OpenCode, Codex CLI, Cursor, and Kiro. 3,750 tests. Production-grade.
AI models write code.
That's not the hard part anymore.
The hard part is everything else. Picking what to work on. Managing branches. Reviewing output. Cleaning up AI artifacts. Handling CI. Addressing reviewer comments. Deploying. AgentSys automates all of it.
13 Commands. One Toolkit.
Each works standalone. Together, they automate everything.
/next-task
Task to production, fully automated
- 12-phase pipeline: discovery through deployment
- Multi-agent review loop (code, security, perf, tests)
- Persistent state -- resume from any phase
- GitHub Issues, GitLab, or local task files
$ /next-task # Start new workflow
$ /next-task --resume # Resume interrupted workflow
/agnix
Lint agent configs before they break
- 155 validation rules across 28 categories
- 10+ AI tools: Claude Code, Cursor, Copilot, Codex, OpenCode, Gemini CLI
- 57 auto-fixable rules with --fix flag
- SARIF output for GitHub Code Scanning
$ /agnix # Validate current project
$ /agnix --fix # Auto-fix fixable issues
/ship
Branch to merged PR in one command
- Commits, pushes, creates PR, monitors CI
- Waits for auto-reviewers, addresses every comment
- Platform auto-detection (GitHub Actions, Railway, Vercel)
- Merges, deploys, and cleans up
$ /ship # Full workflow
$ /ship --dry-run # Preview without executing
/deslop
Kill AI slop before it ships
- 3-phase detection: regex, multi-pass analyzers, CLI tools
- Certainty-graded findings (HIGH / MEDIUM / LOW)
- JS/TS, Python, Rust, Go, Java
- Auto-fix HIGH certainty issues
$ /deslop # Report only (safe)
$ /deslop apply # Fix HIGH certainty issues
/perf
Evidence-backed performance investigation
- 10-phase methodology with baselines and profiling
- Hypothesis generation and controlled experiments
- Breaking point analysis via binary search
- Based on recorded real investigation sessions
$ /perf # Start new investigation
$ /perf --resume # Resume previous investigation
/drift-detect
Find what's documented but not built
- AST-based plan vs code semantic analysis
- JavaScript collectors + single Opus call
- 77% token reduction vs multi-agent approaches
- Tested on 1,000+ repositories
$ /drift-detect # Full analysis
$ /drift-detect --depth quick # Quick scan
/audit-project
Multi-agent code review that iterates until clean
- Up to 10 specialized agents per project
- Security, performance, architecture, DB, API, frontend
- Iterates until zero open issues remain
- Auto-fixes all non-false-positive findings
$ /audit-project # Full review
$ /audit-project --domain security # Security only
/enhance
Analyze everything that shapes agent behavior
- 7 parallel analyzers for prompts, agents, plugins, docs
- Certainty-graded findings with auto-fix support
- Auto-learns false positives over time
- Hooks and skills analysis included
$ /enhance # Run all analyzers
$ /enhance --apply # Apply HIGH certainty fixes
/repo-map
AST symbol and import mapping
- Cached file-to-symbols map via ast-grep
- Exports, functions, classes, import graph
- Used by drift-detect and planners automatically
- Incremental updates for large repos
$ /repo-map init # First-time map generation
$ /repo-map update # Incremental update
/sync-docs
Keep docs in sync with code
- Finds outdated references and stale examples
- Detects missing CHANGELOG entries
- Version mismatch detection
- Auto-fixes safe issues like version numbers
$ /sync-docs # Check what needs updates
$ /sync-docs apply # Apply safe fixes
/learn
Research any topic, build a learning guide
- Progressive discovery: broad to specific to deep
- Quality-scored sources (authority, recency, depth)
- Structured guide with examples and best practices
- RAG index for future agent lookups
$ /learn react hooks --depth=deep # Comprehensive
$ /learn kubernetes --depth=brief # Quick overview
/consult
Get a second opinion from another AI tool
- Cross-tool AI consultation via ACP transport
- 6 providers: Claude, Gemini, Codex, Copilot, Kiro, OpenCode
- Effort-mapped model selection per provider
- Session continuations and context injection
$ /consult "Is this the right approach?" --tool=gemini # Second opinion
$ /consult "Review for performance" --tool=codex # Codex review
/debate
Structured adversarial debate between AI tools
- Multi-round proposer/challenger format
- Evidence-backed arguments with mandatory counterpoints
- Any two AI tools as debaters (Claude, Gemini, Codex, Kiro, etc.)
- Final verdict from the orchestrator
$ /debate codex vs gemini about microservices vs monolith # Structured debate
$ /debate claude vs kiro about our auth implementation # Codebase debate
/web-ctl
Browser automation for AI agents
- Headless Playwright with encrypted session persistence
- Human-in-the-loop auth handoff with CAPTCHA detection
- Anti-bot measures and output sanitization
- Snapshot-based accessibility tree for element discovery
$ /web-ctl goto https://example.com # Navigate
$ /web-ctl auth github --url https://github.com/login # Auth handoff
How It Works
One approval. Fully autonomous execution.
Pick a task
Select from GitHub Issues, GitLab, or a local task file. The agent explores your codebase and designs a plan.
Approve the plan
Review the implementation plan. This is the last human interaction. Everything after is automated.
Watch it ship
Code, review, cleanup, documentation, PR, CI, merge. All handled. You review the result.
Built Different
Not another AI wrapper. Engineering-grade workflow automation.
Code does code work. AI does AI work.
Static analysis, regex, and AST for detection. LLMs only for synthesis and judgment. 77% fewer tokens than multi-agent approaches.
One agent, one job, done well
43 specialized agents, each with a narrow scope and clear success criteria. No agent tries to do everything.
Pipeline with gates
Each step must pass before the next begins. Can't push before review. Can't merge before CI. Hooks enforce it.
Validate plan and results
Approve the plan. See the results. The middle is automated. One approval unlocks autonomous execution.
43 Agents. 30 Skills.
Right model for the task. Opus reasons. Sonnet validates. Haiku executes.
Deep codebase analysis and context gathering
Step-by-step implementation design
Autonomous code writing and modification
Performance investigation coordination
Deep performance analysis and profiling
Web research and learning guide creation
Multi-source plan synthesis and merging
Agent configuration quality analysis
CLAUDE.md file optimization
Documentation quality improvement
Git hooks and automation analysis
Prompt engineering best practices
Skill definition quality analysis
Structured adversarial debate coordination
Task source scanning and prioritization
Pre-ship quality gate validation
CI failure diagnosis and auto-repair
Test coverage analysis and gap detection
AI slop pattern detection and cleanup
Cross-file semantic analysis
Plugin configuration validation
Hot code path identification
Performance investigation logging
Performance hypothesis generation
Controlled experiment execution
Documentation sync and update
Agent config linting orchestration
Cross-tool AI consultation orchestration
Browser automation and session management
Git worktree creation and cleanup
CI pipeline status polling
Mechanical code fixes and formatting
Repo map structural validation
Code style and quality patterns
Security vulnerability detection
Runtime performance optimization
Test quality and coverage review
System architecture analysis
Database schema and query review
API design and consistency
Frontend patterns and accessibility
Backend architecture and scaling
CI/CD and infrastructure review
30 Skills across 14 Plugins
Get Started in 30 Seconds
Recommended
$ /plugin marketplace add agent-sh/agentsys
$ /plugin install next-task@agentsys
$ /plugin install ship@agentsys
Interactive installer for Claude Code, OpenCode, and Codex CLI
$ npm install -g agentsys && agentsys
Clone and install from source
$ git clone https://github.com/agent-sh/agentsys.git
$ cd agentsys
$ npm install