Argus multi-model code review

Getting Started

View SKILL.md

Argus runs two ways: as the /argus Claude Code skill, or as standalone Python scripts from a shell. Both share the same config.yaml and environment variables. Pick the track that matches you.

Prerequisites
  • Python 3.10+ (the project targets 3.12).
  • git — Argus reviews diffs.
  • aichat — the universal LLM client Argus uses for API-routed reviewers.
  • At least one API key in your environment. For the easiest start, just an OPENROUTER_API_KEY.
  • Python packages: pyyaml and psutil.

Beginner track

Five steps. One key. You will be reviewing diffs in a couple of minutes.

1. Clone and enter the repo

git clone https://github.com/jimstratus/argus.git
cd argus

2. Install the Python dependencies

pip install pyyaml psutil

3. Point Argus at itself

ARGUS_HOME tells every script where the config and prompts live. Set it to the repo root:

export ARGUS_HOME=$PWD

4. Set one API key

The public default route is OpenRouter — a single key covers most reviewers. Keys live only in your environment; Argus never writes them to disk.

export OPENROUTER_API_KEY="sk-or-..."

5. Generate the aichat config, verify, and review

install_aichat.py writes ~/.config/aichat/config.yaml with the provider client definitions (no keys on disk — keys are forwarded from the environment at dispatch time).

python scripts/install_aichat.py --merge
python scripts/verify.py --all

Then run a review. Inside Claude Code:

/argus

That reviews your current diff with the default standard profile and prints one merged review.

💡
Try a dry run first

/argus --dry-run resolves the roster and shows the cost estimate without spending anything. Great for a sanity check.

Advanced track

For power users with their own provider subscriptions, large diffs, and benchmark needs.

Multiple provider keys + direct-API routing

Three default-roster reviewers are dual-route: glm-5.2, minimax-m3, and deepseek-v4-pro (the custom-only hermes-4.3 is dual-route too). With their direct keys set, you can prefer each provider’s own API first (cheaper, uses your subscriptions) and keep OpenRouter as the fallback.

export ZAI_API_KEY="..."          # GLM direct (z.ai Coding Plan)
export MINIMAX_API_KEY="..."      # MiniMax direct
export DEEPSEEK_API_KEY="..."     # DeepSeek direct (api.deepseek.com)

# Choose direct-first for this run:
python scripts/verify.py --all --prefer-direct
# or persist it:  set route_preference: direct in config.yaml
# or per-shell:   export ARGUS_ROUTE_PREF=direct

See Routing preference for the full precedence rules. CLI reviewers (Codex / Claude / OpenCode / Gemini) are never reordered — their subscription stays primary.

Profiles

Skip naming individual reviewers — pick a profile:

/argus --profile panel        # maximum coverage
/argus --profile security     # auth / crypto / input focus
/argus --profile deep         # long-context reviewers for large diffs
/argus --profile direct       # direct-API subs only (pair with route_preference: direct)

Manual shell pipeline

RUN_DIR="$ARGUS_HOME/runs/$(date +%Y%m%dT%H%M%S)-manual"; mkdir -p "$RUN_DIR"
git diff HEAD > "$RUN_DIR/diff.patch"
python scripts/estimate_cost.py --roster "glm-5.2,minimax-m3,gemini-or,codex" --diff "$RUN_DIR/diff.patch"
python scripts/dispatch.py --run-dir "$RUN_DIR" --roster "glm-5.2,minimax-m3,gemini-or,codex" --diff "$RUN_DIR/diff.patch"
python scripts/merge.py --run-dir "$RUN_DIR"

Benchmark seeding + parallel shells

Benchmark mode scores reviewers against labeled fixtures to build a leaderboard. Always dry-run a single fixture first to catch provider-config bugs in ~30 seconds:

python scripts/benchmark.py --runs 1 --fixtures sql-injection --profile panel   # smoke test
python scripts/benchmark.py --runs 3 --profile panel                            # full run

The recommended protocol is one shell per reviewer, max five concurrent, with a hard wall-cap per reviewer (--max-wall-sec) so a single stalled provider never blocks the run. Each reviewer writes its own incremental JSON for tailable progress.

Host-CLI awareness

Argus detects the CLI host it runs inside (claude / codex / gemini / opencode / unknown) and never asks a host CLI to review its own invocation. Inside Claude Code the claude reviewer is auto-skipped. For profile rosters, the claude reviewer is auto-added when the host is not Claude. See the host-CLI table.