Argus multi-model code review

Argus is built so the most common contributions are config-only — you rarely need to touch Python to add a reviewer or a profile. Thanks for helping the panel see more.

Reviewer & provider changes

Adding, removing, or re-versioning a reviewer is a config.yaml edit: add an entry under reviewers with its route(s), model ID(s), context window, tier, and cost. For a new provider route, add a client to the aichat generator (scripts/install_aichat.py) and forward its $<PROVIDER>_API_KEY. No keys go in the repo — ever.

  • Reuse the universal aichat adapter for any OpenAI-compatible endpoint.
  • Mark superseded models custom_only rather than deleting them.
  • Keep model IDs pinned; document upstream renames.

Fixture contributions

The leaderboard is only as good as its fixtures, and four is too few. New fixtures are very welcome: auth bypass, XSS, null-deref, async races, TOCTOU, and more.

A fixture is a directory under fixtures/ containing:

  • diff.patch — the diff under review.
  • ground-truth.json — the labeled findings (file + line + severity) the panel should catch.

Add at least one clean variant where appropriate so false positives are penalized. Fixtures are diff-based — do not copy OMC’s code-file fixtures directly; their schema differs.

Tests

Run the suite before opening a PR:

python -m pytest

Add tests for new parsing paths (the JSON extractor is the riskiest surface) and for any merge / scoring change.

Don’t-break rules

  • Don’t re-implement ±3 bucketing — anchor-based clustering already exists in merge.py. Extend it; don’t duplicate it.
  • Keep the finding schema strict{file, line, severity, category, description, confidence}. Downstream merge and benchmark scoring depend on it.
  • Never write API keys to disk. Keys stay in the environment, forwarded at dispatch.
  • Preserve failure isolation — one broken reviewer must never kill the run.
  • CLI reviewers are never reordered by routing preference.

See CONTRIBUTING.md and DEVELOPMENT.md for the full developer guide.