Secret Scanning
Chub includes a built-in secret scanner — a drop-in replacement for gitleaks and betterleaks, with native awareness of AI agent transcripts.
Why another scanner?
Traditional secret scanners focus on source code and git history. But AI coding agents introduce new leak vectors:
- Developers paste API keys into agent prompts
- Agents include credentials in tool call outputs and chat responses
- Transcripts with secrets get committed or stored locally
- Agent-generated code hardcodes secrets copied from environment
Chub's scanner catches all of these. The same 260-rule detection engine that automatically redacts secrets from stored session transcripts also powers the chub scan command — one engine, two applications.
Quick start
# Scan git history
chub scan secrets git
# Scan only staged files (pre-commit hook)
chub scan secrets git --staged
# Scan a directory
chub scan secrets dir ./src
# Scan from stdin (agent transcript, pipe, etc.)
cat transcript.log | chub scan secrets stdin --label "claude-session"Detection rules
Chub ships with 260 built-in rules covering:
| Category | Examples |
|---|---|
| Cloud providers | AWS access key/secret, GCP API key, Azure AD client secret |
| AI/LLM services | OpenAI, Anthropic, DeepSeek, Mistral, xAI, Cerebras, TogetherAI, OpenRouter |
| Source control | GitHub PAT/fine-grained/OAuth, GitLab PAT/pipeline, Bitbucket |
| DevOps | Heroku, Netlify, Fly.io, Vercel, HashiCorp Terraform/Vault, Pulumi, Doppler |
| Payments | Stripe secret/publishable key |
| Observability | Grafana API/cloud/service, Sentry DSN, New Relic, Datadog, Databricks |
| SaaS | Slack (bot/app/webhook), Twilio, Shopify, Postman, Linear, Notion, Snyk |
| Package registries | npm, PyPI, RubyGems, Clojars |
| Auth/crypto | Private keys (PEM), JWT, age secret keys |
Each rule uses regex matching combined with:
- Keyword pre-filtering — only scan lines containing relevant keywords (fast path)
- Shannon entropy — measures randomness of captured secrets to filter low-entropy false positives
- Stopword filtering — ignores common placeholders like
your_api_key_here,${VARIABLE},changeme
Output formats
JSON (default)
Gitleaks-compatible JSON array of findings:
chub scan secrets git -f json
chub scan secrets git -f json -r report.json # write to fileSARIF
Static Analysis Results Interchange Format — for CI/CD integration (GitHub Code Scanning, Azure DevOps, etc.):
chub scan secrets git -f sarif -r report.sarifCSV
Spreadsheet-friendly format:
chub scan secrets git -f csv -r report.csvRedaction
Mask secrets in output:
chub scan secrets git --redact # 100% redacted (default)
chub scan secrets git --redact 50 # 50% redactedPre-commit hook
Add to .pre-commit-config.yaml:
repos:
- repo: local
hooks:
- id: chub-scan
name: chub secret scan
entry: chub scan secrets git --staged --no-banner
language: system
pass_filenames: falseOr use a git hook directly in .git/hooks/pre-commit:
#!/bin/sh
chub scan secrets git --staged --exit-code 1 --no-bannerBaseline management
Filter out known, accepted findings:
# Generate a baseline from current state
chub scan secrets git -r baseline.json
# Future scans ignore baseline findings (matched by fingerprint)
chub scan secrets git --baseline-path baseline.jsonConfiguration
Chub reads gitleaks-compatible TOML config files. It searches for config in this order:
- Explicit
--configpath - Environment variables:
CHUB_SCAN_CONFIG,BETTERLEAKS_CONFIG,GITLEAKS_CONFIG .chub-scan.tomlin the target directory.betterleaks.tomlin the target directory.gitleaks.tomlin the target directory
Example config
title = "My project scan config"
[extend]
useDefault = true # include built-in rules (default: true)
disabledRules = ["jwt"] # disable specific built-in rules
[allowlist]
description = "Global allowlist"
paths = ["test/fixtures/.*", ".*_test\\.go"]
regexes = ["EXAMPLE", "PLACEHOLDER"]
stopwords = ["dummy", "fake"]
[[rules]]
id = "internal-service-token"
description = "Internal service API token"
regex = '''myco-svc-[a-zA-Z0-9]{40}'''
keywords = ["myco-svc-"]
tags = ["internal"]
entropy = 3.5
[[rules.allowlists]]
description = "Test fixtures"
condition = "OR"
paths = [".*test.*"]
regexes = ["test-token"]Config fields
| Field | Description |
|---|---|
title | Config title (informational) |
extend.useDefault | Include built-in rules (default: true) |
extend.disabledRules | Rule IDs to disable from built-in set |
allowlist.paths | Regex patterns for paths to skip |
allowlist.regexes | Regex patterns to allowlist in content |
allowlist.stopwords | Substrings that suppress a finding |
allowlist.commits | Commit SHAs to skip |
rules[].id | Rule identifier |
rules[].regex | Detection regex |
rules[].secretGroup | Capture group index for the secret (default: 1; 0 = full match fallback) |
rules[].entropy | Minimum Shannon entropy for the secret |
rules[].keywords | Keywords for fast pre-filtering |
rules[].path | File path regex filter |
rules[].allowlists | Per-rule allowlists |
CI/CD integration
GitHub Actions
- name: Scan for secrets
run: chub scan secrets git -f sarif -r results.sarif --no-banner
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarifGeneric CI
chub scan secrets git --exit-code 1 --no-bannerThe scanner exits with code 1 (configurable via --exit-code) when secrets are found, and 0 when clean.
Writing custom rules
The 260 built-in rules cover common services, but you'll often need project-specific rules for internal tokens, service keys, or proprietary formats.
Rule anatomy
[[rules]]
id = "internal-service-token" # Unique ID (used in allowlists and disabledRules)
description = "Internal service token" # Human-readable description
regex = '''myco-svc-[a-zA-Z0-9]{40}''' # Detection regex
keywords = ["myco-svc-"] # Fast pre-filter (must appear in line for regex to run)
tags = ["internal"] # Optional tags for categorization
entropy = 3.5 # Optional: minimum Shannon entropy for the match
secretGroup = 1 # Optional: regex capture group containing the secret (default: 1; betterleaks convention)Step-by-step example: detecting an internal API key
Suppose your company issues API keys like ACME-KEY-a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4:
[[rules]]
id = "acme-api-key"
description = "ACME Corp internal API key"
regex = '''ACME-KEY-[a-f0-9]{32}'''
keywords = ["ACME-KEY-"]How it works:
- Keyword pre-filter — only lines containing
ACME-KEY-are tested (fast path, skips 99%+ of lines) - Regex match — the full pattern validates the key format
- No entropy needed — the
ACME-KEY-prefix is distinctive enough
Using entropy filtering
Entropy measures randomness. A string like aaaaaa has low entropy; a8Kf2mQ9 has high entropy. Set an entropy threshold to filter out placeholder values that match the regex pattern but aren't real secrets:
[[rules]]
id = "generic-token-header"
description = "Bearer token in Authorization header"
regex = '''(?i)authorization:\s*bearer\s+([a-zA-Z0-9._\-]{20,})'''
keywords = ["authorization", "bearer"]
secretGroup = 1 # capture group 1 = the token value
entropy = 3.5 # reject low-entropy matches like "xxxxxxxxxxxxxxxxxxxx"Entropy guidelines:
3.0— loose (catches most real secrets, some false positives)3.5— balanced (good default for token-like patterns)4.0— strict (may miss some real secrets with repetitive patterns)
Per-rule allowlists
Suppress false positives for specific rules without affecting other rules:
[[rules]]
id = "internal-service-token"
description = "Internal service token"
regex = '''myco-svc-[a-zA-Z0-9]{40}'''
keywords = ["myco-svc-"]
[[rules.allowlists]]
description = "Test fixtures use fake tokens"
condition = "OR" # Match ANY of the following
paths = [".*test.*", ".*fixture.*"]
regexes = ["fake-token", "test-token"]Testing custom rules
Validate your rules before committing by scanning a known file:
# Create a test file with a known secret
echo 'token = "ACME-KEY-a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"' > /tmp/test-secret.txt # gitleaks:allow
# Scan it with your config
chub scan secrets dir /tmp/test-secret.txt --config .chub-scan.toml
# Clean up
rm /tmp/test-secret.txtDisabling built-in rules
If a built-in rule produces false positives in your codebase, disable it rather than fighting it:
[extend]
useDefault = true
disabledRules = [
"generic-api-key", # too broad for our codebase
"jwt", # we store test JWTs in fixtures
]Troubleshooting
Too many false positives
- Check which rule fires — the JSON output includes the
ruleIDfield for each finding - Add to allowlist — add the pattern or path to
[allowlist] - Add stopwords — for placeholder patterns like
example_key_here - Raise entropy — increase the rule's
entropythreshold - Disable the rule — as a last resort, add to
disabledRules
Scanning is slow
- Use
--stagedfor pre-commit hooks (scans only the diff, not full history) - Add path allowlists — skip
vendor/,dist/,node_modules/, generated files - Use keyword pre-filtering — rules with
keywordsskip lines that don't contain the keyword
Scanner exits 0 but secrets exist
- Check if the secret pattern matches the rule's regex
- Check if entropy filtering is rejecting the match (try lowering or removing
entropy) - Check if an allowlist is suppressing the finding
Relationship to transcript redaction
Chub's tracking system automatically redacts secrets from stored session transcripts using the same 260-rule engine. The scanner and the redactor share their rule definitions — when a new detection rule is added, both scanning and redaction benefit immediately.
| Feature | chub scan | Transcript redaction |
|---|---|---|
| Rule engine | Shared (260 rules) | Shared (260 rules) |
| Entropy filtering | Yes | Yes |
| Stopword filtering | Yes | Yes |
| Invocation | Explicit CLI command | Automatic during tracking |
| Output | JSON/SARIF/CSV findings | Redacted transcript text |
Performance
Benchmarked against gitleaks v8.30.1 and betterleaks on 10 real public GitHub repositories. Median of 3 runs on each.
Directory scan
| Repo | Files | Chub | Gitleaks | Betterleaks | Speedup |
|---|---|---|---|---|---|
| axios/axios | 361 | 124 ms | 410 ms | 468 ms | 3.8x |
| expressjs/express | 213 | 119 ms | 409 ms | 461 ms | 3.9x |
| tokio-rs/tokio | 843 | 132 ms | 414 ms | 466 ms | 3.5x |
| pallets/flask | 236 | 179 ms | 425 ms | 474 ms | 2.6x |
| openai/openai-python | 1,280 | 185 ms | 414 ms | 469 ms | 2.5x |
| tiangolo/fastapi | 2,981 | 263 ms | 421 ms | 486 ms | 1.8x |
| django/django | 7,027 | 445 ms | 435 ms | 488 ms | 1.1x |
| hashicorp/vault | 8,611 | 527 ms | 414 ms | 462 ms | 0.9x |
| denoland/deno | 11,618 | 711 ms | 414 ms | 462 ms | 0.6x |
| golang/go | 15,154 | 847 ms | 422 ms | 471 ms | 0.6x |
Chub is 2–4x faster on repositories up to ~7,000 files. For very large repositories (10k+ files) gitleaks' fixed-overhead goroutine pool pulls ahead — Rust's rayon threads amortise well across I/O-bound work but the Go tool's lower startup cost wins at scale.
Git history scan
| Repo | Commits | Chub | Gitleaks | Betterleaks | Speedup |
|---|---|---|---|---|---|
| expressjs/express | 357 | 233 ms | 310 ms | 526 ms | 2.3x |
| pallets/flask | 688 | 272 ms | 320 ms | 490 ms | 1.8x |
| openai/openai-python | 200 | 290 ms | 306 ms | 480 ms | 1.7x |
| tokio-rs/tokio | 625 | 370 ms | 318 ms | 504 ms | 1.4x |
| axios/axios | 452 | 457 ms | 298 ms | 474 ms | 1.0x |
| tiangolo/fastapi | 200 | 478 ms | 322 ms | 529 ms | 1.1x |
| django/django | 200 | 980 ms | 315 ms | 489 ms | 0.5x |
| denoland/deno | 200 | 1,143 ms | 312 ms | 495 ms | 0.4x |
| hashicorp/vault | 477 | 1,946 ms | 318 ms | 493 ms | 0.3x |
| golang/go | 200 | 1,615 ms | 307 ms | 482 ms | 0.3x |
Git history scanning is fastest on small-to-medium histories. For large or dense commit trees (vault, golang/go) gitleaks' C-backed libgit2 blob diffing outperforms chub's pure-Rust git2 walker. Improvements to pack-file streaming are tracked in the roadmap.
Run bash scripts/benchmark-scan-repos.sh to reproduce locally (requires the 10 repos pre-cloned — see the script header).