Skip to content

Conversation

@addelong
Copy link
Contributor

@addelong addelong commented Nov 7, 2025

Summary

Adjusting risk score thresholds to match the thresholds defined by CVSS: https://nvd.nist.gov/vuln-metrics/cvss

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 7, 2025

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'ignore_draft_pr', 'focus_on_security_issues'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

This pull request refactors risk score thresholds to align with CVSS v3.x/v4.0 standards. The critical threshold is raised from 7.5 to 9.0, high threshold from 5.0 to 7.0, and medium threshold from 2.5 to 4.0. Updates are applied consistently across the codebase: the TypeScript scoring logic, user-facing documentation including SLA timelines and procedural guidance, and the changelog. No exported entity signatures were modified.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10-15 minutes

  • Verify that all threshold references throughout the codebase have been updated consistently (search for remaining instances of old thresholds: 7.5, 5.0, 2.5)
  • Cross-check that documentation score ranges and SLA mappings align with the implementation in scoreToLevel function
  • Confirm that any tests or test fixtures relying on the old thresholds have been updated accordingly
  • Review the new procedural steps (items 4-7) in the documentation for accuracy and completeness

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: updating risk scoring thresholds to align with CVSS v3.x/v4.0 standards, which is reflected across all modified files.
Description check ✅ Passed The description is directly related to the changeset, explaining that risk score thresholds are being adjusted to match CVSS standards with a relevant reference link.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/risk-score-thresholds

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
site/docs/red-team/risk-scoring.md (1)

109-120: Update Scenario 3 label to “Medium”

With the new thresholds, a score of 5.65 now falls into the medium band (<7.0). Please adjust the bolded classification (and any surrounding text) so the example matches the updated severity ranges.

🧹 Nitpick comments (2)
CHANGELOG.md (2)

7-12: Add PR number to comply with changelog rules.

The entry meets scope/prefix/category expectations, but it’s missing the PR reference required by our changelog rules. Append (#6132) at the end.

Apply this minimal diff:

- - refactor(redteam): update risk score thresholds to match CVSS v3.x/v4.0 standards (Critical: 9.0-10.0, High: 7.0-8.9, Medium: 4.0-6.9, Low: 0.1-3.9)
+ - refactor(redteam): update risk score thresholds to match CVSS v3.x/v4.0 standards (Critical: 9.0-10.0, High: 7.0-8.9, Medium: 4.0-6.9, Low: 0.1-3.9) (#6132)

As per coding guidelines.


7-12: Add a Documentation note if docs were updated.

If this PR also updated risk scoring docs/SLA mappings, add a single-line entry under “Documentation” with a conventional prefix and scope.

Example to add under Unreleased → Documentation:

+### Documentation
+
+- docs(redteam): align risk scoring docs and examples with CVSS v3.x/v4.0 severity ranges (#6132)

Based on learnings.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e485ca and 7d07c0c.

📒 Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • site/docs/red-team/risk-scoring.md (2 hunks)
  • src/redteam/riskScoring.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
src/redteam/**/*.ts

📄 CodeRabbit inference engine (src/redteam/CLAUDE.md)

src/redteam/**/*.ts: Always sanitize when logging test prompts or model outputs by passing them via the structured metadata parameter (second argument) to the logger, not raw string interpolation
Use the standardized risk severity levels: critical, high, medium, low when reporting results

Files:

  • src/redteam/riskScoring.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Prefer not to introduce new TypeScript types; reuse existing interfaces where possible

**/*.{ts,tsx}: Maintain consistent import order (Biome handles sorting)
Use consistent curly braces for all control statements
Prefer const over let and avoid var
Use object shorthand syntax when possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks

**/*.{ts,tsx}: Use TypeScript with strict type checking enabled
Follow consistent import order (Biome will sort imports)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object property shorthand when possible
Use async/await for asynchronous code instead of raw promises/callbacks
When logging, pass sensitive data via the logger context object so it is auto-sanitized; avoid interpolating secrets into message strings
Manually sanitize sensitive objects with sanitizeObject before storing or emitting outside logging contexts

Files:

  • src/redteam/riskScoring.ts
src/**

📄 CodeRabbit inference engine (AGENTS.md)

Place core application/library logic under src/

Files:

  • src/redteam/riskScoring.ts
site/docs/**/*.md

📄 CodeRabbit inference engine (.cursor/rules/docusaurus.mdc)

site/docs/**/*.md: Prioritize minimal edits when updating existing documentation; avoid creating entirely new sections or rewriting substantial portions; focus edits on improving grammar, spelling, clarity, fixing typos, and structural improvements where needed; do not modify existing headings (h1, h2, h3, etc.) as they are often linked externally.
Structure content to reveal information progressively: begin with essential actions and information, then provide deeper context as necessary; organize information from most important to least important.
Use action-oriented language: clearly outline actionable steps users should take, use concise and direct language, prefer active voice over passive voice, and use imperative mood for instructions.
Use 'eval' instead of 'evaluation' in all documentation; when referring to command line usage, use 'npx promptfoo eval' rather than 'npx promptfoo evaluation'; maintain consistency with this terminology across all examples, code blocks, and explanations.
The project name can be written as either 'Promptfoo' (capitalized) or 'promptfoo' (lowercase) depending on context: use 'Promptfoo' at the beginning of sentences or in headings, and 'promptfoo' in code examples, terminal commands, or when referring to the package name; be consistent with the chosen capitalization within each document or section.
Each markdown documentation file must include required front matter fields: 'title' (the page title shown in search results and browser tabs) and 'description' (a concise summary of the page content, ideally 150-160 characters).
Only add a title attribute to code blocks that represent complete, runnable files; do not add titles to code fragments, partial examples, or snippets that aren't meant to be used as standalone files; this applies to all code blocks regardless of language.
Use special comment directives to highlight specific lines in code blocks: 'highlight-next-line' highlights the line immediately after the comment, 'highligh...

Files:

  • site/docs/red-team/risk-scoring.md
{site,examples}/**

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Documentation-only changes (touching only site/ or examples/) must use docs: prefix in PR title

Files:

  • site/docs/red-team/risk-scoring.md
site/**

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

For feature changes, update relevant documentation under site/

Files:

  • site/docs/red-team/risk-scoring.md
CHANGELOG.md

📄 CodeRabbit inference engine (.cursor/rules/changelog.mdc)

CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Every pull request must add or update an entry in CHANGELOG.md under the [Unreleased] section
Follow Keep a Changelog structure under [Unreleased] with sections: Added, Changed, Fixed, Dependencies, Documentation, Tests, Removed
Each changelog entry must include the PR number formatted as (#1234) or temporary placeholder (#XXXX)
Each changelog entry must use a Conventional Commit prefix: feat:, fix:, chore:, docs:, test:, or refactor:
Each changelog entry must be concise and on a single line
Each changelog entry must be user-focused, describing what changed and why it matters to users
Each changelog entry must include a scope in parentheses, e.g., feat(providers): or fix(evaluator):
Use common scopes for consistency: providers, evaluator, webui or app, cli, redteam, core, assertions, config, database
Place all dependency updates under the Dependencies category
Place all test changes under the Tests category
Use categories consistently: Added for new features, Changed for modifications/refactors/CI, Fixed for bug fixes, Removed for removed features
After a PR number is assigned, replace (#XXXX) placeholders with the actual PR number
Be specific, use active voice, include context, and avoid repeating the PR title in changelog entries
Group related changes with multiple bullets in the same category when needed; use one entry per logical change

CHANGELOG.md: All user-facing changes require a CHANGELOG.md entry before creating a PR
Add entries under [Unreleased] in appropriate category (Added, Changed, Fixed, Dependencies, Documentation, Tests)
Each changelog entry must include PR number (#1234) or placeholder (#XXXX)
Use conventional commit prefixes in changelog entries (feat:, fix:, chore:, docs:, test:, refactor:)

CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Changelog entries must include the PR number in format (#1234)
Use conventional commit prefixes in changelog entries: feat:,...

Files:

  • CHANGELOG.md
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:59:20.507Z
Learning: Applies to src/redteam/**/*.ts : Use the standardized risk severity levels: critical, high, medium, low when reporting results
📚 Learning: 2025-10-05T16:59:20.507Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:59:20.507Z
Learning: Applies to src/redteam/**/*.ts : Use the standardized risk severity levels: critical, high, medium, low when reporting results

Applied to files:

  • src/redteam/riskScoring.ts
  • site/docs/red-team/risk-scoring.md
  • CHANGELOG.md
📚 Learning: 2025-10-24T22:41:44.088Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/gh-cli-workflow.mdc:0-0
Timestamp: 2025-10-24T22:41:44.088Z
Learning: Applies to CHANGELOG.md : Add entries under [Unreleased] in appropriate category (Added, Changed, Fixed, Dependencies, Documentation, Tests)

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-24T22:42:38.674Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-24T22:42:38.674Z
Learning: Applies to CHANGELOG.md : Add new entries under the 'Unreleased' section

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-24T22:41:09.485Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/changelog.mdc:0-0
Timestamp: 2025-10-24T22:41:09.485Z
Learning: Applies to CHANGELOG.md : Follow Keep a Changelog structure under [Unreleased] with sections: Added, Changed, Fixed, Dependencies, Documentation, Tests, Removed

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-27T08:53:44.103Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-27T08:53:44.103Z
Learning: Applies to CHANGELOG.md : Use standardized scopes: providers, webui, cli, assertions, api, config, deps, docs, tests, examples, redteam, site

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-24T22:41:09.485Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/changelog.mdc:0-0
Timestamp: 2025-10-24T22:41:09.485Z
Learning: Applies to CHANGELOG.md : Every pull request must add or update an entry in CHANGELOG.md under the [Unreleased] section

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-24T22:42:38.674Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-24T22:42:38.674Z
Learning: Applies to CHANGELOG.md : Document all user-facing changes in CHANGELOG.md

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-27T08:53:44.103Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-27T08:53:44.103Z
Learning: Applies to CHANGELOG.md : Keep entries concise, one line, user-focused (what changed, not how) under the correct category

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-10-24T22:41:09.485Z
Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/changelog.mdc:0-0
Timestamp: 2025-10-24T22:41:09.485Z
Learning: Applies to CHANGELOG.md : Use categories consistently: Added for new features, Changed for modifications/refactors/CI, Fixed for bug fixes, Removed for removed features

Applied to files:

  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: webui tests
  • GitHub Check: Generate Assets
  • GitHub Check: Build Docs
  • GitHub Check: Share Test
  • GitHub Check: Redteam (Production API)
  • GitHub Check: Build on Node 24.x
  • GitHub Check: Redteam (Staging API)
  • GitHub Check: Test on Node 24.x and ubuntu-latest
  • GitHub Check: Build on Node 20.x
  • GitHub Check: Test on Node 24.x and windows-latest
  • GitHub Check: Build on Node 22.x
  • GitHub Check: Test on Node 20.x and macOS-latest
  • GitHub Check: Test on Node 20.x and windows-latest
  • GitHub Check: Test on Node 22.x and macOS-latest
  • GitHub Check: Test on Node 22.x and windows-latest
  • GitHub Check: Test on Node 22.x and ubuntu-latest
  • GitHub Check: Test on Node 20.x and ubuntu-latest
  • GitHub Check: Style Check
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
src/redteam/riskScoring.ts (1)

181-188: Threshold update aligns with CVSS bands

The new cutoffs (≥9.0 critical, ≥7.0 high, ≥4.0 medium) match the qualitative ranges defined in CVSS v3.x/v4.0, so the mapping now reflects the standard terminology precisely. (first.org)

typpo
typpo previously approved these changes Nov 9, 2025
mldangelo
mldangelo previously approved these changes Nov 10, 2025
@mldangelo mldangelo dismissed stale reviews from typpo and themself via 1f57bf4 November 10, 2025 15:28
@use-tusk
Copy link
Contributor

use-tusk bot commented Nov 10, 2025

⏩ No test execution environment matched (1f57bf4) View output ↗

Tip

New to Tusk? Learn more here.


View check history

Commit Status Output Created (UTC)
1f57bf4 ⏩ No test execution environment matched Output Nov 10, 2025 3:28PM

View output in GitHub ↗

@addelong addelong enabled auto-merge (squash) November 10, 2025 16:50
@addelong addelong disabled auto-merge November 10, 2025 17:27
@addelong addelong merged commit a2b8fbf into main Nov 10, 2025
35 checks passed
@addelong addelong deleted the fix/risk-score-thresholds branch November 10, 2025 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants