Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Nov 27, 2025

⚡️ This pull request contains optimizations for PR #945

If you approve this dependent PR, these changes will be merged into the original PR branch feat/feedback-loop-for-unmatched-test-results.

This PR will be automatically closed if the original PR is merged.


📄 16% (0.16x) speedup for parse_test_failures_from_stdout in codeflash/verification/parse_test_output.py

⏱️ Runtime : 2.76 milliseconds 2.39 milliseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 15% speedup through several targeted micro-optimizations that reduce computational overhead in the parsing loop:

Key Optimizations:

  1. Single-pass boundary search: Instead of checking both conditions (start_line != -1 and end_line != -1) on every iteration, the optimized version uses None values and breaks immediately when both markers are found, eliminating redundant condition checks.

  2. Fast-path string matching: Before calling the expensive .startswith("_______") method, it first checks if line[0] == "_", avoiding the method call for most lines that don't start with underscores.

  3. Method lookup optimization: Pulls current_failure_lines.append into a local variable to avoid repeated attribute lookups in the hot loop where failure lines are processed.

  4. Memory-efficient list management: Uses current_failure_lines.clear() instead of creating new list objects (current_failure_lines = []), reducing object allocation pressure.

Performance Impact:
The optimizations show the most significant gains in large-scale scenarios:

  • Large failure sets: 14.2% faster with 500 failures, 14.0% faster with 999 failures
  • Large output: 29.2% faster for single failures with 1000 lines of output
  • Complex scenarios: 22.3% faster with 50 cases having 10 lines each

Hot Path Context:
Based on the function reference, parse_test_failures_from_stdout is called from parse_test_results, which appears to be part of a test optimization pipeline. The function processes pytest stdout to extract failure information, making it performance-critical when dealing with large test suites or verbose test outputs. The 15% improvement becomes meaningful when processing hundreds of test failures in CI/CD environments or during iterative code optimization workflows.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 33 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from dataclasses import dataclass, field
from typing import Dict

# imports
from codeflash.verification.parse_test_output import parse_test_failures_from_stdout


# Minimal TestResults model for testing purposes
@dataclass
class TestResults:
    test_failures: Dict[str, str] = field(default_factory=dict)


# unit tests

# -------------------------
# Basic Test Cases
# -------------------------


def test_no_failures_section_returns_original_results():
    # No 'FAILURES' or 'short test summary info' in stdout
    stdout = "All tests passed!\n"
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 1.71μs -> 1.58μs (8.28% faster)


def test_empty_stdout_returns_original_results():
    # Completely empty stdout
    stdout = ""
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 1.13μs -> 1.06μs (6.59% faster)


def test_single_failure_parsed_correctly():
    # Typical pytest output with one failure
    stdout = (
        "=========================== FAILURES ===========================\n"
        "_______ test_addition _______\n"
        "def test_addition():\n"
        "    assert 1 + 1 == 3\n"
        "AssertionError: assert 2 == 3\n"
        "================= short test summary info ======================\n"
        "FAILED test_math.py::test_addition - AssertionError: assert 2 == 3\n"
    )
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 6.27μs -> 6.21μs (0.950% faster)


def test_multiple_failures_parsed_correctly():
    # Output with two failures
    stdout = (
        "=========================== FAILURES ===========================\n"
        "_______ test_addition _______\n"
        "def test_addition():\n"
        "    assert 1 + 1 == 3\n"
        "AssertionError: assert 2 == 3\n"
        "_______ test_subtraction _______\n"
        "def test_subtraction():\n"
        "    assert 2 - 1 == 0\n"
        "AssertionError: assert 1 == 0\n"
        "================= short test summary info ======================\n"
        "FAILED test_math.py::test_addition - AssertionError: assert 2 == 3\n"
        "FAILED test_math.py::test_subtraction - AssertionError: assert 1 == 0\n"
    )
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 8.04μs -> 7.80μs (2.96% faster)


def test_failure_with_multiline_traceback():
    # Output with a traceback spanning multiple lines
    stdout = (
        "=========================== FAILURES ===========================\n"
        "_______ test_division _______\n"
        "def test_division():\n"
        "    assert 1 / 0 == 0\n"
        "ZeroDivisionError: division by zero\n"
        "Traceback (most recent call last):\n"
        '  File "test_math.py", line 10, in test_division\n'
        "    assert 1 / 0 == 0\n"
        "ZeroDivisionError: division by zero\n"
        "================= short test summary info ======================\n"
        "FAILED test_math.py::test_division - ZeroDivisionError: division by zero\n"
    )
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 7.97μs -> 7.34μs (8.59% faster)


# -------------------------
# Edge Test Cases
# -------------------------


def test_failures_section_without_any_test_case():
    # 'FAILURES' section present but no test case starts with '_______'
    stdout = (
        "=========================== FAILURES ===========================\n"
        "No test cases failed!\n"
        "================= short test summary info ======================\n"
        "All tests passed\n"
    )
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 3.55μs -> 3.49μs (1.75% faster)


def test_missing_end_marker_returns_original_results():
    # 'FAILURES' present, but no 'short test summary info'
    stdout = (
        "=========================== FAILURES ===========================\n"
        "_______ test_addition _______\n"
        "def test_addition():\n"
        "    assert 1 + 1 == 3\n"
        "AssertionError: assert 2 == 3\n"
    )
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 2.60μs -> 2.56μs (1.96% faster)


def test_missing_start_marker_returns_original_results():
    # 'short test summary info' present, but no 'FAILURES'
    stdout = (
        "_______ test_addition _______\n"
        "def test_addition():\n"
        "    assert 1 + 1 == 3\n"
        "AssertionError: assert 2 == 3\n"
        "================= short test summary info ======================\n"
        "FAILED test_math.py::test_addition - AssertionError: assert 2 == 3\n"
    )
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 3.08μs -> 2.65μs (16.3% faster)


def test_case_names_with_spaces_and_special_chars():
    # Test case names with spaces and special characters
    stdout = (
        "=========================== FAILURES ===========================\n"
        "_______ test_addition with space _______\n"
        "def test_addition():\n"
        "    assert 1 + 1 == 3\n"
        "AssertionError: assert 2 == 3\n"
        "_______ test_subtraction!@# _______\n"
        "def test_subtraction():\n"
        "    assert 2 - 1 == 0\n"
        "AssertionError: assert 1 == 0\n"
        "================= short test summary info ======================\n"
        "FAILED test_math.py::test_addition - AssertionError: assert 2 == 3\n"
        "FAILED test_math.py::test_subtraction - AssertionError: assert 1 == 0\n"
    )
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 8.11μs -> 7.78μs (4.12% faster)


def test_case_with_no_failure_lines():
    # Test case marker present, but no lines after marker
    stdout = (
        "=========================== FAILURES ===========================\n"
        "_______ test_empty _______\n"
        "================= short test summary info ======================\n"
        "FAILED test_math.py::test_empty - AssertionError: assert False\n"
    )
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 4.43μs -> 4.58μs (3.30% slower)


def test_existing_failures_are_overwritten():
    # Existing failures in TestResults are replaced
    stdout = (
        "=========================== FAILURES ===========================\n"
        "_______ test_new_failure _______\n"
        "def test_new_failure():\n"
        "    assert False\n"
        "AssertionError: assert False\n"
        "================= short test summary info ======================\n"
        "FAILED test_math.py::test_new_failure - AssertionError: assert False\n"
    )
    results = TestResults(test_failures={"old_test": "Old failure"})
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 6.07μs -> 5.93μs (2.36% faster)


def test_nonstandard_marker_lines():
    # Markers with extra underscores or spaces
    stdout = (
        "=========================== FAILURES ===========================\n"
        "_______  test_weird_marker  _______\n"
        "def test_weird_marker():\n"
        "    assert False\n"
        "AssertionError: assert False\n"
        "================= short test summary info ======================\n"
        "FAILED test_math.py::test_weird_marker - AssertionError: assert False\n"
    )
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 5.91μs -> 5.72μs (3.32% faster)


# -------------------------
# Large Scale Test Cases
# -------------------------


def test_many_failures_parsed_correctly():
    # Simulate 500 failures
    num_failures = 500
    stdout_lines = ["=========================== FAILURES ==========================="]
    for i in range(num_failures):
        stdout_lines.append(f"_______ test_case_{i} _______")
        stdout_lines.append(f"def test_case_{i}():")
        stdout_lines.append(f"    assert {i} == -1")
        stdout_lines.append(f"AssertionError: assert {i} == -1")
    stdout_lines.append("================= short test summary info ======================")
    for i in range(num_failures):
        stdout_lines.append(f"FAILED test_suite.py::test_case_{i} - AssertionError: assert {i} == -1")
    stdout = "\n".join(stdout_lines)
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 635μs -> 556μs (14.2% faster)
    for i in range(num_failures):
        key = f"test_case_{i}"


def test_large_failure_output_lines():
    # One test case with a very large failure output
    large_output = "\n".join([f"Line {i}" for i in range(1000)])
    stdout = (
        "=========================== FAILURES ===========================\n"
        "_______ test_large_output _______\n"
        f"{large_output}\n"
        "================= short test summary info ======================\n"
        "FAILED test_suite.py::test_large_output - AssertionError: large output\n"
    )
    results = TestResults()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 228μs -> 177μs (29.2% faster)
    # Check that all lines are present
    for i in range(1000):
        pass


def test_performance_with_many_failures():
    # This test is to ensure the function completes in reasonable time with 999 failures
    import time

    num_failures = 999
    stdout_lines = ["=========================== FAILURES ==========================="]
    for i in range(num_failures):
        stdout_lines.append(f"_______ test_case_{i} _______")
        stdout_lines.append(f"def test_case_{i}():")
        stdout_lines.append(f"    assert {i} == -1")
        stdout_lines.append(f"AssertionError: assert {i} == -1")
    stdout_lines.append("================= short test summary info ======================")
    for i in range(num_failures):
        stdout_lines.append(f"FAILED test_suite.py::test_case_{i} - AssertionError: assert {i} == -1")
    stdout = "\n".join(stdout_lines)
    results = TestResults()
    start = time.time()
    codeflash_output = parse_test_failures_from_stdout(results, stdout)
    returned = codeflash_output  # 1.32ms -> 1.16ms (14.0% faster)
    duration = time.time() - start


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from dataclasses import dataclass, field
from typing import Dict

# imports
from codeflash.verification.parse_test_output import parse_test_failures_from_stdout


# Minimal stub for TestResults to allow unit testing
@dataclass
class TestResults:
    test_failures: Dict[str, str] = field(default_factory=dict)


# unit tests

# 1. Basic Test Cases


def test_no_failures_section_returns_original():
    # stdout does not contain "FAILURES" or "short test summary info"
    tr = TestResults()
    stdout = "All tests passed!\n"
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 1.72μs -> 1.63μs (5.51% faster)


def test_failures_section_no_test_cases():
    # stdout contains "FAILURES" and "short test summary info" but no test cases
    tr = TestResults()
    stdout = "Some output\nFAILURES\nshort test summary info\n"
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 3.34μs -> 3.41μs (2.06% slower)


def test_single_failure_case():
    # stdout contains one failure case
    tr = TestResults()
    stdout = (
        "Random output\n"
        "FAILURES\n"
        "_______ test_addition _______\n"
        "Traceback (most recent call last):\n"
        "    assert 1 + 1 == 3\n"
        "AssertionError\n"
        "short test summary info\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 6.21μs -> 6.04μs (2.80% faster)


def test_multiple_failure_cases():
    # stdout contains two failure cases
    tr = TestResults()
    stdout = (
        "Intro\n"
        "FAILURES\n"
        "_______ test_subtraction _______\n"
        "Traceback (most recent call last):\n"
        "    assert 2 - 1 == 0\n"
        "AssertionError\n"
        "_______ test_multiplication _______\n"
        "Traceback (most recent call last):\n"
        "    assert 2 * 2 == 5\n"
        "AssertionError\n"
        "short test summary info\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 7.74μs -> 7.68μs (0.768% faster)


def test_failure_case_with_extra_underscores_and_spaces():
    # stdout contains a test case name with extra underscores and spaces
    tr = TestResults()
    stdout = (
        "FAILURES\n"
        "_______   test_weird_case   _______\n"
        "Traceback (most recent call last):\n"
        "    assert False\n"
        "AssertionError\n"
        "short test summary info\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 5.62μs -> 5.70μs (1.42% slower)


# 2. Edge Test Cases


def test_empty_stdout():
    # stdout is empty
    tr = TestResults()
    stdout = ""
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 1.13μs -> 1.06μs (6.59% faster)


def test_failures_section_at_start_and_end():
    # stdout contains multiple "FAILURES" and "short test summary info"
    tr = TestResults()
    stdout = (
        "FAILURES\n"
        "_______ test_first _______\n"
        "Traceback (most recent call last):\n"
        "    assert 1 == 0\n"
        "AssertionError\n"
        "short test summary info\n"
        "Some other text\n"
        "FAILURES\n"
        "_______ test_second _______\n"
        "Traceback (most recent call last):\n"
        "    assert 2 == 0\n"
        "AssertionError\n"
        "short test summary info\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 6.33μs -> 6.05μs (4.64% faster)


def test_failures_section_without_end_marker():
    # stdout contains "FAILURES" but not "short test summary info"
    tr = TestResults()
    stdout = (
        "Some output\n"
        "FAILURES\n"
        "_______ test_missing_end _______\n"
        "Traceback (most recent call last):\n"
        "    assert True == False\n"
        "AssertionError\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 2.96μs -> 2.67μs (10.9% faster)


def test_failures_section_without_start_marker():
    # stdout contains "short test summary info" but not "FAILURES"
    tr = TestResults()
    stdout = (
        "Intro\n"
        "_______ test_missing_start _______\n"
        "Traceback (most recent call last):\n"
        "    assert 1 == 2\n"
        "AssertionError\n"
        "short test summary info\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 2.85μs -> 2.41μs (17.9% faster)


def test_case_with_no_traceback_lines():
    # stdout contains a test case but no traceback lines
    tr = TestResults()
    stdout = "FAILURES\n_______ test_empty_failure _______\nshort test summary info\n"
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 4.33μs -> 4.53μs (4.44% slower)


def test_case_with_long_name_and_special_characters():
    # stdout contains a test case with a long name and special characters
    tr = TestResults()
    test_name = "test_special_123_!@#$_case"
    stdout = (
        "FAILURES\n"
        f"_______ {test_name} _______\n"
        "Traceback (most recent call last):\n"
        "    assert 1 == 2\n"
        "AssertionError\n"
        "short test summary info\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 5.70μs -> 5.74μs (0.697% slower)


def test_case_with_multiple_lines_and_indentation():
    # stdout contains a test case with multi-line and indented failure output
    tr = TestResults()
    stdout = (
        "FAILURES\n"
        "_______ test_multiline _______\n"
        "Traceback (most recent call last):\n"
        "    line 1\n"
        "        line 2\n"
        "            line 3\n"
        "AssertionError\n"
        "short test summary info\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 6.21μs -> 6.12μs (1.47% faster)


def test_case_with_similar_names():
    # stdout contains test cases with similar names
    tr = TestResults()
    stdout = (
        "FAILURES\n"
        "_______ test_func _______\n"
        "Traceback (most recent call last):\n"
        "    assert 1 == 2\n"
        "AssertionError\n"
        "_______ test_func_extra _______\n"
        "Traceback (most recent call last):\n"
        "    assert 3 == 4\n"
        "AssertionError\n"
        "short test summary info\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 7.28μs -> 7.30μs (0.260% slower)


def test_case_with_leading_and_trailing_whitespace():
    # stdout contains test case names with leading/trailing whitespace
    tr = TestResults()
    stdout = (
        "FAILURES\n"
        "_______   test_leading_trailing   _______\n"
        "Traceback (most recent call last):\n"
        "    assert False\n"
        "AssertionError\n"
        "short test summary info\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 5.70μs -> 5.64μs (1.06% faster)


# 3. Large Scale Test Cases


def test_large_number_of_failures():
    # stdout contains many failure cases (up to 100)
    tr = TestResults()
    num_cases = 100
    stdout = "FAILURES\n"
    for i in range(num_cases):
        stdout += f"_______ test_case_{i} _______\n"
        stdout += "Traceback (most recent call last):\n"
        stdout += f"    assert {i} == {i + 1}\n"
        stdout += "AssertionError\n"
    stdout += "short test summary info\n"
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 124μs -> 109μs (14.0% faster)
    for i in range(num_cases):
        key = f"test_case_{i}"


def test_large_failure_output_lines():
    # stdout contains a single failure case with many lines
    tr = TestResults()
    lines = [f"line {i}" for i in range(500)]
    stdout = (
        "FAILURES\n"
        "_______ test_big_output _______\n"
        "Traceback (most recent call last):\n" + "\n".join(lines) + "\n"
        "AssertionError\n"
        "short test summary info\n"
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 118μs -> 92.7μs (27.4% faster)
    for i in range(500):
        pass


def test_large_stdout_with_irrelevant_lines():
    # stdout contains a lot of irrelevant lines before and after the failures section
    tr = TestResults()
    irrelevant_before = "\n".join([f"pre_line_{i}" for i in range(200)])
    irrelevant_after = "\n".join([f"post_line_{i}" for i in range(200)])
    stdout = (
        irrelevant_before + "\n"
        "FAILURES\n"
        "_______ test_irrelevant _______\n"
        "Traceback (most recent call last):\n"
        "    assert True is False\n"
        "AssertionError\n"
        "short test summary info\n" + irrelevant_after
    )
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 29.9μs -> 28.2μs (6.19% faster)


def test_performance_large_number_of_cases_and_lines():
    # stdout contains 50 cases each with 10 lines of output
    tr = TestResults()
    num_cases = 50
    lines_per_case = 10
    stdout = "FAILURES\n"
    for i in range(num_cases):
        stdout += f"_______ test_perf_{i} _______\n"
        stdout += "Traceback (most recent call last):\n"
        for j in range(lines_per_case):
            stdout += f"    line {j} for case {i}\n"
        stdout += "AssertionError\n"
    stdout += "short test summary info\n"
    codeflash_output = parse_test_failures_from_stdout(tr, stdout)
    result = codeflash_output  # 175μs -> 143μs (22.3% faster)
    for i in range(num_cases):
        key = f"test_perf_{i}"
        for j in range(lines_per_case):
            pass


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr945-2025-11-27T14.49.01 and push.

Codeflash Static Badge

The optimized code achieves a **15% speedup** through several targeted micro-optimizations that reduce computational overhead in the parsing loop:

**Key Optimizations:**

1. **Single-pass boundary search**: Instead of checking both conditions (`start_line != -1 and end_line != -1`) on every iteration, the optimized version uses `None` values and breaks immediately when both markers are found, eliminating redundant condition checks.

2. **Fast-path string matching**: Before calling the expensive `.startswith("_______")` method, it first checks if `line[0] == "_"`, avoiding the method call for most lines that don't start with underscores.

3. **Method lookup optimization**: Pulls `current_failure_lines.append` into a local variable to avoid repeated attribute lookups in the hot loop where failure lines are processed.

4. **Memory-efficient list management**: Uses `current_failure_lines.clear()` instead of creating new list objects (`current_failure_lines = []`), reducing object allocation pressure.

**Performance Impact:**
The optimizations show the most significant gains in large-scale scenarios:
- **Large failure sets**: 14.2% faster with 500 failures, 14.0% faster with 999 failures  
- **Large output**: 29.2% faster for single failures with 1000 lines of output
- **Complex scenarios**: 22.3% faster with 50 cases having 10 lines each

**Hot Path Context:**
Based on the function reference, `parse_test_failures_from_stdout` is called from `parse_test_results`, which appears to be part of a test optimization pipeline. The function processes pytest stdout to extract failure information, making it performance-critical when dealing with large test suites or verbose test outputs. The 15% improvement becomes meaningful when processing hundreds of test failures in CI/CD environments or during iterative code optimization workflows.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 27, 2025
@mohammedahmed18 mohammedahmed18 merged commit 168118a into feat/feedback-loop-for-unmatched-test-results Nov 27, 2025
14 of 23 checks passed
@mohammedahmed18 mohammedahmed18 deleted the codeflash/optimize-pr945-2025-11-27T14.49.01 branch November 27, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants