past trials for code repair #954

aseembits93 · 2025-12-05T15:21:22Z

PR Type

Enhancement, Tests

Description

Add AI-powered code repair workflow
Track candidate source across pipeline
Return detailed test comparison diffs
Parse pytest failures from stdout

Diagram Walkthrough

flowchart LR
  cand["OptimizedCandidate (source tagged)"]
  proc["CandidateProcessor"]
  run["Run candidate tests"]
  cmp["Compare results (match, diffs)"]
  repairReq["AIServiceCodeRepairRequest"]
  repairAPI["/code_repair (AiServiceClient)"]
  queue["Re-queue repaired candidate"]
  refine["/refine pipeline"]

  cand -- "enqueue" --> proc
  proc -- "dequeue" --> run
  run -- "compare" --> cmp
  cmp -- "mismatch <=50%" --> repairReq
  repairReq -- "submit" --> repairAPI
  repairAPI -- "candidate (REPAIR)" --> queue
  queue -- "enqueue" --> proc
  cmp -- "match" --> refine

File Walkthrough

Relevant files

Enhancement

5 files

aiservice.py `Add code-repair API and candidate source tagging`	+63/-15
models.py `Add TestDiffs, repair request, and source enum`	+76/-0
function_optimizer.py `Integrate repair queue, dedupe helpers, and source-aware flow`	+154/-39
equivalence.py `Return (match, diffs) and capture pytest errors`	+58/-29
parse_test_output.py `Parse pytest failures per test from stdout`	+60/-0

Tests

5 files

test_codeflash_capture.py `Adapt tests to new compare_test_results API`	+305/-6
test_comparator.py `Update comparator tests for tuple return`	+14/-7
test_instrument_all_and_run.py `Adjust assertions to match new API`	+16/-8
test_instrumentation_run_results_aiservice.py `Update assertions and add match unpacking`	+6/-4
test_pickle_patcher.py `Align pickle patcher tests with new comparison`	+4/-4

The optimized code achieves a **15% speedup** through several targeted micro-optimizations that reduce computational overhead in the parsing loop: **Key Optimizations:** 1. **Single-pass boundary search**: Instead of checking both conditions (`start_line != -1 and end_line != -1`) on every iteration, the optimized version uses `None` values and breaks immediately when both markers are found, eliminating redundant condition checks. 2. **Fast-path string matching**: Before calling the expensive `.startswith("_______")` method, it first checks if `line[0] == "_"`, avoiding the method call for most lines that don't start with underscores. 3. **Method lookup optimization**: Pulls `current_failure_lines.append` into a local variable to avoid repeated attribute lookups in the hot loop where failure lines are processed. 4. **Memory-efficient list management**: Uses `current_failure_lines.clear()` instead of creating new list objects (`current_failure_lines = []`), reducing object allocation pressure. **Performance Impact:** The optimizations show the most significant gains in large-scale scenarios: - **Large failure sets**: 14.2% faster with 500 failures, 14.0% faster with 999 failures - **Large output**: 29.2% faster for single failures with 1000 lines of output - **Complex scenarios**: 22.3% faster with 50 cases having 10 lines each **Hot Path Context:** Based on the function reference, `parse_test_failures_from_stdout` is called from `parse_test_results`, which appears to be part of a test optimization pipeline. The function processes pytest stdout to extract failure information, making it performance-critical when dealing with large test suites or verbose test outputs. The 15% improvement becomes meaningful when processing hundreds of test failures in CI/CD environments or during iterative code optimization workflows.

…25-11-27T14.49.01 ⚡️ Speed up function `parse_test_failures_from_stdout` by 16% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`)

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

…edback-loop-for-unmatched-test-results

CLAassistant · 2025-12-05T15:21:29Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Codeflash Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

github-actions · 2025-12-05T15:22:40Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Robustness The pytest failure parsing relies on regex and section heuristics; edge cases (different pytest verbosity/themes, locale, plugins altering output) may break parsing or misassociate failures. Consider guarding against None indices, normalizing headers, and adding unit tests for varied outputs. def parse_test_failures_from_stdout(test_results: TestResults, stdout: str) -> TestResults: """Extract individual pytest test failures from stdout grouped by test case qualified name, and add them to the test results.""" lines = stdout.splitlines() start = end = None for i, line in enumerate(lines): if FAILURES_HEADER_RE.search(line.strip()): start = i break if start is None: return test_results for j in range(start + 1, len(lines)): stripped = lines[j].strip() if "short test summary info" in stripped: end = j break # any new === section === block if stripped.startswith("=") and stripped.count("=") > 3: end = j break # If no clear "end", just grap the rest of the string if end is None: end = len(lines) failure_block = lines[start:end] failures: dict[str, str] = {} current_name = None current_lines: list[str] = [] for line in failure_block: m = TEST_HEADER_RE.match(line.strip()) if m: if current_name is not None: failures[current_name] = "".join(current_lines) current_name = m.group(1) current_lines = [] elif current_name: current_lines.append(line + "\n") if current_name: failures[current_name] = "".join(current_lines) test_results.test_failures = failures return test_results Nonesafety In compare_test_results, accessing original_test_result fields before confirming both results are present can raise exceptions when either side is None. Validate both objects exist before reading attributes or add try/except around access. set(candidate_results.get_all_unique_invocation_loop_ids()) ) test_diffs: list[TestDiff] = [] did_all_timeout: bool = True for test_id in test_ids_superset: original_test_result = original_results.get_by_unique_invocation_loop_id(test_id) cdd_test_result = candidate_results.get_by_unique_invocation_loop_id(test_id) candidate_test_failures = candidate_results.test_failures original_test_failures = original_results.test_failures cdd_pytest_error = ( candidate_test_failures.get(original_test_result.id.test_fn_qualified_name(), "") if candidate_test_failures else "" ) original_pytest_error = ( original_test_failures.get(original_test_result.id.test_fn_qualified_name(), "") if original_test_failures else "" ) if cdd_test_result is not None and original_test_result is None: continue # If helper function instance_state verification is not present, that's ok. continue if ( original_test_result.verification_type and original_test_result.verification_type == VerificationType.INIT_STATE_HELPER and cdd_test_result is None ): continue if original_test_result is None or cdd_test_result is None: return False, [] did_all_timeout = did_all_timeout and original_test_result.timed_out Race/State CandidateProcessor mutates shared lists of futures (refinement/repair) and internal state across threads. Ensure thread-safety when appending to self.future_all_code_repair from other threads and when clearing these lists after wait; consider locks or confining mutations to processor.

github-actions · 2025-12-05T15:23:37Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Guard against None before access Accessing `original_test_result.id` before ensuring `original_test_result` is not None can raise an AttributeError. Reorder logic to guard against None before building `cdd_pytest_error`/`original_pytest_error`. Also compute failure messages only after both results are validated to exist. codeflash/verification/equivalence.py [16-43] def compare_test_results(original_results: TestResults, candidate_results: TestResults) -> tuple[bool, list[TestDiff]]: - # This is meant to be only called with test results for the first loop index if len(original_results) == 0 or len(candidate_results) == 0: - return False, [] # empty test results are not equal + return False, [] original_recursion_limit = sys.getrecursionlimit() if original_recursion_limit < INCREASED_RECURSION_LIMIT: - sys.setrecursionlimit(INCREASED_RECURSION_LIMIT) # Increase recursion limit to avoid RecursionError + sys.setrecursionlimit(INCREASED_RECURSION_LIMIT) test_ids_superset = original_results.get_all_unique_invocation_loop_ids().union( set(candidate_results.get_all_unique_invocation_loop_ids()) ) test_diffs: list[TestDiff] = [] did_all_timeout: bool = True for test_id in test_ids_superset: original_test_result = original_results.get_by_unique_invocation_loop_id(test_id) cdd_test_result = candidate_results.get_by_unique_invocation_loop_id(test_id) - candidate_test_failures = candidate_results.test_failures - original_test_failures = original_results.test_failures - cdd_pytest_error = ( - candidate_test_failures.get(original_test_result.id.test_fn_qualified_name(), "") - if candidate_test_failures - else "" - ) - original_pytest_error = ( - original_test_failures.get(original_test_result.id.test_fn_qualified_name(), "") - if original_test_failures - else "" - ) if cdd_test_result is not None and original_test_result is None: continue - # If helper function instance_state verification is not present, that's ok. continue if ( - original_test_result.verification_type + original_test_result is not None and original_test_result.verification_type == VerificationType.INIT_STATE_HELPER and cdd_test_result is None ): continue if original_test_result is None or cdd_test_result is None: return False, [] + + candidate_test_failures = candidate_results.test_failures + original_test_failures = original_results.test_failures + test_fn_name = original_test_result.id.test_fn_qualified_name() + cdd_pytest_error = candidate_test_failures.get(test_fn_name, "") if candidate_test_failures else "" + original_pytest_error = original_test_failures.get(test_fn_name, "") if original_test_failures else "" + did_all_timeout = did_all_timeout and original_test_result.timed_out if original_test_result.timed_out: continue superset_obj = False Suggestion importance[1-10]: 8 __ Why: The code accesses `original_test_result.id` before verifying `original_test_result` is not None, which can raise an AttributeError; reordering the guards is accurate and prevents a real bug without changing behavior.	Medium
Possible issue	Fix Enum base class order The Enum declaration order is invalid; in Python, StrEnum-like patterns must inherit from str first, then Enum. Reverse the base class order to prevent runtime/type errors when creating or comparing enum values. codeflash/models/models.py [387-391] -class OptimizedCandidateSource(enum.Enum, str): +class OptimizedCandidateSource(str, enum.Enum): OPTIMIZE = "OPTIMIZE" OPTIMIZE_LP = "OPTIMIZE_LP" REFINE = "REFINE" REPAIR = "REPAIR" Suggestion importance[1-10]: 7 __ Why: Inheriting `enum.Enum, str` is unconventional and may break string behavior for enum members; switching to `str, enum.Enum` is a small but correct improvement to ensure expected comparisons for `OptimizedCandidateSource`.	Medium
General	Harden pytest failure parsing The parser assumes pytest "FAILURES" section and underscores headers always exist and match once; this can misparse or raise on alternative outputs. Add basic guards and normalize header matching to avoid incorrect slicing and ensure `test_failures` remains a dict even on unexpected formats. codeflash/verification/parse_test_output.py [519-567] -FAILURES_HEADER_RE = re.compile(r"=+ FAILURES =+") +FAILURES_HEADER_RE = re.compile(r"=+\sFAILURES\s=+") TEST_HEADER_RE = re.compile(r"_{3,}\s(.?)\s*_{3,}$") def parse_test_failures_from_stdout(test_results: TestResults, stdout: str) -> TestResults: - """Extract individual pytest test failures from stdout grouped by test case qualified name, and add them to the test results.""" + failures: dict[str, str] = {} + if not stdout: + test_results.test_failures = failures + return test_results + lines = stdout.splitlines() - start = end = None - + start = None for i, line in enumerate(lines): if FAILURES_HEADER_RE.search(line.strip()): start = i break - if start is None: + test_results.test_failures = failures return test_results + end = None for j in range(start + 1, len(lines)): stripped = lines[j].strip() if "short test summary info" in stripped: end = j break - # any new === section === block if stripped.startswith("=") and stripped.count("=") > 3: end = j break - - # If no clear "end", just grap the rest of the string if end is None: end = len(lines) failure_block = lines[start:end] - - failures: dict[str, str] = {} current_name = None current_lines: list[str] = [] - for line in failure_block: m = TEST_HEADER_RE.match(line.strip()) if m: - if current_name is not None: + if current_name is not None and current_lines: failures[current_name] = "".join(current_lines) - current_name = m.group(1) current_lines = [] - elif current_name: + elif current_name is not None: current_lines.append(line + "\n") - - if current_name: + if current_name is not None and current_lines: failures[current_name] = "".join(current_lines) test_results.test_failures = failures return test_results Suggestion importance[1-10]: 6 __ Why: The proposed normalization and guards make the failure parsing more robust to format variations; while not critical, it reduces parsing errors and keeps `test_failures` consistent.	Low

mohammedahmed18 and others added 30 commits November 27, 2025 16:18

quick and dirty

5830a70

safter

3e0440b

Merge pull request #946 from codeflash-ai/codeflash/optimize-pr945-20…

168118a

…25-11-27T14.49.01 ⚡️ Speed up function `parse_test_failures_from_stdout` by 16% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`)

fix tests

a7f8816

linting

4e9f894

did it pass ?

1c9abaf

revert test optimization

0b2d894

cleaner

ecfa89f

test: try to fix the candidate and see if the diff is empty

6ea2545

capture all test discrepancies

fe68772

do the repair in main loop

ed39ec8

todo write backend endpoint

142da4c

need to test now

5a7c356

Merge branch 'feat/feedback-loop-for-unmatched-test-results' of githu…

8a28d0d

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

works, figure out logging

5ed5dfc

local db logging

fe33c82

ready to run experiments

83814be

logging fix

0325444

handle test class methods for the test diff

9f7ed90

Merge branch 'feat/feedback-loop-for-unmatched-test-results' of githu…

1ddc87c

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

codeflash suggestion

6060ffb

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

safer parsing

1120d64

better parsing for pytest stdout

c2e037a

Merge branch 'feat/feedback-loop-for-unmatched-test-results' of githu…

5703889

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

temp logging

bd1ebf4

working version

c1ae81e

Merge branch 'feat/feedback-loop-for-unmatched-test-results' of githu…

696448c

…b.com:codeflash-ai/codeflash into feat/feedback-loop-for-unmatched-test-results

fix override candidate after the code repair

97f2426

typo

6a9390c

mohammedahmed18 added 8 commits December 4, 2025 11:38

enhancements and cleanups

b93fd34

handle repaired code is exact same as the original code

bcc19f7

linting issue and handle file name in code_print for repaired candidate

79387c3

fixes

4c13bb9

Merge branch 'main' of github.com:codeflash-ai/codeflash into feat/fe…

12dc7e1

…edback-loop-for-unmatched-test-results

small changes

d66d2ce

add code repairs to the queue

b4474f3

optimization source

726405b

github-actions bot added the Review effort 4/5 label Dec 5, 2025

fix for equivalence

cfb52c0

quick and dirty

a2464d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

past trials for code repair #954

past trials for code repair #954

Uh oh!

aseembits93 commented Dec 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

CLAassistant commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

past trials for code repair #954

Are you sure you want to change the base?

past trials for code repair #954

Uh oh!

Conversation

aseembits93 commented Dec 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

CLAassistant commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Dec 5, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aseembits93 commented Dec 5, 2025 •

edited by github-actions bot

Loading