⚡️ Speed up function `compare_test_results` by 655% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`) #947

codeflash-ai · 2025-11-27T18:26:58Z

⚡️ This pull request contains optimizations for PR #945

If you approve this dependent PR, these changes will be merged into the original PR branch feat/feedback-loop-for-unmatched-test-results.

This PR will be automatically closed if the original PR is merged.

📄 655% (6.55x) speedup for `compare_test_results` in `codeflash/verification/equivalence.py`

⏱️ Runtime : 90.0 milliseconds → 11.9 milliseconds (best of 5 runs)

📝 Explanation and details

The optimizations deliver a 655% speedup by addressing several key bottlenecks identified in the line profiler results:

Key Optimizations:

Caching parsed CST modules - Added @lru_cache to InvocationId._parse_module_by_path() to avoid repeatedly parsing the same test files. The original code spent 68% of time in cst.parse_module(), which is now cached for repeated calls.
Single-pass AST traversal - Combined class and function search into one loop with early returns, eliminating redundant iterations through module_node.body.
Optimized dictionary lookups - In TestResults.get_by_unique_invocation_loop_id(), replaced the try/except pattern with direct .get() calls to avoid exception overhead.
Reordered type checks in comparator - Moved cheap, common types (str, int, bool) to the front of isinstance checks, allowing ~75% of comparisons to exit early without checking expensive types like numpy arrays.
Eliminated generator allocation - Replaced all() comprehensions with direct for-loops that can break early, avoiding unnecessary iteration over remaining elements.
Cached function references - In the hot loop of compare_test_results(), cached method lookups like get_by_unique_invocation_loop_id to avoid repeated attribute resolution.

Impact on Hot Paths:

Based on the function references, this code is called in the critical path of run_optimized_candidate(), which executes during performance testing of optimization candidates. The speedup means:

Faster validation of test result equivalence between original and optimized code
Reduced overhead when processing many test results with repeated file parsing
More efficient comparison of complex data structures in test outputs

The optimizations are particularly effective for workloads with many test invocations on the same files and complex return value comparisons, which matches the typical usage pattern shown in the function references.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 26 Passed
🌀 Generated Regression Tests	🔘 None Found
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	79.1%

⚙️ Existing Unit Tests and Runtime

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`test_codeflash_capture.py::test_codeflash_capture_basic`	2.68ms	284μs	840%✅
`test_codeflash_capture.py::test_codeflash_capture_multiple_helpers`	1.77ms	313μs	464%✅
`test_codeflash_capture.py::test_codeflash_capture_recursive`	2.27ms	217μs	943%✅
`test_codeflash_capture.py::test_codeflash_capture_super_init`	2.69ms	288μs	833%✅
`test_codeflash_capture.py::test_instrument_codeflash_capture_and_run_tests`	663μs	350μs	89.1%✅
`test_codeflash_capture.py::test_instrument_codeflash_capture_and_run_tests_2`	3.89ms	47.6μs	8080%✅
`test_comparator.py::test_compare_results_fn`	120μs	81.6μs	47.5%✅
`test_instrument_all_and_run.py::test_bubble_sort_behavior_results`	11.3ms	1.68ms	569%✅
`test_instrument_all_and_run.py::test_classmethod_full_instrumentation`	11.6ms	1.79ms	546%✅
`test_instrument_all_and_run.py::test_method_full_instrumentation`	22.8ms	3.48ms	556%✅
`test_instrument_all_and_run.py::test_staticmethod_full_instrumentation`	11.4ms	1.72ms	564%✅
`test_instrumentation_run_results_aiservice.py::test_class_method_full_instrumentation`	6.22ms	426μs	1358%✅
`test_instrumentation_run_results_aiservice.py::test_class_method_test_instrumentation_only`	6.31ms	411μs	1436%✅
`test_pickle_patcher.py::test_run_and_parse_picklepatch`	6.24ms	814μs	667%✅

To edit these changes git checkout codeflash/optimize-pr945-2025-11-27T18.26.50 and push.

The optimizations deliver a **655% speedup** by addressing several key bottlenecks identified in the line profiler results: **Key Optimizations:** 1. **Caching parsed CST modules** - Added `@lru_cache` to `InvocationId._parse_module_by_path()` to avoid repeatedly parsing the same test files. The original code spent 68% of time in `cst.parse_module()`, which is now cached for repeated calls. 2. **Single-pass AST traversal** - Combined class and function search into one loop with early returns, eliminating redundant iterations through `module_node.body`. 3. **Optimized dictionary lookups** - In `TestResults.get_by_unique_invocation_loop_id()`, replaced the try/except pattern with direct `.get()` calls to avoid exception overhead. 4. **Reordered type checks in comparator** - Moved cheap, common types (str, int, bool) to the front of isinstance checks, allowing ~75% of comparisons to exit early without checking expensive types like numpy arrays. 5. **Eliminated generator allocation** - Replaced `all()` comprehensions with direct for-loops that can break early, avoiding unnecessary iteration over remaining elements. 6. **Cached function references** - In the hot loop of `compare_test_results()`, cached method lookups like `get_by_unique_invocation_loop_id` to avoid repeated attribute resolution. **Impact on Hot Paths:** Based on the function references, this code is called in the critical path of `run_optimized_candidate()`, which executes during performance testing of optimization candidates. The speedup means: - Faster validation of test result equivalence between original and optimized code - Reduced overhead when processing many test results with repeated file parsing - More efficient comparison of complex data structures in test outputs The optimizations are particularly effective for workloads with many test invocations on the same files and complex return value comparisons, which matches the typical usage pattern shown in the function references.

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 27, 2025

codeflash-ai bot mentioned this pull request Nov 27, 2025

[FEAT] Feedback loop for unmatched test results #945

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `compare_test_results` by 655% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`) #947

⚡️ Speed up function `compare_test_results` by 655% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`) #947

Uh oh!

codeflash-ai bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function compare_test_results by 655% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #947

Are you sure you want to change the base?

⚡️ Speed up function compare_test_results by 655% in PR #945 (feat/feedback-loop-for-unmatched-test-results) #947

Uh oh!

Conversation

codeflash-ai bot commented Nov 27, 2025

⚡️ This pull request contains optimizations for PR #945

📄 655% (6.55x) speedup for compare_test_results in codeflash/verification/equivalence.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `compare_test_results` by 655% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`) #947

⚡️ Speed up function `compare_test_results` by 655% in PR #945 (`feat/feedback-loop-for-unmatched-test-results`) #947

📄 655% (6.55x) speedup for `compare_test_results` in `codeflash/verification/equivalence.py`