Skip to content

Conversation

@aseembits93
Copy link
Contributor

@aseembits93 aseembits93 commented Dec 3, 2025

PR Type

Enhancement, Tests


Description

  • Expose behavioral tests as LLM tool

  • Add tool schema and execution registry

  • Provide lazy imports to avoid cycles

  • Add comprehensive tests for tool API


Diagram Walkthrough

flowchart LR
  LLM["LLM"] -- "calls tool" --> Exec["execute_tool()"]
  Exec -- "dispatch" --> Tool["run_behavioral_tests_tool()"]
  Tool -- "invoke" --> Runner["run_behavioral_tests()"]
  Runner -- "produce JUnit XML" --> Parser["parse_test_xml()"]
  Parser -- "results" --> Tool
  Tool -- "structured dict" --> LLM
Loading

File Walkthrough

Relevant files
Enhancement
__init__.py
Lazy export of verification LLM tools                                       

codeflash/verification/init.py

  • Add lazy attribute loader for LLM tools
  • Re-export tool APIs via __all__
+31/-0   
llm_tools.py
LLM tool schema and behavior test wrapper                               

codeflash/verification/llm_tools.py

  • Define JSON schema for run_behavioral_tests
  • Implement run_behavioral_tests_tool wrapper
  • Add tool registry and execution helpers
  • Map string test types to enum
+321/-0 
Tests
test_llm_tools.py
Tests for verification LLM tools interface                             

tests/test_llm_tools.py

  • Add tests for tool schema and registry
  • Validate execute_tool dispatch and errors
  • Run real pytest samples through tool
  • Test handling of failing and invalid paths
+193/-0 

@aseembits93 aseembits93 marked this pull request as draft December 3, 2025 04:12
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Codeflash Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The mapping in _test_type_from_string uses the key concolic_test instead of concolic_coverage_test (already present). If callers pass concolic_test per earlier conventions, it's fine, but if schemas or other parts expect only concolic_coverage_test, the extra alias may mask typos. Conversely, if concolic_test was not intended as an alias, this could be a bug. Validate intended enum values and align with TestType.

def _test_type_from_string(test_type_str: str) -> TestType:
    """Convert a string test type to TestType enum."""
    mapping = {
        "existing_unit_test": TestType.EXISTING_UNIT_TEST,
        "generated_regression": TestType.GENERATED_REGRESSION,
        "replay_test": TestType.REPLAY_TEST,
        "concolic_test": TestType.CONCOLIC_COVERAGE_TEST,
        "concolic_coverage_test": TestType.CONCOLIC_COVERAGE_TEST,
    }
    return mapping.get(test_type_str.lower(), TestType.EXISTING_UNIT_TEST)
Robustness

When constructing PYTHONPATH, the code appends os.pathsep + project_root_path without checking for duplication or normalizing. Also, if PYTHONPATH exists but is empty, the leading separator can occur. Consider normalizing and avoiding duplicates.

# Ensure PYTHONPATH includes project root
if "PYTHONPATH" not in test_env:
    test_env["PYTHONPATH"] = str(project_root_path)
else:
    test_env["PYTHONPATH"] += os.pathsep + str(project_root_path)
Error Handling

The broad except Exception swallows all errors and returns success: False with minimal context. Consider logging or including more structured error info (e.g., traceback) to aid debugging, and ensure sensitive paths are handled appropriately.

except Exception as e:
    return {
        "success": False,
        "total_tests": 0,
        "passed_tests": 0,
        "failed_tests": 0,
        "results": [],
        "stdout": "",
        "stderr": "",
        "error": str(e),
    }

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Align schema enum with mapper

The accepted enum in the tool schema does not include 'concolic_test', yet the
mapping supports it. This mismatch can cause user inputs to be rejected before
reaching your mapper. Align the schema enum with supported inputs to prevent
validation errors.

codeflash/verification/llm_tools.py [123-131]

 mapping = {
     "existing_unit_test": TestType.EXISTING_UNIT_TEST,
     "generated_regression": TestType.GENERATED_REGRESSION,
     "replay_test": TestType.REPLAY_TEST,
     "concolic_test": TestType.CONCOLIC_COVERAGE_TEST,
     "concolic_coverage_test": TestType.CONCOLIC_COVERAGE_TEST,
 }
+...
+RUN_BEHAVIORAL_TESTS_TOOL_SCHEMA = {
+    "type": "function",
+    "function": {
+        "name": "run_behavioral_tests",
+        "description": (
+            "Run behavioral tests to verify code correctness. "
+            "This executes test files using pytest or unittest and returns detailed results "
+            "including pass/fail status, runtime information, and any errors encountered."
+        ),
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "test_files": {
+                    "type": "array",
+                    "description": "List of test files to run",
+                    "items": {
+                        "type": "object",
+                        "properties": {
+                            "test_file_path": {
+                                "type": "string",
+                                "description": "Absolute path to the test file to run",
+                            },
+                            "test_type": {
+                                "type": "string",
+                                "enum": [
+                                    "existing_unit_test",
+                                    "generated_regression",
+                                    "replay_test",
+                                    "concolic_test",
+                                    "concolic_coverage_test",
+                                ],
+                                "default": "existing_unit_test",
+                                "description": "Type of test being run",
+                            },
+                        },
+                        "required": ["test_file_path"],
+                    },
+                },
+                ...
+            },
+            "required": ["test_files", "project_root"],
+        },
+    },
+}
Suggestion importance[1-10]: 7

__

Why: Correctly identifies a mismatch: _test_type_from_string accepts concolic_test but the tool schema enum omits it, causing premature validation failures. The proposed change is accurate and improves usability, though not critical to core functionality.

Medium
General
Prevent PYTHONPATH duplication, ensure precedence

Appending to an existing PYTHONPATH without checking for duplicates can grow
unbounded across calls and may break module resolution order. Prepend the project
root if not already present to ensure it takes precedence and avoid duplication.

codeflash/verification/llm_tools.py [191-196]

-# Ensure PYTHONPATH includes project root
-if "PYTHONPATH" not in test_env:
-    test_env["PYTHONPATH"] = str(project_root_path)
-else:
-    test_env["PYTHONPATH"] += os.pathsep + str(project_root_path)
+# Ensure PYTHONPATH includes project root once, and with precedence
+current_pp = test_env.get("PYTHONPATH", "")
+pp_parts = current_pp.split(os.pathsep) if current_pp else []
+project_root_str = str(project_root_path)
+if project_root_str not in pp_parts:
+    test_env["PYTHONPATH"] = os.pathsep.join([project_root_str] + pp_parts) if pp_parts else project_root_str
Suggestion importance[1-10]: 6

__

Why: Sensible enhancement to avoid unbounded PYTHONPATH growth and ensure project root precedence. It’s a maintainability/robustness improvement; impact is moderate and the code change aligns with the existing snippet.

Low
Normalize subprocess output to text

Accessing 'process.stdout' and 'process.stderr' assumes they are strings; they may
be bytes or None depending on how the subprocess was run. Normalize to string to
avoid type issues for JSON serialization and downstream consumers.

codeflash/verification/llm_tools.py [248-258]

+def _to_text(s: Any) -> str:
+    if s is None:
+        return ""
+    return s.decode("utf-8", errors="replace") if isinstance(s, (bytes, bytearray)) else str(s)
+
 return {
     "success": True,
     "total_tests": len(test_results),
     "passed_tests": passed_count,
     "failed_tests": failed_count,
     "results": results_list,
-    "stdout": process.stdout if process.stdout else "",
-    "stderr": process.stderr if process.stderr else "",
+    "stdout": _to_text(process.stdout),
+    "stderr": _to_text(process.stderr),
     "error": None,
 }
Suggestion importance[1-10]: 6

__

Why: Normalizing process.stdout/stderr to strings improves robustness for serialization and consumer consistency. While likely already strings, handling bytes/None is a reasonable defensive improvement without altering behavior.

Low

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Dec 3, 2025

⚡️ Codeflash found optimizations for this PR

📄 128% (1.28x) speedup for _test_type_from_string in codeflash/verification/llm_tools.py

⏱️ Runtime : 3.07 milliseconds 1.34 milliseconds (best of 128 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch feat/behavior-test-as-tool).

Static Badge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants