⚡️ Speed up function extract_packages_special_cases by 30%
#593
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 30% (0.30x) speedup for
extract_packages_special_casesinmarimo/_runtime/packages/import_error_extractors.py⏱️ Runtime :
138 microseconds→106 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 29% speedup through two key data structure optimizations:
1. Dictionary → Tuple Conversion:
The original code creates a dictionary with
.items()iteration, which has overhead from hash table operations and dynamic key-value pair creation. The optimized version uses a tuple of tuples, eliminating dictionary overhead and enabling direct iteration over a constant, immutable structure.2. Early Return Pattern:
Instead of collecting matches in a list and checking
packages if packages else Noneat the end, the optimized code returns immediately upon finding the first match. This eliminates:packages = [])packages.extend(package_names))Performance Impact by Test Case:
Why This Works:
Python tuples have lower memory overhead and faster iteration than dictionaries. For this single-item lookup table, the tuple structure is more cache-friendly and eliminates hash computation. The early return pattern is particularly effective since this function typically finds at most one match, making list accumulation wasteful.
The optimization maintains identical behavior - returning the same package list for matches and
Nonefor non-matches - while being more efficient for the common single-match use case.✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
_runtime/packages/test_import_error_extractors.py::test_extract_packages_special_cases_pandas_parquet🌀 Generated Regression Tests and Runtime
import pytest # used for our unit tests
from marimo._runtime.packages.import_error_extractors import
extract_packages_special_cases
unit tests
--------------------
Basic Test Cases
--------------------
def test_exact_special_case_match():
# Test exact match of the special case substring
msg = "Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'."
codeflash_output = extract_packages_special_cases(msg) # 1.25μs -> 826ns (51.1% faster)
def test_special_case_with_prefix_suffix():
# Test message containing the special case substring with extra text before and after
msg = "Error: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. Please install one."
codeflash_output = extract_packages_special_cases(msg) # 1.21μs -> 855ns (40.9% faster)
def test_no_special_case_match():
# Test a message with no special case substring
msg = "ModuleNotFoundError: No module named 'numpy'"
codeflash_output = extract_packages_special_cases(msg) # 984ns -> 696ns (41.4% faster)
def test_empty_message():
# Test an empty string as message
msg = ""
codeflash_output = extract_packages_special_cases(msg) # 996ns -> 653ns (52.5% faster)
--------------------
Edge Test Cases
--------------------
def test_partial_special_case_match():
# Test message that contains only part of the special case substring
msg = "Unable to find a usable engine; tried using: 'pyarrow'"
codeflash_output = extract_packages_special_cases(msg) # 994ns -> 714ns (39.2% faster)
def test_case_sensitivity():
# Test that the function is case-sensitive and does not match with different case
msg = "unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'."
codeflash_output = extract_packages_special_cases(msg) # 1.14μs -> 832ns (36.8% faster)
def test_multiple_special_case_substrings():
# Test message containing the special case substring more than once
msg = (
"Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. "
"Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'."
)
# Should only return one instance per the function's logic (no deduplication)
codeflash_output = extract_packages_special_cases(msg) # 1.25μs -> 900ns (39.3% faster)
def test_message_is_none():
# Test passing None as message (should raise TypeError)
with pytest.raises(TypeError):
extract_packages_special_cases(None) # 2.03μs -> 1.71μs (18.6% faster)
def test_message_is_not_string():
# Test passing a non-string type as message (should raise TypeError)
with pytest.raises(TypeError):
extract_packages_special_cases(12345) # 1.93μs -> 1.71μs (12.8% faster)
def test_message_with_similar_text():
# Test message that is similar but not exactly matching the special case substring
msg = "Unable to find a usable engine; tried using: 'pyarrow'."
codeflash_output = extract_packages_special_cases(msg) # 1.04μs -> 747ns (39.9% faster)
--------------------
Large Scale Test Cases
--------------------
def test_large_message_with_no_special_case():
# Test a very large message that does not contain the special case substring
msg = "Error: " + ("no engine found. " * 500)
codeflash_output = extract_packages_special_cases(msg) # 2.36μs -> 2.01μs (17.0% faster)
def test_large_message_with_special_case_embedded():
# Test a very large message with the special case substring embedded somewhere
msg = (
"Error: " * 250
+ "Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'."
+ " End of error." * 250
)
codeflash_output = extract_packages_special_cases(msg) # 1.58μs -> 1.22μs (29.5% faster)
def test_large_message_with_multiple_special_case_occurrences():
# Test a large message with multiple occurrences of the special case substring
repeated = "Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. "
msg = repeated * 20 # 20 occurrences
codeflash_output = extract_packages_special_cases(msg) # 1.29μs -> 904ns (42.6% faster)
def test_many_non_matching_messages():
# Test many different messages that do not match the special case substring
for i in range(100):
msg = f"Random error message {i}: nothing to see here."
codeflash_output = extract_packages_special_cases(msg) # 28.4μs -> 21.2μs (34.4% faster)
--------------------
Additional Robustness Tests
--------------------
def test_special_case_substring_at_start():
# Test with the special case substring at the very start of the message
msg = "Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. Something else went wrong."
codeflash_output = extract_packages_special_cases(msg) # 1.30μs -> 893ns (45.9% faster)
def test_special_case_substring_at_end():
# Test with the special case substring at the very end of the message
msg = "Some error occurred. Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'."
codeflash_output = extract_packages_special_cases(msg) # 1.27μs -> 928ns (36.5% faster)
def test_special_case_substring_with_newlines():
# Test with newlines around the special case substring
msg = "\nUnable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.\n"
codeflash_output = extract_packages_special_cases(msg) # 1.28μs -> 871ns (47.4% faster)
def test_special_case_substring_with_tabs_and_spaces():
# Test with tabs and spaces around the special case substring
msg = "\t Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. \t"
codeflash_output = extract_packages_special_cases(msg) # 1.18μs -> 871ns (35.0% faster)
def test_special_case_substring_with_unicode():⚠️ Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. 🚨"
# Test with unicode characters before and after the special case substring
msg = "
codeflash_output = extract_packages_special_cases(msg) # 1.71μs -> 1.28μs (33.6% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from future import annotations
imports
import pytest # used for our unit tests
from marimo._runtime.packages.import_error_extractors import
extract_packages_special_cases
unit tests
--------------------- Basic Test Cases ---------------------
def test_exact_special_case_match():
# Test that the exact special case substring returns the correct package
msg = "Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'."
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 1.39μs -> 951ns (46.4% faster)
def test_special_case_with_extra_text():
# Test that the function matches the special case even if extra text is present
msg = "Error! Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. Please install."
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 1.38μs -> 904ns (52.2% faster)
def test_no_special_case_match():
# Test that a message not containing the special case returns None
msg = "Some unrelated error message."
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 1.01μs -> 704ns (43.8% faster)
def test_partial_special_case_no_match():
# Test that a partial substring does not trigger extraction
msg = "Unable to find a usable engine"
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 989ns -> 698ns (41.7% faster)
--------------------- Edge Test Cases ---------------------
def test_empty_message():
# Test that an empty string returns None
msg = ""
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 971ns -> 677ns (43.4% faster)
def test_none_message():
# Test that passing None raises a TypeError (since str expected)
with pytest.raises(TypeError):
extract_packages_special_cases(None) # 1.97μs -> 1.77μs (11.3% faster)
def test_message_with_special_case_as_substring_only():
# Test that the special case substring as part of another word does not match
msg = "Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet' (partial)."
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 1.14μs -> 876ns (30.1% faster)
def test_case_sensitivity():
# Test that the function is case-sensitive
msg = "unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'."
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 1.09μs -> 868ns (25.6% faster)
def test_message_with_multiple_special_case_substrings():
# Test that multiple occurrences of the substring do not duplicate packages
msg = ("Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. "
"Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.")
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 1.29μs -> 916ns (40.8% faster)
def test_message_with_similar_but_not_exact_substring():
# Test a message that is very similar but not an exact match
msg = "Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet', or 'other'."
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 1.10μs -> 825ns (33.5% faster)
--------------------- Large Scale Test Cases ---------------------
def test_large_message_with_no_special_case():
# Test a large message with no special case substring
msg = "Error: " + ("foo bar baz " * 200)
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 2.74μs -> 2.47μs (11.0% faster)
def test_large_message_with_special_case_in_middle():
# Test a large message where the special case substring appears in the middle
prefix = "foo bar baz " * 300
suffix = " lorem ipsum " * 300
msg = prefix + "Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'." + suffix
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 4.15μs -> 3.50μs (18.6% faster)
def test_large_message_with_multiple_special_case_occurrences():
# Test a large message with the special case substring repeated many times
special = "Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'."
msg = ("foo bar baz " * 100) + (special * 10) + (" lorem ipsum " * 100)
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 2.23μs -> 1.86μs (19.9% faster)
def test_large_number_of_non_matching_messages():
# Test many different large messages that do not match
for i in range(100):
msg = f"Error {i}: " + ("not the right substring " * 10)
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 36.8μs -> 29.3μs (25.4% faster)
def test_large_number_of_matching_messages():
# Test many different large messages that do match
special = "Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'."
for i in range(50):
msg = f"Message {i}: " + ("foo bar " * 5) + special + (" baz qux " * 5)
codeflash_output = extract_packages_special_cases(msg); result = codeflash_output # 23.2μs -> 17.7μs (31.1% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from marimo._runtime.packages.import_error_extractors import extract_packages_special_cases
def test_extract_packages_special_cases():
extract_packages_special_cases("Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.")
def test_extract_packages_special_cases_2():
extract_packages_special_cases('')
🔎 Concolic Coverage Tests and Runtime
codeflash_concolic_bps3n5s8/tmpi5_exh7h/test_concolic_coverage.py::test_extract_packages_special_casescodeflash_concolic_bps3n5s8/tmpi5_exh7h/test_concolic_coverage.py::test_extract_packages_special_cases_2To edit these changes
git checkout codeflash/optimize-extract_packages_special_cases-mhv4a6ncand push.