Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 5, 2025

📄 22% (0.22x) speedup for sanitize_relationship_for_cypher in mem0/memory/utils.py

⏱️ Runtime : 2.59 milliseconds 2.11 milliseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 22% speedup by eliminating redundant work and leveraging more efficient Python operations. Here are the key optimizations:

1. Moved expensive setup outside the function: The original code recreated the 37-key char_map dictionary on every function call (31.4% of runtime). The optimized version moves this to module level, eliminating this overhead entirely.

2. Pre-compiled regex pattern: Instead of compiling r"_+" on each call (24.2% of original runtime), the pattern is compiled once at module level as _re_sub_underscores.

3. Optimized character replacement strategy:

  • Multi-character keys (like "...") are handled first with str.replace() to avoid conflicts
  • Single-character replacements use str.translate() with a pre-built translation table, which is significantly faster than iterating through individual str.replace() calls

4. Reduced iteration overhead: The original code performed 37 individual str.replace() operations (23% of runtime). The optimized version does just 2 multi-character replacements plus one efficient translate() call.

Performance characteristics by test type:

  • Small strings with few special chars: 100-400% faster due to eliminated setup overhead
  • Large strings with no special chars: 88-117% faster, benefiting from reduced function overhead
  • Large strings with many special chars: Mixed results (some 5-45% faster, others 16-36% slower) as the translation approach trades setup cost for per-character efficiency

The optimization is most beneficial for typical use cases with shorter strings and moderate special character density, which appear to be the common case based on the test results showing consistent 2-4x speedups for basic scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 128 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import re
import string  # used for large scale test cases

# imports
import pytest  # used for our unit tests
from mem0.memory.utils import sanitize_relationship_for_cypher

# unit tests

# 1. BASIC TEST CASES

def test_basic_ascii_no_special_chars():
    # Should return input unchanged if no special chars
    codeflash_output = sanitize_relationship_for_cypher("FRIEND") # 6.71μs -> 2.23μs (202% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent_child") # 4.57μs -> 2.04μs (124% faster)
    codeflash_output = sanitize_relationship_for_cypher("worksWith") # 3.58μs -> 1.13μs (217% faster)

def test_basic_single_special_char():
    # Should replace single special char
    codeflash_output = sanitize_relationship_for_cypher("parent-child") # 6.23μs -> 2.38μs (162% faster)
    codeflash_output = sanitize_relationship_for_cypher("friend!") # 4.86μs -> 2.03μs (139% faster)
    codeflash_output = sanitize_relationship_for_cypher("loves?") # 3.94μs -> 1.29μs (204% faster)
    codeflash_output = sanitize_relationship_for_cypher("has#tag") # 4.13μs -> 1.67μs (147% faster)

def test_basic_multiple_special_chars():
    # Should replace all mapped special chars
    codeflash_output = sanitize_relationship_for_cypher("A&B") # 6.91μs -> 2.75μs (151% faster)
    codeflash_output = sanitize_relationship_for_cypher("A/B/C") # 4.91μs -> 1.97μs (150% faster)
    codeflash_output = sanitize_relationship_for_cypher("A|B|C") # 4.10μs -> 1.39μs (195% faster)
    codeflash_output = sanitize_relationship_for_cypher("A+B") # 3.58μs -> 1.04μs (244% faster)
    codeflash_output = sanitize_relationship_for_cypher("A=B") # 3.49μs -> 980ns (256% faster)

def test_basic_unicode_punctuation():
    # Should replace unicode punctuation
    codeflash_output = sanitize_relationship_for_cypher("朋友。") # 9.02μs -> 4.93μs (82.9% faster)
    codeflash_output = sanitize_relationship_for_cypher("你好,世界!") # 6.24μs -> 2.85μs (119% faster)
    codeflash_output = sanitize_relationship_for_cypher("《关系》") # 5.34μs -> 2.25μs (137% faster)
    codeflash_output = sanitize_relationship_for_cypher("(测试)") # 4.53μs -> 1.45μs (212% faster)

def test_basic_ellipsis():
    # Should replace both ASCII and unicode ellipsis
    codeflash_output = sanitize_relationship_for_cypher("wait...") # 6.84μs -> 3.53μs (93.8% faster)
    codeflash_output = sanitize_relationship_for_cypher("继续…") # 5.83μs -> 2.58μs (126% faster)

def test_basic_quotes_and_apostrophe():
    # Should replace quotes and apostrophe
    codeflash_output = sanitize_relationship_for_cypher("John's friend") # 7.07μs -> 3.62μs (95.5% faster)
    codeflash_output = sanitize_relationship_for_cypher('"quoted"') # 4.96μs -> 2.43μs (104% faster)

def test_basic_brackets_and_parentheses():
    # Should replace all bracket types
    codeflash_output = sanitize_relationship_for_cypher("[A]") # 7.22μs -> 3.20μs (125% faster)
    codeflash_output = sanitize_relationship_for_cypher("{A}") # 4.69μs -> 1.79μs (161% faster)
    codeflash_output = sanitize_relationship_for_cypher("(A)") # 3.87μs -> 1.15μs (238% faster)
    codeflash_output = sanitize_relationship_for_cypher("【关系】") # 6.38μs -> 3.46μs (84.1% faster)

def test_basic_strip_leading_trailing_underscores():
    codeflash_output = sanitize_relationship_for_cypher("!A!") # 6.73μs -> 2.92μs (131% faster)

def test_basic_multiple_consecutive_special_chars():
    # Should collapse multiple underscores
    codeflash_output = sanitize_relationship_for_cypher("A!!!B") # 7.23μs -> 3.26μs (122% faster)
    codeflash_output = sanitize_relationship_for_cypher("A&&&B") # 4.99μs -> 2.15μs (132% faster)
    codeflash_output = sanitize_relationship_for_cypher("A...B") # 4.01μs -> 2.14μs (87.4% faster)
    codeflash_output = sanitize_relationship_for_cypher("A……B") # 4.95μs -> 1.87μs (164% faster)

# 2. EDGE TEST CASES

def test_edge_empty_string():
    # Should return empty string
    codeflash_output = sanitize_relationship_for_cypher("") # 5.51μs -> 1.32μs (318% faster)

def test_edge_only_special_chars():
    # Should replace all and collapse underscores
    codeflash_output = sanitize_relationship_for_cypher("!!!") # 7.34μs -> 3.11μs (136% faster)
    codeflash_output = sanitize_relationship_for_cypher("???") # 4.93μs -> 1.85μs (166% faster)
    codeflash_output = sanitize_relationship_for_cypher("...") # 3.77μs -> 1.77μs (112% faster)
    codeflash_output = sanitize_relationship_for_cypher("()") # 4.98μs -> 1.30μs (282% faster)
    codeflash_output = sanitize_relationship_for_cypher("【】") # 4.39μs -> 1.09μs (303% faster)
    codeflash_output = sanitize_relationship_for_cypher("《》") # 3.98μs -> 906ns (339% faster)
    codeflash_output = sanitize_relationship_for_cypher("''") # 3.91μs -> 1.07μs (266% faster)
    codeflash_output = sanitize_relationship_for_cypher('""') # 3.64μs -> 965ns (277% faster)
    codeflash_output = sanitize_relationship_for_cypher("&&&&") # 4.62μs -> 1.83μs (152% faster)
    codeflash_output = sanitize_relationship_for_cypher("////") # 4.24μs -> 1.49μs (184% faster)

def test_edge_special_chars_only_with_spaces():
    # Should preserve spaces
    codeflash_output = sanitize_relationship_for_cypher("! !") # 7.00μs -> 2.75μs (155% faster)
    codeflash_output = sanitize_relationship_for_cypher("!  !") # 4.86μs -> 1.68μs (189% faster)
    codeflash_output = sanitize_relationship_for_cypher("… …") # 5.19μs -> 1.98μs (162% faster)

def test_edge_unmapped_special_characters():
    # Should leave unmapped chars unchanged
    codeflash_output = sanitize_relationship_for_cypher("A-B") # 5.89μs -> 1.98μs (197% faster)
    codeflash_output = sanitize_relationship_for_cypher("A_B") # 4.13μs -> 1.33μs (211% faster)
    codeflash_output = sanitize_relationship_for_cypher("A~B") # 3.30μs -> 748ns (341% faster)
    codeflash_output = sanitize_relationship_for_cypher("A`B") # 3.23μs -> 615ns (425% faster)

def test_edge_mixed_mapped_and_unmapped():
    # Should only replace mapped chars
    codeflash_output = sanitize_relationship_for_cypher("A-B!") # 6.47μs -> 2.44μs (165% faster)
    codeflash_output = sanitize_relationship_for_cypher("A_B?") # 4.58μs -> 1.68μs (173% faster)
    codeflash_output = sanitize_relationship_for_cypher("A~B!") # 3.79μs -> 1.06μs (256% faster)
    codeflash_output = sanitize_relationship_for_cypher("A`B!") # 3.46μs -> 804ns (331% faster)

def test_edge_collapse_multiple_underscores():
    # Should collapse multiple underscores from adjacent replacements
    codeflash_output = sanitize_relationship_for_cypher("A!!B!!C") # 7.56μs -> 3.68μs (105% faster)
    codeflash_output = sanitize_relationship_for_cypher("A!!!B!!!C") # 5.69μs -> 2.90μs (96.0% faster)
    codeflash_output = sanitize_relationship_for_cypher("A__B") # 3.58μs -> 1.04μs (245% faster)

def test_edge_leading_trailing_special_chars():
    # Should strip leading/trailing underscores after replacement
    codeflash_output = sanitize_relationship_for_cypher("!A") # 6.55μs -> 2.54μs (158% faster)
    codeflash_output = sanitize_relationship_for_cypher("A!") # 4.36μs -> 1.30μs (237% faster)
    codeflash_output = sanitize_relationship_for_cypher("!!!A!!!") # 5.03μs -> 2.58μs (94.9% faster)

def test_edge_non_string_input():
    # Should raise TypeError if input is not str
    with pytest.raises(AttributeError):
        sanitize_relationship_for_cypher(None) # 3.63μs -> 1.46μs (149% faster)
    with pytest.raises(AttributeError):
        sanitize_relationship_for_cypher(123) # 2.53μs -> 912ns (177% faster)
    with pytest.raises(AttributeError):
        sanitize_relationship_for_cypher(["A", "B"]) # 2.05μs -> 694ns (195% faster)

def test_edge_unicode_mixed_with_ascii():
    # Should handle mixed unicode and ascii
    codeflash_output = sanitize_relationship_for_cypher("关系!") # 8.96μs -> 4.80μs (86.6% faster)
    codeflash_output = sanitize_relationship_for_cypher("关系(A)!") # 6.61μs -> 3.06μs (116% faster)
    codeflash_output = sanitize_relationship_for_cypher("关系…A...") # 5.22μs -> 3.10μs (68.5% faster)

def test_edge_repeated_replacements():
    # Should not double-replace already replaced substrings
    codeflash_output = sanitize_relationship_for_cypher("A!!!...!!!B") # 8.93μs -> 5.31μs (68.1% faster)

def test_edge_strip_underscores_only_output():
    # Should strip underscores if output would otherwise be only underscores
    codeflash_output = sanitize_relationship_for_cypher("!!!") # 7.16μs -> 3.04μs (135% faster)
    codeflash_output = sanitize_relationship_for_cypher("...") # 4.41μs -> 1.98μs (123% faster)

# 3. LARGE SCALE TEST CASES

def test_large_long_string_many_special_chars():
    # Should handle long string with many special chars
    input_str = "A!" * 500  # 500 'A!' pairs
    expected = "_".join(["A_bang"] * 500)
    codeflash_output = sanitize_relationship_for_cypher(input_str) # 112μs -> 136μs (17.1% slower)

def test_large_long_string_no_special_chars():
    # Should handle long string with no special chars
    input_str = "A" * 1000
    codeflash_output = sanitize_relationship_for_cypher(input_str) # 18.7μs -> 8.64μs (117% faster)

def test_large_all_special_chars():
    # Should handle string of all mapped special chars
    all_special = "".join([
        "...", "…", "。", ",", ";", ":", "!", "?", "(", ")", "【", "】", "《", "》",
        "'", '"', "\\", "/", "|", "&", "=", "+", "*", "^", "%", "$", "#", "@", "!", "?", "(", ")", "[", "]", "{", "}", "<", ">"
    ])
    expected = "_".join([
        "ellipsis", "ellipsis", "period", "comma", "semicolon", "colon", "exclamation", "question",
        "lparen", "rparen", "lbracket", "rbracket", "langle", "rangle",
        "apostrophe", "quote", "backslash", "slash", "pipe", "ampersand", "equals", "plus", "asterisk", "caret", "percent", "dollar", "hash", "at", "bang", "question", "lparen", "rparen", "lbracket", "rbracket", "lbrace", "rbrace", "langle", "rangle"
    ])
    codeflash_output = sanitize_relationship_for_cypher(all_special) # 21.4μs -> 12.9μs (65.5% faster)

def test_large_mixed_ascii_and_special():
    # Should handle large string with mixed ascii and mapped special chars
    input_str = ("A!B?" * 250)  # 1000 chars
    expected = "_".join(["A_bang_B_question"] * 250)
    codeflash_output = sanitize_relationship_for_cypher(input_str) # 119μs -> 120μs (1.35% slower)

def test_large_randomized_special_chars():
    # Should handle a string of random mapped special chars
    chars = ["!", "?", "&", "$", "#", "@", "/", "|", "+", "=", "*", "^", "%"]
    input_str = "".join(chars * 75)  # 975 chars
    expected = "_".join([
        "bang", "question", "ampersand", "dollar", "hash", "at", "slash", "pipe", "plus", "equals", "asterisk", "caret", "percent"
    ] * 75)
    codeflash_output = sanitize_relationship_for_cypher(input_str) # 165μs -> 159μs (3.38% faster)

def test_large_unicode_and_ascii_mixed():
    # Should handle large string with mixed unicode and ascii mapped chars
    chars = ["。", ",", ";", ":", "!", "?", "(", ")", "【", "】", "《", "》"]
    input_str = "".join(chars * 80)  # 960 chars
    expected = "_".join([
        "period", "comma", "semicolon", "colon", "exclamation", "question",
        "lparen", "rparen", "lbracket", "rbracket", "langle", "rangle"
    ] * 80)
    codeflash_output = sanitize_relationship_for_cypher(input_str) # 213μs -> 155μs (37.4% faster)

def test_large_string_with_spaces_and_special_chars():
    # Should preserve spaces in large string
    input_str = ("A! B? " * 100)  # 600 chars
    expected = " ".join(["A_bang", "B_question"] * 100)
    codeflash_output = sanitize_relationship_for_cypher(input_str) # 56.3μs -> 64.2μs (12.3% slower)

def test_large_string_with_unmapped_special_chars():
    # Should leave unmapped chars unchanged in large string
    input_str = ("A-B_" * 200)  # 1000 chars
    codeflash_output = sanitize_relationship_for_cypher(input_str) # 29.6μs -> 21.7μs (36.2% faster)

def test_large_string_all_printable_ascii():
    # Should handle all printable ascii characters
    input_str = string.printable[:1000]  # up to 1000 chars
    # Only mapped chars replaced, others unchanged
    codeflash_output = sanitize_relationship_for_cypher(input_str); output = codeflash_output # 16.0μs -> 12.6μs (27.2% faster)
    for old, new in {
        "...": "_ellipsis_",
        "'": "_apostrophe_",
        '"': "_quote_",
        "\\": "_backslash_",
        "/": "_slash_",
        "|": "_pipe_",
        "&": "_ampersand_",
        "=": "_equals_",
        "+": "_plus_",
        "*": "_asterisk_",
        "^": "_caret_",
        "%": "_percent_",
        "$": "_dollar_",
        "#": "_hash_",
        "@": "_at_",
        "!": "_bang_",
        "?": "_question_",
        "(": "_lparen_",
        ")": "_rparen_",
        "[": "_lbracket_",
        "]": "_rbracket_",
        "{": "_lbrace_",
        "}": "_rbrace_",
        "<": "_langle_",
        ">": "_rangle_",
    }.items():
        pass

def test_large_string_repeated_unicode_ellipsis():
    # Should replace repeated unicode ellipsis
    input_str = "…" * 500
    expected = "_".join(["ellipsis"] * 500)
    codeflash_output = sanitize_relationship_for_cypher(input_str) # 106μs -> 80.7μs (31.5% faster)

def test_large_string_repeated_ascii_ellipsis():
    # Should replace repeated ascii ellipsis
    input_str = "..." * 333 + "..."  # 334 times
    expected = "_".join(["ellipsis"] * 334)
    codeflash_output = sanitize_relationship_for_cypher(input_str) # 71.7μs -> 49.2μs (45.6% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re

# imports
import pytest  # used for our unit tests
from mem0.memory.utils import sanitize_relationship_for_cypher

# unit tests

# 1. Basic Test Cases

def test_basic_ascii_no_special():
    # Simple ASCII string, no special chars
    codeflash_output = sanitize_relationship_for_cypher("parent") # 7.41μs -> 2.60μs (185% faster)

def test_basic_ascii_with_space():
    # Spaces should be preserved
    codeflash_output = sanitize_relationship_for_cypher("parent child") # 6.95μs -> 2.62μs (165% faster)

def test_basic_single_special_char():
    # Single special character replacement
    codeflash_output = sanitize_relationship_for_cypher("parent-child") # 6.39μs -> 2.52μs (154% faster)

def test_basic_multiple_special_chars():
    # Multiple special characters, should all be replaced
    codeflash_output = sanitize_relationship_for_cypher("parent/child") # 7.73μs -> 3.66μs (111% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent&child") # 4.93μs -> 2.08μs (137% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent+child") # 4.04μs -> 1.65μs (145% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent?child") # 3.80μs -> 1.45μs (162% faster)

def test_basic_ellipsis_variants():
    # Ellipsis variants
    codeflash_output = sanitize_relationship_for_cypher("parent...child") # 7.18μs -> 3.68μs (95.4% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent…child") # 5.71μs -> 2.60μs (120% faster)

def test_basic_chinese_punctuation():
    # Chinese punctuation should be replaced
    codeflash_output = sanitize_relationship_for_cypher("父母,孩子。") # 9.78μs -> 5.58μs (75.3% faster)
    codeflash_output = sanitize_relationship_for_cypher("关系:朋友;敌人") # 6.12μs -> 2.70μs (127% faster)

def test_basic_brackets_and_quotes():
    # Brackets and quotes
    codeflash_output = sanitize_relationship_for_cypher("parent(child)") # 7.61μs -> 3.77μs (102% faster)
    codeflash_output = sanitize_relationship_for_cypher('parent"child"') # 4.97μs -> 2.05μs (142% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent'child'") # 4.56μs -> 2.00μs (128% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent[child]") # 4.35μs -> 1.82μs (139% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent{child}") # 4.01μs -> 1.64μs (144% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent<child>") # 3.81μs -> 1.52μs (150% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent【child】") # 5.00μs -> 2.08μs (141% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent《child》") # 4.51μs -> 1.56μs (188% faster)

def test_basic_mixed_special_chars():
    # Mixed special characters
    codeflash_output = sanitize_relationship_for_cypher("parent/child&friend@school") # 8.46μs -> 4.71μs (79.7% faster)

def test_basic_strip_leading_trailing_underscores():
    # Should strip leading/trailing underscores after replacement
    codeflash_output = sanitize_relationship_for_cypher("/parent/") # 7.32μs -> 3.48μs (111% faster)

# 2. Edge Test Cases

def test_edge_empty_string():
    # Empty string should return empty string
    codeflash_output = sanitize_relationship_for_cypher("") # 5.62μs -> 1.28μs (340% faster)

def test_edge_only_special_chars():
    # Only special characters
    codeflash_output = sanitize_relationship_for_cypher("!!!") # 7.46μs -> 3.13μs (138% faster)
    codeflash_output = sanitize_relationship_for_cypher("...") # 4.59μs -> 2.15μs (113% faster)
    codeflash_output = sanitize_relationship_for_cypher("()") # 5.15μs -> 1.55μs (232% faster)
    codeflash_output = sanitize_relationship_for_cypher("【】") # 4.55μs -> 1.13μs (302% faster)

def test_edge_repeated_special_chars():
    # Repeated special characters should be replaced and collapsed if needed
    codeflash_output = sanitize_relationship_for_cypher("parent&&child") # 7.31μs -> 3.94μs (85.3% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent////child") # 5.42μs -> 2.90μs (87.0% faster)

def test_edge_adjacent_special_chars():
    # Adjacent special characters
    codeflash_output = sanitize_relationship_for_cypher("parent?!child") # 7.23μs -> 3.40μs (112% faster)

def test_edge_leading_trailing_special_chars():
    # Leading and trailing special characters
    codeflash_output = sanitize_relationship_for_cypher("!parent?") # 7.21μs -> 3.17μs (127% faster)

def test_edge_underscore_handling():
    # Underscores in input should not be collapsed
    codeflash_output = sanitize_relationship_for_cypher("parent_child") # 6.36μs -> 2.87μs (122% faster)
    # But multiple underscores from replacements should be collapsed to one
    codeflash_output = sanitize_relationship_for_cypher("parent...child") # 5.08μs -> 2.62μs (93.6% faster)

def test_edge_unicode_and_non_ascii():
    # Unicode and non-ASCII characters should be preserved if not in char_map
    codeflash_output = sanitize_relationship_for_cypher("父母👨‍👩‍👧‍👦孩子") # 7.78μs -> 4.08μs (90.7% faster)
    codeflash_output = sanitize_relationship_for_cypher("parent😀child") # 4.56μs -> 1.81μs (152% faster)

def test_edge_backslash_and_pipe():
    # Backslash and pipe
    codeflash_output = sanitize_relationship_for_cypher("parent\\child|friend") # 7.78μs -> 4.29μs (81.3% faster)

def test_edge_hash_and_dollar():
    # Hash and dollar
    codeflash_output = sanitize_relationship_for_cypher("parent#child$friend") # 7.45μs -> 3.90μs (91.4% faster)

def test_edge_nested_brackets():
    # Nested brackets
    codeflash_output = sanitize_relationship_for_cypher("parent(child[friend{enemy}])") # 9.14μs -> 5.61μs (62.8% faster)

def test_edge_multiple_different_special_chars():
    # Multiple different special chars in a row
    codeflash_output = sanitize_relationship_for_cypher("parent?!@child") # 7.99μs -> 3.84μs (108% faster)

def test_edge_multiple_ellipsis_and_period():
    # Multiple ellipsis and period
    codeflash_output = sanitize_relationship_for_cypher("parent...。child") # 8.88μs -> 5.12μs (73.4% faster)

def test_edge_strip_multiple_underscores():
    # Should collapse multiple underscores
    codeflash_output = sanitize_relationship_for_cypher("parent.../child") # 7.71μs -> 4.11μs (87.7% faster)

def test_edge_apostrophe_and_quote():
    # Apostrophe and quote
    codeflash_output = sanitize_relationship_for_cypher("parent's \"friend\"") # 8.19μs -> 4.17μs (96.5% faster)

def test_edge_percent_and_caret():
    # Percent and caret
    codeflash_output = sanitize_relationship_for_cypher("parent%child^friend") # 7.83μs -> 4.07μs (92.6% faster)

def test_edge_brackets_with_spaces():
    # Brackets with spaces inside
    codeflash_output = sanitize_relationship_for_cypher("parent ( child )") # 7.62μs -> 3.80μs (100% faster)

def test_edge_long_replacement_chain():
    # Chain of special characters
    codeflash_output = sanitize_relationship_for_cypher("parent/child?friend!enemy&ally") # 8.83μs -> 5.28μs (67.3% faster)

# 3. Large Scale Test Cases

def test_large_scale_long_string():
    # Very long string of repeated pattern
    s = "parent/child?" * 100
    expected = ("parent_slash_child_question" * 100)
    codeflash_output = sanitize_relationship_for_cypher(s) # 77.9μs -> 93.0μs (16.2% slower)

def test_large_scale_many_special_chars():
    # String with many different special characters
    chars = "/|&=+*^%$#@!?()[]{}<>…。,;:!?”"
    s = "parent" + chars * 50 + "child"
    # Build expected manually
    expected = "parent" + (
        "_slash__pipe__ampersand__equals__plus__asterisk__caret__percent__dollar__hash__at__bang__question__lparen__rparen__lbracket__rbracket__lbrace__rbrace__langle__rangle__ellipsis__period__comma__semicolon__colon__exclamation__question__quote_"
        * 50
    ) + "child"
    codeflash_output = sanitize_relationship_for_cypher(s) # 335μs -> 291μs (14.9% faster)

def test_large_scale_no_special_chars():
    # Large string, no special chars
    s = "parentchild" * 1000
    codeflash_output = sanitize_relationship_for_cypher(s) # 137μs -> 72.9μs (88.9% faster)

def test_large_scale_only_special_chars():
    # Large string, only special chars
    s = "!@#$%^&*()_+" * 100
    expected = ("bang_at_hash_dollar_caret_percent_ampersand_asterisk_lparen_rparen__plus_" * 100).strip("_")
    codeflash_output = sanitize_relationship_for_cypher(s) # 172μs -> 162μs (5.87% faster)

def test_large_scale_mixed_ascii_and_unicode():
    # Large string, mixed ascii and unicode
    s = ("parent👨‍👩‍👧‍👦/child…friend。") * 100
    expected = ("parent👨‍👩‍👧‍👦_slash_child_ellipsis_friend_period_" * 100).strip("_")
    codeflash_output = sanitize_relationship_for_cypher(s) # 164μs -> 196μs (16.6% slower)

def test_large_scale_strip_leading_trailing_underscores():
    # Large string with leading/trailing special chars
    s = "/" + ("parent/child?" * 100) + "?"
    expected = "slash" + ("parent_slash_child_question" * 100) + "question"
    codeflash_output = sanitize_relationship_for_cypher(s) # 69.1μs -> 88.7μs (22.1% slower)

def test_large_scale_repeated_underscores():
    # Large string with many underscores produced by replacements
    s = ("parent...child/") * 100
    expected = ("parent_ellipsis_child_slash" * 100)
    codeflash_output = sanitize_relationship_for_cypher(s) # 71.7μs -> 113μs (36.7% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-sanitize_relationship_for_cypher-mhlpkz7w and push.

Codeflash Static Badge

The optimized code achieves a **22% speedup** by eliminating redundant work and leveraging more efficient Python operations. Here are the key optimizations:

**1. Moved expensive setup outside the function**: The original code recreated the 37-key `char_map` dictionary on every function call (31.4% of runtime). The optimized version moves this to module level, eliminating this overhead entirely.

**2. Pre-compiled regex pattern**: Instead of compiling `r"_+"` on each call (24.2% of original runtime), the pattern is compiled once at module level as `_re_sub_underscores`.

**3. Optimized character replacement strategy**: 
- Multi-character keys (like `"..."`) are handled first with `str.replace()` to avoid conflicts
- Single-character replacements use `str.translate()` with a pre-built translation table, which is significantly faster than iterating through individual `str.replace()` calls

**4. Reduced iteration overhead**: The original code performed 37 individual `str.replace()` operations (23% of runtime). The optimized version does just 2 multi-character replacements plus one efficient `translate()` call.

**Performance characteristics by test type**:
- **Small strings with few special chars**: 100-400% faster due to eliminated setup overhead
- **Large strings with no special chars**: 88-117% faster, benefiting from reduced function overhead  
- **Large strings with many special chars**: Mixed results (some 5-45% faster, others 16-36% slower) as the translation approach trades setup cost for per-character efficiency

The optimization is most beneficial for typical use cases with shorter strings and moderate special character density, which appear to be the common case based on the test results showing consistent 2-4x speedups for basic scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 5, 2025 08:01
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants