⚡️ Speed up function `create_sql_error_metadata` by 27% #584

codeflash-ai · 2025-11-11T04:43:32Z

📄 27% (0.27x) speedup for `create_sql_error_metadata` in `marimo/_sql/error_utils.py`

⏱️ Runtime : 831 microseconds → 653 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 27% speedup through two key optimizations that reduce Python's regex compilation overhead and string processing costs:

Key Optimizations:

1. Precompiled Regex Patterns

The original code calls re.search() with raw pattern strings each time, causing Python to recompile the regex patterns on every function call. The optimization precompiles three regex patterns at module level:

_LINE_COL_RE = re.compile(r"Line (\d+), Col: (\d+)")
_LINE_ONLY_RE = re.compile(r"LINE (\d+):")
_SQLGLOT_RE = re.compile(r"line (\d+), col (\d+)", re.IGNORECASE)

Performance Impact: Line profiler shows _extract_sql_position time drops from 1.98ms to 0.64ms (67% faster) - the regex compilation overhead was consuming ~60% of the function's runtime.

2. Optimized String Processing for Hints

The original code splits the entire exception message into lines with exception_msg.split("\n"), then processes all lines even when no hints exist. The optimization uses string.find() to locate the first newline, then processes only the remaining content when needed:

nl = exception_msg.find("\n")
if nl == -1:
    hint_lines = []
else:
    rest = exception_msg[nl+1:]
    if rest:
        hint_lines = [line.strip() for line in rest.split("\n")]

Performance Impact: This reduces unnecessary string operations, particularly beneficial when no hints exist (176/241 test cases had no hints).

Test Case Performance Analysis:

Basic cases: 18-45% faster - primarily benefiting from regex precompilation
Edge cases with no position info: 47-64% faster - avoiding multiple regex compilations when patterns don't match
Large-scale cases: 8-45% faster - string processing optimizations become more significant with larger inputs

The optimizations are particularly effective for SQL error parsing workloads where the same regex patterns are applied repeatedly to analyze exception messages, making this ideal for SQL linting or error handling systems that process many SQL statements.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 241 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import ast
import re
from typing import Optional

imports

import pytest # used for our unit tests
from marimo._sql.error_utils import create_sql_error_metadata

function to test

Copyright 2025 Marimo. All rights reserved.

class SQLErrorMetadata:
"""Simple dataclass-like structure for SQL error metadata."""
def init(
self,
lint_rule: str,
error_type: str,
clean_message: str,
hint: Optional[str],
node_lineno: int,
node_col_offset: int,
sql_statement: str,
sql_line: Optional[int],
sql_col: Optional[int],
context: str,
):
self.lint_rule = lint_rule
self.error_type = error_type
self.clean_message = clean_message
self.hint = hint
self.node_lineno = node_lineno
self.node_col_offset = node_col_offset
self.sql_statement = sql_statement
self.sql_line = sql_line
self.sql_col = sql_col
self.context = context

def __eq__(self, other):
    if not isinstance(other, SQLErrorMetadata):
        return False
    return (
        self.lint_rule == other.lint_rule and
        self.error_type == other.error_type and
        self.clean_message == other.clean_message and
        self.hint == other.hint and
        self.node_lineno == other.node_lineno and
        self.node_col_offset == other.node_col_offset and
        self.sql_statement == other.sql_statement and
        self.sql_line == other.sql_line and
        self.sql_col == other.sql_col and
        self.context == other.context
    )

def __repr__(self):
    return (
        f"SQLErrorMetadata(lint_rule={self.lint_rule!r}, error_type={self.error_type!r}, "
        f"clean_message={self.clean_message!r}, hint={self.hint!r}, "
        f"node_lineno={self.node_lineno!r}, node_col_offset={self.node_col_offset!r}, "
        f"sql_statement={self.sql_statement!r}, sql_line={self.sql_line!r}, "
        f"sql_col={self.sql_col!r}, context={self.context!r})"
    )

from marimo._sql.error_utils import create_sql_error_metadata

unit tests

--- Basic Test Cases ---

def test_basic_sqlglot_line_col_extraction():
"""Test extraction from SqlGlot format: 'Line 1, Col: 15'"""
exc = Exception("Parse error: Line 2, Col: 5")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E001",
sql_content="SELECT * FROM table;",
context="test_context"
); meta = codeflash_output # 6.27μs -> 5.29μs (18.7% faster)

def test_basic_duckdb_line_only_extraction():
"""Test extraction from DuckDB format: 'LINE 4:'"""
exc = Exception("Syntax error: LINE 3: near 'SELECT'")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E002",
sql_content="SELECT * FROM table;",
); meta = codeflash_output # 6.20μs -> 5.03μs (23.3% faster)

def test_basic_sqlglot_lowercase_format():
"""Test extraction from SQLGlot format: 'line 5, col 10' (lowercase)"""
exc = Exception("error: line 5, col 10: invalid syntax")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E003",
sql_content="SELECT * FROM table;",
); meta = codeflash_output # 8.24μs -> 5.66μs (45.4% faster)

def test_basic_multiline_hint():
"""Test extraction of hints from multiline exception messages."""
exc = Exception("Error: Something went wrong\nHint: Try using a valid column\nDetails: More info")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E004",
sql_content="SELECT col FROM table;",
); meta = codeflash_output # 8.13μs -> 6.38μs (27.3% faster)

def test_basic_node_info():
"""Test passing an AST node and extracting its line/col info."""
node = ast.parse("x = 1 + 2").body[0].value # ast.BinOp
exc = Exception("Error: LINE 1: near '+'")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E005",
node=node,
sql_content="SELECT 1 + 2;",
); meta = codeflash_output # 6.49μs -> 5.20μs (24.9% faster)

def test_basic_empty_sql_content():
"""Test when sql_content is empty."""
exc = Exception("Error: LINE 1: near 'SELECT'")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E006",
sql_content="",
); meta = codeflash_output # 6.02μs -> 4.76μs (26.4% faster)

def test_basic_no_context():
"""Test when context is not provided."""
exc = Exception("Parse error: Line 1, Col: 1")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E007",
sql_content="SELECT * FROM t;",
); meta = codeflash_output # 5.79μs -> 4.74μs (22.1% faster)

--- Edge Test Cases ---

def test_edge_no_line_col_in_exception():
"""Test when exception message has no line/col info."""
exc = Exception("General SQL error: something failed")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E008",
sql_content="SELECT * FROM t;",
); meta = codeflash_output # 7.17μs -> 4.86μs (47.6% faster)

def test_edge_empty_exception_message():
"""Test when exception message is empty."""
exc = Exception("")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E009",
sql_content="SELECT 1;",
); meta = codeflash_output # 5.88μs -> 3.58μs (64.2% faster)

def test_edge_long_sql_content_truncation():
"""Test that long SQL content is truncated to 200 chars."""
long_sql = "SELECT " + ", ".join(f"col{i}" for i in range(250))
exc = Exception("Error: Line 1, Col: 1")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E010",
sql_content=long_sql,
); meta = codeflash_output # 6.36μs -> 5.40μs (17.6% faster)

def test_edge_hint_with_leading_trailing_spaces():
"""Test that hint lines are stripped of leading/trailing spaces."""
exc = Exception("Error\n Hint: Check this \n Details: Info ")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E011",
sql_content="SELECT 1;",
); meta = codeflash_output # 8.01μs -> 6.26μs (27.9% faster)

def test_edge_node_is_none():
"""Test when node is None, node_lineno and node_col_offset are 0."""
exc = Exception("Error: LINE 1: near 'SELECT'")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E012",
node=None,
sql_content="SELECT 1;",
); meta = codeflash_output # 6.31μs -> 5.00μs (26.2% faster)

def test_edge_non_exception_type():
"""Test when input is a subclass of BaseException."""
class CustomError(BaseException):
def str(self):
return "Custom error: Line 5, Col: 7"
exc = CustomError()
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E013",
sql_content="SELECT 1;",
); meta = codeflash_output # 6.30μs -> 5.15μs (22.2% faster)

def test_edge_multiline_exception_message_with_blank_lines():
"""Test when exception message has blank lines."""
exc = Exception("Error occurred\n\nHint: Try again\n\nDetails: Info")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E014",
sql_content="SELECT 1;",
); meta = codeflash_output # 8.16μs -> 6.23μs (31.1% faster)

def test_edge_sql_content_exactly_200_chars():
"""Test when sql_content is exactly 200 chars."""
sql = "SELECT " + "a," * 198 + "b"
exc = Exception("Error: Line 1, Col: 1")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E015",
sql_content=sql,
); meta = codeflash_output # 6.31μs -> 5.31μs (18.8% faster)

def test_edge_sql_content_just_over_200_chars():
"""Test when sql_content is 201 chars."""
sql = "SELECT " + "a," * 199 + "b"
exc = Exception("Error: Line 1, Col: 1")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E016",
sql_content=sql,
); meta = codeflash_output # 5.95μs -> 5.18μs (15.0% faster)

def test_edge_exception_with_only_newlines():
"""Test when exception message is only newlines."""
exc = Exception("\n\n\n")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E017",
sql_content="SELECT 1;",
); meta = codeflash_output # 6.82μs -> 5.16μs (32.1% faster)

def test_edge_exception_with_non_ascii_characters():
"""Test when exception message contains non-ASCII characters."""
exc = Exception("Ошибка: Line 2, Col: 3\nПодсказка: Проверьте синтаксис")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E018",
sql_content="SELECT 1;",
); meta = codeflash_output # 7.68μs -> 7.74μs (0.763% slower)

--- Large Scale Test Cases ---

def test_large_scale_long_multiline_hint():
"""Test that a long multiline hint is preserved."""
exc_msg = "Error on query\n" + "\n".join(f"Hint line {i}: info" for i in range(100))
exc = Exception(exc_msg)
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E019",
sql_content="SELECT * FROM table;",
); meta = codeflash_output # 30.6μs -> 27.4μs (12.0% faster)

def test_large_scale_sql_content_truncation():
"""Test truncation of very large SQL content (999 columns)."""
large_sql = "SELECT " + ", ".join(f"col{i}" for i in range(999))
exc = Exception("Error: Line 1, Col: 1")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E020",
sql_content=large_sql,
); meta = codeflash_output # 6.35μs -> 5.73μs (10.9% faster)

def test_large_scale_many_exception_variants():
"""Test many different exception messages for robustness."""
for i in range(1, 50):
exc_msg = f"Error: Line {i}, Col: {i+1}\nHint: Info {i}"
exc = Exception(exc_msg)
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E021",
sql_content="SELECT 1;",
); meta = codeflash_output # 96.9μs -> 89.7μs (8.03% faster)

def test_large_scale_ast_node_variety():
"""Test with a variety of AST nodes for line/col info."""
code = "\n".join([f"x{i} = {i}" for i in range(50)])
tree = ast.parse(code)
for i, node in enumerate(tree.body):
exc = Exception(f"Error: LINE {i+1}: near '='")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E022",
node=node.value,
sql_content="SELECT 1;",
); meta = codeflash_output # 104μs -> 73.8μs (41.0% faster)

def test_large_scale_context_variety():
"""Test with many different context strings."""
for i in range(100):
exc = Exception("Error: LINE 1: near 'SELECT'")
context = f"context_{i}"
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E023",
sql_content="SELECT 1;",
context=context,
); meta = codeflash_output # 193μs -> 132μs (45.2% faster)

def test_large_scale_hint_blank_lines():
"""Test hint extraction with many blank lines."""
exc_msg = "Error\n" + "\n".join("" for _ in range(200)) + "Final hint"
exc = Exception(exc_msg)
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E024",
sql_content="SELECT 1;",
); meta = codeflash_output # 20.0μs -> 15.8μs (26.7% faster)

def test_large_scale_exception_with_varying_types():
"""Test with many different exception types."""
class CustomErrorA(Exception): pass
class CustomErrorB(Exception): pass
class CustomErrorC(BaseException): pass

exc_a = CustomErrorA("Error: Line 3, Col: 4")
exc_b = CustomErrorB("Error: LINE 2:")
exc_c = CustomErrorC("Error: line 10, col 20")

codeflash_output = create_sql_error_metadata(
    exc_a,
    rule_code="E025",
    sql_content="SELECT 1;",
); meta_a = codeflash_output # 5.92μs -> 4.78μs (23.9% faster)
codeflash_output = create_sql_error_metadata(
    exc_b,
    rule_code="E025",
    sql_content="SELECT 1;",
); meta_b = codeflash_output # 3.13μs -> 2.39μs (31.1% faster)
codeflash_output = create_sql_error_metadata(
    exc_c,
    rule_code="E025",
    sql_content="SELECT 1;",
); meta_c = codeflash_output # 4.65μs -> 2.68μs (73.3% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
import ast
import re
from typing import Optional

imports

import pytest # used for our unit tests
from marimo._sql.error_utils import create_sql_error_metadata

function to test

Copyright 2025 Marimo. All rights reserved.

class SQLErrorMetadata:
"""
Data class to hold SQL error metadata.
"""
def init(
self,
lint_rule: str,
error_type: str,
clean_message: str,
hint: Optional[str],
node_lineno: int,
node_col_offset: int,
sql_statement: str,
sql_line: Optional[int],
sql_col: Optional[int],
context: str,
):
self.lint_rule = lint_rule
self.error_type = error_type
self.clean_message = clean_message
self.hint = hint
self.node_lineno = node_lineno
self.node_col_offset = node_col_offset
self.sql_statement = sql_statement
self.sql_line = sql_line
self.sql_col = sql_col
self.context = context

def __eq__(self, other):
    if not isinstance(other, SQLErrorMetadata):
        return False
    return (
        self.lint_rule == other.lint_rule and
        self.error_type == other.error_type and
        self.clean_message == other.clean_message and
        self.hint == other.hint and
        self.node_lineno == other.node_lineno and
        self.node_col_offset == other.node_col_offset and
        self.sql_statement == other.sql_statement and
        self.sql_line == other.sql_line and
        self.sql_col == other.sql_col and
        self.context == other.context
    )

def __repr__(self):
    return (
        f"SQLErrorMetadata(lint_rule={self.lint_rule!r}, error_type={self.error_type!r}, "
        f"clean_message={self.clean_message!r}, hint={self.hint!r}, node_lineno={self.node_lineno!r}, "
        f"node_col_offset={self.node_col_offset!r}, sql_statement={self.sql_statement!r}, "
        f"sql_line={self.sql_line!r}, sql_col={self.sql_col!r}, context={self.context!r})"
    )

from marimo._sql.error_utils import create_sql_error_metadata

unit tests

-------- BASIC TEST CASES --------

def test_basic_sqlglot_format():
# Test for "Line 1, Col: 15" format
exc = Exception("Syntax error near 'SELECT'. Line 2, Col: 20")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E001",
sql_content="SELECT * FROM table;",
context="Parsing SQL"
); metadata = codeflash_output # 6.29μs -> 5.38μs (16.9% faster)

def test_basic_duckdb_format():
# Test for "LINE 4:" format
exc = Exception("Error: syntax error at or near "FROM"\nLINE 4:")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E002",
sql_content="SELECT *\nFROM table;",
context="DuckDB"
); metadata = codeflash_output # 6.58μs -> 6.03μs (9.15% faster)

def test_basic_sqlglot_lowercase_format():
# Test for "line 5, col 12" format (case insensitive)
exc = Exception("Unexpected token. line 5, col 12")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E003",
sql_content="SELECT * FROM users;",
context="SQLGlot"
); metadata = codeflash_output # 8.24μs -> 5.59μs (47.4% faster)

def test_basic_with_node():
# Test with an ast node provided
node = ast.parse("1+2").body[0].value # ast.BinOp
exc = Exception("Syntax error near '+'")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E004",
node=node,
sql_content="SELECT 1 + 2;",
context="AST context"
); metadata = codeflash_output # 6.65μs -> 4.56μs (45.9% faster)

def test_basic_multiline_hint():
# Test extraction of multiline hints
exc = Exception("Error: something went wrong\nHint: check your SQL\nHint: use valid syntax")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E005",
sql_content="SELECT * FROM x;",
context="Hint context"
); metadata = codeflash_output # 7.90μs -> 6.46μs (22.2% faster)

-------- EDGE TEST CASES --------

def test_edge_no_position_info():
# Test when exception message has no position info
exc = Exception("General error, no position info")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E006",
sql_content="SELECT *;",
context="No position"
); metadata = codeflash_output # 6.71μs -> 4.42μs (51.9% faster)

def test_edge_empty_sql_content():
# Test with empty SQL content
exc = Exception("Error: missing SQL")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E007",
sql_content="",
context="Empty SQL"
); metadata = codeflash_output # 6.33μs -> 4.11μs (53.9% faster)

def test_edge_long_sql_content_truncation():
# Test SQL content longer than 200 chars gets truncated
long_sql = "SELECT " + ",".join([f"col{i}" for i in range(250)]) + " FROM table;"
exc = Exception("Error: too many columns")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E008",
sql_content=long_sql,
context="Truncation"
); metadata = codeflash_output # 6.86μs -> 4.59μs (49.3% faster)

def test_edge_node_none():
# Test when node is None, node_lineno and node_col_offset should be 0
exc = Exception("Error: node is None")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E009",
node=None,
sql_content="SELECT 1;",
context="Node None"
); metadata = codeflash_output # 6.50μs -> 4.20μs (54.8% faster)

def test_edge_exception_type():
# Test with a different exception type
class MyCustomException(ValueError):
pass
exc = MyCustomException("Custom error\nHint: custom hint")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E010",
sql_content="SELECT 1;",
context="Custom Exception"
); metadata = codeflash_output # 7.03μs -> 5.59μs (25.8% faster)

def test_edge_hint_with_empty_lines():
# Test hint extraction with empty lines
exc = Exception("Error: something\n\nHint: one\n\nHint: two\n")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E011",
sql_content="SELECT 1;",
context="Empty lines in hint"
); metadata = codeflash_output # 7.88μs -> 6.30μs (25.1% faster)

def test_edge_only_newline_message():
# Test message with only newlines
exc = Exception("\n\n")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E012",
sql_content="SELECT 1;",
context="Only newline"
); metadata = codeflash_output # 6.54μs -> 5.02μs (30.3% faster)

def test_large_scale_many_columns():
# SQL with many columns, triggers truncation
cols = [f"col{i}" for i in range(999)]
sql = "SELECT " + ",".join(cols) + " FROM table;"
exc = Exception("Error: large SQL\nLine 1, Col: 100")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E014",
sql_content=sql,
context="Large SQL"
); metadata = codeflash_output # 8.66μs -> 8.19μs (5.74% faster)

def test_large_scale_long_multiline_hint():
# Exception message with many hint lines
hint_lines = [f"Hint: {i}" for i in range(999)]
exc_msg = "Error: lots of hints\n" + "\n".join(hint_lines)
exc = Exception(exc_msg)
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E015",
sql_content="SELECT 1;",
context="Many hints"
); metadata = codeflash_output # 124μs -> 109μs (14.5% faster)

def test_large_scale_long_context():
# Very long context string
context = "A" * 999
exc = Exception("Error: large context\nLINE 10:")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E016",
sql_content="SELECT 1;",
context=context
); metadata = codeflash_output # 7.06μs -> 6.47μs (9.15% faster)

def test_large_scale_ast_node():
# Node with large lineno/col_offset
node = ast.parse("1+2").body[0].value
node.lineno = 999
node.col_offset = 888
exc = Exception("Error: node info\nLine 100, Col: 200")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E017",
node=node,
sql_content="SELECT 1;",
context="Large node"
); metadata = codeflash_output # 7.06μs -> 6.44μs (9.63% faster)

def test_large_scale_sql_content_exact_200():
# SQL content exactly 200 chars, should not truncate
sql_content = "A" * 200
exc = Exception("Error: test")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E018",
sql_content=sql_content,
context="Exact 200"
); metadata = codeflash_output # 6.36μs -> 4.00μs (59.2% faster)

def test_large_scale_sql_content_just_over_200():
# SQL content just over 200 chars, should truncate
sql_content = "B" * 201
exc = Exception("Error: test")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E019",
sql_content=sql_content,
context="Just over 200"
); metadata = codeflash_output # 6.50μs -> 4.16μs (56.1% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-create_sql_error_metadata-mhu35r1e and push.

The optimized code achieves a **27% speedup** through two key optimizations that reduce Python's regex compilation overhead and string processing costs: ## **Key Optimizations:** ### 1. **Precompiled Regex Patterns** The original code calls `re.search()` with raw pattern strings each time, causing Python to recompile the regex patterns on every function call. The optimization precompiles three regex patterns at module level: ```python _LINE_COL_RE = re.compile(r"Line (\d+), Col: (\d+)") _LINE_ONLY_RE = re.compile(r"LINE (\d+):") _SQLGLOT_RE = re.compile(r"line (\d+), col (\d+)", re.IGNORECASE) ``` **Performance Impact**: Line profiler shows `_extract_sql_position` time drops from 1.98ms to 0.64ms (67% faster) - the regex compilation overhead was consuming ~60% of the function's runtime. ### 2. **Optimized String Processing for Hints** The original code splits the entire exception message into lines with `exception_msg.split("\n")`, then processes all lines even when no hints exist. The optimization uses `string.find()` to locate the first newline, then processes only the remaining content when needed: ```python nl = exception_msg.find("\n") if nl == -1: hint_lines = [] else: rest = exception_msg[nl+1:] if rest: hint_lines = [line.strip() for line in rest.split("\n")] ``` **Performance Impact**: This reduces unnecessary string operations, particularly beneficial when no hints exist (176/241 test cases had no hints). ## **Test Case Performance Analysis:** - **Basic cases**: 18-45% faster - primarily benefiting from regex precompilation - **Edge cases with no position info**: 47-64% faster - avoiding multiple regex compilations when patterns don't match - **Large-scale cases**: 8-45% faster - string processing optimizations become more significant with larger inputs The optimizations are particularly effective for SQL error parsing workloads where the same regex patterns are applied repeatedly to analyze exception messages, making this ideal for SQL linting or error handling systems that process many SQL statements.

codeflash-ai bot requested a review from mashraf-222 November 11, 2025 04:43

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `create_sql_error_metadata` by 27% #584

⚡️ Speed up function `create_sql_error_metadata` by 27% #584

Uh oh!

codeflash-ai bot commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function create_sql_error_metadata by 27% #584

Are you sure you want to change the base?

⚡️ Speed up function create_sql_error_metadata by 27% #584

Uh oh!

Conversation

codeflash-ai bot commented Nov 11, 2025

📄 27% (0.27x) speedup for create_sql_error_metadata in marimo/_sql/error_utils.py

📝 Explanation and details

Key Optimizations:

1. Precompiled Regex Patterns

2. Optimized String Processing for Hints

Test Case Performance Analysis:

imports

function to test

Copyright 2025 Marimo. All rights reserved.

unit tests

--- Basic Test Cases ---

--- Edge Test Cases ---

--- Large Scale Test Cases ---

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

imports

function to test

Copyright 2025 Marimo. All rights reserved.

unit tests

-------- BASIC TEST CASES --------

-------- EDGE TEST CASES --------

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `create_sql_error_metadata` by 27% #584

⚡️ Speed up function `create_sql_error_metadata` by 27% #584

📄 27% (0.27x) speedup for `create_sql_error_metadata` in `marimo/_sql/error_utils.py`