⚡️ Speed up function create_sql_error_metadata by 27%
#584
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 27% (0.27x) speedup for
create_sql_error_metadatainmarimo/_sql/error_utils.py⏱️ Runtime :
831 microseconds→653 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves a 27% speedup through two key optimizations that reduce Python's regex compilation overhead and string processing costs:
Key Optimizations:
1. Precompiled Regex Patterns
The original code calls
re.search()with raw pattern strings each time, causing Python to recompile the regex patterns on every function call. The optimization precompiles three regex patterns at module level:Performance Impact: Line profiler shows
_extract_sql_positiontime drops from 1.98ms to 0.64ms (67% faster) - the regex compilation overhead was consuming ~60% of the function's runtime.2. Optimized String Processing for Hints
The original code splits the entire exception message into lines with
exception_msg.split("\n"), then processes all lines even when no hints exist. The optimization usesstring.find()to locate the first newline, then processes only the remaining content when needed:Performance Impact: This reduces unnecessary string operations, particularly beneficial when no hints exist (176/241 test cases had no hints).
Test Case Performance Analysis:
The optimizations are particularly effective for SQL error parsing workloads where the same regex patterns are applied repeatedly to analyze exception messages, making this ideal for SQL linting or error handling systems that process many SQL statements.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
import ast
import re
from typing import Optional
imports
import pytest # used for our unit tests
from marimo._sql.error_utils import create_sql_error_metadata
function to test
Copyright 2025 Marimo. All rights reserved.
class SQLErrorMetadata:
"""Simple dataclass-like structure for SQL error metadata."""
def init(
self,
lint_rule: str,
error_type: str,
clean_message: str,
hint: Optional[str],
node_lineno: int,
node_col_offset: int,
sql_statement: str,
sql_line: Optional[int],
sql_col: Optional[int],
context: str,
):
self.lint_rule = lint_rule
self.error_type = error_type
self.clean_message = clean_message
self.hint = hint
self.node_lineno = node_lineno
self.node_col_offset = node_col_offset
self.sql_statement = sql_statement
self.sql_line = sql_line
self.sql_col = sql_col
self.context = context
from marimo._sql.error_utils import create_sql_error_metadata
unit tests
--- Basic Test Cases ---
def test_basic_sqlglot_line_col_extraction():
"""Test extraction from SqlGlot format: 'Line 1, Col: 15'"""
exc = Exception("Parse error: Line 2, Col: 5")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E001",
sql_content="SELECT * FROM table;",
context="test_context"
); meta = codeflash_output # 6.27μs -> 5.29μs (18.7% faster)
def test_basic_duckdb_line_only_extraction():
"""Test extraction from DuckDB format: 'LINE 4:'"""
exc = Exception("Syntax error: LINE 3: near 'SELECT'")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E002",
sql_content="SELECT * FROM table;",
); meta = codeflash_output # 6.20μs -> 5.03μs (23.3% faster)
def test_basic_sqlglot_lowercase_format():
"""Test extraction from SQLGlot format: 'line 5, col 10' (lowercase)"""
exc = Exception("error: line 5, col 10: invalid syntax")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E003",
sql_content="SELECT * FROM table;",
); meta = codeflash_output # 8.24μs -> 5.66μs (45.4% faster)
def test_basic_multiline_hint():
"""Test extraction of hints from multiline exception messages."""
exc = Exception("Error: Something went wrong\nHint: Try using a valid column\nDetails: More info")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E004",
sql_content="SELECT col FROM table;",
); meta = codeflash_output # 8.13μs -> 6.38μs (27.3% faster)
def test_basic_node_info():
"""Test passing an AST node and extracting its line/col info."""
node = ast.parse("x = 1 + 2").body[0].value # ast.BinOp
exc = Exception("Error: LINE 1: near '+'")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E005",
node=node,
sql_content="SELECT 1 + 2;",
); meta = codeflash_output # 6.49μs -> 5.20μs (24.9% faster)
def test_basic_empty_sql_content():
"""Test when sql_content is empty."""
exc = Exception("Error: LINE 1: near 'SELECT'")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E006",
sql_content="",
); meta = codeflash_output # 6.02μs -> 4.76μs (26.4% faster)
def test_basic_no_context():
"""Test when context is not provided."""
exc = Exception("Parse error: Line 1, Col: 1")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E007",
sql_content="SELECT * FROM t;",
); meta = codeflash_output # 5.79μs -> 4.74μs (22.1% faster)
--- Edge Test Cases ---
def test_edge_no_line_col_in_exception():
"""Test when exception message has no line/col info."""
exc = Exception("General SQL error: something failed")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E008",
sql_content="SELECT * FROM t;",
); meta = codeflash_output # 7.17μs -> 4.86μs (47.6% faster)
def test_edge_empty_exception_message():
"""Test when exception message is empty."""
exc = Exception("")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E009",
sql_content="SELECT 1;",
); meta = codeflash_output # 5.88μs -> 3.58μs (64.2% faster)
def test_edge_long_sql_content_truncation():
"""Test that long SQL content is truncated to 200 chars."""
long_sql = "SELECT " + ", ".join(f"col{i}" for i in range(250))
exc = Exception("Error: Line 1, Col: 1")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E010",
sql_content=long_sql,
); meta = codeflash_output # 6.36μs -> 5.40μs (17.6% faster)
def test_edge_hint_with_leading_trailing_spaces():
"""Test that hint lines are stripped of leading/trailing spaces."""
exc = Exception("Error\n Hint: Check this \n Details: Info ")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E011",
sql_content="SELECT 1;",
); meta = codeflash_output # 8.01μs -> 6.26μs (27.9% faster)
def test_edge_node_is_none():
"""Test when node is None, node_lineno and node_col_offset are 0."""
exc = Exception("Error: LINE 1: near 'SELECT'")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E012",
node=None,
sql_content="SELECT 1;",
); meta = codeflash_output # 6.31μs -> 5.00μs (26.2% faster)
def test_edge_non_exception_type():
"""Test when input is a subclass of BaseException."""
class CustomError(BaseException):
def str(self):
return "Custom error: Line 5, Col: 7"
exc = CustomError()
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E013",
sql_content="SELECT 1;",
); meta = codeflash_output # 6.30μs -> 5.15μs (22.2% faster)
def test_edge_multiline_exception_message_with_blank_lines():
"""Test when exception message has blank lines."""
exc = Exception("Error occurred\n\nHint: Try again\n\nDetails: Info")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E014",
sql_content="SELECT 1;",
); meta = codeflash_output # 8.16μs -> 6.23μs (31.1% faster)
def test_edge_sql_content_exactly_200_chars():
"""Test when sql_content is exactly 200 chars."""
sql = "SELECT " + "a," * 198 + "b"
exc = Exception("Error: Line 1, Col: 1")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E015",
sql_content=sql,
); meta = codeflash_output # 6.31μs -> 5.31μs (18.8% faster)
def test_edge_sql_content_just_over_200_chars():
"""Test when sql_content is 201 chars."""
sql = "SELECT " + "a," * 199 + "b"
exc = Exception("Error: Line 1, Col: 1")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E016",
sql_content=sql,
); meta = codeflash_output # 5.95μs -> 5.18μs (15.0% faster)
def test_edge_exception_with_only_newlines():
"""Test when exception message is only newlines."""
exc = Exception("\n\n\n")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E017",
sql_content="SELECT 1;",
); meta = codeflash_output # 6.82μs -> 5.16μs (32.1% faster)
def test_edge_exception_with_non_ascii_characters():
"""Test when exception message contains non-ASCII characters."""
exc = Exception("Ошибка: Line 2, Col: 3\nПодсказка: Проверьте синтаксис")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E018",
sql_content="SELECT 1;",
); meta = codeflash_output # 7.68μs -> 7.74μs (0.763% slower)
--- Large Scale Test Cases ---
def test_large_scale_long_multiline_hint():
"""Test that a long multiline hint is preserved."""
exc_msg = "Error on query\n" + "\n".join(f"Hint line {i}: info" for i in range(100))
exc = Exception(exc_msg)
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E019",
sql_content="SELECT * FROM table;",
); meta = codeflash_output # 30.6μs -> 27.4μs (12.0% faster)
def test_large_scale_sql_content_truncation():
"""Test truncation of very large SQL content (999 columns)."""
large_sql = "SELECT " + ", ".join(f"col{i}" for i in range(999))
exc = Exception("Error: Line 1, Col: 1")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E020",
sql_content=large_sql,
); meta = codeflash_output # 6.35μs -> 5.73μs (10.9% faster)
def test_large_scale_many_exception_variants():
"""Test many different exception messages for robustness."""
for i in range(1, 50):
exc_msg = f"Error: Line {i}, Col: {i+1}\nHint: Info {i}"
exc = Exception(exc_msg)
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E021",
sql_content="SELECT 1;",
); meta = codeflash_output # 96.9μs -> 89.7μs (8.03% faster)
def test_large_scale_ast_node_variety():
"""Test with a variety of AST nodes for line/col info."""
code = "\n".join([f"x{i} = {i}" for i in range(50)])
tree = ast.parse(code)
for i, node in enumerate(tree.body):
exc = Exception(f"Error: LINE {i+1}: near '='")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E022",
node=node.value,
sql_content="SELECT 1;",
); meta = codeflash_output # 104μs -> 73.8μs (41.0% faster)
def test_large_scale_context_variety():
"""Test with many different context strings."""
for i in range(100):
exc = Exception("Error: LINE 1: near 'SELECT'")
context = f"context_{i}"
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E023",
sql_content="SELECT 1;",
context=context,
); meta = codeflash_output # 193μs -> 132μs (45.2% faster)
def test_large_scale_hint_blank_lines():
"""Test hint extraction with many blank lines."""
exc_msg = "Error\n" + "\n".join("" for _ in range(200)) + "Final hint"
exc = Exception(exc_msg)
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E024",
sql_content="SELECT 1;",
); meta = codeflash_output # 20.0μs -> 15.8μs (26.7% faster)
def test_large_scale_exception_with_varying_types():
"""Test with many different exception types."""
class CustomErrorA(Exception): pass
class CustomErrorB(Exception): pass
class CustomErrorC(BaseException): pass
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import ast
import re
from typing import Optional
imports
import pytest # used for our unit tests
from marimo._sql.error_utils import create_sql_error_metadata
function to test
Copyright 2025 Marimo. All rights reserved.
class SQLErrorMetadata:
"""
Data class to hold SQL error metadata.
"""
def init(
self,
lint_rule: str,
error_type: str,
clean_message: str,
hint: Optional[str],
node_lineno: int,
node_col_offset: int,
sql_statement: str,
sql_line: Optional[int],
sql_col: Optional[int],
context: str,
):
self.lint_rule = lint_rule
self.error_type = error_type
self.clean_message = clean_message
self.hint = hint
self.node_lineno = node_lineno
self.node_col_offset = node_col_offset
self.sql_statement = sql_statement
self.sql_line = sql_line
self.sql_col = sql_col
self.context = context
from marimo._sql.error_utils import create_sql_error_metadata
unit tests
-------- BASIC TEST CASES --------
def test_basic_sqlglot_format():
# Test for "Line 1, Col: 15" format
exc = Exception("Syntax error near 'SELECT'. Line 2, Col: 20")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E001",
sql_content="SELECT * FROM table;",
context="Parsing SQL"
); metadata = codeflash_output # 6.29μs -> 5.38μs (16.9% faster)
def test_basic_duckdb_format():
# Test for "LINE 4:" format
exc = Exception("Error: syntax error at or near "FROM"\nLINE 4:")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E002",
sql_content="SELECT *\nFROM table;",
context="DuckDB"
); metadata = codeflash_output # 6.58μs -> 6.03μs (9.15% faster)
def test_basic_sqlglot_lowercase_format():
# Test for "line 5, col 12" format (case insensitive)
exc = Exception("Unexpected token. line 5, col 12")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E003",
sql_content="SELECT * FROM users;",
context="SQLGlot"
); metadata = codeflash_output # 8.24μs -> 5.59μs (47.4% faster)
def test_basic_with_node():
# Test with an ast node provided
node = ast.parse("1+2").body[0].value # ast.BinOp
exc = Exception("Syntax error near '+'")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E004",
node=node,
sql_content="SELECT 1 + 2;",
context="AST context"
); metadata = codeflash_output # 6.65μs -> 4.56μs (45.9% faster)
def test_basic_multiline_hint():
# Test extraction of multiline hints
exc = Exception("Error: something went wrong\nHint: check your SQL\nHint: use valid syntax")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E005",
sql_content="SELECT * FROM x;",
context="Hint context"
); metadata = codeflash_output # 7.90μs -> 6.46μs (22.2% faster)
-------- EDGE TEST CASES --------
def test_edge_no_position_info():
# Test when exception message has no position info
exc = Exception("General error, no position info")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E006",
sql_content="SELECT *;",
context="No position"
); metadata = codeflash_output # 6.71μs -> 4.42μs (51.9% faster)
def test_edge_empty_sql_content():
# Test with empty SQL content
exc = Exception("Error: missing SQL")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E007",
sql_content="",
context="Empty SQL"
); metadata = codeflash_output # 6.33μs -> 4.11μs (53.9% faster)
def test_edge_long_sql_content_truncation():
# Test SQL content longer than 200 chars gets truncated
long_sql = "SELECT " + ",".join([f"col{i}" for i in range(250)]) + " FROM table;"
exc = Exception("Error: too many columns")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E008",
sql_content=long_sql,
context="Truncation"
); metadata = codeflash_output # 6.86μs -> 4.59μs (49.3% faster)
def test_edge_node_none():
# Test when node is None, node_lineno and node_col_offset should be 0
exc = Exception("Error: node is None")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E009",
node=None,
sql_content="SELECT 1;",
context="Node None"
); metadata = codeflash_output # 6.50μs -> 4.20μs (54.8% faster)
def test_edge_exception_type():
# Test with a different exception type
class MyCustomException(ValueError):
pass
exc = MyCustomException("Custom error\nHint: custom hint")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E010",
sql_content="SELECT 1;",
context="Custom Exception"
); metadata = codeflash_output # 7.03μs -> 5.59μs (25.8% faster)
def test_edge_hint_with_empty_lines():
# Test hint extraction with empty lines
exc = Exception("Error: something\n\nHint: one\n\nHint: two\n")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E011",
sql_content="SELECT 1;",
context="Empty lines in hint"
); metadata = codeflash_output # 7.88μs -> 6.30μs (25.1% faster)
def test_edge_only_newline_message():
# Test message with only newlines
exc = Exception("\n\n")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E012",
sql_content="SELECT 1;",
context="Only newline"
); metadata = codeflash_output # 6.54μs -> 5.02μs (30.3% faster)
def test_large_scale_many_columns():
# SQL with many columns, triggers truncation
cols = [f"col{i}" for i in range(999)]
sql = "SELECT " + ",".join(cols) + " FROM table;"
exc = Exception("Error: large SQL\nLine 1, Col: 100")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E014",
sql_content=sql,
context="Large SQL"
); metadata = codeflash_output # 8.66μs -> 8.19μs (5.74% faster)
def test_large_scale_long_multiline_hint():
# Exception message with many hint lines
hint_lines = [f"Hint: {i}" for i in range(999)]
exc_msg = "Error: lots of hints\n" + "\n".join(hint_lines)
exc = Exception(exc_msg)
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E015",
sql_content="SELECT 1;",
context="Many hints"
); metadata = codeflash_output # 124μs -> 109μs (14.5% faster)
def test_large_scale_long_context():
# Very long context string
context = "A" * 999
exc = Exception("Error: large context\nLINE 10:")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E016",
sql_content="SELECT 1;",
context=context
); metadata = codeflash_output # 7.06μs -> 6.47μs (9.15% faster)
def test_large_scale_ast_node():
# Node with large lineno/col_offset
node = ast.parse("1+2").body[0].value
node.lineno = 999
node.col_offset = 888
exc = Exception("Error: node info\nLine 100, Col: 200")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E017",
node=node,
sql_content="SELECT 1;",
context="Large node"
); metadata = codeflash_output # 7.06μs -> 6.44μs (9.63% faster)
def test_large_scale_sql_content_exact_200():
# SQL content exactly 200 chars, should not truncate
sql_content = "A" * 200
exc = Exception("Error: test")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E018",
sql_content=sql_content,
context="Exact 200"
); metadata = codeflash_output # 6.36μs -> 4.00μs (59.2% faster)
def test_large_scale_sql_content_just_over_200():
# SQL content just over 200 chars, should truncate
sql_content = "B" * 201
exc = Exception("Error: test")
codeflash_output = create_sql_error_metadata(
exc,
rule_code="E019",
sql_content=sql_content,
context="Just over 200"
); metadata = codeflash_output # 6.50μs -> 4.16μs (56.1% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To edit these changes
git checkout codeflash/optimize-create_sql_error_metadata-mhu35r1eand push.