⚡️ Speed up method BreaklessListsPreprocessor.run by 40%
#595
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 40% (0.40x) speedup for
BreaklessListsPreprocessor.runinmarimo/_output/md_extensions/breakless_lists.py⏱️ Runtime :
2.80 milliseconds→2.00 millisecond(best of203runs)📝 Explanation and details
The optimized code achieves a 40% speedup through three key micro-optimizations that reduce Python's overhead in tight loops:
What optimizations were applied:
append = result_lines.appendandmatch_list_start = self.LIST_START_PATTERN.matchavoid repeated attribute lookups during iterationwhile i < len(lines)withfor i in range(length)to eliminate redundantlen()calls and manual index incrementingcurrent_line.strip()check that was duplicated in the blank line insertion logicWhy these optimizations provide speedup:
self.LIST_START_PATTERN.match,result_lines.append) has overhead. Localizing these to variables makes them direct variable lookups, which are significantly faster in loopsforloops withrange()are more efficient thanwhileloops with manual indexing because Python can optimize the iteration internallylen(lines)calls and redundant.strip()checks reduces CPU cycles per iterationImpact on workloads:
The optimizations are most beneficial for large-scale document processing, as shown by the test results where large documents (500-1000 lines) see 40-50% speedups, while small documents (2-5 lines) show minimal or slight slowdowns due to the overhead of variable assignment. This suggests the function processes substantial markdown documents in production, making these micro-optimizations valuable for real-world usage where document size matters more than single-line processing overhead.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
import re
imports
import pytest # used for our unit tests
from marimo._output.md_extensions.breakless_lists import
BreaklessListsPreprocessor
from markdown import Markdown, preprocessors
unit tests
class DummyMarkdown(Markdown):
"""Dummy Markdown class for instantiating the preprocessor."""
pass
@pytest.fixture
def preprocessor():
# Provide a fresh preprocessor for each test
return BreaklessListsPreprocessor(DummyMarkdown())
--------------------- Basic Test Cases ---------------------
def test_empty_input(preprocessor):
# Should return empty list when input is empty
codeflash_output = preprocessor.run([]) # 414ns -> 462ns (10.4% slower)
def test_single_line_no_list(preprocessor):
# Single line that is not a list
codeflash_output = preprocessor.run(["Hello world"]) # 1.07μs -> 2.19μs (51.3% slower)
def test_single_list_line(preprocessor):
# Single list item; should not add blank lines
codeflash_output = preprocessor.run(["- item"]) # 1.03μs -> 1.93μs (46.5% slower)
def test_paragraph_followed_by_list(preprocessor):
# Paragraph directly followed by a list; should insert a blank line
lines = [
"This is a paragraph.",
"- item 1",
"- item 2"
]
expected = [
"This is a paragraph.",
"",
"- item 1",
"- item 2"
]
codeflash_output = preprocessor.run(lines) # 4.42μs -> 4.54μs (2.53% slower)
def test_paragraph_list_with_existing_blank_line(preprocessor):
# Paragraph, blank line, then list; should not add an extra blank line
lines = [
"This is a paragraph.",
"",
"- item 1"
]
expected = [
"This is a paragraph.",
"",
"- item 1"
]
codeflash_output = preprocessor.run(lines) # 2.50μs -> 2.94μs (15.0% slower)
def test_multiple_lists_interrupted_by_paragraphs(preprocessor):
# Paragraph, list, paragraph, list, all needing blank lines
lines = [
"Para1.",
"- item 1",
"Para2.",
"1. item 2"
]
expected = [
"Para1.",
"",
"- item 1",
"Para2.",
"",
"1. item 2"
]
codeflash_output = preprocessor.run(lines) # 4.83μs -> 5.06μs (4.70% slower)
def test_multiple_paragraphs_and_lists(preprocessor):
# Paragraph, list, paragraph, list, with blank lines already present
lines = [
"Para1.",
"",
"- item 1",
"",
"Para2.",
"",
"1. item 2"
]
expected = [
"Para1.",
"",
"- item 1",
"",
"Para2.",
"",
"1. item 2"
]
codeflash_output = preprocessor.run(lines) # 3.28μs -> 3.50μs (6.34% slower)
def test_list_after_code_block(preprocessor):
# List after code block (code block lines start with 4 spaces)
lines = [
" code block",
"- item"
]
expected = [
" code block",
"",
"- item"
]
codeflash_output = preprocessor.run(lines) # 3.37μs -> 3.77μs (10.8% slower)
def test_list_with_indentation(preprocessor):
# List with indentation should be detected
lines = [
"Paragraph.",
" - indented item"
]
expected = [
"Paragraph.",
"",
" - indented item"
]
codeflash_output = preprocessor.run(lines) # 3.12μs -> 3.40μs (8.10% slower)
def test_ordered_list_followed_by_unordered_list(preprocessor):
# Ordered list followed by unordered list, both after paragraphs
lines = [
"Paragraph.",
"1. item one",
"Another paragraph.",
"* item two"
]
expected = [
"Paragraph.",
"",
"1. item one",
"Another paragraph.",
"",
"* item two"
]
codeflash_output = preprocessor.run(lines) # 4.59μs -> 4.87μs (5.63% slower)
def test_unordered_list_symbols(preprocessor):
# Test all unordered list symbols (, -, +)
lines = [
"Para.",
" star item",
"Para.",
"- dash item",
"Para.",
"+ plus item"
]
expected = [
"Para.",
"",
"* star item",
"Para.",
"",
"- dash item",
"Para.",
"",
"+ plus item"
]
codeflash_output = preprocessor.run(lines) # 4.96μs -> 4.96μs (0.101% slower)
--------------------- Edge Test Cases ---------------------
def test_list_at_start_of_document(preprocessor):
# List at start should not have blank line inserted
lines = [
"- item 1",
"- item 2"
]
expected = [
"- item 1",
"- item 2"
]
codeflash_output = preprocessor.run(lines) # 2.92μs -> 3.38μs (13.5% slower)
def test_list_after_blank_line(preprocessor):
# List after blank line should not get extra blank line
lines = [
"",
"- item 1"
]
expected = [
"",
"- item 1"
]
codeflash_output = preprocessor.run(lines) # 1.50μs -> 2.17μs (30.8% slower)
def test_blank_line_between_paragraph_and_list(preprocessor):
# Paragraph, blank line, list; should not add extra blank line
lines = [
"Paragraph.",
"",
"- item"
]
expected = [
"Paragraph.",
"",
"- item"
]
codeflash_output = preprocessor.run(lines) # 2.46μs -> 3.01μs (18.1% slower)
def test_multiple_consecutive_lists(preprocessor):
# Two lists directly after each other; only the first after a paragraph gets a blank line
lines = [
"Paragraph.",
"- item 1",
"- item 2",
"1. item 3"
]
expected = [
"Paragraph.",
"",
"- item 1",
"- item 2",
"1. item 3"
]
codeflash_output = preprocessor.run(lines) # 4.80μs -> 5.03μs (4.51% slower)
def test_multiple_blank_lines_between_paragraph_and_list(preprocessor):
# Paragraph, multiple blank lines, list; should not add extra blank line
lines = [
"Paragraph.",
"",
"",
"- item"
]
expected = [
"Paragraph.",
"",
"",
"- item"
]
codeflash_output = preprocessor.run(lines) # 2.56μs -> 2.99μs (14.4% slower)
def test_list_with_multiple_spaces(preprocessor):
# List item with multiple spaces between marker and content
lines = [
"Para.",
"- spaced item"
]
expected = [
"Para.",
"",
"- spaced item"
]
codeflash_output = preprocessor.run(lines) # 3.15μs -> 3.79μs (16.9% slower)
def test_list_marker_in_paragraph(preprocessor):
# List marker in the middle of a paragraph should not be treated as a list
lines = [
"This is not a list: - just text",
"1. This is a list"
]
expected = [
"This is not a list: - just text",
"",
"1. This is a list"
]
codeflash_output = preprocessor.run(lines) # 3.32μs -> 3.72μs (10.8% slower)
def test_paragraph_with_whitespace(preprocessor):
# Paragraph with trailing whitespace before list
lines = [
"Paragraph. ",
"- item"
]
expected = [
"Paragraph. ",
"",
"- item"
]
codeflash_output = preprocessor.run(lines) # 3.40μs -> 3.77μs (9.99% slower)
def test_list_with_large_number_marker(preprocessor):
# Ordered list with large number marker
lines = [
"Para.",
"1234567890. big number"
]
expected = [
"Para.",
"",
"1234567890. big number"
]
codeflash_output = preprocessor.run(lines) # 3.37μs -> 3.64μs (7.34% slower)
def test_list_with_leading_tabs(preprocessor):
# List with leading tabs
lines = [
"Para.",
"\t- tabbed item"
]
expected = [
"Para.",
"",
"\t- tabbed item"
]
codeflash_output = preprocessor.run(lines) # 3.12μs -> 3.66μs (14.7% slower)
def test_list_with_nonstandard_whitespace(preprocessor):
# List with mixed whitespace (tabs and spaces)
lines = [
"Para.",
" \t - mixed whitespace"
]
expected = [
"Para.",
"",
" \t - mixed whitespace"
]
codeflash_output = preprocessor.run(lines) # 3.14μs -> 3.65μs (14.0% slower)
def test_list_marker_with_no_space(preprocessor):
# List marker with no space after marker is NOT a list
lines = [
"Para.",
"-item not a list"
]
expected = [
"Para.",
"-item not a list"
]
codeflash_output = preprocessor.run(lines) # 2.82μs -> 3.46μs (18.7% slower)
def test_list_marker_with_multiple_spaces_before_marker(preprocessor):
# List marker with multiple spaces before marker
lines = [
"Para.",
" - indented item"
]
expected = [
"Para.",
"",
" - indented item"
]
codeflash_output = preprocessor.run(lines) # 3.21μs -> 3.74μs (14.4% slower)
def test_list_marker_with_unicode_whitespace(preprocessor):
# List marker with unicode whitespace (should not match)
lines = [
"Para.",
"\u2003- unicode space item"
]
expected = [
"Para.",
"\u2003- unicode space item"
]
codeflash_output = preprocessor.run(lines) # 3.70μs -> 4.28μs (13.5% slower)
def test_list_marker_with_leading_blank_line(preprocessor):
# List preceded by a blank line, after paragraph
lines = [
"Para.",
"",
"1. item"
]
expected = [
"Para.",
"",
"1. item"
]
codeflash_output = preprocessor.run(lines) # 2.43μs -> 2.91μs (16.4% slower)
def test_list_marker_with_trailing_blank_line(preprocessor):
# List followed by blank line
lines = [
"Para.",
"- item",
""
]
expected = [
"Para.",
"",
"- item",
""
]
codeflash_output = preprocessor.run(lines) # 3.70μs -> 4.25μs (12.8% slower)
def test_list_marker_with_only_spaces(preprocessor):
# List marker with only spaces (should not match as a list)
lines = [
"Para.",
" "
]
expected = [
"Para.",
" "
]
codeflash_output = preprocessor.run(lines) # 3.21μs -> 3.61μs (11.1% slower)
--------------------- Large Scale Test Cases ---------------------
def test_large_document_with_many_paragraphs_and_lists(preprocessor):
# Large document alternating paragraphs and lists
lines = []
expected = []
for i in range(500):
lines.append(f"Paragraph {i}")
lines.append(f"- item {i}")
expected.append(f"Paragraph {i}")
expected.append("")
expected.append(f"- item {i}")
codeflash_output = preprocessor.run(lines) # 308μs -> 206μs (49.5% faster)
def test_large_document_with_no_lists(preprocessor):
# Large document with no lists; should remain unchanged
lines = [f"Paragraph {i}" for i in range(1000)]
codeflash_output = preprocessor.run(lines) # 235μs -> 165μs (42.6% faster)
def test_large_document_with_all_lists(preprocessor):
# Large document with only lists; should remain unchanged
lines = [f"- item {i}" for i in range(1000)]
codeflash_output = preprocessor.run(lines) # 306μs -> 209μs (46.1% faster)
def test_large_document_with_lists_after_paragraphs_and_blank_lines(preprocessor):
# Large document with paragraphs, blank lines, and lists
lines = []
expected = []
for i in range(333):
lines.append(f"Paragraph {i}")
lines.append("")
lines.append(f"- item {i}")
expected.append(f"Paragraph {i}")
expected.append("")
expected.append(f"- item {i}")
codeflash_output = preprocessor.run(lines) # 195μs -> 131μs (48.6% faster)
def test_large_document_with_mixed_content(preprocessor):
# Large document with paragraphs, code blocks, lists, and blank lines
lines = []
expected = []
for i in range(250):
lines.append(f"Paragraph {i}")
lines.append(" code block")
lines.append(f"- item {i}")
expected.append(f"Paragraph {i}")
expected.append("")
expected.append(" code block")
expected.append("")
expected.append(f"- item {i}")
codeflash_output = preprocessor.run(lines) # 252μs -> 177μs (42.5% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re
imports
import pytest
from marimo._output.md_extensions.breakless_lists import
BreaklessListsPreprocessor
from markdown import Markdown, preprocessors
unit tests
class DummyMarkdown:
"""A dummy Markdown object for testing purposes."""
pass
@pytest.fixture
def preprocessor():
# Provides a fresh preprocessor for each test
return BreaklessListsPreprocessor(DummyMarkdown())
------------------- Basic Test Cases -------------------
def test_empty_input_returns_empty(preprocessor):
# Should return an empty list when given an empty list
codeflash_output = preprocessor.run([]) # 399ns -> 393ns (1.53% faster)
def test_single_line_no_list(preprocessor):
# Single non-list line should be unchanged
codeflash_output = preprocessor.run(["Hello world"]) # 1.03μs -> 2.13μs (51.9% slower)
def test_single_list_item(preprocessor):
# Single list item line should be unchanged
codeflash_output = preprocessor.run(["- item"]) # 1.02μs -> 1.87μs (45.5% slower)
def test_paragraph_followed_by_list(preprocessor):
# Should insert a blank line between paragraph and list
lines = [
"This is a paragraph.",
"- list item 1",
"- list item 2"
]
expected = [
"This is a paragraph.",
"",
"- list item 1",
"- list item 2"
]
codeflash_output = preprocessor.run(lines) # 3.98μs -> 4.26μs (6.39% slower)
def test_paragraph_followed_by_ordered_list(preprocessor):
# Should insert a blank line before ordered list
lines = [
"Paragraph.",
"1. first",
"2. second"
]
expected = [
"Paragraph.",
"",
"1. first",
"2. second"
]
codeflash_output = preprocessor.run(lines) # 3.58μs -> 4.00μs (10.6% slower)
def test_multiple_paragraphs_and_lists(preprocessor):
# Should insert blank lines before each list that follows a paragraph
lines = [
"First paragraph.",
"- list1",
"",
"Second paragraph.",
"1. list2",
"",
"Third paragraph.",
"+ list3"
]
expected = [
"First paragraph.",
"",
"- list1",
"",
"Second paragraph.",
"",
"1. list2",
"",
"Third paragraph.",
"",
"+ list3"
]
codeflash_output = preprocessor.run(lines) # 5.00μs -> 5.17μs (3.21% slower)
def test_blank_line_before_list_is_preserved(preprocessor):
# Should not add extra blank line if one already exists
lines = [
"Paragraph.",
"",
"* item"
]
expected = [
"Paragraph.",
"",
"* item"
]
codeflash_output = preprocessor.run(lines) # 2.17μs -> 2.76μs (21.3% slower)
def test_list_followed_by_paragraph(preprocessor):
# Should not insert blank line after list if not followed by another list
lines = [
"- item",
"Paragraph."
]
expected = [
"- item",
"Paragraph."
]
codeflash_output = preprocessor.run(lines) # 2.42μs -> 3.00μs (19.2% slower)
def test_list_followed_by_list(preprocessor):
# Should not insert blank line between consecutive list items
lines = [
"- item1",
"- item2"
]
expected = [
"- item1",
"- item2"
]
codeflash_output = preprocessor.run(lines) # 2.65μs -> 3.39μs (22.0% slower)
def test_multiple_lists_with_intervening_paragraphs(preprocessor):
# Should insert blank lines only before lists that follow paragraphs
lines = [
"Para 1.",
"+ list1",
"",
"Para 2.",
"* list2"
]
expected = [
"Para 1.",
"",
"+ list1",
"",
"Para 2.",
"",
"* list2"
]
codeflash_output = preprocessor.run(lines) # 3.85μs -> 4.16μs (7.29% slower)
------------------- Edge Test Cases -------------------
def test_leading_and_trailing_whitespace(preprocessor):
# Should handle lines with leading/trailing whitespace correctly
lines = [
" Paragraph with spaces. ",
" - list with indent",
"Next para",
" 1. ordered list"
]
expected = [
" Paragraph with spaces. ",
"",
" - list with indent",
"Next para",
"",
" 1. ordered list"
]
codeflash_output = preprocessor.run(lines) # 4.62μs -> 4.84μs (4.52% slower)
def test_list_marker_with_many_spaces(preprocessor):
# Should match list markers with multiple spaces after marker
lines = [
"Para.",
"- item with many spaces"
]
expected = [
"Para.",
"",
"- item with many spaces"
]
codeflash_output = preprocessor.run(lines) # 2.67μs -> 3.21μs (17.1% slower)
def test_list_marker_with_leading_spaces(preprocessor):
# Should match list markers with leading spaces before marker
lines = [
"Para.",
" * indented list"
]
expected = [
"Para.",
"",
" * indented list"
]
codeflash_output = preprocessor.run(lines) # 2.67μs -> 3.30μs (19.0% slower)
def test_list_marker_with_nonstandard_bullet(preprocessor):
# Should not match nonstandard list markers (e.g., not *, -, +, or number.)
lines = [
"Para.",
"# not a list"
]
expected = [
"Para.",
"# not a list"
]
codeflash_output = preprocessor.run(lines) # 2.48μs -> 2.97μs (16.7% slower)
def test_list_marker_with_dot_but_not_number(preprocessor):
# Should not match a dot not preceded by a number
lines = [
"Para.",
". not a list"
]
expected = [
"Para.",
". not a list"
]
codeflash_output = preprocessor.run(lines) # 2.42μs -> 3.02μs (19.9% slower)
def test_list_marker_with_number_and_no_dot(preprocessor):
# Should not match a number with no dot
lines = [
"Para.",
"1 not a list"
]
expected = [
"Para.",
"1 not a list"
]
codeflash_output = preprocessor.run(lines) # 2.42μs -> 3.05μs (20.8% slower)
def test_list_marker_with_number_and_dot_and_no_space(preprocessor):
# Should not match a number and dot with no space after
lines = [
"Para.",
"1.not a list"
]
expected = [
"Para.",
"1.not a list"
]
codeflash_output = preprocessor.run(lines) # 2.42μs -> 3.35μs (27.6% slower)
def test_only_blank_lines(preprocessor):
# Should return only blank lines unchanged
lines = ["", "", ""]
expected = ["", "", ""]
codeflash_output = preprocessor.run(lines) # 1.70μs -> 2.31μs (26.2% slower)
def test_list_at_start_of_document(preprocessor):
# Should not add blank line before list at start
lines = [
"* item1",
"* item2"
]
expected = [
"* item1",
"* item2"
]
codeflash_output = preprocessor.run(lines) # 2.79μs -> 3.45μs (19.3% slower)
def test_list_marker_in_middle_of_line(preprocessor):
# Should not match list marker in the middle of a line
lines = [
"This is a * not a list",
"Another line"
]
expected = [
"This is a * not a list",
"Another line"
]
codeflash_output = preprocessor.run(lines) # 2.38μs -> 2.93μs (18.6% slower)
def test_line_with_only_spaces(preprocessor):
# Should treat lines with only spaces as blank
lines = [
"Paragraph.",
" ",
"- list"
]
expected = [
"Paragraph.",
" ",
"- list"
]
codeflash_output = preprocessor.run(lines) # 3.13μs -> 3.75μs (16.4% slower)
def test_multiple_lists_no_paragraphs(preprocessor):
# Should not add blank lines between lists when no paragraph present
lines = [
"- item1",
"",
"+ item2",
"",
"1. item3"
]
expected = [
"- item1",
"",
"+ item2",
"",
"1. item3"
]
codeflash_output = preprocessor.run(lines) # 2.74μs -> 3.18μs (13.8% slower)
def test_blank_line_between_paragraph_and_list(preprocessor):
# Should not add a blank line if one already exists between paragraph and list
lines = [
"Paragraph.",
"",
"- item"
]
expected = [
"Paragraph.",
"",
"- item"
]
codeflash_output = preprocessor.run(lines) # 2.11μs -> 2.70μs (21.6% slower)
def test_paragraph_followed_by_non_list(preprocessor):
# Should not add blank line before non-list line
lines = [
"Paragraph.",
"Not a list"
]
expected = [
"Paragraph.",
"Not a list"
]
codeflash_output = preprocessor.run(lines) # 2.42μs -> 3.09μs (21.6% slower)
def test_list_marker_with_tab(preprocessor):
# Should match list marker with tab after marker
lines = [
"Paragraph.",
"-\titem"
]
expected = [
"Paragraph.",
"",
"-\titem"
]
codeflash_output = preprocessor.run(lines) # 2.85μs -> 3.44μs (17.3% slower)
------------------- Large Scale Test Cases -------------------
def test_large_document_with_many_paragraphs_and_lists(preprocessor):
# Should efficiently process a large document with many paragraphs and lists
lines = []
expected = []
for i in range(500):
lines.append(f"Paragraph {i}.")
lines.append(f"- list item {i}")
# The expected output should have a blank line inserted before each list item
for i in range(500):
expected.append(f"Paragraph {i}.")
expected.append("")
expected.append(f"- list item {i}")
codeflash_output = preprocessor.run(lines) # 287μs -> 198μs (44.7% faster)
def test_large_document_with_no_lists(preprocessor):
# Should efficiently process a large document with no lists
lines = [f"Paragraph {i}." for i in range(1000)]
expected = lines.copy()
codeflash_output = preprocessor.run(lines) # 233μs -> 165μs (41.3% faster)
def test_large_document_with_no_paragraphs(preprocessor):
# Should efficiently process a large document with only lists
lines = [f"- item {i}" for i in range(1000)]
expected = lines.copy()
codeflash_output = preprocessor.run(lines) # 311μs -> 218μs (42.4% faster)
def test_large_document_with_alternating_blank_and_list(preprocessor):
# Should not add extra blank lines when alternating blank and list
lines = []
expected = []
for i in range(500):
lines.append("")
lines.append(f"+ item {i}")
expected.append("")
expected.append(f"+ item {i}")
codeflash_output = preprocessor.run(lines) # 169μs -> 113μs (49.2% faster)
def test_large_document_with_complex_patterns(preprocessor):
# Mix paragraphs, blank lines, and lists in a complex pattern
lines = []
expected = []
for i in range(250):
lines.append(f"Paragraph {i}")
lines.append(f" - list {i}")
lines.append("")
lines.append(f"Another paragraph {i}")
lines.append(f"1. first ordered {i}")
expected.append(f"Paragraph {i}")
expected.append("")
expected.append(f" - list {i}")
expected.append("")
expected.append(f"Another paragraph {i}")
expected.append("")
expected.append(f"1. first ordered {i}")
codeflash_output = preprocessor.run(lines) # 350μs -> 232μs (50.5% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To edit these changes
git checkout codeflash/optimize-BreaklessListsPreprocessor.run-mhv5yi72and push.