Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 11, 2025

📄 34% (0.34x) speedup for poloniex.parse_ohlcv in python/ccxt/poloniex.py

⏱️ Runtime : 6.44 milliseconds 4.82 milliseconds (best of 54 runs)

📝 Explanation and details

The optimized code achieves a 33% speedup by eliminating overhead in the hot-path methods safe_integer and safe_number used extensively in OHLCV parsing.

Key optimizations:

  1. Direct dictionary/list access in safe_integer: Replaced the expensive Exchange.key_exists() call (which took 66% of execution time) with direct isinstance() checks and key/index access. This eliminates the overhead of the utility function while preserving the same error handling through a unified try-catch block.

  2. Inlined value retrieval in safe_number: Bypassed the safe_string() intermediate call (which consumed 62.9% of execution time) by directly accessing dictionary/list values and converting to string inline. This removes an entire function call layer in the critical path.

  3. Optimized exception handling: Consolidated multiple exception types (KeyError, IndexError, ValueError, TypeError) into single catch blocks, reducing branching overhead.

Why this works: The original code used generic utility functions (key_exists, safe_string) that added significant overhead through multiple function calls and redundant type checks. Since parse_ohlcv calls these methods 11 times per OHLCV record, even small per-call savings compound dramatically.

Test results show consistent improvements: 20-35% faster for typical numeric data, and up to 73% faster for edge cases with None values. Large batch operations (1000 records) see 33-34% improvements, making this optimization particularly valuable for high-throughput financial data processing where OHLCV parsing is in the critical path.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2085 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime

import decimal

imports

import pytest
from ccxt.poloniex import poloniex

--- Unit tests for poloniex.parse_ohlcv ---

@pytest.fixture
def poloniex_instance():
return poloniex()

------------------ Basic Test Cases ------------------

def test_parse_ohlcv_spot_basic(poloniex_instance):
# Standard spot OHLCV array
ohlcv = [
"22814.01", # open
"22937.42", # high
"22832.57", # low
"22937.42", # close
"3916.58764051", # base volume
"0.171199", # quote volume
"2982.64647063", # buy base volume
"0.130295", # buy quote volume
33, # trade count
0, # taker buy base volume
"22877.449915304470460711", # weighted average price
"MINUTE_5", # granularity
1659664800000, # start time (ms)
1659665099999 # end time (ms)
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.47μs -> 5.63μs (32.7% faster)

def test_parse_ohlcv_contract_basic(poloniex_instance):
# Standard contract OHLCV array
ohlcv = [
"84207.02", # open
"84320.85", # high
"84207.02", # low
"84253.83", # close
"3707.5395", # volume
"44", # count
"14", # ??? (unknown)
"1740770040000", # start time (ms)
"1740770099999" # end time (ms)
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.82μs -> 5.35μs (27.4% faster)

def test_parse_ohlcv_spot_with_ints_and_floats(poloniex_instance):
# Spot OHLCV with mixed int/float types
ohlcv = [
100.0, # open
110, # high
90, # low
105, # close
1000, # base volume
500, # quote volume
700, # buy base volume
300, # buy quote volume
20, # trade count
0, # taker buy base volume
105.0, # weighted avg price
"MINUTE_1", # granularity
1234567890, # start time (ms)
1234567999 # end time (ms)
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 8.57μs -> 7.06μs (21.3% faster)

def test_parse_ohlcv_contract_with_ints_and_floats(poloniex_instance):
# Contract OHLCV with ints/floats
ohlcv = [
200, # open
210, # high
190, # low
205, # close
10000, # volume
42, # count
7, # ??? (unknown)
9876543210, # start time (ms)
9876543999 # end time (ms)
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.49μs -> 5.84μs (28.3% faster)

------------------ Edge Test Cases ------------------

def test_parse_ohlcv_spot_missing_fields(poloniex_instance):
# Spot OHLCV with missing fields (shorter than expected)
ohlcv = [
"100", "110", "90", "105", "1000", "500", "700", "300", 20, 0, "105", "MINUTE_1", 1234567890
# Only 13 elements, missing end time (should still parse)
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.20μs -> 5.54μs (29.9% faster)

def test_parse_ohlcv_spot_extra_fields(poloniex_instance):
# Spot OHLCV with extra fields (should ignore extras)
ohlcv = [
"100", "110", "90", "105", "1000", "500", "700", "300", 20, 0, "105", "MINUTE_1", 1234567890, 1234567999, "extra", "fields"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.30μs -> 5.87μs (24.5% faster)

def test_parse_ohlcv_contract_extra_fields(poloniex_instance):
# Contract OHLCV with extra fields
ohlcv = [
"200", "210", "190", "205", "10000", "42", "7", "9876543210", "9876543999", "extra", "fields"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.94μs -> 5.56μs (24.8% faster)

def test_parse_ohlcv_spot_non_numeric_strings(poloniex_instance):
# Spot OHLCV with non-numeric strings in numeric fields
ohlcv = [
"open", "high", "low", "close", "base", "quote", "buybase", "buyquote", "count", "taker", "wap", "gran", "timestamp", "end"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 10.1μs -> 8.94μs (13.3% faster)

def test_parse_ohlcv_contract_non_numeric_strings(poloniex_instance):
# Contract OHLCV with non-numeric strings
ohlcv = [
"open", "high", "low", "close", "vol", "count", "unk", "timestamp", "end"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 9.89μs -> 8.69μs (13.8% faster)

def test_parse_ohlcv_spot_all_none(poloniex_instance):
# Spot OHLCV with all None
ohlcv = [None] * 14
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 4.65μs -> 2.68μs (73.5% faster)

def test_parse_ohlcv_contract_all_none(poloniex_instance):
# Contract OHLCV with all None
ohlcv = [None] * 9
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 4.46μs -> 2.69μs (65.9% faster)

def test_parse_ohlcv_spot_large_batch(poloniex_instance):
# Test parsing a large batch of spot OHLCV data (1000 elements)
ohlcv_template = [
"100", "110", "90", "105", "1000", "500", "700", "300", 20, 0, "105", "MINUTE_1", 1234567890, 1234567999
]
batch = [list(ohlcv_template) for _ in range(1000)]
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in batch]
for result in results:
pass

def test_parse_ohlcv_contract_large_batch(poloniex_instance):
# Test parsing a large batch of contract OHLCV data (1000 elements)
ohlcv_template = [
"200", "210", "190", "205", "10000", "42", "7", "9876543210", "9876543999"
]
batch = [list(ohlcv_template) for _ in range(1000)]
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in batch]
for result in results:
pass

def test_parse_ohlcv_spot_large_randomized(poloniex_instance):
# Large batch with randomized numeric values
import random
batch = []
for i in range(1000):
base = random.uniform(100, 200)
high = base + random.uniform(0, 10)
low = base - random.uniform(0, 10)
close = base + random.uniform(-5, 5)
quote = random.uniform(1, 1000)
ts = 1234567890 + i
ohlcv = [
str(base), str(high), str(low), str(close), "1000", str(quote), "700", "300", 20, 0, "105", "MINUTE_1", ts, ts+9
]
batch.append(ohlcv)
for ohlcv in batch:
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 3.19ms -> 2.39ms (33.3% faster)

def test_parse_ohlcv_contract_large_randomized(poloniex_instance):
# Large batch with randomized numeric values for contract
import random
batch = []
for i in range(1000):
base = random.uniform(100, 200)
high = base + random.uniform(0, 10)
low = base - random.uniform(0, 10)
close = base + random.uniform(-5, 5)
count = random.randint(1, 100)
ts = 9876543210 + i
ohlcv = [
str(base), str(high), str(low), str(close), "10000", str(count), "7", str(ts), str(ts+9)
]
batch.append(ohlcv)
for ohlcv in batch:
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 3.06ms -> 2.28ms (34.5% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
import pytest
from ccxt.poloniex import poloniex

unit tests

@pytest.fixture
def poloniex_instance():
return poloniex()

-----------------------------

Basic Test Cases

-----------------------------

def test_spot_ohlcv_basic(poloniex_instance):
# Typical spot OHLCV input
ohlcv = [
"22814.01", "22937.42", "22832.57", "22937.42", "3916.58764051", "0.171199",
"2982.64647063", "0.130295", 33, 0, "22877.449915304470460711", "MINUTE_5",
1659664800000, 1659665099999
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.08μs -> 5.79μs (22.4% faster)

def test_contract_ohlcv_basic(poloniex_instance):
# Typical contract OHLCV input
ohlcv = [
"84207.02", "84320.85", "84207.02", "84253.83", "3707.5395",
"44", "14", "1740770040000", "1740770099999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.02μs -> 5.28μs (33.0% faster)

def test_spot_ohlcv_with_integers(poloniex_instance):
# Spot OHLCV with integer values
ohlcv = [
10000, 11000, 10500, 10900, 100, 5,
50, 2, 10, 0, 10500, "MINUTE_5", 1600000000000, 1600000002999
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.45μs -> 6.10μs (22.1% faster)

def test_contract_ohlcv_with_integers(poloniex_instance):
# Contract OHLCV with integer values
ohlcv = [
10000, 11000, 10500, 10900, 100, 5, 50, 1600000000000, 1600000002999
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.62μs -> 5.96μs (27.9% faster)

def test_spot_ohlcv_with_float_strings(poloniex_instance):
# Spot OHLCV with float strings
ohlcv = [
"10000.5", "11000.7", "10500.2", "10900.8", "100.1", "5.3",
"50.1", "2.2", 10, 0, "10500.2", "MINUTE_5", "1600000000000", "1600000002999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.69μs -> 5.21μs (28.4% faster)

-----------------------------

Edge Test Cases

-----------------------------

def test_spot_ohlcv_missing_fields(poloniex_instance):
# Spot OHLCV missing last field (timestamp)
ohlcv = [
"10000.5", "11000.7", "10500.2", "10900.8", "100.1", "5.3",
"50.1", "2.2", 10, 0, "10500.2", "MINUTE_5", None
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.20μs -> 4.81μs (28.8% faster)

def test_contract_ohlcv_missing_fields(poloniex_instance):
# Contract OHLCV missing timestamp field
ohlcv = [
"10000.5", "11000.7", "10500.2", "10900.8", "100.1", "5.3", "2.2", None, "1600000002999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.13μs -> 4.87μs (26.0% faster)

def test_spot_ohlcv_extra_fields(poloniex_instance):
# Spot OHLCV with extra fields
ohlcv = [
"10000.5", "11000.7", "10500.2", "10900.8", "100.1", "5.3",
"50.1", "2.2", 10, 0, "10500.2", "MINUTE_5", "1600000000000", "1600000002999", "EXTRA"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.78μs -> 5.08μs (33.4% faster)

def test_contract_ohlcv_extra_fields(poloniex_instance):
# Contract OHLCV with extra fields
ohlcv = [
"10000.5", "11000.7", "10500.2", "10900.8", "100.1", "5.3", "2.2", "1600000000000", "1600000002999", "EXTRA"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.63μs -> 5.48μs (20.9% faster)

def test_spot_ohlcv_non_numeric(poloniex_instance):
# Spot OHLCV with non-numeric values
ohlcv = [
"foo", "bar", "baz", "qux", "quux", "corge",
"grault", "garply", "waldo", "fred", "plugh", "xyzzy", "1600000000000", "1600000002999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 9.83μs -> 8.40μs (17.0% faster)

def test_contract_ohlcv_non_numeric(poloniex_instance):
# Contract OHLCV with non-numeric values
ohlcv = [
"foo", "bar", "baz", "qux", "quux", "corge", "grault", "1600000000000", "1600000002999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 9.69μs -> 8.07μs (20.0% faster)

def test_spot_ohlcv_short_length(poloniex_instance):
# Spot OHLCV with less than 13 elements
ohlcv = ["22814.01", "22937.42", "22832.57", "22937.42", "3916.58764051", "0.171199"]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.74μs -> 5.38μs (25.2% faster)

def test_contract_ohlcv_short_length(poloniex_instance):
# Contract OHLCV with less than 9 elements
ohlcv = ["84207.02", "84320.85", "84207.02", "84253.83", "3707.5395", "44", "14"]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.76μs -> 5.46μs (24.0% faster)

def test_spot_ohlcv_all_none(poloniex_instance):
# Spot OHLCV all None
ohlcv = [None] * 14
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 4.66μs -> 2.78μs (67.6% faster)

def test_contract_ohlcv_all_none(poloniex_instance):
# Contract OHLCV all None
ohlcv = [None] * 9
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 4.68μs -> 2.75μs (70.4% faster)

def test_spot_ohlcv_mixed_types(poloniex_instance):
# Spot OHLCV with mixed types
ohlcv = [
"10000", 11000.7, 10500, "10900.8", "100.1", 5,
"50.1", 2, 10, 0, "10500.2", "MINUTE_5", "1600000000000", "1600000002999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 9.59μs -> 8.21μs (16.8% faster)

-----------------------------

Large Scale Test Cases

-----------------------------

def test_large_scale_spot_ohlcv(poloniex_instance):
# Large scale: 1000 spot OHLCV entries with incremental values
base = [
"10000", "11000", "10500", "10900", "100", "5",
"50", "2", 10, 0, "10500", "MINUTE_5", "1600000000000", "1600000002999"
]
# Each entry's timestamp increases by 60000 (1 minute)
ohlcvs = []
for i in range(1000):
entry = base[:]
entry[2] = str(10500 + i) # open
entry[1] = str(11000 + i) # high
entry[0] = str(10000 + i) # low
entry[3] = str(10900 + i) # close
entry[5] = str(5 + i) # volume
entry[12] = str(1600000000000 + i * 60000)
ohlcvs.append(entry)
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in ohlcvs]

def test_large_scale_contract_ohlcv(poloniex_instance):
# Large scale: 1000 contract OHLCV entries with incremental values
base = [
"10000", "11000", "10500", "10900", "100", "5", "50", "1600000000000", "1600000002999"
]
ohlcvs = []
for i in range(1000):
entry = base[:]
entry[2] = str(10500 + i) # open
entry[1] = str(11000 + i) # high
entry[0] = str(10000 + i) # low
entry[3] = str(10900 + i) # close
entry[5] = str(5 + i) # volume
entry[7] = str(1600000000000 + i * 60000)
ohlcvs.append(entry)
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in ohlcvs]

def test_large_scale_spot_ohlcv_all_none(poloniex_instance):
# Large scale: 1000 spot OHLCV entries, all None
ohlcvs = [[None] * 14 for _ in range(1000)]
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in ohlcvs]

def test_large_scale_contract_ohlcv_all_none(poloniex_instance):
# Large scale: 1000 contract OHLCV entries, all None
ohlcvs = [[None] * 9 for _ in range(1000)]
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in ohlcvs]

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-poloniex.parse_ohlcv-mhufvefl and push.

Codeflash

The optimized code achieves a **33% speedup** by eliminating overhead in the hot-path methods `safe_integer` and `safe_number` used extensively in OHLCV parsing.

**Key optimizations:**

1. **Direct dictionary/list access in `safe_integer`**: Replaced the expensive `Exchange.key_exists()` call (which took 66% of execution time) with direct `isinstance()` checks and key/index access. This eliminates the overhead of the utility function while preserving the same error handling through a unified try-catch block.

2. **Inlined value retrieval in `safe_number`**: Bypassed the `safe_string()` intermediate call (which consumed 62.9% of execution time) by directly accessing dictionary/list values and converting to string inline. This removes an entire function call layer in the critical path.

3. **Optimized exception handling**: Consolidated multiple exception types (`KeyError`, `IndexError`, `ValueError`, `TypeError`) into single catch blocks, reducing branching overhead.

**Why this works**: The original code used generic utility functions (`key_exists`, `safe_string`) that added significant overhead through multiple function calls and redundant type checks. Since `parse_ohlcv` calls these methods 11 times per OHLCV record, even small per-call savings compound dramatically.

**Test results show consistent improvements**: 20-35% faster for typical numeric data, and up to 73% faster for edge cases with None values. Large batch operations (1000 records) see 33-34% improvements, making this optimization particularly valuable for high-throughput financial data processing where OHLCV parsing is in the critical path.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 11, 2025 10:39
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant