⚡️ Speed up method poloniex.parse_ohlcv by 34%
#43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 34% (0.34x) speedup for
poloniex.parse_ohlcvinpython/ccxt/poloniex.py⏱️ Runtime :
6.44 milliseconds→4.82 milliseconds(best of54runs)📝 Explanation and details
The optimized code achieves a 33% speedup by eliminating overhead in the hot-path methods
safe_integerandsafe_numberused extensively in OHLCV parsing.Key optimizations:
Direct dictionary/list access in
safe_integer: Replaced the expensiveExchange.key_exists()call (which took 66% of execution time) with directisinstance()checks and key/index access. This eliminates the overhead of the utility function while preserving the same error handling through a unified try-catch block.Inlined value retrieval in
safe_number: Bypassed thesafe_string()intermediate call (which consumed 62.9% of execution time) by directly accessing dictionary/list values and converting to string inline. This removes an entire function call layer in the critical path.Optimized exception handling: Consolidated multiple exception types (
KeyError,IndexError,ValueError,TypeError) into single catch blocks, reducing branching overhead.Why this works: The original code used generic utility functions (
key_exists,safe_string) that added significant overhead through multiple function calls and redundant type checks. Sinceparse_ohlcvcalls these methods 11 times per OHLCV record, even small per-call savings compound dramatically.Test results show consistent improvements: 20-35% faster for typical numeric data, and up to 73% faster for edge cases with None values. Large batch operations (1000 records) see 33-34% improvements, making this optimization particularly valuable for high-throughput financial data processing where OHLCV parsing is in the critical path.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
import decimal
imports
import pytest
from ccxt.poloniex import poloniex
--- Unit tests for poloniex.parse_ohlcv ---
@pytest.fixture
def poloniex_instance():
return poloniex()
------------------ Basic Test Cases ------------------
def test_parse_ohlcv_spot_basic(poloniex_instance):
# Standard spot OHLCV array
ohlcv = [
"22814.01", # open
"22937.42", # high
"22832.57", # low
"22937.42", # close
"3916.58764051", # base volume
"0.171199", # quote volume
"2982.64647063", # buy base volume
"0.130295", # buy quote volume
33, # trade count
0, # taker buy base volume
"22877.449915304470460711", # weighted average price
"MINUTE_5", # granularity
1659664800000, # start time (ms)
1659665099999 # end time (ms)
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.47μs -> 5.63μs (32.7% faster)
def test_parse_ohlcv_contract_basic(poloniex_instance):
# Standard contract OHLCV array
ohlcv = [
"84207.02", # open
"84320.85", # high
"84207.02", # low
"84253.83", # close
"3707.5395", # volume
"44", # count
"14", # ??? (unknown)
"1740770040000", # start time (ms)
"1740770099999" # end time (ms)
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.82μs -> 5.35μs (27.4% faster)
def test_parse_ohlcv_spot_with_ints_and_floats(poloniex_instance):
# Spot OHLCV with mixed int/float types
ohlcv = [
100.0, # open
110, # high
90, # low
105, # close
1000, # base volume
500, # quote volume
700, # buy base volume
300, # buy quote volume
20, # trade count
0, # taker buy base volume
105.0, # weighted avg price
"MINUTE_1", # granularity
1234567890, # start time (ms)
1234567999 # end time (ms)
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 8.57μs -> 7.06μs (21.3% faster)
def test_parse_ohlcv_contract_with_ints_and_floats(poloniex_instance):
# Contract OHLCV with ints/floats
ohlcv = [
200, # open
210, # high
190, # low
205, # close
10000, # volume
42, # count
7, # ??? (unknown)
9876543210, # start time (ms)
9876543999 # end time (ms)
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.49μs -> 5.84μs (28.3% faster)
------------------ Edge Test Cases ------------------
def test_parse_ohlcv_spot_missing_fields(poloniex_instance):
# Spot OHLCV with missing fields (shorter than expected)
ohlcv = [
"100", "110", "90", "105", "1000", "500", "700", "300", 20, 0, "105", "MINUTE_1", 1234567890
# Only 13 elements, missing end time (should still parse)
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.20μs -> 5.54μs (29.9% faster)
def test_parse_ohlcv_spot_extra_fields(poloniex_instance):
# Spot OHLCV with extra fields (should ignore extras)
ohlcv = [
"100", "110", "90", "105", "1000", "500", "700", "300", 20, 0, "105", "MINUTE_1", 1234567890, 1234567999, "extra", "fields"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.30μs -> 5.87μs (24.5% faster)
def test_parse_ohlcv_contract_extra_fields(poloniex_instance):
# Contract OHLCV with extra fields
ohlcv = [
"200", "210", "190", "205", "10000", "42", "7", "9876543210", "9876543999", "extra", "fields"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.94μs -> 5.56μs (24.8% faster)
def test_parse_ohlcv_spot_non_numeric_strings(poloniex_instance):
# Spot OHLCV with non-numeric strings in numeric fields
ohlcv = [
"open", "high", "low", "close", "base", "quote", "buybase", "buyquote", "count", "taker", "wap", "gran", "timestamp", "end"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 10.1μs -> 8.94μs (13.3% faster)
def test_parse_ohlcv_contract_non_numeric_strings(poloniex_instance):
# Contract OHLCV with non-numeric strings
ohlcv = [
"open", "high", "low", "close", "vol", "count", "unk", "timestamp", "end"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 9.89μs -> 8.69μs (13.8% faster)
def test_parse_ohlcv_spot_all_none(poloniex_instance):
# Spot OHLCV with all None
ohlcv = [None] * 14
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 4.65μs -> 2.68μs (73.5% faster)
def test_parse_ohlcv_contract_all_none(poloniex_instance):
# Contract OHLCV with all None
ohlcv = [None] * 9
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 4.46μs -> 2.69μs (65.9% faster)
def test_parse_ohlcv_spot_large_batch(poloniex_instance):
# Test parsing a large batch of spot OHLCV data (1000 elements)
ohlcv_template = [
"100", "110", "90", "105", "1000", "500", "700", "300", 20, 0, "105", "MINUTE_1", 1234567890, 1234567999
]
batch = [list(ohlcv_template) for _ in range(1000)]
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in batch]
for result in results:
pass
def test_parse_ohlcv_contract_large_batch(poloniex_instance):
# Test parsing a large batch of contract OHLCV data (1000 elements)
ohlcv_template = [
"200", "210", "190", "205", "10000", "42", "7", "9876543210", "9876543999"
]
batch = [list(ohlcv_template) for _ in range(1000)]
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in batch]
for result in results:
pass
def test_parse_ohlcv_spot_large_randomized(poloniex_instance):
# Large batch with randomized numeric values
import random
batch = []
for i in range(1000):
base = random.uniform(100, 200)
high = base + random.uniform(0, 10)
low = base - random.uniform(0, 10)
close = base + random.uniform(-5, 5)
quote = random.uniform(1, 1000)
ts = 1234567890 + i
ohlcv = [
str(base), str(high), str(low), str(close), "1000", str(quote), "700", "300", 20, 0, "105", "MINUTE_1", ts, ts+9
]
batch.append(ohlcv)
for ohlcv in batch:
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 3.19ms -> 2.39ms (33.3% faster)
def test_parse_ohlcv_contract_large_randomized(poloniex_instance):
# Large batch with randomized numeric values for contract
import random
batch = []
for i in range(1000):
base = random.uniform(100, 200)
high = base + random.uniform(0, 10)
low = base - random.uniform(0, 10)
close = base + random.uniform(-5, 5)
count = random.randint(1, 100)
ts = 9876543210 + i
ohlcv = [
str(base), str(high), str(low), str(close), "10000", str(count), "7", str(ts), str(ts+9)
]
batch.append(ohlcv)
for ohlcv in batch:
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 3.06ms -> 2.28ms (34.5% faster)
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest
from ccxt.poloniex import poloniex
unit tests
@pytest.fixture
def poloniex_instance():
return poloniex()
-----------------------------
Basic Test Cases
-----------------------------
def test_spot_ohlcv_basic(poloniex_instance):
# Typical spot OHLCV input
ohlcv = [
"22814.01", "22937.42", "22832.57", "22937.42", "3916.58764051", "0.171199",
"2982.64647063", "0.130295", 33, 0, "22877.449915304470460711", "MINUTE_5",
1659664800000, 1659665099999
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.08μs -> 5.79μs (22.4% faster)
def test_contract_ohlcv_basic(poloniex_instance):
# Typical contract OHLCV input
ohlcv = [
"84207.02", "84320.85", "84207.02", "84253.83", "3707.5395",
"44", "14", "1740770040000", "1740770099999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.02μs -> 5.28μs (33.0% faster)
def test_spot_ohlcv_with_integers(poloniex_instance):
# Spot OHLCV with integer values
ohlcv = [
10000, 11000, 10500, 10900, 100, 5,
50, 2, 10, 0, 10500, "MINUTE_5", 1600000000000, 1600000002999
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.45μs -> 6.10μs (22.1% faster)
def test_contract_ohlcv_with_integers(poloniex_instance):
# Contract OHLCV with integer values
ohlcv = [
10000, 11000, 10500, 10900, 100, 5, 50, 1600000000000, 1600000002999
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 7.62μs -> 5.96μs (27.9% faster)
def test_spot_ohlcv_with_float_strings(poloniex_instance):
# Spot OHLCV with float strings
ohlcv = [
"10000.5", "11000.7", "10500.2", "10900.8", "100.1", "5.3",
"50.1", "2.2", 10, 0, "10500.2", "MINUTE_5", "1600000000000", "1600000002999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.69μs -> 5.21μs (28.4% faster)
-----------------------------
Edge Test Cases
-----------------------------
def test_spot_ohlcv_missing_fields(poloniex_instance):
# Spot OHLCV missing last field (timestamp)
ohlcv = [
"10000.5", "11000.7", "10500.2", "10900.8", "100.1", "5.3",
"50.1", "2.2", 10, 0, "10500.2", "MINUTE_5", None
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.20μs -> 4.81μs (28.8% faster)
def test_contract_ohlcv_missing_fields(poloniex_instance):
# Contract OHLCV missing timestamp field
ohlcv = [
"10000.5", "11000.7", "10500.2", "10900.8", "100.1", "5.3", "2.2", None, "1600000002999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.13μs -> 4.87μs (26.0% faster)
def test_spot_ohlcv_extra_fields(poloniex_instance):
# Spot OHLCV with extra fields
ohlcv = [
"10000.5", "11000.7", "10500.2", "10900.8", "100.1", "5.3",
"50.1", "2.2", 10, 0, "10500.2", "MINUTE_5", "1600000000000", "1600000002999", "EXTRA"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.78μs -> 5.08μs (33.4% faster)
def test_contract_ohlcv_extra_fields(poloniex_instance):
# Contract OHLCV with extra fields
ohlcv = [
"10000.5", "11000.7", "10500.2", "10900.8", "100.1", "5.3", "2.2", "1600000000000", "1600000002999", "EXTRA"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.63μs -> 5.48μs (20.9% faster)
def test_spot_ohlcv_non_numeric(poloniex_instance):
# Spot OHLCV with non-numeric values
ohlcv = [
"foo", "bar", "baz", "qux", "quux", "corge",
"grault", "garply", "waldo", "fred", "plugh", "xyzzy", "1600000000000", "1600000002999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 9.83μs -> 8.40μs (17.0% faster)
def test_contract_ohlcv_non_numeric(poloniex_instance):
# Contract OHLCV with non-numeric values
ohlcv = [
"foo", "bar", "baz", "qux", "quux", "corge", "grault", "1600000000000", "1600000002999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 9.69μs -> 8.07μs (20.0% faster)
def test_spot_ohlcv_short_length(poloniex_instance):
# Spot OHLCV with less than 13 elements
ohlcv = ["22814.01", "22937.42", "22832.57", "22937.42", "3916.58764051", "0.171199"]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.74μs -> 5.38μs (25.2% faster)
def test_contract_ohlcv_short_length(poloniex_instance):
# Contract OHLCV with less than 9 elements
ohlcv = ["84207.02", "84320.85", "84207.02", "84253.83", "3707.5395", "44", "14"]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 6.76μs -> 5.46μs (24.0% faster)
def test_spot_ohlcv_all_none(poloniex_instance):
# Spot OHLCV all None
ohlcv = [None] * 14
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 4.66μs -> 2.78μs (67.6% faster)
def test_contract_ohlcv_all_none(poloniex_instance):
# Contract OHLCV all None
ohlcv = [None] * 9
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 4.68μs -> 2.75μs (70.4% faster)
def test_spot_ohlcv_mixed_types(poloniex_instance):
# Spot OHLCV with mixed types
ohlcv = [
"10000", 11000.7, 10500, "10900.8", "100.1", 5,
"50.1", 2, 10, 0, "10500.2", "MINUTE_5", "1600000000000", "1600000002999"
]
codeflash_output = poloniex_instance.parse_ohlcv(ohlcv); result = codeflash_output # 9.59μs -> 8.21μs (16.8% faster)
-----------------------------
Large Scale Test Cases
-----------------------------
def test_large_scale_spot_ohlcv(poloniex_instance):
# Large scale: 1000 spot OHLCV entries with incremental values
base = [
"10000", "11000", "10500", "10900", "100", "5",
"50", "2", 10, 0, "10500", "MINUTE_5", "1600000000000", "1600000002999"
]
# Each entry's timestamp increases by 60000 (1 minute)
ohlcvs = []
for i in range(1000):
entry = base[:]
entry[2] = str(10500 + i) # open
entry[1] = str(11000 + i) # high
entry[0] = str(10000 + i) # low
entry[3] = str(10900 + i) # close
entry[5] = str(5 + i) # volume
entry[12] = str(1600000000000 + i * 60000)
ohlcvs.append(entry)
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in ohlcvs]
def test_large_scale_contract_ohlcv(poloniex_instance):
# Large scale: 1000 contract OHLCV entries with incremental values
base = [
"10000", "11000", "10500", "10900", "100", "5", "50", "1600000000000", "1600000002999"
]
ohlcvs = []
for i in range(1000):
entry = base[:]
entry[2] = str(10500 + i) # open
entry[1] = str(11000 + i) # high
entry[0] = str(10000 + i) # low
entry[3] = str(10900 + i) # close
entry[5] = str(5 + i) # volume
entry[7] = str(1600000000000 + i * 60000)
ohlcvs.append(entry)
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in ohlcvs]
def test_large_scale_spot_ohlcv_all_none(poloniex_instance):
# Large scale: 1000 spot OHLCV entries, all None
ohlcvs = [[None] * 14 for _ in range(1000)]
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in ohlcvs]
def test_large_scale_contract_ohlcv_all_none(poloniex_instance):
# Large scale: 1000 contract OHLCV entries, all None
ohlcvs = [[None] * 9 for _ in range(1000)]
results = [poloniex_instance.parse_ohlcv(ohlcv) for ohlcv in ohlcvs]
codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
To edit these changes
git checkout codeflash/optimize-poloniex.parse_ohlcv-mhufvefland push.