Skip to content

Commit 7eeaea4

Browse files
committed
docs: binary formats specs
1 parent 318e9cf commit 7eeaea4

File tree

3 files changed

+135
-0
lines changed

3 files changed

+135
-0
lines changed

.vitepress/config.mts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ export default defineConfig({
55
title: "Stochastix",
66
description: "High-Performance Quantitative Backtesting Engine",
77
head: [['link', { rel: 'icon', href: '/favicon.ico' }]],
8+
lastUpdated: true,
89
themeConfig: {
910
// https://vitepress.dev/reference/default-theme-config
1011
nav: [
@@ -47,6 +48,8 @@ export default defineConfig({
4748
items: [
4849
{ text: 'Downloading Market Data', link: '/data-downloading' },
4950
{ text: 'Inspecting & Validating Data', link: '/data-validation' },
51+
{ text: 'Format Specification: OHLCV Data', link: '/spec-stchx' },
52+
{ text: 'Format Specification: Time-Series Data', link: '/spec-timeseries' },
5053
]
5154
},
5255
{

spec-stchx.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Format Specification: OHLCV Data <Badge type="tip" text=".stchx" />
2+
3+
## 1. Overview and Purpose
4+
5+
This document specifies Version 1 of a binary file format ("STCHXBF1") designed for storing OHLCV (Open, High, Low, Close, Volume) time-series data. The primary use case for this format is within a custom financial backtesting framework, emphasizing efficient storage and fast data retrieval, especially for time-based ranges.
6+
7+
All multi-byte numerical values in this format are stored in **Big-Endian** (network byte order).
8+
9+
## 2. File Structure
10+
11+
The binary file is composed of two main sections:
12+
13+
1. **Header Section:** A 64-byte block at the beginning of the file containing metadata about the data series.
14+
2. **Data Records Section:** A sequence of fixed-size (48-byte) data records, each representing one OHLCV data point.
15+
16+
```
17+
+------------------------+
18+
| Header Section | (64 bytes)
19+
+------------------------+
20+
| Data Record 1 | (48 bytes)
21+
+------------------------+
22+
| Data Record 2 | (48 bytes)
23+
+------------------------+
24+
| ... |
25+
+------------------------+
26+
| Data Record N | (48 bytes)
27+
+------------------------+
28+
```
29+
30+
## 3. Header Section Definition (Version 1)
31+
32+
The header is a fixed-size block of 64 bytes.
33+
34+
| Offset (Bytes) | Length (Bytes) | Field Name | Data Type | Description / Value for Version 1 |
35+
|:---------------|:---------------|:-----------------------|:-------------|:-----------------------------------------------------------------------------|
36+
| 0 | 8 | Magic Number | ASCII String | "STCHXBF1" (Identifies file as STCHX Binary Format v1) |
37+
| 8 | 2 | Format Version | `uint16_t` | `1` |
38+
| 10 | 2 | Header Length | `uint16_t` | `64` (Total size of this header in bytes for v1) |
39+
| 12 | 2 | Record Length | `uint16_t` | `48` (Size of one OHLCV data record in bytes for v1) |
40+
| 14 | 1 | Timestamp Format Code | `uint8_t` | `1` (Indicates: 8-byte Unix Timestamp in seconds, Big-Endian `uint64_t`) |
41+
| 15 | 1 | OHLCV Data Format Code | `uint8_t` | `1` (Indicates: 8-byte IEEE 754 Double Precision, Big-Endian, for O,H,L,C,V) |
42+
| 16 | 8 | Number of Data Records | `uint64_t` | Total count of OHLCV records in the file |
43+
| 24 | 16 | Symbol / Instrument | ASCII String | e.g., "EURUSDT\0\0\0\0\0\0\0" (Null-padded if shorter than 16 bytes) |
44+
| 40 | 4 | Timeframe | ASCII String | e.g., "M1\0\0" (Null-padded if shorter than 4 bytes) |
45+
| 44 | 20 | Reserved | Bytes | Set to null bytes (`\0`). Reserved for future use. |
46+
47+
## 4. Data Records Section Definition (Version 1)
48+
49+
This section starts immediately after the 64-byte header. It contains a sequence of `Number of Data Records` entries. Each record is 48 bytes long.
50+
51+
**Crucial Assumption:** Data records **MUST** be sorted chronologically by the `Timestamp` field in ascending order.
52+
53+
**Structure of a Single Data Record (48 bytes):**
54+
55+
| Offset within Record (Bytes) | Length (Bytes) | Field Name | Data Type | Description |
56+
|:-----------------------------|:---------------|:------------|:-----------|:------------------------------------------------|
57+
| 0 | 8 | Timestamp | `uint64_t` | Unix timestamp in seconds (UTC) since epoch. |
58+
| 8 | 8 | Open Price | `double` | Opening price. |
59+
| 16 | 8 | High Price | `double` | Highest price during the period. |
60+
| 24 | 8 | Low Price | `double` | Lowest price during the period. |
61+
| 32 | 8 | Close Price | `double` | Closing price. |
62+
| 40 | 8 | Volume | `double` | Traded volume (using `double` for flexibility). |

spec-timeseries.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Format Specification: Time-Series Data <Badge type="tip" text=".stchxi" /> <Badge type="tip" text=".stchxm" />
2+
3+
## 1. Overview and Purpose
4+
5+
After a backtest run, Stochastix saves time-aligned data series for indicators and performance metrics (like the equity curve) into specialized binary files. This is done to provide a compact, high-performance storage solution for the potentially large volume of data required for plotting and analysis, keeping it separate from the main JSON summary results.
6+
7+
* **`.stchxi`**: Stores indicator data series (e.g., EMA, MACD values).
8+
* **`.stchxm`**: Stores performance metric series (e.g., Equity, Drawdown).
9+
10+
Both formats share the same underlying structure, with very minor differences noted below. All multi-byte numerical values are stored in **Big-Endian** (network byte order).
11+
12+
## 2. File Structure
13+
14+
The binary file is composed of four main sections:
15+
16+
1. **Header Section:** A fixed-size block containing metadata about the file's contents.
17+
2. **Timestamp Block:** A single, contiguous block of all timestamps for which values were calculated. This serves as the master time index for all series in the file.
18+
3. **Series Directory:** A block of records defining each individual data series contained in the file.
19+
4. **Data Blocks:** A final section containing the actual values, with one contiguous block of data per series defined in the directory.
20+
21+
```
22+
+---------------------------+
23+
| Header Section | (64 bytes)
24+
+---------------------------+
25+
| Timestamp Block | (Timestamp Count * 8 bytes)
26+
+---------------------------+
27+
| Series Directory | (Series Count * 64 bytes)
28+
+---------------------------+
29+
| Data Block for Series 1 | (Timestamp Count * 8 bytes)
30+
+---------------------------+
31+
| ... |
32+
+---------------------------+
33+
| Data Block for Series N | (Timestamp Count * 8 bytes)
34+
+---------------------------+
35+
```
36+
37+
## 3. Header Section Definition (Version 1)
38+
39+
The header is a fixed-size block of 64 bytes.
40+
41+
| Offset (Bytes) | Length (Bytes) | Field Name | Data Type | Description / Value for Version 1 |
42+
|:---------------|:---------------|:------------------|:-------------|:--------------------------------------------------------------|
43+
| 0 | 8 | Magic Number | ASCII String | `"STCHXI01"` (for Indicator files) or `"STCHXM01"` (for Metric files) |
44+
| 8 | 2 | Format Version | `uint16_t` | `1` |
45+
| 10 | 1 | Value Format Code | `uint8_t` | `1` (Indicates: 8-byte IEEE 754 Double Precision, Big-Endian) |
46+
| 11 | 1 | Padding Byte | `uint8_t` | Null byte for alignment. |
47+
| 12 | 4 | Series Count | `uint32_t` | The total number of unique data series in the file. |
48+
| 16 | 8 | Timestamp Count | `uint64_t` | The number of records/timestamps in each series. |
49+
| 24 | 40 | Reserved | Bytes | Set to null bytes (`\0`). Reserved for future use. |
50+
51+
## 4. Timestamp Block Definition
52+
53+
This section starts immediately after the 64-byte header. It contains a single, unbroken sequence of `Timestamp Count` records. Each timestamp is an 8-byte `uint64_t` representing the Unix timestamp in seconds (UTC). This block acts as the time-axis for all data series that follow.
54+
55+
## 5. Series Directory Definition
56+
57+
This section follows the Timestamp Block. It contains `Series Count` records, each 64 bytes long, defining the properties of each data series stored in the file.
58+
59+
**Structure of a Single Series Directory Entry (64 bytes):**
60+
61+
| Offset within Record | Length (Bytes) | Field Name | Data Type | Description |
62+
|:---------------------|:---------------|:--------------|:-------------|:---------------------------------------------------------------------------------------|
63+
| 0 | 32 | Primary Key | ASCII String | The main key. For `.stchxi` this is the **Indicator Key** (e.g., "ema_short"). For `.stchxm` this is the **Metric Key** (e.g., "equity"). Null-padded. |
64+
| 32 | 32 | Series Key | ASCII String | The specific series from the entity (e.g., "value", "macd", "signal"). Null-padded. |
65+
66+
## 6. Data Blocks Definition
67+
68+
This section starts immediately after the Series Directory. It contains `Series Count` contiguous blocks of data. Each block contains `Timestamp Count` values, corresponding to one of the series defined in the directory. The order of the data blocks must match the order of the entries in the Series Directory.
69+
70+
For Version 1, each value is an 8-byte, Big-Endian `double`. A value of `NAN` (Not A Number) is used to represent null or non-existent data points for a given timestamp.

0 commit comments

Comments
 (0)