Skip to content

Commit 8b5233a

Browse files
committed
more images for large pages
1 parent f9f0e16 commit 8b5233a

File tree

2 files changed

+53
-56
lines changed

2 files changed

+53
-56
lines changed

docs/guides/best-practices/sparse-primary-indexes.md

Lines changed: 39 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -428,67 +428,57 @@ On the test machine the path is `/Users/tomschreiber/Clickhouse/user_files/`
428428

429429
`cp /Users/tomschreiber/Clickhouse/store/85f/85f4ee68-6e28-4f08-98b1-7d8affa1d88c/all_1_9_4/primary.idx /Users/tomschreiber/Clickhouse/user_files/primary-hits_UserID_URL.idx`
430430

431-
<br/>
432-
433431
</ul>
434432

433+
<br/>
435434
Now we can inspect the content of the primary index via SQL:
436435
<ul>
437436
<li>Get amount of entries</li>
438437
`
439438
SELECT count( )<br/>FROM file('primary-hits_UserID_URL.idx', 'RowBinary', 'UserID UInt32, URL String');
440439
`
441-
442-
<br/>
443-
<br/>
444440
returns `1083`
445-
<br/>
446-
<br/>
441+
447442
<li>Get first two index marks</li>
448443
`
449444
SELECT UserID, URL<br/>FROM file('primary-hits_UserID_URL.idx', 'RowBinary', 'UserID UInt32, URL String')<br/>LIMIT 0, 2;
450445
`
451-
<br/>
452-
<br/>
446+
453447
returns
454-
<br/>
448+
455449
`
456450
240923, http://showtopics.html%3...<br/>
457451
4073710, http://mk.ru&pos=3_0
458452
`
459-
<br/>
460-
<br/>
453+
461454
<li>Get last index mark</li>
462455
`
463-
SELECT UserID, URL<br/>FROM file('primary-hits_UserID_URL.idx', 'RowBinary', 'UserID UInt32, URL String')<br/>LIMIT 1082, 1;
456+
SELECT UserID, URL FROM file('primary-hits_UserID_URL.idx', 'RowBinary', 'UserID UInt32, URL String')<br/>LIMIT 1082, 1;
464457
`
465-
<br/>
466-
<br/>
467458
returns
468-
<br/>
469459
`
470460
4292714039 │ http://sosyal-mansetleri...
471461
`
472-
473-
474-
475462
</ul>
476-
463+
<br/>
477464
This matches exactly our diagram of the primary index content for our example table:
478-
<img src={sparsePrimaryIndexes03b} class="image"/>
465+
466+
479467
</p>
480468
</details>
481469

482470

483471

484472
The primary key entries are called index marks because each index entry is marking the start of a specific data range. Specifically for the example table:
485-
- UserID index marks:<br/>
473+
- UserID index marks:
474+
486475
The stored `UserID` values in the primary index are sorted in ascending order.<br/>
487476
‘mark 1’ in the diagram above thus indicates that the `UserID` values of all table rows in granule 1, and in all following granules, are guaranteed to be greater than or equal to 4.073.710.
488477

489478
[As we will see later](#the-primary-index-is-used-for-selecting-granules), this global order enables ClickHouse to <a href="https://github.com/ClickHouse/ClickHouse/blob/22.3/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp#L1452" target="_blank">use a binary search algorithm</a> over the index marks for the first key column when a query is filtering on the first column of the primary key.
490479

491-
- URL index marks:<br/>
480+
- URL index marks:
481+
492482
The quite similar cardinality of the primary key columns `UserID` and `URL`
493483
means that the index marks for all key columns after the first column in general only indicate a data range as long as the predecessor key column value stays the same for all table rows within at least the current granule.<br/>
494484
For example, because the UserID values of mark 0 and mark 1 are different in the diagram above, ClickHouse can't assume that all URL values of all table rows in granule 0 are larger or equal to `'http://showtopics.html%3...'`. However, if the UserID values of mark 0 and mark 1 would be the same in the diagram above (meaning that the UserID value stays the same for all table rows within the granule 0), the ClickHouse could assume that all URL values of all table rows in granule 0 are larger or equal to `'http://showtopics.html%3...'`.
@@ -625,7 +615,7 @@ We discuss that second stage in more detail in the following section.
625615

626616
The following diagram illustrates a part of the primary index file for our table.
627617

628-
<img src={sparsePrimaryIndexes04} class="image"/>
618+
<Image img={sparsePrimaryIndexes04} size="lg" alt="Sparse Primary Indices 04" background="white"/>
629619

630620
As discussed above, via a binary search over the index’s 1083 UserID marks, mark 176 was identified. Its corresponding granule 176 can therefore possibly contain rows with a UserID column value of 749.927.693.
631621

@@ -646,7 +636,8 @@ To achieve this, ClickHouse needs to know the physical location of granule 176.
646636
In ClickHouse the physical locations of all granules for our table are stored in mark files. Similar to data files, there is one mark file per table column.
647637

648638
The following diagram shows the three mark files `UserID.mrk`, `URL.mrk`, and `EventTime.mrk` that store the physical locations of the granules for the table’s `UserID`, `URL`, and `EventTime` columns.
649-
<img src={sparsePrimaryIndexes05} class="image"/>
639+
640+
<Image img={sparsePrimaryIndexes05} size="lg" alt="Sparse Primary Indices 05" background="white"/>
650641

651642
We have discussed how the primary index is a flat uncompressed array file (primary.idx), containing index marks that are numbered starting at 0.
652643

@@ -697,7 +688,7 @@ The indirection provided by mark files avoids storing, directly within the prima
697688

698689
The following diagram and the text below illustrate how for our example query ClickHouse locates granule 176 in the UserID.bin data file.
699690

700-
<img src={sparsePrimaryIndexes06} class="image"/>
691+
<Image img={sparsePrimaryIndexes06} size="lg" alt="Sparse Primary Indices 06" background="white"/>
701692

702693
We discussed earlier in this guide that ClickHouse selected the primary index mark 176 and therefore granule 176 as possibly containing matching rows for our query.
703694

@@ -810,7 +801,8 @@ We have marked the key column values for the first table rows for each granule i
810801
**Predecessor key column has low(er) cardinality**<a name="generic-exclusion-search-fast"></a>
811802

812803
Suppose UserID had low cardinality. In this case it would be likely that the same UserID value is spread over multiple table rows and granules and therefore index marks. For index marks with the same UserID, the URL values for the index marks are sorted in ascending order (because the table rows are ordered first by UserID and then by URL). This allows efficient filtering as described below:
813-
<img src={sparsePrimaryIndexes07} class="image"/>
804+
805+
<Image img={sparsePrimaryIndexes07} size="lg" alt="Sparse Primary Indices 06" background="white"/>
814806

815807
There are three different scenarios for the granule selection process for our abstract sample data in the diagram above:
816808

@@ -824,7 +816,7 @@ There are three different scenarios for the granule selection process for our ab
824816

825817
When the UserID has high cardinality then it is unlikely that the same UserID value is spread over multiple table rows and granules. This means the URL values for the index marks are not monotonically increasing:
826818

827-
<img src={sparsePrimaryIndexes08} class="image"/>
819+
<Image img={sparsePrimaryIndexes08} size="lg" alt="Sparse Primary Indices 06" background="white"/>
828820

829821
As we can see in the diagram above, all shown marks whose URL values are smaller than W3 are getting selected for streaming its associated granule's rows into the ClickHouse engine.
830822

@@ -858,7 +850,7 @@ ALTER TABLE hits_UserID_URL MATERIALIZE INDEX url_skipping_index;
858850
```
859851
ClickHouse now created an additional index that is storing - per group of 4 consecutive [granules](#data-is-organized-into-granules-for-parallel-data-processing) (note the `GRANULARITY 4` clause in the `ALTER TABLE` statement above) - the minimum and maximum URL value:
860852

861-
<img src={sparsePrimaryIndexes13a} class="image"/>
853+
<Image img={sparsePrimaryIndexes13a} size="lg" alt="Sparse Primary Indices 13a" background="white"/>
862854

863855
The first index entry (‘mark 0’ in the diagram above) is storing the minimum and maximum URL values for the [rows belonging to the first 4 granules of our table](#data-is-organized-into-granules-for-parallel-data-processing).
864856

@@ -897,15 +889,16 @@ All three options will effectively duplicate our sample data into a additional t
897889
However, the three options differ in how transparent that additional table is to the user with respect to the routing of queries and insert statements.
898890

899891
When creating a **second table** with a different primary key then queries must be explicitly send to the table version best suited for the query, and new data must be inserted explicitly into both tables in order to keep the tables in sync:
900-
<img src={sparsePrimaryIndexes09a} class="image"/>
901892

893+
<Image img={sparsePrimaryIndexes09a} size="md" alt="Sparse Primary Indices 09a" background="white"/>
902894

903895
With a **materialized view** the additional table is implicitly created and data is automatically kept in sync between both tables:
904-
<img src={sparsePrimaryIndexes09b} class="image"/>
905896

897+
<Image img={sparsePrimaryIndexes09b} size="md" alt="Sparse Primary Indices 09b" background="white"/>
906898

907899
And the **projection** is the most transparent option because next to automatically keeping the implicitly created (and hidden) additional table in sync with data changes, ClickHouse will automatically choose the most effective table version for queries:
908-
<img src={sparsePrimaryIndexes09c} class="image"/>
900+
901+
<Image img={sparsePrimaryIndexes09c} size="md" alt="Sparse Primary Indices 09c" background="white"/>
909902

910903
In the following we discuss this three options for creating and using multiple primary indexes in more detail and with real examples.
911904

@@ -952,11 +945,11 @@ OPTIMIZE TABLE hits_URL_UserID FINAL;
952945

953946
Because we switched the order of the columns in the primary key, the inserted rows are now stored on disk in a different lexicographical order (compared to our [original table](#a-table-with-a-primary-key)) and therefore also the 1083 granules of that table are containing different values than before:
954947

955-
<img src={sparsePrimaryIndexes10} class="image"/>
948+
<Image img={sparsePrimaryIndexes10} size="lg" alt="Sparse Primary Indices 10" background="white"/>
956949

957950
This is the resulting primary key:
958951

959-
<img src={sparsePrimaryIndexes11} class="image"/>
952+
<Image img={sparsePrimaryIndexes11} size="lg" alt="Sparse Primary Indices 11" background="white"/>
960953

961954
That can now be used to significantly speed up the execution of our example query filtering on the URL column in order to calculate the top 10 users that most frequently clicked on the URL "http://public_search":
962955
```sql
@@ -1074,7 +1067,6 @@ Server Log:
10741067

10751068
We now have two tables. Optimized for speeding up queries filtering on `UserIDs`, and speeding up queries filtering on URLs, respectively:
10761069

1077-
<img src={sparsePrimaryIndexes12a} class="image"/>
10781070

10791071
### Option 2: Materialized Views {#option-2-materialized-views}
10801072

@@ -1105,11 +1097,11 @@ Ok.
11051097
- if new rows are inserted into the source table hits_UserID_URL, then that rows are automatically also inserted into the implicitly created table
11061098
- Effectively the implicitly created table has the same row order and primary index as the [secondary table that we created explicitly](/guides/best-practices/sparse-primary-indexes#option-1-secondary-tables):
11071099

1108-
<img src={sparsePrimaryIndexes12b1} class="image"/>
1100+
<Image img={sparsePrimaryIndexes12b1} size="lg" alt="Sparse Primary Indices 12b1" background="white"/>
11091101

11101102
ClickHouse is storing the [column data files](#data-is-stored-on-disk-ordered-by-primary-key-columns) (*.bin), the [mark files](#mark-files-are-used-for-locating-granules) (*.mrk2) and the [primary index](#the-primary-index-has-one-entry-per-granule) (primary.idx) of the implicitly created table in a special folder withing the ClickHouse server's data directory:
11111103

1112-
<img src={sparsePrimaryIndexes12b2} class="image"/>
1104+
<Image img={sparsePrimaryIndexes12b2} size="md" alt="Sparse Primary Indices 12b2" background="white"/>
11131105

11141106
:::
11151107

@@ -1189,11 +1181,12 @@ ALTER TABLE hits_UserID_URL
11891181
- please note that projections do not make queries that use ORDER BY more efficient, even if the ORDER BY matches the projection's ORDER BY statement (see https://github.com/ClickHouse/ClickHouse/issues/47333)
11901182
- Effectively the implicitly created hidden table has the same row order and primary index as the [secondary table that we created explicitly](/guides/best-practices/sparse-primary-indexes#option-1-secondary-tables):
11911183

1192-
<img src={sparsePrimaryIndexes12c1} class="image"/>
1184+
<Image img={sparsePrimaryIndexes12c1} size="lg" alt="Sparse Primary Indices 12c1" background="white"/>
11931185

11941186
ClickHouse is storing the [column data files](#data-is-stored-on-disk-ordered-by-primary-key-columns) (*.bin), the [mark files](#mark-files-are-used-for-locating-granules) (*.mrk2) and the [primary index](#the-primary-index-has-one-entry-per-granule) (primary.idx) of the hidden table in a special folder (marked in orange in the screenshot below) next to the source table's data files, mark files, and primary index files:
11951187

1196-
<img src={sparsePrimaryIndexes12c2} class="image"/>
1188+
<Image img={sparsePrimaryIndexes12c2} size="sm" alt="Sparse Primary Indices 12c2" background="white"/>
1189+
11971190
:::
11981191

11991192

@@ -1455,7 +1448,8 @@ Having a good compression ratio for the data of a table's column on disk not onl
14551448
In the following we illustrate why it's beneficial for the compression ratio of a table's columns to order the primary key columns by cardinality in ascending order.
14561449

14571450
The diagram below sketches the on-disk order of rows for a primary key where the key columns are ordered by cardinality in ascending order:
1458-
<img src={sparsePrimaryIndexes14a} class="image"/>
1451+
1452+
<Image img={sparsePrimaryIndexes14a} size="lg" alt="Sparse Primary Indices 14a" background="white"/>
14591453

14601454
We discussed that [the table's row data is stored on disk ordered by primary key columns](#data-is-stored-on-disk-ordered-by-primary-key-columns).
14611455

@@ -1466,7 +1460,8 @@ In general, a compression algorithm benefits from the run length of data (the mo
14661460
and locality (the more similar the data is, the better the compression ratio is).
14671461

14681462
In contrast to the diagram above, the diagram below sketches the on-disk order of rows for a primary key where the key columns are ordered by cardinality in descending order:
1469-
<img src={sparsePrimaryIndexes14b} class="image"/>
1463+
1464+
<Image img={sparsePrimaryIndexes14b} size="lg" alt="Sparse Primary Indices 14b" background="white"/>
14701465

14711466
Now the table's rows are first ordered by their `ch` value, and rows that have the same `ch` value are ordered by their `cl` value.
14721467
But because the first key column `ch` has high cardinality, it is unlikely that there are rows with the same `ch` value. And because of that is is also unlikely that `cl` values are ordered (locally - for rows with the same `ch` value).
@@ -1508,7 +1503,8 @@ And one way to identify and retrieve (a specific version of) the pasted content
15081503
The following diagram shows
15091504
- the insert order of rows when the content changes (for example because of keystrokes typing the text into the text-area) and
15101505
- the on-disk order of the data from the inserted rows when the `PRIMARY KEY (hash)` is used:
1511-
<img src={sparsePrimaryIndexes15a} class="image"/>
1506+
1507+
<Image img={sparsePrimaryIndexes15a} size="lg" alt="Sparse Primary Indices 15a" background="white"/>
15121508

15131509
Because the `hash` column is used as the primary key column
15141510
- specific rows can be retrieved [very quickly](#the-primary-index-is-used-for-selecting-granules), but
@@ -1523,7 +1519,7 @@ The following diagram shows
15231519
- the insert order of rows when the content changes (for example because of keystrokes typing the text into the text-area) and
15241520
- the on-disk order of the data from the inserted rows when the compound `PRIMARY KEY (fingerprint, hash)` is used:
15251521

1526-
<img src={sparsePrimaryIndexes15b} class="image"/>
1522+
<Image img={sparsePrimaryIndexes15b} size="lg" alt="Sparse Primary Indices 15b" background="white"/>
15271523

15281524
Now the rows on disk are first ordered by `fingerprint`, and for rows with the same fingerprint value, their `hash` value determines the final order.
15291525

0 commit comments

Comments
 (0)