-
Notifications
You must be signed in to change notification settings - Fork 50
docs: add SSTS_MANIFEST and SSTS_STORAGE system tables #2198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 6 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
61b3445
Initial plan
Copilot fbcee26
Add documentation for SSTS_MANIFEST and SSTS_STORAGE tables
Copilot 1c69242
Fix Chinese translation to use consistent terminology for 'manifest'
Copilot d198e42
Add SSTS_MANIFEST and SSTS_STORAGE to sidebar navigation
Copilot 96e0b0f
Fix Markdown style to match SSTS_INDEX_META reference
Copilot f97e7f8
Align Chinese terminology with SSTS_INDEX_META reference
Copilot 5281403
Apply suggestions from code review
WenyXu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,139 @@ | ||
| --- | ||
| keywords: [SST manifest, SST files, region files, file metadata, table data files] | ||
| description: Provides access to SST (Sorted String Table) file information from the manifest, including file paths, sizes, time ranges, and row counts. | ||
| --- | ||
|
|
||
| # SSTS_MANIFEST | ||
|
|
||
| The `SSTS_MANIFEST` table provides access to SST (Sorted String Table) file information collected from the manifest. This table surfaces detailed information about each SST file, including file paths, sizes, levels, time ranges, and row counts. | ||
|
|
||
| :::tip NOTE | ||
| This table is not available on [GreptimeCloud](https://greptime.cloud/). | ||
| ::: | ||
|
|
||
| ```sql | ||
| USE INFORMATION_SCHEMA; | ||
| DESC SSTS_MANIFEST; | ||
| ``` | ||
|
|
||
| The output is as follows: | ||
|
|
||
| ```sql | ||
| +------------------+---------------------+-----+------+---------+---------------+ | ||
| | Column | Type | Key | Null | Default | Semantic Type | | ||
| +------------------+---------------------+-----+------+---------+---------------+ | ||
| | table_dir | String | | NO | | FIELD | | ||
| | region_id | UInt64 | | NO | | FIELD | | ||
| | table_id | UInt32 | | NO | | FIELD | | ||
| | region_number | UInt32 | | NO | | FIELD | | ||
| | region_group | UInt8 | | NO | | FIELD | | ||
| | region_sequence | UInt32 | | NO | | FIELD | | ||
| | file_id | String | | NO | | FIELD | | ||
| | level | UInt8 | | NO | | FIELD | | ||
| | file_path | String | | NO | | FIELD | | ||
| | file_size | UInt64 | | NO | | FIELD | | ||
| | index_file_path | String | | YES | | FIELD | | ||
| | index_file_size | UInt64 | | YES | | FIELD | | ||
| | num_rows | UInt64 | | NO | | FIELD | | ||
| | num_row_groups | UInt64 | | NO | | FIELD | | ||
| | min_ts | TimestampNanosecond | | YES | | FIELD | | ||
| | max_ts | TimestampNanosecond | | YES | | FIELD | | ||
| | sequence | UInt64 | | YES | | FIELD | | ||
| | origin_region_id | UInt64 | | NO | | FIELD | | ||
| | node_id | UInt64 | | YES | | FIELD | | ||
| | visible | Boolean | | NO | | FIELD | | ||
| +------------------+---------------------+-----+------+---------+---------------+ | ||
| ``` | ||
|
|
||
| Fields in the `SSTS_MANIFEST` table are described as follows: | ||
|
|
||
| - `table_dir`: The directory path of the table. | ||
| - `region_id`: The ID of the region that refers to the file. | ||
| - `table_id`: The ID of the table. | ||
| - `region_number`: The region number within the table. | ||
| - `region_group`: The group identifier for the region. | ||
| - `region_sequence`: The sequence number of the region. | ||
| - `file_id`: The unique identifier of the SST file (UUID). | ||
| - `level`: The SST level in the LSM tree (0 for uncompacted, 1 for compacted). | ||
| - `file_path`: The full path to the SST file in object storage. | ||
| - `file_size`: The size of the SST file in bytes. | ||
| - `index_file_path`: The full path to the index file in object storage (if exists). | ||
| - `index_file_size`: The size of the index file in bytes (if exists). | ||
| - `num_rows`: The number of rows in the SST file. | ||
| - `num_row_groups`: The number of row groups in the SST file. | ||
| - `min_ts`: The minimum timestamp in the SST file. | ||
| - `max_ts`: The maximum timestamp in the SST file. | ||
| - `sequence`: The sequence number associated with this file. | ||
| - `origin_region_id`: The ID of the region that created the file. | ||
| - `node_id`: The ID of the datanode where the file is located. | ||
| - `visible`: Whether this file is visible in the current version. | ||
|
|
||
| ## Examples | ||
|
|
||
| Query all SST files in the manifest: | ||
|
|
||
| ```sql | ||
| SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST; | ||
| ``` | ||
|
|
||
| Query SST files for a specific table by joining with the `TABLES` table: | ||
|
|
||
| ```sql | ||
| SELECT s.* | ||
| FROM INFORMATION_SCHEMA.SSTS_MANIFEST s | ||
| JOIN INFORMATION_SCHEMA.TABLES t ON s.table_id = t.table_id | ||
| WHERE t.table_name = 'my_table'; | ||
| ``` | ||
|
|
||
| Query only compacted SST files (level 1): | ||
|
|
||
| ```sql | ||
| SELECT file_path, file_size, num_rows, level | ||
| FROM INFORMATION_SCHEMA.SSTS_MANIFEST | ||
| WHERE level = 1; | ||
| ``` | ||
|
|
||
| Query SST files with their time ranges: | ||
|
|
||
| ```sql | ||
| SELECT table_id, file_path, num_rows, min_ts, max_ts | ||
| FROM INFORMATION_SCHEMA.SSTS_MANIFEST | ||
| ORDER BY table_id, min_ts; | ||
| ``` | ||
|
|
||
| Calculate total SST file size per table: | ||
|
|
||
| ```sql | ||
| SELECT table_id, COUNT(*) as sst_count, SUM(file_size) as total_size | ||
| FROM INFORMATION_SCHEMA.SSTS_MANIFEST | ||
| GROUP BY table_id; | ||
| ``` | ||
|
|
||
|
|
||
| Output example: | ||
|
|
||
| ```sql | ||
| mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST LIMIT 1\G; | ||
| *************************** 1. row *************************** | ||
| table_dir: data/greptime/public/1024/ | ||
| region_id: 4398046511104 | ||
| table_id: 1024 | ||
| region_number: 0 | ||
| region_group: 0 | ||
| region_sequence: 0 | ||
| file_id: 01234567-89ab-cdef-0123-456789abcdef | ||
| level: 0 | ||
| file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet | ||
| file_size: 1234 | ||
| index_file_path: data/greptime/public/1024/4398046511104_0/index/01234567-89ab-cdef-0123-456789abcdef.puffin | ||
| index_file_size: 256 | ||
| num_rows: 100 | ||
| num_row_groups: 1 | ||
| min_ts: 2025-01-01 00:00:00.000000000 | ||
| max_ts: 2025-01-01 00:01:00.000000000 | ||
| sequence: 1 | ||
| origin_region_id: 4398046511104 | ||
| node_id: 0 | ||
| visible: true | ||
| 1 row in set (0.02 sec) | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| --- | ||
| keywords: [SST storage, SST files, file listing, storage layer, object storage] | ||
| description: Provides access to SST (Sorted String Table) file information from the storage layer, including file paths, sizes, and last modified timestamps. | ||
| --- | ||
|
|
||
| # SSTS_STORAGE | ||
|
|
||
| The `SSTS_STORAGE` table provides access to SST (Sorted String Table) file information listed directly from the storage layer. This table shows raw file metadata from object storage, which may include files that are not yet reflected in the manifest or files that have been orphaned. | ||
|
|
||
| :::tip NOTE | ||
| This table is not available on [GreptimeCloud](https://greptime.cloud/). | ||
| ::: | ||
|
|
||
| ```sql | ||
| USE INFORMATION_SCHEMA; | ||
| DESC SSTS_STORAGE; | ||
| ``` | ||
|
|
||
| The output is as follows: | ||
|
|
||
| ```sql | ||
| +------------------+----------------------+-----+------+---------+---------------+ | ||
| | Column | Type | Key | Null | Default | Semantic Type | | ||
| +------------------+----------------------+-----+------+---------+---------------+ | ||
| | file_path | String | | NO | | FIELD | | ||
| | file_size | UInt64 | | YES | | FIELD | | ||
| | last_modified_ms | TimestampMillisecond | | YES | | FIELD | | ||
| | node_id | UInt64 | | YES | | FIELD | | ||
| +------------------+----------------------+-----+------+---------+---------------+ | ||
| ``` | ||
|
|
||
| Fields in the `SSTS_STORAGE` table are described as follows: | ||
|
|
||
| - `file_path`: The full path to the file in object storage. | ||
| - `file_size`: The size of the file in bytes (if available from storage). | ||
| - `last_modified_ms`: The last modified time in milliseconds since epoch (if available from storage). | ||
| - `node_id`: The ID of the datanode where the file is located. | ||
|
|
||
| ## Use Cases | ||
|
|
||
| The `SSTS_STORAGE` table is useful for: | ||
|
|
||
| - **Storage verification**: Compare files in storage against the manifest to detect orphaned files or inconsistencies. | ||
| - **Storage debugging**: Identify files that exist in storage but may not be properly tracked in the manifest. | ||
| - **Cleanup operations**: Find and remove orphaned SST files that are no longer referenced. | ||
| - **Storage auditing**: Get a complete view of all SST files in the storage layer. | ||
|
|
||
| ## Examples | ||
|
|
||
| Query all SST files in storage: | ||
|
|
||
| ```sql | ||
| SELECT * FROM INFORMATION_SCHEMA.SSTS_STORAGE; | ||
| ``` | ||
|
|
||
| Find files in storage that are not in the manifest (potential orphaned files): | ||
|
|
||
| ```sql | ||
| SELECT s.file_path, s.file_size, s.last_modified_ms | ||
| FROM INFORMATION_SCHEMA.SSTS_STORAGE s | ||
| LEFT JOIN INFORMATION_SCHEMA.SSTS_MANIFEST m ON s.file_path = m.file_path | ||
| WHERE m.file_path IS NULL; | ||
| ``` | ||
|
|
||
| Find the largest SST files in storage: | ||
|
|
||
| ```sql | ||
| SELECT file_path, file_size | ||
| FROM INFORMATION_SCHEMA.SSTS_STORAGE | ||
| WHERE file_size IS NOT NULL | ||
| ORDER BY file_size DESC | ||
| LIMIT 10; | ||
| ``` | ||
|
|
||
| Calculate total storage usage by SST files: | ||
|
|
||
| ```sql | ||
| SELECT COUNT(*) as file_count, SUM(file_size) as total_size | ||
| FROM INFORMATION_SCHEMA.SSTS_STORAGE | ||
| WHERE file_size IS NOT NULL; | ||
| ``` | ||
|
|
||
|
|
||
| Output example: | ||
|
|
||
| ```sql | ||
| mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_STORAGE LIMIT 1\G; | ||
| *************************** 1. row *************************** | ||
| file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet | ||
| file_size: 1234 | ||
| last_modified_ms: 2025-01-01 00:00:00.000 | ||
| node_id: 0 | ||
| 1 row in set (0.02 sec) | ||
| ``` | ||
|
|
||
| ## Differences from SSTS_MANIFEST | ||
|
|
||
| | Aspect | SSTS_MANIFEST | SSTS_STORAGE | | ||
| |--------|---------------|--------------| | ||
| | **Data Source** | Manifest metadata | Storage layer directly | | ||
| | **Information** | Detailed SST metadata (rows, time ranges, etc.) | Basic file metadata only | | ||
| | **File Coverage** | Only files tracked in manifest | All files in storage | | ||
| | **Use Case** | Query SST metadata for analysis | Verify storage, find orphaned files | | ||
| | **Performance** | Fast (reads from manifest) | Slower (scans storage) | | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
139 changes: 139 additions & 0 deletions
139
...s-plugin-content-docs/current/reference/sql/information-schema/ssts-manifest.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,139 @@ | ||
| --- | ||
| keywords: [SST manifest, SST 文件, region 文件, 文件元数据, 表数据文件] | ||
| description: 提供从 manifest 中获取的 SST(排序字符串表)文件信息,包括文件路径、大小、时间范围和行数。 | ||
| --- | ||
|
|
||
| # SSTS_MANIFEST | ||
|
|
||
| `SSTS_MANIFEST` 表提供从清单中收集的 SST(排序字符串表)文件信息。此表显示每个 SST 文件的详细信息,包括文件路径、大小、级别、时间范围和行数。 | ||
|
|
||
| :::tip 注意 | ||
| 此表在 [GreptimeCloud](https://greptime.cloud/) 上不可用。 | ||
| ::: | ||
|
|
||
| ```sql | ||
| USE INFORMATION_SCHEMA; | ||
| DESC SSTS_MANIFEST; | ||
| ``` | ||
|
|
||
| 输出如下: | ||
|
|
||
| ```sql | ||
| +------------------+---------------------+-----+------+---------+---------------+ | ||
| | Column | Type | Key | Null | Default | Semantic Type | | ||
| +------------------+---------------------+-----+------+---------+---------------+ | ||
| | table_dir | String | | NO | | FIELD | | ||
| | region_id | UInt64 | | NO | | FIELD | | ||
| | table_id | UInt32 | | NO | | FIELD | | ||
| | region_number | UInt32 | | NO | | FIELD | | ||
| | region_group | UInt8 | | NO | | FIELD | | ||
| | region_sequence | UInt32 | | NO | | FIELD | | ||
| | file_id | String | | NO | | FIELD | | ||
| | level | UInt8 | | NO | | FIELD | | ||
| | file_path | String | | NO | | FIELD | | ||
| | file_size | UInt64 | | NO | | FIELD | | ||
| | index_file_path | String | | YES | | FIELD | | ||
| | index_file_size | UInt64 | | YES | | FIELD | | ||
| | num_rows | UInt64 | | NO | | FIELD | | ||
| | num_row_groups | UInt64 | | NO | | FIELD | | ||
| | min_ts | TimestampNanosecond | | YES | | FIELD | | ||
| | max_ts | TimestampNanosecond | | YES | | FIELD | | ||
| | sequence | UInt64 | | YES | | FIELD | | ||
| | origin_region_id | UInt64 | | NO | | FIELD | | ||
| | node_id | UInt64 | | YES | | FIELD | | ||
| | visible | Boolean | | NO | | FIELD | | ||
| +------------------+---------------------+-----+------+---------+---------------+ | ||
| ``` | ||
|
|
||
| `SSTS_MANIFEST` 表中的字段描述如下: | ||
|
|
||
| - `table_dir`:表的目录路径。 | ||
| - `region_id`:引用该文件的 Region ID。 | ||
| - `table_id`:表的 ID。 | ||
| - `region_number`:表中的 Region 编号。 | ||
| - `region_group`:Region 的组标识符。 | ||
| - `region_sequence`:Region 的序列号。 | ||
| - `file_id`:SST 文件的唯一标识符(UUID)。 | ||
| - `level`:LSM 树中的 SST 级别(0 表示未压缩,1 表示已压缩)。 | ||
| - `file_path`:对象存储中 SST 文件的完整路径。 | ||
| - `file_size`:SST 文件的大小(字节)。 | ||
| - `index_file_path`:对象存储中索引文件的完整路径(如果存在)。 | ||
| - `index_file_size`:索引文件的大小(字节,如果存在)。 | ||
| - `num_rows`:SST 文件中的行数。 | ||
| - `num_row_groups`:SST 文件中的行组数。 | ||
| - `min_ts`:SST 文件中的最小时间戳。 | ||
| - `max_ts`:SST 文件中的最大时间戳。 | ||
| - `sequence`:与此文件关联的序列号。 | ||
| - `origin_region_id`:创建该文件的 Region ID。 | ||
| - `node_id`:文件所在的数据节点 ID。 | ||
| - `visible`:该文件在当前版本中是否可见。 | ||
|
|
||
| ## 示例 | ||
|
|
||
| 查询清单中的所有 SST 文件: | ||
|
|
||
| ```sql | ||
| SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST; | ||
| ``` | ||
|
|
||
| 通过与 `TABLES` 表连接查询特定表的 SST 文件: | ||
|
|
||
| ```sql | ||
| SELECT s.* | ||
| FROM INFORMATION_SCHEMA.SSTS_MANIFEST s | ||
| JOIN INFORMATION_SCHEMA.TABLES t ON s.table_id = t.table_id | ||
| WHERE t.table_name = 'my_table'; | ||
| ``` | ||
|
|
||
| 仅查询已压缩的 SST 文件(级别 1): | ||
|
|
||
| ```sql | ||
| SELECT file_path, file_size, num_rows, level | ||
| FROM INFORMATION_SCHEMA.SSTS_MANIFEST | ||
| WHERE level = 1; | ||
| ``` | ||
|
|
||
| 查询 SST 文件及其时间范围: | ||
|
|
||
| ```sql | ||
| SELECT table_id, file_path, num_rows, min_ts, max_ts | ||
| FROM INFORMATION_SCHEMA.SSTS_MANIFEST | ||
| ORDER BY table_id, min_ts; | ||
| ``` | ||
|
|
||
| 计算每个表的 SST 文件总大小: | ||
|
|
||
| ```sql | ||
| SELECT table_id, COUNT(*) as sst_count, SUM(file_size) as total_size | ||
| FROM INFORMATION_SCHEMA.SSTS_MANIFEST | ||
| GROUP BY table_id; | ||
| ``` | ||
|
|
||
|
|
||
| 输出样例: | ||
|
|
||
| ```sql | ||
| mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST LIMIT 1\G; | ||
| *************************** 1. row *************************** | ||
| table_dir: data/greptime/public/1024/ | ||
| region_id: 4398046511104 | ||
| table_id: 1024 | ||
| region_number: 0 | ||
| region_group: 0 | ||
| region_sequence: 0 | ||
| file_id: 01234567-89ab-cdef-0123-456789abcdef | ||
| level: 0 | ||
| file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet | ||
| file_size: 1234 | ||
| index_file_path: data/greptime/public/1024/4398046511104_0/index/01234567-89ab-cdef-0123-456789abcdef.puffin | ||
| index_file_size: 256 | ||
| num_rows: 100 | ||
| num_row_groups: 1 | ||
| min_ts: 2025-01-01 00:00:00.000000000 | ||
| max_ts: 2025-01-01 00:01:00.000000000 | ||
| sequence: 1 | ||
| origin_region_id: 4398046511104 | ||
| node_id: 0 | ||
| visible: true | ||
| 1 row in set (0.02 sec) | ||
| ``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.