Skip to content

Commit 58a6084

Browse files
CopilotWenyXu
andauthored
docs: add SSTS_MANIFEST and SSTS_STORAGE system tables (#2198)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: WenyXu <32535939+WenyXu@users.noreply.github.com> Co-authored-by: Weny Xu <wenymedia@gmail.com>
1 parent 5b3763d commit 58a6084

File tree

7 files changed

+492
-0
lines changed

7 files changed

+492
-0
lines changed

docs/reference/sql/information-schema/overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,5 @@ There is still lots of work to do for `INFORMATION_SCHEMA`. The tracking [issue]
6464
| [`PROCEDURE_INFO`](./procedure-info.md) | Procedure information.|
6565
| [`PROCESS_LIST`](./process-list.md) | Running queries information.|
6666
| [`SSTS_INDEX_META`](./ssts-index-meta.md) | Provides SST index metadata including inverted indexes, fulltext indexes, and bloom filters.|
67+
| [`SSTS_MANIFEST`](./ssts-manifest.md) | Provides SST file information from the manifest including file paths, sizes, time ranges, and row counts.|
68+
| [`SSTS_STORAGE`](./ssts-storage.md) | Provides SST file information from the storage layer for verification and debugging.|
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
keywords: [SST manifest, SST files, region files, file metadata, table data files]
3+
description: Provides access to SST (Sorted String Table) file information from the manifest, including file paths, sizes, time ranges, and row counts.
4+
---
5+
6+
# SSTS_MANIFEST
7+
8+
The `SSTS_MANIFEST` table provides access to SST (Sorted String Table) file information collected from the manifest. This table surfaces detailed information about each SST file, including file paths, sizes, levels, time ranges, and row counts.
9+
10+
:::tip NOTE
11+
This table is not available on [GreptimeCloud](https://greptime.cloud/).
12+
:::
13+
14+
```sql
15+
USE INFORMATION_SCHEMA;
16+
DESC SSTS_MANIFEST;
17+
```
18+
19+
The output is as follows:
20+
21+
```sql
22+
+------------------+---------------------+-----+------+---------+---------------+
23+
| Column | Type | Key | Null | Default | Semantic Type |
24+
+------------------+---------------------+-----+------+---------+---------------+
25+
| table_dir | String | | NO | | FIELD |
26+
| region_id | UInt64 | | NO | | FIELD |
27+
| table_id | UInt32 | | NO | | FIELD |
28+
| region_number | UInt32 | | NO | | FIELD |
29+
| region_group | UInt8 | | NO | | FIELD |
30+
| region_sequence | UInt32 | | NO | | FIELD |
31+
| file_id | String | | NO | | FIELD |
32+
| level | UInt8 | | NO | | FIELD |
33+
| file_path | String | | NO | | FIELD |
34+
| file_size | UInt64 | | NO | | FIELD |
35+
| index_file_path | String | | YES | | FIELD |
36+
| index_file_size | UInt64 | | YES | | FIELD |
37+
| num_rows | UInt64 | | NO | | FIELD |
38+
| num_row_groups | UInt64 | | NO | | FIELD |
39+
| min_ts | TimestampNanosecond | | YES | | FIELD |
40+
| max_ts | TimestampNanosecond | | YES | | FIELD |
41+
| sequence | UInt64 | | YES | | FIELD |
42+
| origin_region_id | UInt64 | | NO | | FIELD |
43+
| node_id | UInt64 | | YES | | FIELD |
44+
| visible | Boolean | | NO | | FIELD |
45+
+------------------+---------------------+-----+------+---------+---------------+
46+
```
47+
48+
Fields in the `SSTS_MANIFEST` table are described as follows:
49+
50+
- `table_dir`: The directory path of the table.
51+
- `region_id`: The ID of the region that refers to the file.
52+
- `table_id`: The ID of the table.
53+
- `region_number`: The region number within the table.
54+
- `region_group`: The group identifier for the region.
55+
- `region_sequence`: The sequence number of the region.
56+
- `file_id`: The unique identifier of the SST file (UUID).
57+
- `level`: The SST level in the LSM tree (0 for uncompacted, 1 for compacted).
58+
- `file_path`: The full path to the SST file in object storage.
59+
- `file_size`: The size of the SST file in bytes.
60+
- `index_file_path`: The full path to the index file in object storage (if exists).
61+
- `index_file_size`: The size of the index file in bytes (if exists).
62+
- `num_rows`: The number of rows in the SST file.
63+
- `num_row_groups`: The number of row groups in the SST file.
64+
- `min_ts`: The minimum timestamp in the SST file.
65+
- `max_ts`: The maximum timestamp in the SST file.
66+
- `sequence`: The sequence number associated with this file.
67+
- `origin_region_id`: The ID of the region that created the file.
68+
- `node_id`: The ID of the datanode where the file is located.
69+
- `visible`: Whether this file is visible in the current version.
70+
71+
## Examples
72+
73+
Query all SST files in the manifest:
74+
75+
```sql
76+
SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST;
77+
```
78+
79+
Query SST files for a specific table by joining with the `TABLES` table:
80+
81+
```sql
82+
SELECT s.*
83+
FROM INFORMATION_SCHEMA.SSTS_MANIFEST s
84+
JOIN INFORMATION_SCHEMA.TABLES t ON s.table_id = t.table_id
85+
WHERE t.table_name = 'my_table';
86+
```
87+
88+
Query only compacted SST files (level 1):
89+
90+
```sql
91+
SELECT file_path, file_size, num_rows, level
92+
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
93+
WHERE level = 1;
94+
```
95+
96+
Query SST files with their time ranges:
97+
98+
```sql
99+
SELECT table_id, file_path, num_rows, min_ts, max_ts
100+
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
101+
ORDER BY table_id, min_ts;
102+
```
103+
104+
Calculate total SST file size per table:
105+
106+
```sql
107+
SELECT table_id, COUNT(*) as sst_count, SUM(file_size) as total_size
108+
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
109+
GROUP BY table_id;
110+
```
111+
112+
113+
Output example:
114+
115+
```sql
116+
mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST LIMIT 1\G;
117+
*************************** 1. row ***************************
118+
table_dir: data/greptime/public/1024/
119+
region_id: 4398046511104
120+
table_id: 1024
121+
region_number: 0
122+
region_group: 0
123+
region_sequence: 0
124+
file_id: 01234567-89ab-cdef-0123-456789abcdef
125+
level: 0
126+
file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet
127+
file_size: 1234
128+
index_file_path: data/greptime/public/1024/4398046511104_0/index/01234567-89ab-cdef-0123-456789abcdef.puffin
129+
index_file_size: 256
130+
num_rows: 100
131+
num_row_groups: 1
132+
min_ts: 2025-01-01 00:00:00.000000000
133+
max_ts: 2025-01-01 00:01:00.000000000
134+
sequence: 1
135+
origin_region_id: 4398046511104
136+
node_id: 0
137+
visible: true
138+
1 row in set (0.02 sec)
139+
```
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
keywords: [SST storage, SST files, file listing, storage layer, object storage]
3+
description: Provides access to SST (Sorted String Table) file information from the storage layer, including file paths, sizes, and last modified timestamps.
4+
---
5+
6+
# SSTS_STORAGE
7+
8+
The `SSTS_STORAGE` table provides access to SST (Sorted String Table) file information listed directly from the storage layer. This table shows raw file metadata from object storage, which may include files that are not yet reflected in the manifest or files that have been orphaned.
9+
10+
:::tip NOTE
11+
This table is not available on [GreptimeCloud](https://greptime.cloud/).
12+
:::
13+
14+
```sql
15+
USE INFORMATION_SCHEMA;
16+
DESC SSTS_STORAGE;
17+
```
18+
19+
The output is as follows:
20+
21+
```sql
22+
+------------------+----------------------+-----+------+---------+---------------+
23+
| Column | Type | Key | Null | Default | Semantic Type |
24+
+------------------+----------------------+-----+------+---------+---------------+
25+
| file_path | String | | NO | | FIELD |
26+
| file_size | UInt64 | | YES | | FIELD |
27+
| last_modified_ms | TimestampMillisecond | | YES | | FIELD |
28+
| node_id | UInt64 | | YES | | FIELD |
29+
+------------------+----------------------+-----+------+---------+---------------+
30+
```
31+
32+
Fields in the `SSTS_STORAGE` table are described as follows:
33+
34+
- `file_path`: The full path to the file in object storage.
35+
- `file_size`: The size of the file in bytes (if available from storage).
36+
- `last_modified_ms`: The last modified time in milliseconds (if available from storage).
37+
- `node_id`: The ID of the datanode where the file is located.
38+
39+
## Use Cases
40+
41+
The `SSTS_STORAGE` table is useful for:
42+
43+
- **Storage verification**: Compare files in storage against the manifest to detect orphaned files or inconsistencies.
44+
- **Storage debugging**: Identify files that exist in storage but may not be properly tracked in the manifest.
45+
- **Cleanup operations**: Find and remove orphaned SST files that are no longer referenced.
46+
- **Storage auditing**: Get a complete view of all SST files in the storage layer.
47+
48+
## Examples
49+
50+
Query all SST files in storage:
51+
52+
```sql
53+
SELECT * FROM INFORMATION_SCHEMA.SSTS_STORAGE;
54+
```
55+
56+
Find files in storage that are not in the manifest (potential orphaned files):
57+
58+
```sql
59+
SELECT s.file_path, s.file_size, s.last_modified_ms
60+
FROM INFORMATION_SCHEMA.SSTS_STORAGE s
61+
LEFT JOIN INFORMATION_SCHEMA.SSTS_MANIFEST m ON s.file_path = m.file_path
62+
WHERE m.file_path IS NULL;
63+
```
64+
65+
Find the largest SST files in storage:
66+
67+
```sql
68+
SELECT file_path, file_size
69+
FROM INFORMATION_SCHEMA.SSTS_STORAGE
70+
WHERE file_size IS NOT NULL
71+
ORDER BY file_size DESC
72+
LIMIT 10;
73+
```
74+
75+
Calculate total storage usage by SST files:
76+
77+
```sql
78+
SELECT COUNT(*) as file_count, SUM(file_size) as total_size
79+
FROM INFORMATION_SCHEMA.SSTS_STORAGE
80+
WHERE file_size IS NOT NULL;
81+
```
82+
83+
84+
Output example:
85+
86+
```sql
87+
mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_STORAGE LIMIT 1\G;
88+
*************************** 1. row ***************************
89+
file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet
90+
file_size: 1234
91+
last_modified_ms: 2025-01-01 00:00:00.000
92+
node_id: 0
93+
1 row in set (0.02 sec)
94+
```
95+
96+
## Differences from SSTS_MANIFEST
97+
98+
| Aspect | SSTS_MANIFEST | SSTS_STORAGE |
99+
|--------|---------------|--------------|
100+
| **Data Source** | Manifest metadata | Storage layer directly |
101+
| **Information** | Detailed SST metadata (rows, time ranges, etc.) | Basic file metadata only |
102+
| **File Coverage** | Only files tracked in manifest | All files in storage |
103+
| **Use Case** | Query SST metadata for analysis | Verify storage, find orphaned files |
104+
| **Performance** | Fast (reads from manifest) | Slower (scans storage) |

i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,5 @@ description: INFORMATION_SCHEMA 提供对系统元数据的访问,例如数据
6262
| [`PROCEDURE_INFO`](./procedure-info.md) | 提供 Procedure 相关信息。|
6363
| [`PROCESS_LIST`](./process-list.md) | 提供集群内正在执行的查询信息|
6464
| [`SSTS_INDEX_META`](./ssts-index-meta.md) | 提供 SST 索引元数据,包括倒排索引、全文索引和布隆过滤器。|
65+
| [`SSTS_MANIFEST`](./ssts-manifest.md) | 提供从 manifest 获取的 SST 文件信息,包括文件路径、大小、时间范围和行数。|
66+
| [`SSTS_STORAGE`](./ssts-storage.md) | 提供从存储层获取的 SST 文件信息,用于验证和调试。|
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
keywords: [SST manifest, SST 文件, region 文件, 文件元数据, 表数据文件]
3+
description: 提供从 manifest 中获取的 SST(排序字符串表)文件信息,包括文件路径、大小、时间范围和行数。
4+
---
5+
6+
# SSTS_MANIFEST
7+
8+
`SSTS_MANIFEST` 表提供从清单中收集的 SST(排序字符串表)文件信息。此表显示每个 SST 文件的详细信息,包括文件路径、大小、级别、时间范围和行数。
9+
10+
:::tip 注意
11+
此表在 [GreptimeCloud](https://greptime.cloud/) 上不可用。
12+
:::
13+
14+
```sql
15+
USE INFORMATION_SCHEMA;
16+
DESC SSTS_MANIFEST;
17+
```
18+
19+
输出如下:
20+
21+
```sql
22+
+------------------+---------------------+-----+------+---------+---------------+
23+
| Column | Type | Key | Null | Default | Semantic Type |
24+
+------------------+---------------------+-----+------+---------+---------------+
25+
| table_dir | String | | NO | | FIELD |
26+
| region_id | UInt64 | | NO | | FIELD |
27+
| table_id | UInt32 | | NO | | FIELD |
28+
| region_number | UInt32 | | NO | | FIELD |
29+
| region_group | UInt8 | | NO | | FIELD |
30+
| region_sequence | UInt32 | | NO | | FIELD |
31+
| file_id | String | | NO | | FIELD |
32+
| level | UInt8 | | NO | | FIELD |
33+
| file_path | String | | NO | | FIELD |
34+
| file_size | UInt64 | | NO | | FIELD |
35+
| index_file_path | String | | YES | | FIELD |
36+
| index_file_size | UInt64 | | YES | | FIELD |
37+
| num_rows | UInt64 | | NO | | FIELD |
38+
| num_row_groups | UInt64 | | NO | | FIELD |
39+
| min_ts | TimestampNanosecond | | YES | | FIELD |
40+
| max_ts | TimestampNanosecond | | YES | | FIELD |
41+
| sequence | UInt64 | | YES | | FIELD |
42+
| origin_region_id | UInt64 | | NO | | FIELD |
43+
| node_id | UInt64 | | YES | | FIELD |
44+
| visible | Boolean | | NO | | FIELD |
45+
+------------------+---------------------+-----+------+---------+---------------+
46+
```
47+
48+
`SSTS_MANIFEST` 表中的字段描述如下:
49+
50+
- `table_dir`:表的目录路径。
51+
- `region_id`:引用该文件的 Region ID。
52+
- `table_id`:表的 ID。
53+
- `region_number`:表中的 Region 编号。
54+
- `region_group`:Region 的组标识符。
55+
- `region_sequence`:Region 的序列号。
56+
- `file_id`:SST 文件的唯一标识符(UUID)。
57+
- `level`:LSM 树中的 SST 级别(0 表示未压缩,1 表示已压缩)。
58+
- `file_path`:对象存储中 SST 文件的完整路径。
59+
- `file_size`:SST 文件的大小(字节)。
60+
- `index_file_path`:对象存储中索引文件的完整路径(如果存在)。
61+
- `index_file_size`:索引文件的大小(字节,如果存在)。
62+
- `num_rows`:SST 文件中的行数。
63+
- `num_row_groups`:SST 文件中的行组数。
64+
- `min_ts`:SST 文件中的最小时间戳。
65+
- `max_ts`:SST 文件中的最大时间戳。
66+
- `sequence`:与此文件关联的序列号。
67+
- `origin_region_id`:创建该文件的 Region ID。
68+
- `node_id`:文件所在的数据节点 ID。
69+
- `visible`:该文件在当前版本中是否可见。
70+
71+
## 示例
72+
73+
查询清单中的所有 SST 文件:
74+
75+
```sql
76+
SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST;
77+
```
78+
79+
通过与 `TABLES` 表连接查询特定表的 SST 文件:
80+
81+
```sql
82+
SELECT s.*
83+
FROM INFORMATION_SCHEMA.SSTS_MANIFEST s
84+
JOIN INFORMATION_SCHEMA.TABLES t ON s.table_id = t.table_id
85+
WHERE t.table_name = 'my_table';
86+
```
87+
88+
仅查询已压缩的 SST 文件(级别 1):
89+
90+
```sql
91+
SELECT file_path, file_size, num_rows, level
92+
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
93+
WHERE level = 1;
94+
```
95+
96+
查询 SST 文件及其时间范围:
97+
98+
```sql
99+
SELECT table_id, file_path, num_rows, min_ts, max_ts
100+
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
101+
ORDER BY table_id, min_ts;
102+
```
103+
104+
计算每个表的 SST 文件总大小:
105+
106+
```sql
107+
SELECT table_id, COUNT(*) as sst_count, SUM(file_size) as total_size
108+
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
109+
GROUP BY table_id;
110+
```
111+
112+
113+
输出样例:
114+
115+
```sql
116+
mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST LIMIT 1\G;
117+
*************************** 1. row ***************************
118+
table_dir: data/greptime/public/1024/
119+
region_id: 4398046511104
120+
table_id: 1024
121+
region_number: 0
122+
region_group: 0
123+
region_sequence: 0
124+
file_id: 01234567-89ab-cdef-0123-456789abcdef
125+
level: 0
126+
file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet
127+
file_size: 1234
128+
index_file_path: data/greptime/public/1024/4398046511104_0/index/01234567-89ab-cdef-0123-456789abcdef.puffin
129+
index_file_size: 256
130+
num_rows: 100
131+
num_row_groups: 1
132+
min_ts: 2025-01-01 00:00:00.000000000
133+
max_ts: 2025-01-01 00:01:00.000000000
134+
sequence: 1
135+
origin_region_id: 4398046511104
136+
node_id: 0
137+
visible: true
138+
1 row in set (0.02 sec)
139+
```

0 commit comments

Comments
 (0)