diff --git a/docs/reference/sql/information-schema/overview.md b/docs/reference/sql/information-schema/overview.md index 9275e63fe..664fd8712 100644 --- a/docs/reference/sql/information-schema/overview.md +++ b/docs/reference/sql/information-schema/overview.md @@ -64,3 +64,5 @@ There is still lots of work to do for `INFORMATION_SCHEMA`. The tracking [issue] | [`PROCEDURE_INFO`](./procedure-info.md) | Procedure information.| | [`PROCESS_LIST`](./process-list.md) | Running queries information.| | [`SSTS_INDEX_META`](./ssts-index-meta.md) | Provides SST index metadata including inverted indexes, fulltext indexes, and bloom filters.| +| [`SSTS_MANIFEST`](./ssts-manifest.md) | Provides SST file information from the manifest including file paths, sizes, time ranges, and row counts.| +| [`SSTS_STORAGE`](./ssts-storage.md) | Provides SST file information from the storage layer for verification and debugging.| diff --git a/docs/reference/sql/information-schema/ssts-manifest.md b/docs/reference/sql/information-schema/ssts-manifest.md new file mode 100644 index 000000000..7fdad6a83 --- /dev/null +++ b/docs/reference/sql/information-schema/ssts-manifest.md @@ -0,0 +1,139 @@ +--- +keywords: [SST manifest, SST files, region files, file metadata, table data files] +description: Provides access to SST (Sorted String Table) file information from the manifest, including file paths, sizes, time ranges, and row counts. +--- + +# SSTS_MANIFEST + +The `SSTS_MANIFEST` table provides access to SST (Sorted String Table) file information collected from the manifest. This table surfaces detailed information about each SST file, including file paths, sizes, levels, time ranges, and row counts. + +:::tip NOTE +This table is not available on [GreptimeCloud](https://greptime.cloud/). +::: + +```sql +USE INFORMATION_SCHEMA; +DESC SSTS_MANIFEST; +``` + +The output is as follows: + +```sql ++------------------+---------------------+-----+------+---------+---------------+ +| Column | Type | Key | Null | Default | Semantic Type | ++------------------+---------------------+-----+------+---------+---------------+ +| table_dir | String | | NO | | FIELD | +| region_id | UInt64 | | NO | | FIELD | +| table_id | UInt32 | | NO | | FIELD | +| region_number | UInt32 | | NO | | FIELD | +| region_group | UInt8 | | NO | | FIELD | +| region_sequence | UInt32 | | NO | | FIELD | +| file_id | String | | NO | | FIELD | +| level | UInt8 | | NO | | FIELD | +| file_path | String | | NO | | FIELD | +| file_size | UInt64 | | NO | | FIELD | +| index_file_path | String | | YES | | FIELD | +| index_file_size | UInt64 | | YES | | FIELD | +| num_rows | UInt64 | | NO | | FIELD | +| num_row_groups | UInt64 | | NO | | FIELD | +| min_ts | TimestampNanosecond | | YES | | FIELD | +| max_ts | TimestampNanosecond | | YES | | FIELD | +| sequence | UInt64 | | YES | | FIELD | +| origin_region_id | UInt64 | | NO | | FIELD | +| node_id | UInt64 | | YES | | FIELD | +| visible | Boolean | | NO | | FIELD | ++------------------+---------------------+-----+------+---------+---------------+ +``` + +Fields in the `SSTS_MANIFEST` table are described as follows: + +- `table_dir`: The directory path of the table. +- `region_id`: The ID of the region that refers to the file. +- `table_id`: The ID of the table. +- `region_number`: The region number within the table. +- `region_group`: The group identifier for the region. +- `region_sequence`: The sequence number of the region. +- `file_id`: The unique identifier of the SST file (UUID). +- `level`: The SST level in the LSM tree (0 for uncompacted, 1 for compacted). +- `file_path`: The full path to the SST file in object storage. +- `file_size`: The size of the SST file in bytes. +- `index_file_path`: The full path to the index file in object storage (if exists). +- `index_file_size`: The size of the index file in bytes (if exists). +- `num_rows`: The number of rows in the SST file. +- `num_row_groups`: The number of row groups in the SST file. +- `min_ts`: The minimum timestamp in the SST file. +- `max_ts`: The maximum timestamp in the SST file. +- `sequence`: The sequence number associated with this file. +- `origin_region_id`: The ID of the region that created the file. +- `node_id`: The ID of the datanode where the file is located. +- `visible`: Whether this file is visible in the current version. + +## Examples + +Query all SST files in the manifest: + +```sql +SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST; +``` + +Query SST files for a specific table by joining with the `TABLES` table: + +```sql +SELECT s.* +FROM INFORMATION_SCHEMA.SSTS_MANIFEST s +JOIN INFORMATION_SCHEMA.TABLES t ON s.table_id = t.table_id +WHERE t.table_name = 'my_table'; +``` + +Query only compacted SST files (level 1): + +```sql +SELECT file_path, file_size, num_rows, level +FROM INFORMATION_SCHEMA.SSTS_MANIFEST +WHERE level = 1; +``` + +Query SST files with their time ranges: + +```sql +SELECT table_id, file_path, num_rows, min_ts, max_ts +FROM INFORMATION_SCHEMA.SSTS_MANIFEST +ORDER BY table_id, min_ts; +``` + +Calculate total SST file size per table: + +```sql +SELECT table_id, COUNT(*) as sst_count, SUM(file_size) as total_size +FROM INFORMATION_SCHEMA.SSTS_MANIFEST +GROUP BY table_id; +``` + + +Output example: + +```sql +mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST LIMIT 1\G; +*************************** 1. row *************************** + table_dir: data/greptime/public/1024/ + region_id: 4398046511104 + table_id: 1024 + region_number: 0 + region_group: 0 + region_sequence: 0 + file_id: 01234567-89ab-cdef-0123-456789abcdef + level: 0 + file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet + file_size: 1234 + index_file_path: data/greptime/public/1024/4398046511104_0/index/01234567-89ab-cdef-0123-456789abcdef.puffin + index_file_size: 256 + num_rows: 100 + num_row_groups: 1 + min_ts: 2025-01-01 00:00:00.000000000 + max_ts: 2025-01-01 00:01:00.000000000 + sequence: 1 +origin_region_id: 4398046511104 + node_id: 0 + visible: true +1 row in set (0.02 sec) +``` diff --git a/docs/reference/sql/information-schema/ssts-storage.md b/docs/reference/sql/information-schema/ssts-storage.md new file mode 100644 index 000000000..0f5ce2d33 --- /dev/null +++ b/docs/reference/sql/information-schema/ssts-storage.md @@ -0,0 +1,104 @@ +--- +keywords: [SST storage, SST files, file listing, storage layer, object storage] +description: Provides access to SST (Sorted String Table) file information from the storage layer, including file paths, sizes, and last modified timestamps. +--- + +# SSTS_STORAGE + +The `SSTS_STORAGE` table provides access to SST (Sorted String Table) file information listed directly from the storage layer. This table shows raw file metadata from object storage, which may include files that are not yet reflected in the manifest or files that have been orphaned. + +:::tip NOTE +This table is not available on [GreptimeCloud](https://greptime.cloud/). +::: + +```sql +USE INFORMATION_SCHEMA; +DESC SSTS_STORAGE; +``` + +The output is as follows: + +```sql ++------------------+----------------------+-----+------+---------+---------------+ +| Column | Type | Key | Null | Default | Semantic Type | ++------------------+----------------------+-----+------+---------+---------------+ +| file_path | String | | NO | | FIELD | +| file_size | UInt64 | | YES | | FIELD | +| last_modified_ms | TimestampMillisecond | | YES | | FIELD | +| node_id | UInt64 | | YES | | FIELD | ++------------------+----------------------+-----+------+---------+---------------+ +``` + +Fields in the `SSTS_STORAGE` table are described as follows: + +- `file_path`: The full path to the file in object storage. +- `file_size`: The size of the file in bytes (if available from storage). +- `last_modified_ms`: The last modified time in milliseconds (if available from storage). +- `node_id`: The ID of the datanode where the file is located. + +## Use Cases + +The `SSTS_STORAGE` table is useful for: + +- **Storage verification**: Compare files in storage against the manifest to detect orphaned files or inconsistencies. +- **Storage debugging**: Identify files that exist in storage but may not be properly tracked in the manifest. +- **Cleanup operations**: Find and remove orphaned SST files that are no longer referenced. +- **Storage auditing**: Get a complete view of all SST files in the storage layer. + +## Examples + +Query all SST files in storage: + +```sql +SELECT * FROM INFORMATION_SCHEMA.SSTS_STORAGE; +``` + +Find files in storage that are not in the manifest (potential orphaned files): + +```sql +SELECT s.file_path, s.file_size, s.last_modified_ms +FROM INFORMATION_SCHEMA.SSTS_STORAGE s +LEFT JOIN INFORMATION_SCHEMA.SSTS_MANIFEST m ON s.file_path = m.file_path +WHERE m.file_path IS NULL; +``` + +Find the largest SST files in storage: + +```sql +SELECT file_path, file_size +FROM INFORMATION_SCHEMA.SSTS_STORAGE +WHERE file_size IS NOT NULL +ORDER BY file_size DESC +LIMIT 10; +``` + +Calculate total storage usage by SST files: + +```sql +SELECT COUNT(*) as file_count, SUM(file_size) as total_size +FROM INFORMATION_SCHEMA.SSTS_STORAGE +WHERE file_size IS NOT NULL; +``` + + +Output example: + +```sql +mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_STORAGE LIMIT 1\G; +*************************** 1. row *************************** + file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet + file_size: 1234 +last_modified_ms: 2025-01-01 00:00:00.000 + node_id: 0 +1 row in set (0.02 sec) +``` + +## Differences from SSTS_MANIFEST + +| Aspect | SSTS_MANIFEST | SSTS_STORAGE | +|--------|---------------|--------------| +| **Data Source** | Manifest metadata | Storage layer directly | +| **Information** | Detailed SST metadata (rows, time ranges, etc.) | Basic file metadata only | +| **File Coverage** | Only files tracked in manifest | All files in storage | +| **Use Case** | Query SST metadata for analysis | Verify storage, find orphaned files | +| **Performance** | Fast (reads from manifest) | Slower (scans storage) | diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/overview.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/overview.md index 5e2cbae3d..1adf0395b 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/overview.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/overview.md @@ -62,3 +62,5 @@ description: INFORMATION_SCHEMA 提供对系统元数据的访问,例如数据 | [`PROCEDURE_INFO`](./procedure-info.md) | 提供 Procedure 相关信息。| | [`PROCESS_LIST`](./process-list.md) | 提供集群内正在执行的查询信息| | [`SSTS_INDEX_META`](./ssts-index-meta.md) | 提供 SST 索引元数据,包括倒排索引、全文索引和布隆过滤器。| +| [`SSTS_MANIFEST`](./ssts-manifest.md) | 提供从 manifest 获取的 SST 文件信息,包括文件路径、大小、时间范围和行数。| +| [`SSTS_STORAGE`](./ssts-storage.md) | 提供从存储层获取的 SST 文件信息,用于验证和调试。| diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/ssts-manifest.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/ssts-manifest.md new file mode 100644 index 000000000..53dadebad --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/ssts-manifest.md @@ -0,0 +1,139 @@ +--- +keywords: [SST manifest, SST 文件, region 文件, 文件元数据, 表数据文件] +description: 提供从 manifest 中获取的 SST(排序字符串表)文件信息,包括文件路径、大小、时间范围和行数。 +--- + +# SSTS_MANIFEST + +`SSTS_MANIFEST` 表提供从清单中收集的 SST(排序字符串表)文件信息。此表显示每个 SST 文件的详细信息,包括文件路径、大小、级别、时间范围和行数。 + +:::tip 注意 +此表在 [GreptimeCloud](https://greptime.cloud/) 上不可用。 +::: + +```sql +USE INFORMATION_SCHEMA; +DESC SSTS_MANIFEST; +``` + +输出如下: + +```sql ++------------------+---------------------+-----+------+---------+---------------+ +| Column | Type | Key | Null | Default | Semantic Type | ++------------------+---------------------+-----+------+---------+---------------+ +| table_dir | String | | NO | | FIELD | +| region_id | UInt64 | | NO | | FIELD | +| table_id | UInt32 | | NO | | FIELD | +| region_number | UInt32 | | NO | | FIELD | +| region_group | UInt8 | | NO | | FIELD | +| region_sequence | UInt32 | | NO | | FIELD | +| file_id | String | | NO | | FIELD | +| level | UInt8 | | NO | | FIELD | +| file_path | String | | NO | | FIELD | +| file_size | UInt64 | | NO | | FIELD | +| index_file_path | String | | YES | | FIELD | +| index_file_size | UInt64 | | YES | | FIELD | +| num_rows | UInt64 | | NO | | FIELD | +| num_row_groups | UInt64 | | NO | | FIELD | +| min_ts | TimestampNanosecond | | YES | | FIELD | +| max_ts | TimestampNanosecond | | YES | | FIELD | +| sequence | UInt64 | | YES | | FIELD | +| origin_region_id | UInt64 | | NO | | FIELD | +| node_id | UInt64 | | YES | | FIELD | +| visible | Boolean | | NO | | FIELD | ++------------------+---------------------+-----+------+---------+---------------+ +``` + +`SSTS_MANIFEST` 表中的字段描述如下: + +- `table_dir`:表的目录路径。 +- `region_id`:引用该文件的 Region ID。 +- `table_id`:表的 ID。 +- `region_number`:表中的 Region 编号。 +- `region_group`:Region 的组标识符。 +- `region_sequence`:Region 的序列号。 +- `file_id`:SST 文件的唯一标识符(UUID)。 +- `level`:LSM 树中的 SST 级别(0 表示未压缩,1 表示已压缩)。 +- `file_path`:对象存储中 SST 文件的完整路径。 +- `file_size`:SST 文件的大小(字节)。 +- `index_file_path`:对象存储中索引文件的完整路径(如果存在)。 +- `index_file_size`:索引文件的大小(字节,如果存在)。 +- `num_rows`:SST 文件中的行数。 +- `num_row_groups`:SST 文件中的行组数。 +- `min_ts`:SST 文件中的最小时间戳。 +- `max_ts`:SST 文件中的最大时间戳。 +- `sequence`:与此文件关联的序列号。 +- `origin_region_id`:创建该文件的 Region ID。 +- `node_id`:文件所在的数据节点 ID。 +- `visible`:该文件在当前版本中是否可见。 + +## 示例 + +查询清单中的所有 SST 文件: + +```sql +SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST; +``` + +通过与 `TABLES` 表连接查询特定表的 SST 文件: + +```sql +SELECT s.* +FROM INFORMATION_SCHEMA.SSTS_MANIFEST s +JOIN INFORMATION_SCHEMA.TABLES t ON s.table_id = t.table_id +WHERE t.table_name = 'my_table'; +``` + +仅查询已压缩的 SST 文件(级别 1): + +```sql +SELECT file_path, file_size, num_rows, level +FROM INFORMATION_SCHEMA.SSTS_MANIFEST +WHERE level = 1; +``` + +查询 SST 文件及其时间范围: + +```sql +SELECT table_id, file_path, num_rows, min_ts, max_ts +FROM INFORMATION_SCHEMA.SSTS_MANIFEST +ORDER BY table_id, min_ts; +``` + +计算每个表的 SST 文件总大小: + +```sql +SELECT table_id, COUNT(*) as sst_count, SUM(file_size) as total_size +FROM INFORMATION_SCHEMA.SSTS_MANIFEST +GROUP BY table_id; +``` + + +输出样例: + +```sql +mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST LIMIT 1\G; +*************************** 1. row *************************** + table_dir: data/greptime/public/1024/ + region_id: 4398046511104 + table_id: 1024 + region_number: 0 + region_group: 0 + region_sequence: 0 + file_id: 01234567-89ab-cdef-0123-456789abcdef + level: 0 + file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet + file_size: 1234 + index_file_path: data/greptime/public/1024/4398046511104_0/index/01234567-89ab-cdef-0123-456789abcdef.puffin + index_file_size: 256 + num_rows: 100 + num_row_groups: 1 + min_ts: 2025-01-01 00:00:00.000000000 + max_ts: 2025-01-01 00:01:00.000000000 + sequence: 1 +origin_region_id: 4398046511104 + node_id: 0 + visible: true +1 row in set (0.02 sec) +``` diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/ssts-storage.md b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/ssts-storage.md new file mode 100644 index 000000000..2fe29286f --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/reference/sql/information-schema/ssts-storage.md @@ -0,0 +1,104 @@ +--- +keywords: [SST storage, SST 文件, 文件列表, 存储层, 对象存储] +description: 提供从存储层直接获取的 SST(排序字符串表)文件信息,包括文件路径、大小和最后修改时间戳。 +--- + +# SSTS_STORAGE + +`SSTS_STORAGE` 表提供直接从存储层列出的 SST(排序字符串表)文件信息。此表显示来自对象存储的原始文件元数据,可能包括尚未反映在清单中的文件或已孤立的文件。 + +:::tip 注意 +此表在 [GreptimeCloud](https://greptime.cloud/) 上不可用。 +::: + +```sql +USE INFORMATION_SCHEMA; +DESC SSTS_STORAGE; +``` + +输出如下: + +```sql ++------------------+----------------------+-----+------+---------+---------------+ +| Column | Type | Key | Null | Default | Semantic Type | ++------------------+----------------------+-----+------+---------+---------------+ +| file_path | String | | NO | | FIELD | +| file_size | UInt64 | | YES | | FIELD | +| last_modified_ms | TimestampMillisecond | | YES | | FIELD | +| node_id | UInt64 | | YES | | FIELD | ++------------------+----------------------+-----+------+---------+---------------+ +``` + +`SSTS_STORAGE` 表中的字段描述如下: + +- `file_path`:对象存储中文件的完整路径。 +- `file_size`:文件的大小(字节,如果存储中可用)。 +- `last_modified_ms`:最后修改时间(毫秒,如果存储中可用)。 +- `node_id`:文件所在的数据节点 ID。 + +## 使用场景 + +`SSTS_STORAGE` 表适用于: + +- **存储验证**:将存储中的文件与清单进行比较,以检测孤立文件或不一致性。 +- **存储调试**:识别存在于存储中但可能未在清单中正确跟踪的文件。 +- **清理操作**:查找并删除不再被引用的孤立 SST 文件。 +- **存储审计**:获取存储层中所有 SST 文件的完整视图。 + +## 示例 + +查询存储中的所有 SST 文件: + +```sql +SELECT * FROM INFORMATION_SCHEMA.SSTS_STORAGE; +``` + +查找存储中但不在清单中的文件(潜在的孤立文件): + +```sql +SELECT s.file_path, s.file_size, s.last_modified_ms +FROM INFORMATION_SCHEMA.SSTS_STORAGE s +LEFT JOIN INFORMATION_SCHEMA.SSTS_MANIFEST m ON s.file_path = m.file_path +WHERE m.file_path IS NULL; +``` + +查找存储中最大的 SST 文件: + +```sql +SELECT file_path, file_size +FROM INFORMATION_SCHEMA.SSTS_STORAGE +WHERE file_size IS NOT NULL +ORDER BY file_size DESC +LIMIT 10; +``` + +计算 SST 文件的总存储使用量: + +```sql +SELECT COUNT(*) as file_count, SUM(file_size) as total_size +FROM INFORMATION_SCHEMA.SSTS_STORAGE +WHERE file_size IS NOT NULL; +``` + + +输出样例: + +```sql +mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_STORAGE LIMIT 1\G; +*************************** 1. row *************************** + file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet + file_size: 1234 +last_modified_ms: 2025-01-01 00:00:00.000 + node_id: 0 +1 row in set (0.02 sec) +``` + +## 与 SSTS_MANIFEST 的区别 + +| 方面 | SSTS_MANIFEST | SSTS_STORAGE | +|------|---------------|--------------| +| **数据源** | 清单元数据 | 直接从存储层 | +| **信息** | 详细的 SST 元数据(行数、时间范围等) | 仅基本文件元数据 | +| **文件覆盖** | 仅清单中跟踪的文件 | 存储中的所有文件 | +| **使用场景** | 查询 SST 元数据进行分析 | 验证存储、查找孤立文件 | +| **性能** | 快速(从清单读取) | 较慢(扫描存储) | diff --git a/sidebars.ts b/sidebars.ts index 4c84533e1..2474554d0 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -720,6 +720,8 @@ const sidebars: SidebarsConfig = { 'reference/sql/information-schema/cluster-info', 'reference/sql/information-schema/process-list', 'reference/sql/information-schema/ssts-index-meta', + 'reference/sql/information-schema/ssts-manifest', + 'reference/sql/information-schema/ssts-storage', ], }, ],