Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/reference/sql/information-schema/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,5 @@ There is still lots of work to do for `INFORMATION_SCHEMA`. The tracking [issue]
| [`PROCEDURE_INFO`](./procedure-info.md) | Procedure information.|
| [`PROCESS_LIST`](./process-list.md) | Running queries information.|
| [`SSTS_INDEX_META`](./ssts-index-meta.md) | Provides SST index metadata including inverted indexes, fulltext indexes, and bloom filters.|
| [`SSTS_MANIFEST`](./ssts-manifest.md) | Provides SST file information from the manifest including file paths, sizes, time ranges, and row counts.|
| [`SSTS_STORAGE`](./ssts-storage.md) | Provides SST file information from the storage layer for verification and debugging.|
139 changes: 139 additions & 0 deletions docs/reference/sql/information-schema/ssts-manifest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
keywords: [SST manifest, SST files, region files, file metadata, table data files]
description: Provides access to SST (Sorted String Table) file information from the manifest, including file paths, sizes, time ranges, and row counts.
---

# SSTS_MANIFEST

The `SSTS_MANIFEST` table provides access to SST (Sorted String Table) file information collected from the manifest. This table surfaces detailed information about each SST file, including file paths, sizes, levels, time ranges, and row counts.

:::tip NOTE
This table is not available on [GreptimeCloud](https://greptime.cloud/).
:::

```sql
USE INFORMATION_SCHEMA;
DESC SSTS_MANIFEST;
```

The output is as follows:

```sql
+------------------+---------------------+-----+------+---------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+------------------+---------------------+-----+------+---------+---------------+
| table_dir | String | | NO | | FIELD |
| region_id | UInt64 | | NO | | FIELD |
| table_id | UInt32 | | NO | | FIELD |
| region_number | UInt32 | | NO | | FIELD |
| region_group | UInt8 | | NO | | FIELD |
| region_sequence | UInt32 | | NO | | FIELD |
| file_id | String | | NO | | FIELD |
| level | UInt8 | | NO | | FIELD |
| file_path | String | | NO | | FIELD |
| file_size | UInt64 | | NO | | FIELD |
| index_file_path | String | | YES | | FIELD |
| index_file_size | UInt64 | | YES | | FIELD |
| num_rows | UInt64 | | NO | | FIELD |
| num_row_groups | UInt64 | | NO | | FIELD |
| min_ts | TimestampNanosecond | | YES | | FIELD |
| max_ts | TimestampNanosecond | | YES | | FIELD |
| sequence | UInt64 | | YES | | FIELD |
| origin_region_id | UInt64 | | NO | | FIELD |
| node_id | UInt64 | | YES | | FIELD |
| visible | Boolean | | NO | | FIELD |
+------------------+---------------------+-----+------+---------+---------------+
```

Fields in the `SSTS_MANIFEST` table are described as follows:

- `table_dir`: The directory path of the table.
- `region_id`: The ID of the region that refers to the file.
- `table_id`: The ID of the table.
- `region_number`: The region number within the table.
- `region_group`: The group identifier for the region.
- `region_sequence`: The sequence number of the region.
- `file_id`: The unique identifier of the SST file (UUID).
- `level`: The SST level in the LSM tree (0 for uncompacted, 1 for compacted).
- `file_path`: The full path to the SST file in object storage.
- `file_size`: The size of the SST file in bytes.
- `index_file_path`: The full path to the index file in object storage (if exists).
- `index_file_size`: The size of the index file in bytes (if exists).
- `num_rows`: The number of rows in the SST file.
- `num_row_groups`: The number of row groups in the SST file.
- `min_ts`: The minimum timestamp in the SST file.
- `max_ts`: The maximum timestamp in the SST file.
- `sequence`: The sequence number associated with this file.
- `origin_region_id`: The ID of the region that created the file.
- `node_id`: The ID of the datanode where the file is located.
- `visible`: Whether this file is visible in the current version.

## Examples

Query all SST files in the manifest:

```sql
SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST;
```

Query SST files for a specific table by joining with the `TABLES` table:

```sql
SELECT s.*
FROM INFORMATION_SCHEMA.SSTS_MANIFEST s
JOIN INFORMATION_SCHEMA.TABLES t ON s.table_id = t.table_id
WHERE t.table_name = 'my_table';
```

Query only compacted SST files (level 1):

```sql
SELECT file_path, file_size, num_rows, level
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
WHERE level = 1;
```

Query SST files with their time ranges:

```sql
SELECT table_id, file_path, num_rows, min_ts, max_ts
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
ORDER BY table_id, min_ts;
```

Calculate total SST file size per table:

```sql
SELECT table_id, COUNT(*) as sst_count, SUM(file_size) as total_size
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
GROUP BY table_id;
```


Output example:

```sql
mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST LIMIT 1\G;
*************************** 1. row ***************************
table_dir: data/greptime/public/1024/
region_id: 4398046511104
table_id: 1024
region_number: 0
region_group: 0
region_sequence: 0
file_id: 01234567-89ab-cdef-0123-456789abcdef
level: 0
file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet
file_size: 1234
index_file_path: data/greptime/public/1024/4398046511104_0/index/01234567-89ab-cdef-0123-456789abcdef.puffin
index_file_size: 256
num_rows: 100
num_row_groups: 1
min_ts: 2025-01-01 00:00:00.000000000
max_ts: 2025-01-01 00:01:00.000000000
sequence: 1
origin_region_id: 4398046511104
node_id: 0
visible: true
1 row in set (0.02 sec)
```
104 changes: 104 additions & 0 deletions docs/reference/sql/information-schema/ssts-storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
---
keywords: [SST storage, SST files, file listing, storage layer, object storage]
description: Provides access to SST (Sorted String Table) file information from the storage layer, including file paths, sizes, and last modified timestamps.
---

# SSTS_STORAGE

The `SSTS_STORAGE` table provides access to SST (Sorted String Table) file information listed directly from the storage layer. This table shows raw file metadata from object storage, which may include files that are not yet reflected in the manifest or files that have been orphaned.

:::tip NOTE
This table is not available on [GreptimeCloud](https://greptime.cloud/).
:::

```sql
USE INFORMATION_SCHEMA;
DESC SSTS_STORAGE;
```

The output is as follows:

```sql
+------------------+----------------------+-----+------+---------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+------------------+----------------------+-----+------+---------+---------------+
| file_path | String | | NO | | FIELD |
| file_size | UInt64 | | YES | | FIELD |
| last_modified_ms | TimestampMillisecond | | YES | | FIELD |
| node_id | UInt64 | | YES | | FIELD |
+------------------+----------------------+-----+------+---------+---------------+
```

Fields in the `SSTS_STORAGE` table are described as follows:

- `file_path`: The full path to the file in object storage.
- `file_size`: The size of the file in bytes (if available from storage).
- `last_modified_ms`: The last modified time in milliseconds (if available from storage).
- `node_id`: The ID of the datanode where the file is located.

## Use Cases

The `SSTS_STORAGE` table is useful for:

- **Storage verification**: Compare files in storage against the manifest to detect orphaned files or inconsistencies.
- **Storage debugging**: Identify files that exist in storage but may not be properly tracked in the manifest.
- **Cleanup operations**: Find and remove orphaned SST files that are no longer referenced.
- **Storage auditing**: Get a complete view of all SST files in the storage layer.

## Examples

Query all SST files in storage:

```sql
SELECT * FROM INFORMATION_SCHEMA.SSTS_STORAGE;
```

Find files in storage that are not in the manifest (potential orphaned files):

```sql
SELECT s.file_path, s.file_size, s.last_modified_ms
FROM INFORMATION_SCHEMA.SSTS_STORAGE s
LEFT JOIN INFORMATION_SCHEMA.SSTS_MANIFEST m ON s.file_path = m.file_path
WHERE m.file_path IS NULL;
```

Find the largest SST files in storage:

```sql
SELECT file_path, file_size
FROM INFORMATION_SCHEMA.SSTS_STORAGE
WHERE file_size IS NOT NULL
ORDER BY file_size DESC
LIMIT 10;
```

Calculate total storage usage by SST files:

```sql
SELECT COUNT(*) as file_count, SUM(file_size) as total_size
FROM INFORMATION_SCHEMA.SSTS_STORAGE
WHERE file_size IS NOT NULL;
```


Output example:

```sql
mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_STORAGE LIMIT 1\G;
*************************** 1. row ***************************
file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet
file_size: 1234
last_modified_ms: 2025-01-01 00:00:00.000
node_id: 0
1 row in set (0.02 sec)
```

## Differences from SSTS_MANIFEST

| Aspect | SSTS_MANIFEST | SSTS_STORAGE |
|--------|---------------|--------------|
| **Data Source** | Manifest metadata | Storage layer directly |
| **Information** | Detailed SST metadata (rows, time ranges, etc.) | Basic file metadata only |
| **File Coverage** | Only files tracked in manifest | All files in storage |
| **Use Case** | Query SST metadata for analysis | Verify storage, find orphaned files |
| **Performance** | Fast (reads from manifest) | Slower (scans storage) |
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,5 @@ description: INFORMATION_SCHEMA 提供对系统元数据的访问,例如数据
| [`PROCEDURE_INFO`](./procedure-info.md) | 提供 Procedure 相关信息。|
| [`PROCESS_LIST`](./process-list.md) | 提供集群内正在执行的查询信息|
| [`SSTS_INDEX_META`](./ssts-index-meta.md) | 提供 SST 索引元数据,包括倒排索引、全文索引和布隆过滤器。|
| [`SSTS_MANIFEST`](./ssts-manifest.md) | 提供从 manifest 获取的 SST 文件信息,包括文件路径、大小、时间范围和行数。|
| [`SSTS_STORAGE`](./ssts-storage.md) | 提供从存储层获取的 SST 文件信息,用于验证和调试。|
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
keywords: [SST manifest, SST 文件, region 文件, 文件元数据, 表数据文件]
description: 提供从 manifest 中获取的 SST(排序字符串表)文件信息,包括文件路径、大小、时间范围和行数。
---

# SSTS_MANIFEST

`SSTS_MANIFEST` 表提供从清单中收集的 SST(排序字符串表)文件信息。此表显示每个 SST 文件的详细信息,包括文件路径、大小、级别、时间范围和行数。

:::tip 注意
此表在 [GreptimeCloud](https://greptime.cloud/) 上不可用。
:::

```sql
USE INFORMATION_SCHEMA;
DESC SSTS_MANIFEST;
```

输出如下:

```sql
+------------------+---------------------+-----+------+---------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+------------------+---------------------+-----+------+---------+---------------+
| table_dir | String | | NO | | FIELD |
| region_id | UInt64 | | NO | | FIELD |
| table_id | UInt32 | | NO | | FIELD |
| region_number | UInt32 | | NO | | FIELD |
| region_group | UInt8 | | NO | | FIELD |
| region_sequence | UInt32 | | NO | | FIELD |
| file_id | String | | NO | | FIELD |
| level | UInt8 | | NO | | FIELD |
| file_path | String | | NO | | FIELD |
| file_size | UInt64 | | NO | | FIELD |
| index_file_path | String | | YES | | FIELD |
| index_file_size | UInt64 | | YES | | FIELD |
| num_rows | UInt64 | | NO | | FIELD |
| num_row_groups | UInt64 | | NO | | FIELD |
| min_ts | TimestampNanosecond | | YES | | FIELD |
| max_ts | TimestampNanosecond | | YES | | FIELD |
| sequence | UInt64 | | YES | | FIELD |
| origin_region_id | UInt64 | | NO | | FIELD |
| node_id | UInt64 | | YES | | FIELD |
| visible | Boolean | | NO | | FIELD |
+------------------+---------------------+-----+------+---------+---------------+
```

`SSTS_MANIFEST` 表中的字段描述如下:

- `table_dir`:表的目录路径。
- `region_id`:引用该文件的 Region ID。
- `table_id`:表的 ID。
- `region_number`:表中的 Region 编号。
- `region_group`:Region 的组标识符。
- `region_sequence`:Region 的序列号。
- `file_id`:SST 文件的唯一标识符(UUID)。
- `level`:LSM 树中的 SST 级别(0 表示未压缩,1 表示已压缩)。
- `file_path`:对象存储中 SST 文件的完整路径。
- `file_size`:SST 文件的大小(字节)。
- `index_file_path`:对象存储中索引文件的完整路径(如果存在)。
- `index_file_size`:索引文件的大小(字节,如果存在)。
- `num_rows`:SST 文件中的行数。
- `num_row_groups`:SST 文件中的行组数。
- `min_ts`:SST 文件中的最小时间戳。
- `max_ts`:SST 文件中的最大时间戳。
- `sequence`:与此文件关联的序列号。
- `origin_region_id`:创建该文件的 Region ID。
- `node_id`:文件所在的数据节点 ID。
- `visible`:该文件在当前版本中是否可见。

## 示例

查询清单中的所有 SST 文件:

```sql
SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST;
```

通过与 `TABLES` 表连接查询特定表的 SST 文件:

```sql
SELECT s.*
FROM INFORMATION_SCHEMA.SSTS_MANIFEST s
JOIN INFORMATION_SCHEMA.TABLES t ON s.table_id = t.table_id
WHERE t.table_name = 'my_table';
```

仅查询已压缩的 SST 文件(级别 1):

```sql
SELECT file_path, file_size, num_rows, level
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
WHERE level = 1;
```

查询 SST 文件及其时间范围:

```sql
SELECT table_id, file_path, num_rows, min_ts, max_ts
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
ORDER BY table_id, min_ts;
```

计算每个表的 SST 文件总大小:

```sql
SELECT table_id, COUNT(*) as sst_count, SUM(file_size) as total_size
FROM INFORMATION_SCHEMA.SSTS_MANIFEST
GROUP BY table_id;
```


输出样例:

```sql
mysql> SELECT * FROM INFORMATION_SCHEMA.SSTS_MANIFEST LIMIT 1\G;
*************************** 1. row ***************************
table_dir: data/greptime/public/1024/
region_id: 4398046511104
table_id: 1024
region_number: 0
region_group: 0
region_sequence: 0
file_id: 01234567-89ab-cdef-0123-456789abcdef
level: 0
file_path: data/greptime/public/1024/4398046511104_0/01234567-89ab-cdef-0123-456789abcdef.parquet
file_size: 1234
index_file_path: data/greptime/public/1024/4398046511104_0/index/01234567-89ab-cdef-0123-456789abcdef.puffin
index_file_size: 256
num_rows: 100
num_row_groups: 1
min_ts: 2025-01-01 00:00:00.000000000
max_ts: 2025-01-01 00:01:00.000000000
sequence: 1
origin_region_id: 4398046511104
node_id: 0
visible: true
1 row in set (0.02 sec)
```
Loading