Skip to content

Commit 633bbfe

Browse files
authored
Merge pull request #3345 from ClickHouse/thomoc-patch-2
update s3 guide with describe
2 parents e8d9a23 + 4f6fc1b commit 633bbfe

File tree

1 file changed

+60
-1
lines changed
  • docs/integrations/data-ingestion/s3

1 file changed

+60
-1
lines changed

docs/integrations/data-ingestion/s3/index.md

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,67 @@ Using wildcards in the path expression allow multiple files to be referenced and
2929

3030
### Preparation {#preparation}
3131

32-
To interact with our S3-based dataset, we prepare a standard `MergeTree` table as our destination. The statement below creates a table named `trips` in the default database:
32+
Prior to creating the table in ClickHouse, you may want to first take a closer look at the data in the S3 bucket. You can do this directly from ClickHouse using the `DESCRIBE` statement:
3333

34+
```sql
35+
DESCRIBE TABLE s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_*.gz', 'TabSeparatedWithNames');
36+
```
37+
38+
The output of the `DESCRIBE TABLE` statement should show you how ClickHouse would automatically infer this data, as viewed in the S3 bucket. Notice that it also automatically recognizes and decompresses the gzip compression format:
39+
40+
```sql
41+
DESCRIBE TABLE s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_*.gz', 'TabSeparatedWithNames') SETTINGS describe_compact_output=1
42+
43+
┌─name──────────────────┬─type───────────────┐
44+
│ trip_id │ Nullable(Int64) │
45+
│ vendor_id │ Nullable(Int64) │
46+
│ pickup_date │ Nullable(Date) │
47+
│ pickup_datetime │ Nullable(DateTime) │
48+
│ dropoff_date │ Nullable(Date) │
49+
│ dropoff_datetime │ Nullable(DateTime) │
50+
│ store_and_fwd_flag │ Nullable(Int64) │
51+
│ rate_code_id │ Nullable(Int64) │
52+
│ pickup_longitude │ Nullable(Float64) │
53+
│ pickup_latitude │ Nullable(Float64) │
54+
│ dropoff_longitude │ Nullable(Float64) │
55+
│ dropoff_latitude │ Nullable(Float64) │
56+
│ passenger_count │ Nullable(Int64) │
57+
│ trip_distance │ Nullable(String) │
58+
│ fare_amount │ Nullable(String) │
59+
│ extra │ Nullable(String) │
60+
│ mta_tax │ Nullable(String) │
61+
│ tip_amount │ Nullable(String) │
62+
│ tolls_amount │ Nullable(Float64) │
63+
│ ehail_fee │ Nullable(Int64) │
64+
│ improvement_surcharge │ Nullable(String) │
65+
│ total_amount │ Nullable(String) │
66+
│ payment_type │ Nullable(String) │
67+
│ trip_type │ Nullable(Int64) │
68+
│ pickup │ Nullable(String) │
69+
│ dropoff │ Nullable(String) │
70+
│ cab_type │ Nullable(String) │
71+
│ pickup_nyct2010_gid │ Nullable(Int64) │
72+
│ pickup_ctlabel │ Nullable(Float64) │
73+
│ pickup_borocode │ Nullable(Int64) │
74+
│ pickup_ct2010 │ Nullable(String) │
75+
│ pickup_boroct2010 │ Nullable(String) │
76+
│ pickup_cdeligibil │ Nullable(String) │
77+
│ pickup_ntacode │ Nullable(String) │
78+
│ pickup_ntaname │ Nullable(String) │
79+
│ pickup_puma │ Nullable(Int64) │
80+
│ dropoff_nyct2010_gid │ Nullable(Int64) │
81+
│ dropoff_ctlabel │ Nullable(Float64) │
82+
│ dropoff_borocode │ Nullable(Int64) │
83+
│ dropoff_ct2010 │ Nullable(String) │
84+
│ dropoff_boroct2010 │ Nullable(String) │
85+
│ dropoff_cdeligibil │ Nullable(String) │
86+
│ dropoff_ntacode │ Nullable(String) │
87+
│ dropoff_ntaname │ Nullable(String) │
88+
│ dropoff_puma │ Nullable(Int64) │
89+
└───────────────────────┴────────────────────┘
90+
```
91+
92+
To interact with our S3-based dataset, we prepare a standard `MergeTree` table as our destination. The statement below creates a table named `trips` in the default database. Note that we have chosen to modify some of those data types as inferred above, particularly to not use the [`Nullable()`](https://clickhouse.com/docs/en/sql-reference/data-types/nullable) data type modifier, which could cause some unnecessary additional stored data and some additional performance overhead:
3493

3594
```sql
3695
CREATE TABLE trips

0 commit comments

Comments
 (0)