You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Based on the inferred schema, we cleaned up the data types and added a primary key. Define the following table:
66
+
### Create the table {#create-the-table}
67
+
68
+
Based on the inferred schema, we cleaned up the data types and added a primary key.
69
+
Define the following table:
63
70
64
71
```sql
65
72
CREATETABLEyoutube
@@ -90,7 +97,9 @@ ENGINE = MergeTree
90
97
ORDER BY (uploader, upload_date)
91
98
```
92
99
93
-
3. The following command streams the records from the S3 files into the `youtube` table.
100
+
### Insert data {#insert-data}
101
+
102
+
The following command streams the records from the S3 files into the `youtube` table.
94
103
95
104
:::important
96
105
This inserts a lot of data - 4.65 billion rows. If you do not want the entire dataset, simply add a `LIMIT` clause with the desired number of rows.
@@ -133,7 +142,10 @@ Some comments about our `INSERT` command:
133
142
- The `upload_date` column contains valid dates, but it also contains strings like "4 hours ago" - which is certainly not a valid date. We decided to store the original value in `upload_date_str` and attempt to parse it with `toDate(parseDateTimeBestEffortUSOrZero(upload_date::String))`. If the parsing fails we just get `0`
134
143
- We used `ifNull` to avoid getting `NULL` values in our table. If an incoming value is `NULL`, the `ifNull` function is setting the value to an empty string
135
144
136
-
4. Open a new tab in the SQL Console of ClickHouse Cloud (or a new `clickhouse-client` window) and watch the count increase. It will take a while to insert 4.56B rows, depending on your server resources. (Without any tweaking of settings, it takes about 4.5 hours.)
145
+
### Count the number of rows {#count-row-numbers}
146
+
147
+
Open a new tab in the SQL Console of ClickHouse Cloud (or a new `clickhouse-client` window) and watch the count increase.
148
+
It will take a while to insert 4.56B rows, depending on your server resources. (Without any tweaking of settings, it takes about 4.5 hours.)
137
149
138
150
```sql
139
151
SELECT formatReadableQuantity(count())
@@ -146,7 +158,9 @@ FROM youtube
146
158
└─────────────────────────────────┘
147
159
```
148
160
149
-
5. Once the data is inserted, go ahead and count the number of dislikes of your favorite videos or channels. Let's see how many videos were uploaded by ClickHouse:
161
+
### Explore the data {#explore-the-data}
162
+
163
+
Once the data is inserted, go ahead and count the number of dislikes of your favorite videos or channels. Let's see how many videos were uploaded by ClickHouse:
150
164
151
165
```sql
152
166
SELECTcount()
@@ -166,7 +180,7 @@ WHERE uploader = 'ClickHouse';
166
180
The query above runs so quickly because we chose `uploader` as the first column of the primary key - so it only had to process 237k rows.
167
181
:::
168
182
169
-
6.Let's look and likes and dislikes of ClickHouse videos:
183
+
Let's look and likes and dislikes of ClickHouse videos:
170
184
171
185
```sql
172
186
SELECT
@@ -193,7 +207,7 @@ The response looks like:
193
207
84 rows in set. Elapsed: 0.013 sec. Processed 155.65 thousand rows, 16.94 MB (11.96 million rows/s., 1.30 GB/s.)
194
208
```
195
209
196
-
7.Here is a search for videos with **ClickHouse** in the `title` or `description` fields:
210
+
Here is a search for videos with **ClickHouse** in the `title` or `description` fields:
### If someone disables comments does it lower the chance someone will actually click like or dislike? {#if-someone-disables-comments-does-it-lower-the-chance-someone-will-actually-click-like-or-dislike}
0 commit comments