Skip to content

Commit 11b23c2

Browse files
authored
Adding docs for post aggs (#33)
1 parent d27773c commit 11b23c2

File tree

5 files changed

+176
-16
lines changed

5 files changed

+176
-16
lines changed

docs/about/contributing.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,5 @@ Bullet is hosted under the [Bullet Github Organization](https://github.com/bulle
99
## Future plans
1010

1111
Our issues are tracked through GitHub on the appropriate repo issues pages. You are welcome to take a look and contribute. Also, feel free to [contact us](contact.md) with any ideas/suggestions/PRs for features mentioned here or anything else you think about!
12+
13+
You can get a list of all the [open issues here](https://github.com/issues?page=1&q=is%3Aopen+is%3Aissue+user%3Abullet-db).
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Replace me with the real documentation.

docs/index.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ See [Quick Start](quick-start/spark.md) to set up Bullet locally using Spark Str
4545
To set up Bullet on a real data stream, you need:
4646

4747
1. To setup the Bullet Backend on a stream processing framework. Currently, we support [Bullet on Storm](backend/storm-setup.md) and [Bullet on Spark](backend/spark-setup.md).
48-
1. Plug in your source of data. See [Getting your data into Bullet](backend/ingestion.md) for details
48+
1. Plug in your source of data. See [Data Ingestion](backend/ingestion.md) or the [DSL](backend/dsl.md) for details
4949
2. Consume your data stream
5050
2. The [Web Service](ws/setup.md) set up to convey queries and return results back from the backend
5151
3. To choose a [PubSub implementation](pubsub/architecture.md) that connects the Web Service and the Backend. We currently support [Kafka](pubsub/kafka.md) and a [REST PubSub](pubsub/rest.md) on any Backend and [Storm DRPC](pubsub/storm-drpc.md) for the Storm Backend.
@@ -111,6 +111,17 @@ Currently we support ```GROUP``` aggregations with the following operations:
111111
| MAX | Returns the maximum of the non-null values in the provided field for all the elements in the group |
112112
| AVG | Computes the average of the non-null values in the provided field for all the elements in the group |
113113

114+
If you ```GROUP``` with no operation, you are performing a ```DISTINCT``` on the field(s). If you ```GROUP``` with no field(s), you are performing the operation(s) across all your data.
115+
116+
## Post Aggregations
117+
118+
Post Aggregations let you perform some operation before finalizing and returning the results to you. This is applied every time a result is returned to you (see below). The current operations supported are:
119+
120+
| Post Aggregation | Meaning |
121+
| ---------------- | ------- |
122+
| ORDER BY | Orders your result by your specified fields in ascending or descending order |
123+
| COMPUTATION | Specify an expression (can be nested expressions) [here](ws/api-json.md#expressions) to do math with or cast fields in your result |
124+
114125
## Windows
115126

116127
Windows in a Bullet query allow you to specify how often you'd like Bullet to return results.
@@ -162,10 +173,11 @@ Implementations of [Bullet on Storm](backend/storm-architecture.md) and [Bullet
162173
## PubSub
163174

164175
The PubSub is responsible for transmitting queries from the API to the Backend and returning results back from the Backend to the clients. It decouples whatever particular Backend you are using with the API.
165-
We currently support three different PubSub implementations:
176+
We currently support four different PubSub implementations:
166177

167178
* [Kafka](pubsub/kafka.md)
168179
* [REST](pubsub/rest.md)
180+
* [Pulsar](pubsub/pulsar.md)
169181
* [Storm DRPC](pubsub/storm-drpc.md) (only for non-windowed queries)
170182

171183
You can also very easily [implement your own](pubsub/architecture.md#implementing-your-own-pubsub) by defining a few interfaces that we provide.

docs/releases.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,24 @@ A PubSub implementation using Kafka as the backing PubSub. Can be used with any
233233
| 2017-09-27 | [**0.1.2**](https://github.com/bullet-db/bullet-kafka/releases/tag/bullet-kafka-0.1.2) | Fixes a bug with config loading | |
234234
| 2017-09-22 | [**0.1.1**](https://github.com/bullet-db/bullet-kafka/releases/tag/bullet-kafka-0.1.1) | First release using the PubSub interfaces | |
235235

236+
## Bullet Pulsar
237+
238+
A PubSub implementation using Pulsar as the backing PubSub. Can be used with any Bullet Backend.
239+
240+
| | |
241+
| -------------------------- | --------------- |
242+
| **Repository** | [https://github.com/bullet-db/bullet-pulsar](https://github.com/bullet-db/bullet-pulsar) |
243+
| **Issues** | [https://github.com/bullet-db/bullet-pulsar/issues](https://github.com/bullet-db/bullet-pulsar/issues) |
244+
| **Last Tag** | [![Latest tag](https://img.shields.io/github/release/bullet-db/bullet-pulsar/all.svg)](https://github.com/bullet-db/bullet-pulsar/tags) |
245+
| **Latest Artifact** | [![Download](https://api.bintray.com/packages/yahoo/maven/bullet-pulsar/images/download.svg)](https://bintray.com/yahoo/maven/bullet-pulsar/_latestVersion) |
246+
| **Package Manager Setup** | [Setup for Maven, Gradle etc](https://bintray.com/bintray/jcenter?filterByPkgName=bullet-pulsar) |
247+
248+
### Releases
249+
250+
| Date | Release | Highlights | APIDocs |
251+
| ------------ | ---------------------------------------------------------------------------------------- | ---------- | ------- |
252+
| 2018-12-10 | [**0.1.0**](https://github.com/bullet-db/bullet-pulsar/releases/tag/bullet-pulsar-0.1.0) | First release using the PubSub interfaces | [JavaDocs](apidocs/bullet-pulsar/0.1.0/index.html) |
253+
236254
## Bullet BQL
237255

238256
A library facilitating the conversion from Bullet BQL queries to Bullet JSON queries

docs/ws/api-json.md

Lines changed: 141 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,39 +2,48 @@
22

33
This section gives a comprehensive overview of the Web Service API for launching Bullet JSON queries.
44

5-
The JSON API is the actual Query format that is expected by the backend. [The BQL API](api-bql.md) is a more
6-
user-friendly API which can also be used - the Web Service will automatically detect the BQL query and convert the
7-
query to this JSON format before submitting it to the backend.
5+
The JSON API is the actual Query format that is expected by the backend. [The BQL API](api-bql.md) is a more user-friendly API which can also be used - the Web Service will automatically detect the BQL query and convert the query to this JSON format before submitting it to the backend. With the addition of Post Aggregations and Expressions,
6+
it is a lot easier to use BQL rather than construct the JSON. The Bullet Web Service also provides [an API](https://github.com/bullet-db/bullet-service/releases/tag/bullet-service-0.4.2) to convert BQL to JSON if you so desire.
87

98
* For info on how to use the UI, see the [UI Usage section](../ui/usage.md)
109
* For examples of specific queries see the [Examples](examples.md) section
1110

11+
## Constituents of a Bullet Query
12+
1213
The main constituents of a Bullet JSON query are:
1314

1415
* __filters__, which determine which records will be consumed by your query
1516
* __projection__, which determines which fields will be projected in the resulting output from Bullet
16-
* __aggregation__, which allows users to aggregate data and perform aggregation operations
17+
* __aggregation__, which allows you to aggregate data and perform aggregation operations
18+
* __postAggregations__, which allows you to perform post aggregations before the result is returned
1719
* __window__, which can be used to return incremental results on "windowed" data
1820
* __duration__, which determines the maximum duration of the query in milliseconds
1921

20-
Fields inside maps can be accessed using the '.' notation in queries. For example,
21-
22-
`myMap.key`
23-
24-
will access the "key" field inside the "myMap" map. There is no support for accessing fields inside Lists yet. Only the entire object can be operated on for now.
2522

2623
The main constituents of a Bullet query listed above create the top level fields of the Bullet query:
2724
```javascript
2825
{
2926
"filters": [{}, {}, ...],
3027
"projection": {},
31-
"aggregation": {}.
28+
"aggregation": {},
29+
"postAggregations": [{}, {}, ...],
3230
"window": {},
3331
"duration": 20000
3432
}
3533
```
3634

37-
We will describe how to specify each of these top-level fields below:
35+
### Accessing Complex Fields
36+
37+
Fields inside maps and lists can be accessed using the '.' notation in queries.
38+
39+
| Complex Field Type | Example |
40+
| ------------------------ | ---------------------- |
41+
| Map of Primitive | `myMap.key` |
42+
| Map of Map of Primitive | `myMap.myInnerMap.key` |
43+
| List of Map/Primitive | `myList.0` |
44+
| List of Map of Primitive | `myListOfMaps.4.key` |
45+
46+
We will now describe how to specify each of these top-level fields below:
3847

3948
## Filters
4049

@@ -102,13 +111,15 @@ The format for a Relational filter is:
102111
"operation": "== | != | <= | >= | < | > | RLIKE | SIZEIS | CONTAINSKEY | CONTAINSVALUE"
103112
"field": "record_field_name | map_field.subfield",
104113
"values": [
105-
{ "kind": "VALUE", "value": "foo"},
106-
{ "kind": "FIELD", "value": "another_record_field_name"}
114+
{ "kind": "VALUE", "type": "BOOLEAN | INTEGER | LONG | FLOAT | DOUBLE | STRING | MAP | LIST", "value": "foo"},
115+
{ "kind": "FIELD", "type": "BOOLEAN | INTEGER | LONG | FLOAT | DOUBLE | STRING | MAP | LIST", "value": "another_record_field_name"}
107116
]
108117
}
109118
```
110119

111-
Note that you may specify ```VALUE``` or ```KIND``` currently for the ```kind``` key in the entries in the ```values``` field above, denoting the type of value this is. As a shortcut, you can also specify the following format for ```VALUE``` kind.
120+
Note that you may specify ```VALUE``` or ```KIND``` currently for the ```kind``` key in the entries in the ```values``` field above, denoting the type of value this is. The ```type``` field is a *optional* and is provided to change the type of the provided ```kind``` (value or field) to the provided type. If you do not provide this type, the value or field provided here will be *casted* to the type of the field (the LHS of the filter).
121+
122+
As a shortcut, you can also specify the following format for ```VALUE``` kind.
112123

113124
```javascript
114125
{
@@ -303,6 +314,122 @@ The following attributes are supported for ```TOP K```:
303314

304315
Note that the ```K``` in ```TOP K``` is specified using the ```size``` field in the ```aggregation``` object.
305316

317+
## Post Aggregations
318+
319+
Post Aggregations allow you to perform some final operations on the aggregated data before it is returned, as the name suggests. It is **optional** and it is performed for each window. For example, you can cast your result field into another type or perform some math.
320+
321+
| Post Aggregation | Meaning |
322+
| ---------------- | ------- |
323+
| ORDER BY | Orders your result by your specified fields in ascending or descending order |
324+
| COMPUTATION | Specify an expression (can be nested expressions) to do math with or cast fields in your result |
325+
326+
The ```"postAggregations"``` field takes a list of these Post Aggregation entries. The __order__ of the various post aggregations in this list determines how they are evaluated. Post aggregations can refer to previous results of post aggregations in the list to chain them.
327+
328+
### ORDER BY
329+
330+
This orders result records based on given fields (in ascending order by default). To sort the records in descending order, use the ```DESC``` ```direction```. You can specify any fields in each record or from previous post aggregations. Note that the ordering is fully typed, so the types of the fields will be used. If multiple fields are specified, ties will be broken from the list of fields from left to right.
331+
332+
```javascript
333+
{
334+
"type": "ORDERBY",
335+
"fields": ["A", "B"],
336+
"direction": "DESC"
337+
}
338+
```
339+
340+
### COMPUTATION
341+
342+
This lets you perform arithmetic on the results in a fully nested way. We currently support ```+```, ```-```, ```*``` and ```/``` as operations. The format for this is:
343+
344+
```javascript
345+
{
346+
"type": "COMPUTATION",
347+
"expression": {}
348+
}
349+
```
350+
351+
#### Expressions
352+
353+
For future extensibility, the ```expression``` in the post aggregation is free form. Currently, we support binary arithmetic operations that can be nested (implying parentheses). This forms a tree of expressions. The leaves of this tree resolve atomic values such as fields or constants. So, there are two kinds of expressions.
354+
355+
##### Binary Expressions
356+
357+
```javascript
358+
{
359+
"operation": "+",
360+
"left": {},
361+
"right": {},
362+
"type": "INTEGER | FLOAT | BOOLEAN | DOUBLE | LONG | STRING"
363+
}
364+
```
365+
, where ```left``` and ```right``` are themselves expressions and ```type``` is used for force cast the result to the given type.
366+
367+
##### Unary Expressions
368+
369+
```javascript
370+
{
371+
"value": {
372+
"kind": "FIELD | VALUE",
373+
"value": "foo.bar",
374+
"type": "INTEGER | FLOAT | BOOLEAN | DOUBLE | LONG | STRING"
375+
}
376+
}
377+
```
378+
379+
These is the same definition value used for filtering mentioned above and can be used to extract fields from the record as your chosen type or use constants as your chosen type.
380+
381+
If casting __fails__ in any of the expressions, the expression is ignored.
382+
383+
Putting all these together, here is a complete example of post aggregation. This first force computes a new field C, which is the result of doing ```(CAST(foo.bar, LONG) + CAST((CAST(1.2, DOUBLE)/CAST(1, INTEGER)), FLOAT)``` or (C: foo.bar + (1.2/1) for each record in the result window and then orders the result by foo.baz first then by the new the field C.
384+
385+
##### Post Aggregation Example
386+
387+
```javascript
388+
{
389+
"postAggregations":[
390+
{
391+
"type":"COMPUTATION",
392+
"expression":{
393+
"operation":"+",
394+
"left":{
395+
"value":{
396+
"kind":"FIELD",
397+
"value":"foo.bar",
398+
"type":"LONG"
399+
}
400+
},
401+
"right":{
402+
"operation":"/",
403+
"left":{
404+
"value":{
405+
"kind":"VALUE",
406+
"value":"1.2",
407+
"type":"DOUBLE",
408+
}
409+
},
410+
"right":{
411+
"value":{
412+
"kind":"VALUE",
413+
"value":"1",
414+
"type":"INTEGER"
415+
}
416+
},
417+
"type":"FLOAT"
418+
},
419+
"newName":"C"
420+
}
421+
},
422+
{
423+
"type":"ORDERBY",
424+
" fields":[
425+
"foo.baz", "C"
426+
],
427+
"direction":"ASC"
428+
}
429+
]
430+
}
431+
```
432+
306433
## Window
307434

308435
The "window" field is **optional** and allows you to instruct Bullet to return incremental results. For example you might want to return the COUNT of a field and return that count every 2 seconds.

0 commit comments

Comments
 (0)