You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Incremental updates | BE, WS, UI | Push results back to users during the query lifetime. Micro-batching, windowing and other features need to be implemented | In Progress |
20
-
| Bullet on Spark | BE | Implement Bullet on Spark Streaming. Compared with SQL on Spark Streaming which stores data in memory, Bullet will be light-weight | In Progress |
21
19
| Security | WS, UI | The obvious enterprise security for locking down access to the data and the instance of Bullet. Considering SSL, Kerberos, LDAP etc. Ideally, without a database | Planning |
22
-
| In-Memory PubSub | PubSub | For users who don't want a PubSub like Kafka, we could add REST based in-memory PubSub layer that runs in the WS. The backend will then communicate directly with the WS | Planning |
23
-
| LocalForage | UI | Migration the UI to LocalForage to distance ourselves from the relatively small LocalStorage space |[#9](https://github.com/yahoo/bullet-ui/issues/9)|
24
20
| Bullet on X | BE | With the pub/sub feature, Bullet can be implemented on other Stream Processors like Flink, Kafka Streaming, Samza etc | Open |
25
21
| Bullet on Beam | BE | Bullet can be implemented on [Apache Beam](https://beam.apache.org) as an alternative to implementing it on various Stream Processors | Open |
26
-
| SQL API | BE, WS | WS supports an endpoint that converts a SQL-like query into Bullet queries |Open|
22
+
| SQL API | BE, WS | WS supports an endpoint that converts a SQL-like query into Bullet queries |In Progress|
27
23
| Packaging | UI, BE, WS | Github releases and building from source are the only two options for the UI. Docker images or the like for quick setup and to mix and match various pluggable components would be really useful | Open |
28
-
| Spring Boot Reactor | WS | Migrate the Web Service to use Spring Boot reactor instead of servlet containers | Open |
Copy file name to clipboardExpand all lines: docs/backend/ingestion.md
+13-5Lines changed: 13 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,13 @@ Bullet operates on a generic data container that it understands. In order to get
8
8
9
9
## Bullet Record
10
10
11
-
The Bullet Record is a serializable data container based on [Avro](http://avro.apache.org). It is typed and has a generic schema. You can refer to the [Avro Schema](https://github.com/yahoo/bullet-record/blob/master/src/main/avro/BulletAvro.avsc) file for details if you wish to see the internals of the data model. The Bullet Record is also lazy and only deserializes itself when you try to read something from it. So, you can pass it around before sending to Bullet with minimal cost. Partial deserialization is being considered if performance is key. This will let you deserialize a much narrower chunk of the Record if you are just looking for a couple of fields.
11
+
The Bullet backend processes data that must be stored in a [Bullet Record](https://github.com/bullet-db/bullet-record/blob/master/src/main/java/com/yahoo/bullet/record/BulletRecord.java) which is an abstract Java class that can
12
+
be implemented as to be optimized for different backends or use-cases.
13
+
14
+
There are currently two concrete implementations of BulletRecord:
15
+
16
+
1.[SimpleBulletRecord](https://github.com/bullet-db/bullet-record/blob/master/src/main/java/com/yahoo/bullet/record/SimpleBulletRecord.java) which is based on a simple Java HashMap
17
+
2.[AvroBulletRecord](https://github.com/bullet-db/bullet-record/blob/master/src/main/java/com/yahoo/bullet/record/AvroBulletRecord.java) which uses [Avro](http://avro.apache.org) for serialization
12
18
13
19
## Types
14
20
@@ -17,9 +23,11 @@ Data placed into a Bullet Record is strongly typed. We support these types curre
17
23
### Primitives
18
24
19
25
1. Boolean
20
-
2. Long
21
-
3. Double
22
-
4. String
26
+
2. Integer
27
+
3. Long
28
+
4. Float
29
+
5. Double
30
+
6. String
23
31
24
32
### Complex
25
33
@@ -31,7 +39,7 @@ With these types, it is unlikely you would have data that cannot be represented
31
39
32
40
## Installing the Record directly
33
41
34
-
Generally, you depend on the Bullet Core artifact for your Stream Processor when you plug in the piece that gets your data into the Stream processor. The Bullet Core artifact already brings in the Bullet Record container as well. See the usage for the [Storm](storm-setup.md#installation) for an example.
42
+
Generally, you depend on the Bullet Core artifact for your Stream Processor when you plug in the piece that gets your data into the Stream processor. The Bullet Core artifact already brings in the Bullet Record containers as well. See the usage for the [Storm](storm-setup.md#installation) for an example.
35
43
36
44
However, if you need it, the artifacts are available through JCenter to depend on them in code directly. You will need to add the repository. Below is a Maven example:
Copy file name to clipboardExpand all lines: docs/index.md
+24-12Lines changed: 24 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@
20
20
21
21
* Big-data scale-tested - used in production at Yahoo and tested running 500+ queries simultaneously on up to 2,000,000 rps
22
22
23
-
# How is this useful
23
+
# How is Bullet useful
24
24
25
25
How Bullet is used is largely determined by the data source it consumes. Depending on what kind of data you put Bullet on, the types of queries you run on it and your use-cases will change. As a look-forward query system with no persistence, you will not be able to repeat your queries on the same data. The next time you run your query, it will operate on the different data that arrives after that submission. If this usage pattern is what you need and you are looking for a light-weight system that can tap into your streaming data, then Bullet is for you!
26
26
@@ -40,15 +40,15 @@ This instance of Bullet also powers other use-cases such as letting analysts val
40
40
41
41
See [Quick Start](quick-start/bullet-on-spark.md) to set up Bullet locally using spark-streaming. You will generate some synthetic streaming data that you can then query with Bullet.
42
42
43
-
# Setting up Bullet on your streaming data
43
+
# Setup Bullet on your streaming data
44
44
45
45
To set up Bullet on a real data stream, you need:
46
46
47
-
1. To setup the Bullet Backend on a stream processing framework. Currently, we support [Bullet on Storm](backend/storm-setup.md):
47
+
1. To setup the Bullet Backend on a stream processing framework. Currently, we support [Bullet on Storm](backend/storm-setup.md) and [Bullet on Spark](backend/spark-setup.md).
48
48
1. Plug in your source of data. See [Getting your data into Bullet](backend/ingestion.md) for details
49
49
2. Consume your data stream
50
50
2. The [Web Service](ws/setup.md) set up to convey queries and return results back from the backend
51
-
3. To choose a [PubSub implementation](pubsub/architecture.md) that connects the Web Service and the Backend. We currently support [Kafka](pubsub/kafka.md) on any Backend and [Storm DRPC](pubsub/storm-drpc.md) for the Storm Backend.
51
+
3. To choose a [PubSub implementation](pubsub/architecture.md) that connects the Web Service and the Backend. We currently support [Kafka](pubsub/kafka.md)and a [REST PubSub](pubsub/rest.md)on any Backend and [Storm DRPC](pubsub/storm-drpc.md) for the Storm Backend.
52
52
4. The optional [UI](ui/setup.md) set up to talk to your Web Service. You can skip the UI if all your access is programmatic
53
53
54
54
!!! note "Schema in the UI"
@@ -59,9 +59,9 @@ To set up Bullet on a real data stream, you need:
59
59
60
60
# Querying in Bullet
61
61
62
-
Bullet queries allow you to filter, project and aggregate data. It lets you fetch raw (the individual data records) as well as aggregated data.
62
+
Bullet queries allow you to filter, project and aggregate data. You can also specify a window to get incremental results. Bullet lets you fetch raw (the individual data records) as well as aggregated data.
63
63
64
-
* See the [UI Usage section](ui/usage.md) for using the UI to build Bullet queries. This is the same UI you will build in the [Quick Start](quick-start.md)
64
+
* See the [UI Usage section](ui/usage.md) for using the UI to build Bullet queries. This is the same UI you will build in the [Quick Start](quick-start/bullet-on-spark.md)
65
65
66
66
* See the [API section](ws/api.md) for building Bullet API queries
67
67
@@ -111,6 +111,16 @@ Currently we support ```GROUP``` aggregations with the following operations:
111
111
| MAX | Returns the maximum of the non-null values in the provided field for all the elements in the group |
112
112
| AVG | Computes the average of the non-null values in the provided field for all the elements in the group |
113
113
114
+
## Windows
115
+
116
+
Windows in a Bullet query allow you to specify how often you'd like Bullet to return results.
117
+
118
+
For example, you could launch a query for 2 minutes, and have Bullet return a COUNT DISTINCT on a particular field every 3 seconds:
See documentation on [the Web Service API](ws/api.md) for more info.
123
+
114
124
# Results
115
125
116
126
The Bullet Web Service returns your query result as well as associated metadata information in a structured JSON format. The UI can display the results in different formats.
@@ -145,17 +155,19 @@ The Bullet Backend can be split into three main conceptual sub-systems:
145
155
2. Data Processor - reads data from a input stream, converts it to an unified data format and matches it against queries
146
156
3. Combiner - combines results for different queries, performs final aggregations and returns results
147
157
148
-
The core of Bullet querying is not tied to the Backend and lives in a core library. This allows you implement the flow shown above in any stream processor you like. We are currently working on Bullet on [Spark Streaming](https://spark.apache.org/streaming).
158
+
The core of Bullet querying is not tied to the Backend and lives in a core library. This allows you implement the flow shown above in any stream processor you like.
149
159
150
-
## PubSub
160
+
Implementations of [Bullet on Storm](backend/storm-architecture.md) and [Bullet on Spark](backend/spark-architecture.md) are currently supported.
151
161
152
-
The PubSub is responsible for transmitting queries from the API to the Backend and returning results back from the Backend to the clients. It decouples whatever particular Backend you are using with the API. We currently provide a PubSub implementation using Kafka as the transport layer. You can very easily [implement your own](pubsub/architecture.md#implementing-your-own-pubsub) by defining a few interfaces that we provide.
162
+
## PubSub
153
163
154
-
In the case of Bullet on Storm, there is an [additional simplified option](pubsub/storm-drpc.md) using [Storm DRPC](http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) as the PubSub. This layer is planned to only support a request-response model for querying in the future.
164
+
The PubSub is responsible for transmitting queries from the API to the Backend and returning results back from the Backend to the clients. It decouples whatever particular Backend you are using with the API.
165
+
We currently support two different PubSub implementation:
155
166
156
-
!!! note "DRPC PubSub"
167
+
*[Kafka](pubsub/kafka.md)
168
+
*[REST](pubsub/rest.md)
157
169
158
-
This was how Bullet was first implemented in Storm. Storm DRPC provided a really simple way to communicate with Storm that we took advantage of. We provide this as a legacy adapter or for users who use Storm but don't want a PubSub layer.
170
+
You can also very easily [implement your own](pubsub/architecture.md#implementing-your-own-pubsub) by defining a few interfaces that we provide.
Copy file name to clipboardExpand all lines: docs/pubsub/architecture.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,11 @@ This section describes how the Publish-Subscribe or [PubSub layer](../index.md#p
4
4
5
5
## Why a PubSub?
6
6
7
-
When we initially created Bullet, it was built on [Apache Storm](https://storm.apache.org) and leveraged a feature in it called [Storm DRPC](http://storm.apache.org/releases/1.0.3/Distributed-RPC.html) to deliver queries to and extract results from the Bullet Backend. Storm DRPC is supported by a set of clusters that are physically part of the Storm cluster and is a shared resource for the cluster. While many other stream processors support some form of RPC and we could support multiple versions of the Web Service for those, it quickly became clear that abstracting the transport layer from the Web Service to the Backend was needed. This was particularly highlighted when we wanted to switch Bullet queries from operating in a request-response model (one response at the end of the query) to a streaming model. Streaming responses back to the user for a query through DRPC would be cumbersome and require a lot of logic to handle. A PubSub system was a natural solution to this. Since DRPC was a shared resource per cluster, we also were [tying the Backend's scalability](../backend/storm-performance.md#test-4-improving-the-maximum-number-of-simultaneous-raw-queries) to a resource that we didn't control.
7
+
When we initially created Bullet, it was built on [Apache Storm](https://storm.apache.org) and leveraged a feature in it called Storm DRPC to deliver queries to and extract results from the Bullet Backend. Storm DRPC is supported by a set of clusters that are physically part of the Storm cluster and is a shared resource for the cluster. While many other stream processors support some form of RPC and we could support multiple versions of the Web Service for those, it quickly became clear that abstracting the transport layer from the Web Service to the Backend was needed. This was particularly highlighted when we wanted to switch Bullet queries from operating in a request-response model (one response at the end of the query) to a streaming model. Streaming responses back to the user for a query through DRPC would be cumbersome and require a lot of logic to handle. A PubSub system was a natural solution to this. Since DRPC was a shared resource per cluster, we also were [tying the Backend's scalability](../backend/storm-performance.md#test-4-improving-the-maximum-number-of-simultaneous-raw-queries) to a resource that we didn't control.
8
8
9
9
However, we didn't want to pick a particular PubSub like Kafka and restrict a user's choice. So, we added a PubSub layer that was generic and entirely pluggable into both the Backend and the Web Service. We would support a select few like [Kafka](https://github.com/yahoo/bullet-kafka) or [Storm DRPC](https://github.com/yahoo/bullet-storm). See [below](#implementing-your-own-pubsub) for how to create your own.
10
10
11
-
With the transport mechanism abstracted out, it opens up a lot of possibilities like implementing Bullet on other stream processors ([Apache Spark](https://spark.apache.org) is in the works) and adding streaming, incremental results, sharding and much more.
11
+
With the transport mechanism abstracted out, it opens up a lot of possibilities like implementing Bullet on other stream processors, allowing for the development of [Bullet on Spark](../backend/spark-architecture.md) along with other possible implementations in the future.
12
12
13
13
## What does it do?
14
14
@@ -28,7 +28,8 @@ The PubSub layer does not deal with queries and results and just works on instan
28
28
If you want to use an implementation already built, we currently support:
29
29
30
30
1.[Kafka](kafka.md#setup) for any Backend
31
-
2.[Storm DRPC](storm-drpc.md#setup) if you're using Bullet on Storm as your Backend
31
+
2.[REST](rest.md#setup) for any Backend
32
+
3.[Storm DRPC](storm-drpc.md#setup) if you're using Bullet on Storm as your Backend
Copy file name to clipboardExpand all lines: docs/pubsub/storm-drpc.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,8 @@
1
1
# Storm DRPC PubSub
2
2
3
+
!!! note "NOTE: This PubSub only works with old versions of the Storm Backend!"
4
+
Since DRPC is part of Storm, and can only support a single query/response model, this PubSub implementation can only be used with the Storm backend, and cannot support Windowed queries (bullet-storm 0.8.0 and later).
5
+
3
6
Bullet on [Storm](https://storm.apache.org/) can use [Storm DRPC](http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) as a PubSub layer. DRPC or Distributed Remote Procedure Call, is built into Storm and consists of a set of servers that are part of the Storm cluster.
Copy file name to clipboardExpand all lines: docs/quick-start/bullet-on-spark.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,8 +13,6 @@ At the end of this section, you will have:
13
13
* You will need to be on an Unix-based system (Mac OS X, Ubuntu ...) with ```curl``` installed
14
14
* You will need [JDK 8](http://www.oracle.com/technetwork/java/javase/downloads/index.html) installed
15
15
16
-
## To Install and Launch Bullet Locally:
17
-
18
16
### Setup Kafka
19
17
20
18
For this instance of Bullet we will use the kafka PubSub implementation found in [bullet-spark](https://github.com/bullet-db/bullet-spark). So we will first download and run Kafka, and setup a couple Kafka topics.
@@ -180,7 +178,7 @@ Visit [http://localhost:8800](http://localhost:8800) to query your topology with
180
178
If you access the UI from another machine than where your UI is actually running, you will need to edit ```config/env-settings.json```. Since the UI is a client-side app, the machine that your browser is running on will fetch the UI and attempt to use these settings to talk to the Web Service. Since they point to localhost by default, your browser will attempt to connect there and fail. An easy fix is to change ```localhost``` in your env-settings.json to point to the host name where you will hosting the UI. This will be the same as the UI host you use in the browser. You can also do a local port forward on the machine accessing the UI by running:
0 commit comments