Adding Kafka Security and touching up docs

akshaisarma · akshaisarma · commit 26da4d2bc5ad · 2017-10-26T12:49:57.000-07:00
diff --git a/docs/about/contributing.md b/docs/about/contributing.md
@@ -10,17 +10,19 @@ Read the [human-readable summary](https://yahoocla.herokuapp.com/) of the CLA.
 
 ## Future plans
 
-Here is some selected list of features we are currently considering/working on. Feel free to [contact us](contact.md) with any ideas/suggestions/PRs for features mentioned here or anything else you think about!
+Here is a selected list of features we are currently considering/working on. Feel free to [contact us](contact.md) with any ideas/suggestions/PRs for features mentioned here or anything else you think about!
 
 This list is neither comprehensive nor in any particular order.
 
 | Feature             | Components  | Description               | Status        |
 |-------------------- | ----------- | ------------------------- | ------------- |
-| Incremental updates | BE, WS, UI  | Push results back to users during the query lifetime. Micro-batching, windowing and other features come into play | In Progress |
+| Incremental updates | BE, WS, UI  | Push results back to users during the query lifetime. Micro-batching, windowing and other features need to be implemented | In Progress |
 | Bullet on Spark     | BE          | Implement Bullet on Spark Streaming. Compared with SQL on Spark Streaming which stores data in memory, Bullet will be light-weight | In Progress |
-| Security            | WS, UI      | The obvious enterprise security for locking down access to the data and the instance of Bullet. Considering SSL, Kerberos, LDAP etc. | Planning |
+| Security            | WS, UI      | The obvious enterprise security for locking down access to the data and the instance of Bullet. Considering SSL, Kerberos, LDAP etc. Ideally, without a database | Planning |
+| In-Memory PubSub    | PubSub      | For users who don't want a PubSub like Kafka, we could add REST based in-memory PubSub layer that runs in the WS. The backend will then communicate directly with the WS | Planning |
+| LocalForage         | UI          | Migration the UI to LocalForage to distance ourselves from the relatively small LocalStorage space | [#9](https://github.com/yahoo/bullet-ui/issues/9) |
 | Bullet on X         | BE          | With the pub/sub feature, Bullet can be implemented on other Stream Processors like Flink, Kafka Streaming, Samza etc | Open |
+| Bullet on Beam      | BE          | Bullet can be implemented on [Apache Beam](https://beam.apache.org) as an alternative to implementing it on various Stream Processors | Open |
 | SQL API             | BE, WS      | WS supports an endpoint that converts a SQL-like query into Bullet queries | Open |
-| LocalForage         | UI          | Migration to LocalForage to distance ourselves from the relatively small LocalStorage space | [#9](https://github.com/yahoo/bullet-ui/issues/9) |
+| Packaging           | UI, BE, WS  | Github releases and building from source are the only two options for the UI. Docker images or the like for quick setup and to mix and match various pluggable components would be really useful | Open |
 | Spring Boot Reactor | WS          | Migrate the Web Service to use Spring Boot reactor instead of servlet containers | Open |
-| UI Packaging        | UI          | Github releases and building from source are the only two options. Docker or something similar may be more apt | Open |
diff --git a/docs/pubsub/kafka.md b/docs/pubsub/kafka.md
@@ -14,7 +14,9 @@ You do not need to have two topics. You can have one but you should use multiple
 
 ## Setup
 
-Before setting up, you will obviously need a Kafka cluster setup with your topic(s) created. This cluster need only be a couple of machines. However, this depends on your query and result volumes. Generally, these are at most a few hundred or thousands of messages per second and a small Kafka cluster will suffice.
+Before setting up, you will need a Kafka cluster setup with your topic(s) created. This cluster need only be a couple of machines if it's devoted for Bullet. However, this depends on your query and result volumes. Generally, these are at most a few hundred or thousands of messages per second and a small Kafka cluster will suffice.
+
+To setup Kafka, follow the [instructions here](https://kafka.apache.org/quickstart).
 
 ### Plug into the Backend
 
@@ -67,3 +69,32 @@ You may choose to partition your topics for a couple of reasons:
 3. You may use two topics and partition one or both for sharding across multiple Web Service instances (and multiple instances in your Backend)
 
 You can accomplish all this with partition maps. You can configure what partitions your Publishers (Web Service or Backend) will write to using ```bullet.pubsub.kafka.request.partitions``` and what partitions your Subscribers will read from using ```bullet.pubsub.kafka.response.partitions```. Providing these to an instance of the Web Service or the Backend in the YAML file ensures that the Publishers in that instance only write to these request partitions and Subscribers only read from the response partitions. The Publishers will randomly adds one of the response partitions in the messages sent to ensure that the responses only arrive to one of those partitions this instance's Subscribers are waiting on. For more details, see the [configuration file](https://github.com/yahoo/bullet-kafka/blob/master/src/main/resources/bullet_kafka_defaults.yaml).
+
+## Security
+
+If you're using secure Kafka, you will need to do the necessary metadata setup to make sure your principals have access to your topic(s) for reading and writing. If you're using SSL for securing your Kafka cluster, you will need to add the necessary SSL certificates to the keystore for your JVM before launching the Web Service or the Backend.
+
+### Storm
+
+We have tested Kafka with [Bullet Storm](../releases.md#bullet-storm) using ```Kerberos``` from the Storm cluster and SSL from the Web Service. For Kerberos, you may need to add a ```JAAS``` [config file](https://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html) to the [Storm BlobStore](http://storm.apache.org/releases/1.1.0/distcache-blobstore.html) and add it to your worker JVMs. To do this, you will need a JAAS configuration entry. For example, if your Kerberos KDC is shared with your Storm cluster's KDC, you may be adding a jaas_file.conf with
+
+```
+KafkaClient {
+   org.apache.storm.security.auth.kerberos.AutoTGTKrb5LoginModule required
+   serviceName="kafka";
+};
+```
+
+Put this file into Storm's BlobStore using:
+
+```
+storm blobstore create --file jaas_file.conf --acl o::rwa,u:$USER:rwa --repl-fctr 3 jaas_file.conf
+```
+
+Then while launching your topology, you should provide as arguments to the ```storm jar``` command, the following arguments:
+```
+-c topology.blobstore.map='{"jaas_file.conf": {} }' \
+-c topology.worker.childopts="-Djava.security.auth.login.config=./jaas_file.conf" \
+```
+
+This will add this to all your worker JVMs. You can refresh Kerberos credentials periodically and push credentials to Storm as [mentioned here](storm-drpc.md#security).
diff --git a/docs/pubsub/storm-drpc.md b/docs/pubsub/storm-drpc.md
@@ -30,6 +30,10 @@ bullet.pubsub.class.name: "com.yahoo.bullet.storm.drpc.DRPCPubSub"
 bullet.pubsub.storm.drpc.function: "custom-name"
 ```
 
+#### Security
+
+If your Storm  cluster is secured with ```Kerberos``` (a standard for Big Data platforms), you will need to periodically refresh your Kerberos TGT and push the credentials to your Storm topology. This is generally done with ```kinit``` for your topology user, followed by a ```storm upload-credentials <TOPOLOGY_NAME>```. You would probably run this as a ```cron``` task.
+
 ### Plug into the Web Service
 
 When you're plugging in the DRPC PubSub layer into your Web Service, you will need the Bullet Storm JAR with dependencies that you can download from [JCenter](../releases.md#bullet-storm). The classifier for this JAR is ```fat``` if you are depending on it through Maven. You can also download the JAR for the 0.6.2 version directly through [JCenter here](http://jcenter.bintray.com/com/yahoo/bullet/bullet-storm/0.6.2/).