Skip to content

Commit 3f39554

Browse files
author
Johannes Hötter
authored
Merge pull request #11 from code-kern-ai/dev
adds basic rasa adapter
2 parents 288ce77 + 4ac63b5 commit 3f39554

File tree

5 files changed

+340
-15
lines changed

5 files changed

+340
-15
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
data/
12
.vscode/
23
secrets.json
34

README.md

Lines changed: 132 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,27 +4,31 @@
44

55
# Kern AI API for Python
66

7-
This is the official Python SDK for Kern AI, your IDE for programmatic data enrichment and management.
7+
This is the official Python SDK for [*refinery*](https://github.com/code-kern-ai/refinery), your **open-source** data-centric IDE for NLP.
88

99
## Installation
1010

11-
You can set up this library via either running `$ pip install kern-sdk`, or via cloning this repository and running `$ pip install -r requirements.txt` in this repository.
11+
You can set up this SDK either via running `$ pip install kern-sdk`, or by cloning this repository and running `$ pip install -r requirements.txt`.
1212

1313
## Usage
14-
Once you installed the package, you can access the application from any Python terminal as follows:
14+
15+
### Creating a `Client` object
16+
Once you installed the package, you can create a `Client` object from any Python terminal as follows:
1517

1618
```python
1719
from kern import Client
1820

19-
username = "your-username"
21+
user_name = "your-username"
2022
password = "your-password"
2123
project_id = "your-project-id" # can be found in the URL of the web application
2224

23-
client = Client(username, password, project_id)
25+
client = Client(user_name, password, project_id)
2426
# if you run the application locally, please use the following instead:
2527
# client = Client(username, password, project_id, uri="http://localhost:4455")
2628
```
2729

30+
The `project_id` can be found in your browser, e.g. if you run the app on your localhost: `http://localhost:4455/app/projects/{project_id}/overview`
31+
2832
Alternatively, you can provide a `secrets.json` file in your directory where you want to run the SDK, looking as follows:
2933
```json
3034
{
@@ -33,26 +37,140 @@ Alternatively, you can provide a `secrets.json` file in your directory where you
3337
"project_id": "your-project-id"
3438
}
3539
```
36-
Again, if you run on your local machine, you should also provide `"uri": "http://localhost:4455"`. Afterwards, you can access the client like this:
40+
41+
Again, if you run on your localhost, you should also provide `"uri": "http://localhost:4455"`. Afterwards, you can access the client like this:
42+
3743
```python
3844
client = Client.from_secrets_file("secrets.json")
3945
```
4046

47+
With the `Client`, you easily integrate your data into any kind of system; may it be a custom implementation, an AutoML system or a plain data analytics framework 🚀
48+
49+
### Fetching labeled data
50+
4151
Now, you can easily fetch the data from your project:
4252
```python
43-
df = client.get_record_export()
53+
df = client.get_record_export(tokenize=False)
54+
# if you set tokenize=True (default), the project-specific
55+
# spaCy tokenizer will process your textual data
4456
```
4557

4658
Alternatively, you can also just run `kern pull` in your CLI given that you have provided the `secrets.json` file in the same directory.
4759

48-
The `df` contains data of the following scheme:
49-
- all your record attributes are stored as columns, e.g. `headline` or `running_id` if you uploaded records like `{"headline": "some text", "running_id": 1234}`
50-
- per labeling task three columns:
51-
- `<attribute_name|None>__<labeling_task_name>__MANUAL`: those are the manually set labels of your records
52-
- `<attribute_name|None>__<labeling_task_name>__WEAK SUPERVISION`: those are the weakly supervised labels of your records
53-
- `<attribute_name|None>__<labeling_task_name>__WEAK SUPERVISION_confidence`: those are the probabilities or your weakly supervised labels
60+
The `df` contains both your originally uploaded data (e.g. `headline` and `running_id` if you uploaded records like `{"headline": "some text", "running_id": 1234}`), and a triplet for each labeling task you create. This triplet consists of the manual labels, the weakly supervised labels, and their confidence. For extraction tasks, this data is on token-level.
61+
62+
An example export file looks like this:
63+
```json
64+
[
65+
{
66+
"running_id": "0",
67+
"Headline": "T. Rowe Price (TROW) Dips More Than Broader Markets",
68+
"Date": "Jun-30-22 06:00PM\u00a0\u00a0",
69+
"Headline__Sentiment Label__MANUAL": null,
70+
"Headline__Sentiment Label__WEAK_SUPERVISION": "Negative",
71+
"Headline__Sentiment Label__WEAK_SUPERVISION__confidence": "0.6220"
72+
}
73+
]
74+
```
75+
76+
In this example, there is no manual label, but a weakly supervised label `"Negative"` has been set with 62.2% confidence.
77+
78+
### Fetch lookup lists
79+
- [ ] Todo
80+
81+
### Upload files
82+
- [ ] Todo
83+
84+
### Adapters
85+
86+
#### Rasa
87+
*refinery* is perfect to be used for building chatbots with [Rasa](https://github.com/RasaHQ/rasa). We've built an adapter with which you can easily create the required Rasa training data directly from *refinery*.
88+
89+
To do so, do the following:
90+
91+
```python
92+
from kern.adapter import rasa
93+
94+
rasa.build_intent_yaml(
95+
client,
96+
"text",
97+
"__intent__WEAK_SUPERVISION"
98+
)
99+
```
100+
101+
This will create a `.yml` file looking as follows:
102+
103+
```yml
104+
nlu:
105+
- intent: check_balance
106+
examples: |
107+
- how much do I have on my savings account
108+
- how much money is in my checking account
109+
- What's the balance on my credit card account
110+
```
111+
112+
If you want to provide a metadata-level label (such as sentiment), you can provide the optional argument `metadata_label_task`:
113+
114+
```python
115+
from kern.adapter import rasa
116+
117+
rasa.build_intent_yaml(
118+
client,
119+
"text",
120+
"__intent__WEAK_SUPERVISION",
121+
metadata_label_task="__sentiment__WEAK_SUPERVISION"
122+
)
123+
```
124+
125+
This will create a file like this:
126+
```yml
127+
nlu:
128+
- intent: check_balance
129+
metadata:
130+
sentiment: neutral
131+
examples: |
132+
- how much do I have on my savings account
133+
- how much money is in my checking account
134+
- What's the balance on my credit card account
135+
```
136+
137+
And if you have entities in your texts which you'd like to recognize, simply add the `tokenized_label_task` argument:
138+
139+
```python
140+
from kern.adapter import rasa
141+
142+
rasa.build_intent_yaml(
143+
client,
144+
"text",
145+
"__intent__WEAK_SUPERVISION",
146+
metadata_label_task="__sentiment__WEAK_SUPERVISION",
147+
tokenized_label_task="text__entities__WEAK_SUPERVISION"
148+
)
149+
```
150+
151+
This will not only inject the label names on token-level, but also creates lookup lists for your chatbot:
152+
153+
```yml
154+
nlu:
155+
- intent: check_balance
156+
metadata:
157+
sentiment: neutral
158+
examples: |
159+
- how much do I have on my [savings](account) account
160+
- how much money is in my [checking](account) account
161+
- What's the balance on my [credit card account](account)
162+
- lookup: account
163+
examples: |
164+
- savings
165+
- checking
166+
- credit card account
167+
```
168+
169+
Please make sure to also create the further necessary files (`domain.yml`, `data/stories.yml` and `data/rules.yml`) if you want to train your Rasa chatbot. For further reference, see their [documentation](https://rasa.com/docs/rasa).
170+
171+
### What's missing?
172+
Let us know what open-source/closed-source NLP framework you are using, for which you'd like to have an adapter implemented in the SDK. To do so, simply create an issue in this repository with the tag "enhancement".
54173

55-
With the `client`, you easily integrate your data into any kind of system; may it be a custom implementation, an AutoML system or a plain data analytics framework 🚀
56174

57175
## Roadmap
58176
- [ ] Register heuristics via wrappers
@@ -66,7 +184,6 @@ If you want to have something added, feel free to open an [issue](https://github
66184
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.
67185

68186
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
69-
Don't forget to give the project a star! Thanks again!
70187

71188
1. Fork the Project
72189
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)

kern/adapter/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)