You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the official Python SDK for [*refinery*](https://github.com/code-kern-ai/refinery), the **open-source** data-centric IDE for NLP.
6
6
@@ -12,6 +12,8 @@ This is the official Python SDK for [*refinery*](https://github.com/code-kern-ai
12
12
-[Fetching lookup lists](#fetching-lookup-lists)
13
13
-[Upload files](#upload-files)
14
14
-[Adapters](#adapters)
15
+
-[HuggingFace](#hugging-face)
16
+
-[Sklearn](#sklearn)
15
17
-[Rasa](#rasa)
16
18
-[What's missing?](#whats-missing)
17
19
-[Roadmap](#roadmap)
@@ -120,6 +122,77 @@ Alternatively, you can `rsdk push <path-to-your-file>` via CLI, given that you h
120
122
121
123
### Adapters
122
124
125
+
#### 🤗 Hugging Face
126
+
Transformers are great, but often times, you want to finetune them for your downstream task. With *refinery*, you can do so easily by letting the SDK build the dataset for you that you can use as a plug-and-play base for your training:
From here, you can follow the [finetuning example](https://huggingface.co/docs/transformers/training) provided in the official Hugging Face documentation. A next step could look as follows:
You can use *refinery* to directly pull data into a format you can apply for building [sklearn](https://github.com/scikit-learn/scikit-learn) models. This can look as follows:
180
+
181
+
```python
182
+
from refinery.adapter.embedders import build_classification_dataset
183
+
from sklearn.tree import DecisionTreeClassifier
184
+
185
+
data = build_classification_dataset(client, "headline", "__clickbait", "distilbert-base-uncased")
By the way, we can highly recommend to combine this with [Truss](https://github.com/basetenlabs/truss) for easy model serving!
195
+
123
196
#### Rasa
124
197
*refinery* is perfect to be used for building chatbots with [Rasa](https://github.com/RasaHQ/rasa). We've built an adapter with which you can easily create the required Rasa training data directly from *refinery*.
Builds a classification dataset from a refinery client and a config string.
15
+
16
+
Args:
17
+
client (Client): Refinery client
18
+
sentence_input (str): Name of the column containing the sentence input.
19
+
classification_label (str): Name of the label; if this is a task on the full record, enter the string with as "__<label>". Else, input it as "<attribute>__<label>".
20
+
config_string (Optional[str], optional): Config string for the TransformerSentenceEmbedder. Defaults to None; if None is provided, the text will not be embedded.
21
+
22
+
Returns:
23
+
Dict[str, Dict[str, Any]]: Containing the train and test datasets, with embedded inputs.
"""Build a classification dataset from a refinery client and a config string useable for HuggingFace finetuning.
11
+
12
+
Args:
13
+
client (Client): Refinery client
14
+
sentence_input (str): Name of the column containing the sentence input.
15
+
classification_label (str): Name of the label; if this is a task on the full record, enter the string with as "__<label>". Else, input it as "<attribute>__<label>".
0 commit comments