Skip to content

Commit d622542

Browse files
committed
Add docs for guardrails.
1 parent c57abf9 commit d622542

File tree

1 file changed

+155
-0
lines changed

1 file changed

+155
-0
lines changed
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
Guardrails
2+
**********
3+
4+
ADS provides LangChain compatible guardrails to utilize open-source models or your own models for LLM content moderation.
5+
6+
HuggingFace Measurement as Guardrail
7+
====================================
8+
9+
The ``HuggingFaceEvaluation`` class is designed to take any HuggingFace compatible evaluation and use it as guardrail metrics.
10+
For example, to use the `toxicity measurement <https://huggingface.co/spaces/evaluate-measurement/toxicity>`_ and block any content that has a toxicity score over 0.2:
11+
12+
.. code-block:: python3
13+
14+
from ads.llm.guardrails.huggingface import HuggingFaceEvaluation
15+
16+
# Only allow content with toxicity score less than 0.2
17+
toxicity = HuggingFaceEvaluation(path="toxicity", threshold=0.2)
18+
19+
By default, it uses the `facebook/roberta-hate-speech-dynabench-r4-target<https://huggingface.co/facebook/roberta-hate-speech-dynabench-r4-target>`_ model. You may use a custom model by specifying the ``load_args`` and ``compute_args``. For example, to use the ``DaNLP/da-electra-hatespeech-detection`` model:
20+
21+
.. code-block:: python3
22+
23+
toxicity = HuggingFaceEvaluation(
24+
path="toxicity",
25+
load_args={"config_name": "DaNLP/da-electra-hatespeech-detection"},
26+
compute_args={"toxic_label": "offensive"},
27+
threshold=0.2
28+
)
29+
30+
To check text with the guardrail, simply call the ``invoke()`` method:
31+
32+
.. code-block:: python3
33+
34+
toxicity.invoke("<text to be evaluate>")
35+
36+
By default, an exception will be raised if the metric (toxicity in this case) is over the threshold (``0.2`` in this case). Otherwise the same text is returned.
37+
38+
You can customize this behavior by setting a customized message to be return instead of raising an exception.
39+
40+
.. code-block:: python3
41+
42+
toxicity = HuggingFaceEvaluation(
43+
path="toxicity",
44+
threshold=0.2,
45+
raise_exception=False,
46+
custom_msg="Sorry, but let's discuss something else."
47+
)
48+
49+
Now whenever the input is blocked when calling ``invoke()``, the custom message will be returned.
50+
51+
If you would like to get the value of the metric, you can set the ``return_metrics`` argument to ``True``:
52+
53+
.. code-block:: python3
54+
55+
toxicity = HuggingFaceEvaluation(path="toxicity", threshold=0.2, return_metrics=True)
56+
57+
In this case, a dictionary containing the metrics will be return when calling ``invoke()``. For example:
58+
59+
.. code-block:: python3
60+
61+
toxicity.invoke("Oracle is great.")
62+
63+
will give the following outputs:
64+
65+
.. code-block::
66+
67+
{
68+
'output': 'Oracle is great.',
69+
'metrics': {
70+
'toxicity': [0.00014583684969693422],
71+
'passed': [True]
72+
}
73+
}
74+
75+
Using Guardrail with LangChain
76+
==============================
77+
78+
The ADS guardrail is compatible with LangChain Expression Language (LCEL).
79+
You can use the guardrail with other LangChain components.
80+
In this section we will show how you can use guardrail with a translation application.
81+
The following is a `chain` to translate English to French:
82+
83+
.. code-block:: python3
84+
85+
from langchain.prompts import PromptTemplate
86+
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
87+
from ads.llm import GenerativeAI
88+
89+
# Template for the input text.
90+
template = PromptTemplate.from_template("Translate the text into French.\nText:{text}\nFrench translation: ")
91+
llm = GenerativeAI(compartment_id="<compartment_ocid>")
92+
# Put the output into a dictionary
93+
map_output = RunnableParallel(translation=RunnablePassthrough())
94+
95+
# Build the app as a chain
96+
translation_chain = template | llm | map_output
97+
98+
# Now you have a translation app.
99+
translation_chain.invoke({"text": "How are you?"})
100+
# {'translation': 'Comment ça va?'}
101+
102+
We can add the toxicity guardrail to moderate the user input:
103+
104+
.. code-block:: python3
105+
106+
from ads.llm.guardrails import HuggingFaceEvaluation
107+
108+
# Take the text from the input payload for toxicity evaluation
109+
text = PromptTemplate.from_template("{text}")
110+
# Evaluate the toxicity and block toxic text.
111+
toxicity = HuggingFaceEvaluation(path="toxicity", threshold=0.2)
112+
# Map the text back to a dictionary for the translation prompt template
113+
map_text = RunnableParallel(text=RunnablePassthrough())
114+
115+
guarded_chain = text | toxicity | map_text | template | llm | map_output
116+
117+
The ``guarded_chain`` will only translate inputs that are non-toxic.
118+
An exception will be raised if the toxicity of the input is higher than the threshold.
119+
120+
Guardrail Sequence
121+
==================
122+
123+
The ``GuardrailSequence`` class allows you to do more with guardrail and LangChain. You can convert any LangChain ``RunnableSequence`` in to ``GuardrailSequence`` using the ``from_sequence()`` method. For example, with the ``guarded_chain``:
124+
125+
.. code-block:: python3
126+
127+
from ads.llm.chain import GuardrailSequence
128+
129+
guarded_sequence = GuardrailSequence.from_sequence(guarded_chain)
130+
131+
We can invoke the ``GuardrailSequence`` in the same way. The output of invoking the ``GuardrailSequence`` not only include the output of the chain, but also the information when running the chain, including parameters and metrics.
132+
133+
.. code-block:: python3
134+
135+
output = guarded_sequence.invoke({"text": "Hello"})
136+
# Access the text output from the chain
137+
print(output.data)
138+
# {'translation': 'Bonjour'}
139+
140+
The ``info`` property of the ``output`` contains a list of run info corresponding to each component in the chain.
141+
For example, to access the toxicity metrics (which is from the second component in the chain)
142+
143+
.. code-block:: python3
144+
145+
# Access the metrics of the second component
146+
output.info[1].metrics
147+
# {'toxicity': [0.00020703606423921883], 'passed': [True]}
148+
149+
The ``GuardrailSequence`` will also stop running the chain once the content is blocked by the guardrail. By default, the custom message from the guardrail will be returned as the output of the sequence.
150+
151+
LLM may generate a wide range of contents, especially when the temperature is set to a higher value. With ``GuardrailSequence``, you can specify a maximum number of retry if the content generated by the LLM is blocked by the guardrail. For example, the following ``detoxified_chain`` will keep re-running the sequence for at most 10 times, until the output of the LLM has a toxicity score that is lower than the threshold.
152+
153+
.. code-block:: python3
154+
155+
detoxified_chain = GuardrailSequence.from_sequence(llm | toxicity, max_retry=10)

0 commit comments

Comments
 (0)