|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "da22a6e7", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Coherence Evaluator\n", |
| 9 | + "\n", |
| 10 | + "### Getting Started\n", |
| 11 | + "This sample demonstrates how to use the Coherence evaluator to assess the quality and readability of AI-generated responses. The evaluator measures the ability of the language model to generate text that reads naturally, flows smoothly, and resembles human-like language.\n", |
| 12 | + "\n", |
| 13 | + "Before you begin:\n", |
| 14 | + "```bash\n", |
| 15 | + "pip install azure-ai-evaluation\n", |
| 16 | + "```\n", |
| 17 | + "Set these environment variables with your own values:\n", |
| 18 | + "1) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the \"Name\" column in the \"Models + endpoints\" tab in your Azure AI Foundry project.\n", |
| 19 | + "2) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.\n", |
| 20 | + "3) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.\n", |
| 21 | + "4) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation." |
| 22 | + ] |
| 23 | + }, |
| 24 | + { |
| 25 | + "cell_type": "markdown", |
| 26 | + "id": "48f8cb64", |
| 27 | + "metadata": {}, |
| 28 | + "source": [ |
| 29 | + "The Coherence evaluator assesses the ability of the language model to generate text that reads naturally, flows smoothly, and resembles human-like language in its responses. It measures the readability and user-friendliness of the model's generated responses.\n", |
| 30 | + "\n", |
| 31 | + "Coherence scores range from 1 to 5:\n", |
| 32 | + "\n", |
| 33 | + "<pre>\n", |
| 34 | + "Score 1: Very Poor - The response is completely incoherent with no logical flow or connection between ideas.\n", |
| 35 | + "Score 2: Poor - The response has significant coherence issues with disjointed ideas and unclear transitions.\n", |
| 36 | + "Score 3: Fair - The response is somewhat coherent but has noticeable gaps in logical flow or organization.\n", |
| 37 | + "Score 4: Good - The response is mostly coherent with good flow and logical connections between ideas.\n", |
| 38 | + "Score 5: Excellent - The response is perfectly coherent with smooth, natural flow and clear logical progression.\n", |
| 39 | + "</pre>\n", |
| 40 | + "\n", |
| 41 | + "The evaluation requires one of the following input patterns:\n", |
| 42 | + "\n", |
| 43 | + "**Pattern 1: Query-Response Evaluation**\n", |
| 44 | + "- Query: The user's question or prompt. (string)\n", |
| 45 | + "- Response: The AI-generated response to be evaluated for coherence. (string)\n", |
| 46 | + "\n", |
| 47 | + "**Pattern 2: Conversation Evaluation**\n", |
| 48 | + "- Conversation: A multi-turn conversation where each turn is evaluated for coherence. (dict with \"messages\" key)" |
| 49 | + ] |
| 50 | + }, |
| 51 | + { |
| 52 | + "cell_type": "markdown", |
| 53 | + "id": "542c1340", |
| 54 | + "metadata": {}, |
| 55 | + "source": [ |
| 56 | + "### Initialize Coherence Evaluator" |
| 57 | + ] |
| 58 | + }, |
| 59 | + { |
| 60 | + "cell_type": "code", |
| 61 | + "execution_count": null, |
| 62 | + "id": "35799119", |
| 63 | + "metadata": {}, |
| 64 | + "outputs": [], |
| 65 | + "source": [ |
| 66 | + "import os\n", |
| 67 | + "from azure.ai.evaluation import CoherenceEvaluator, AzureOpenAIModelConfiguration\n", |
| 68 | + "from pprint import pprint\n", |
| 69 | + "\n", |
| 70 | + "model_config = AzureOpenAIModelConfiguration(\n", |
| 71 | + " azure_endpoint=os.environ[\"AZURE_OPENAI_ENDPOINT\"],\n", |
| 72 | + " api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n", |
| 73 | + " api_version=os.environ[\"AZURE_OPENAI_API_VERSION\"],\n", |
| 74 | + " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", |
| 75 | + ")\n", |
| 76 | + "\n", |
| 77 | + "coherence_evaluator = CoherenceEvaluator(model_config=model_config)" |
| 78 | + ] |
| 79 | + }, |
| 80 | + { |
| 81 | + "cell_type": "markdown", |
| 82 | + "id": "b10aff00", |
| 83 | + "metadata": {}, |
| 84 | + "source": [ |
| 85 | + "### Samples" |
| 86 | + ] |
| 87 | + }, |
| 88 | + { |
| 89 | + "cell_type": "markdown", |
| 90 | + "id": "724adfc8", |
| 91 | + "metadata": {}, |
| 92 | + "source": [ |
| 93 | + "#### Query and Response as Strings (str)" |
| 94 | + ] |
| 95 | + }, |
| 96 | + { |
| 97 | + "cell_type": "code", |
| 98 | + "execution_count": null, |
| 99 | + "id": "82a1be93", |
| 100 | + "metadata": {}, |
| 101 | + "outputs": [], |
| 102 | + "source": [ |
| 103 | + "query = \"Can you explain how machine learning works?\"\n", |
| 104 | + "response = \"Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It works by analyzing data patterns to make predictions or decisions. First, the system is trained on a dataset, which allows it to identify patterns and relationships. Then, it uses these learned patterns to make predictions on new, unseen data. The accuracy improves over time as the model processes more data and refines its understanding.\"\n", |
| 105 | + "\n", |
| 106 | + "result = coherence_evaluator(query=query, response=response)\n", |
| 107 | + "pprint(result)" |
| 108 | + ] |
| 109 | + }, |
| 110 | + { |
| 111 | + "cell_type": "markdown", |
| 112 | + "id": "bc250127", |
| 113 | + "metadata": {}, |
| 114 | + "source": [ |
| 115 | + "#### Example of Poor Coherence" |
| 116 | + ] |
| 117 | + }, |
| 118 | + { |
| 119 | + "cell_type": "code", |
| 120 | + "execution_count": null, |
| 121 | + "id": "62ad5478", |
| 122 | + "metadata": {}, |
| 123 | + "outputs": [], |
| 124 | + "source": [ |
| 125 | + "# Poor coherence example\n", |
| 126 | + "query = \"What is blockchain technology?\"\n", |
| 127 | + "response = \"Blockchain is decentralized. Bitcoin uses it. Blocks contain data. Cryptography is involved. Banks might use blockchain. It's secure because of the chain. Mining is how new blocks are added. Digital currencies exist. The ledger is distributed.\"\n", |
| 128 | + "\n", |
| 129 | + "result = coherence_evaluator(query=query, response=response)\n", |
| 130 | + "pprint(result)" |
| 131 | + ] |
| 132 | + }, |
| 133 | + { |
| 134 | + "cell_type": "markdown", |
| 135 | + "id": "1d22e3d6", |
| 136 | + "metadata": {}, |
| 137 | + "source": [ |
| 138 | + "#### Conversation as Dict with Messages" |
| 139 | + ] |
| 140 | + }, |
| 141 | + { |
| 142 | + "cell_type": "code", |
| 143 | + "execution_count": null, |
| 144 | + "id": "556c197b", |
| 145 | + "metadata": {}, |
| 146 | + "outputs": [], |
| 147 | + "source": [ |
| 148 | + "# Multi-turn conversation\n", |
| 149 | + "conversation = {\n", |
| 150 | + " \"messages\": [\n", |
| 151 | + " {\n", |
| 152 | + " \"role\": \"user\",\n", |
| 153 | + " \"content\": \"I'm planning a trip to Japan. What should I know?\"\n", |
| 154 | + " },\n", |
| 155 | + " {\n", |
| 156 | + " \"role\": \"assistant\",\n", |
| 157 | + " \"content\": \"Japan is a fascinating destination with a rich cultural heritage. First, you should know that the best times to visit are spring (March-May) for cherry blossoms and fall (September-November) for autumn colors. The country has an excellent public transportation system, particularly the Shinkansen bullet train network. Additionally, learning a few basic Japanese phrases will be appreciated by locals, though many signs in major cities have English translations.\"\n", |
| 158 | + " },\n", |
| 159 | + " {\n", |
| 160 | + " \"role\": \"user\",\n", |
| 161 | + " \"content\": \"What about the food?\"\n", |
| 162 | + " },\n", |
| 163 | + " {\n", |
| 164 | + " \"role\": \"assistant\",\n", |
| 165 | + " \"content\": \"Japanese cuisine is incredibly diverse and delicious. Beyond the well-known sushi and ramen, you should try regional specialties like okonomiyaki in Osaka or motsunabe in Fukuoka. Most restaurants display plastic food models in their windows, which makes ordering easier even with a language barrier. It's also important to note that tipping is not customary in Japan and can sometimes be considered rude.\"\n", |
| 166 | + " },\n", |
| 167 | + " {\n", |
| 168 | + " \"role\": \"user\",\n", |
| 169 | + " \"content\": \"Are there any cultural customs I should be aware of?\"\n", |
| 170 | + " },\n", |
| 171 | + " {\n", |
| 172 | + " \"role\": \"assistant\",\n", |
| 173 | + " \"content\": \"Yes, there are several important customs to observe. Always remove your shoes when entering homes, temples, and some traditional restaurants. Bowing is a common greeting, though a slight nod is acceptable for foreigners. When using chopsticks, never stick them upright in rice as this resembles a funeral ritual. Finally, be mindful of noise levels on public transportation, as talking on phones or having loud conversations is considered impolite.\"\n", |
| 174 | + " }\n", |
| 175 | + " ]\n", |
| 176 | + "}\n", |
| 177 | + "\n", |
| 178 | + "result = coherence_evaluator(conversation=conversation)\n", |
| 179 | + "pprint(result)" |
| 180 | + ] |
| 181 | + } |
| 182 | + ], |
| 183 | + "metadata": { |
| 184 | + "kernelspec": { |
| 185 | + "display_name": "test_agent_evaluator_prp", |
| 186 | + "language": "python", |
| 187 | + "name": "python3" |
| 188 | + }, |
| 189 | + "language_info": { |
| 190 | + "codemirror_mode": { |
| 191 | + "name": "ipython", |
| 192 | + "version": 3 |
| 193 | + }, |
| 194 | + "file_extension": ".py", |
| 195 | + "mimetype": "text/x-python", |
| 196 | + "name": "python", |
| 197 | + "nbconvert_exporter": "python", |
| 198 | + "pygments_lexer": "ipython3", |
| 199 | + "version": "3.12.9" |
| 200 | + } |
| 201 | + }, |
| 202 | + "nbformat": 4, |
| 203 | + "nbformat_minor": 5 |
| 204 | +} |
0 commit comments