From f0746cd0aa508c310caa1809d8318d5aa8c82530 Mon Sep 17 00:00:00 2001 From: biplob <110578485+bks1984@users.noreply.github.com> Date: Sat, 8 Nov 2025 23:24:03 +0530 Subject: [PATCH 1/4] Created using Colab --- medical_assistant_project.ipynb | 7006 +++++++++++++++++++++++++++++++ 1 file changed, 7006 insertions(+) create mode 100644 medical_assistant_project.ipynb diff --git a/medical_assistant_project.ipynb b/medical_assistant_project.ipynb new file mode 100644 index 0000000..c0a2612 --- /dev/null +++ b/medical_assistant_project.ipynb @@ -0,0 +1,7006 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3CNz35ia6Bz3" + }, + "source": [ + "## Problem Statement" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CkRbhMJH6Bz3" + }, + "source": [ + "### Business Context" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3PBm5xaj6Bz3" + }, + "source": [ + "The healthcare industry is rapidly evolving, with professionals facing increasing challenges in managing vast volumes of medical data while delivering accurate and timely diagnoses. The need for quick access to comprehensive, reliable, and up-to-date medical knowledge is critical for improving patient outcomes and ensuring informed decision-making in a fast-paced environment.\n", + "\n", + "Healthcare professionals often encounter information overload, struggling to sift through extensive research and data to create accurate diagnoses and treatment plans. This challenge is amplified by the need for efficiency, particularly in emergencies, where time-sensitive decisions are vital. Furthermore, access to trusted, current medical information from renowned manuals and research papers is essential for maintaining high standards of care.\n", + "\n", + "To address these challenges, healthcare centers can focus on integrating systems that streamline access to medical knowledge, provide tools to support quick decision-making, and enhance efficiency. Leveraging centralized knowledge platforms and ensuring healthcare providers have continuous access to reliable resources can significantly improve patient care and operational effectiveness." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1xDPsqvO6Bz5" + }, + "source": [ + "**Common Questions to Answer**\n", + "\n", + "1. **Critical Care Protocols:** \"What is the protocol for managing sepsis in a critical care unit?\"\n", + "\n", + "2. **General Surgery:** \"What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?\"\n", + "\n", + "3. **Dermatology:** \"What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?\"\n", + "\n", + "4. **Neurology:** \"What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?\"\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CARPKFwm6Bz4" + }, + "source": [ + "### Objective" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dOElOEXq6Bz4" + }, + "source": [ + "As an AI specialist, your task is to develop a RAG-based AI solution using renowned medical manuals to address healthcare challenges. The objective is to **understand** issues like information overload, **apply** AI techniques to streamline decision-making, **analyze** its impact on diagnostics and patient outcomes, **evaluate** its potential to standardize care practices, and **create** a functional prototype demonstrating its feasibility and effectiveness." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "by9EvAnkSpZf" + }, + "source": [ + "### Data Description" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Jw5LievCSru2" + }, + "source": [ + "The **Merck Manuals** are medical references published by the American pharmaceutical company Merck & Co., that cover a wide range of medical topics, including disorders, tests, diagnoses, and drugs. The manuals have been published since 1899, when Merck & Co. was still a subsidiary of the German company Merck.\n", + "\n", + "The manual is provided as a PDF with over 4,000 pages divided into 23 sections." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lnwETBOE6Bz5" + }, + "source": [ + "## Installing and Importing Necessary Libraries and Dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "q4GgLhZhUM4V", + "outputId": "81845f22-556a-4a3d-fd43-64c05a16b1de" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/449.8 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m\u001b[90m━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m225.3/449.8 kB\u001b[0m \u001b[31m7.0 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m449.8/449.8 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "langchain-chroma 1.0.0 requires chromadb<2.0.0,>=1.0.20, but you have chromadb 1.0.15 which is incompatible.\n", + "langchain-chroma 1.0.0 requires langchain-core<2.0.0,>=1.0.0, but you have langchain-core 0.3.79 which is incompatible.\u001b[0m\u001b[31m\n", + "\u001b[0m" + ] + } + ], + "source": [ + "# Install required libraries\n", + "!pip install -q langchain_community==0.3.27 \\\n", + " langchain==0.3.27 \\\n", + " chromadb==1.0.15 \\\n", + " pymupdf==1.26.3 \\\n", + " tiktoken==0.9.0 \\\n", + " datasets==4.0.0 \\\n", + " evaluate==0.4.5 \\\n", + " langchain_openai==0.3.30" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mDp-EYZH-69E" + }, + "source": [ + "**Note**:\n", + "- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.\n", + "- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "RTY9GN4oWK3g" + }, + "outputs": [], + "source": [ + "# Import core libraries\n", + "import os # Interact with the operating system (e.g., set environment variables)\n", + "import json # Read/write JSON data\n", + "import requests # Make HTTP requests (e.g., API calls); ignore type checker\n", + "\n", + "# Import libraries for working with PDFs and OpenAI\n", + "from langchain.document_loaders import PyMuPDFLoader # Load and extract text from PDF files\n", + "# from langchain_community.document_loaders import PyPDFLoader # Load and extract text from PDF files\n", + "from openai import OpenAI # Access OpenAI's models and services\n", + "\n", + "# Import libraries for processing dataframes and text\n", + "import tiktoken # Tokenizer used for counting and splitting text for models\n", + "import pandas as pd # Load, manipulate, and analyze tabular data\n", + "\n", + "# Import LangChain components for data loading, chunking, embedding, and vector DBs\n", + "from langchain.text_splitter import RecursiveCharacterTextSplitter # Break text into overlapping chunks for processing\n", + "from langchain.embeddings.openai import OpenAIEmbeddings # Create vector embeddings using OpenAI's models # type: ignore\n", + "from langchain.vectorstores import Chroma # Store and search vector embeddings using Chroma DB # type: ignore\n", + "\n", + "from datasets import Dataset # Used to structure the input (questions, answers, contexts etc.) in tabular format\n", + "from langchain_openai import ChatOpenAI # This is needed since LLM is used in metric computation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TtZWqj0wFTS1" + }, + "source": [ + "## Question Answering using LLM" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MfNKCvuzWSI-" + }, + "source": [ + "### OpenAI API Calling and Downloading and Loading the model\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "eMi5GjvNBrlO" + }, + "outputs": [], + "source": [ + "# Load the JSON file and extract values\n", + "file_name = \"config.json\" # Name of the configuration file\n", + "with open(file_name, 'r') as file: # Open the config file in read mode\n", + " config = json.load(file) # Load the JSON content as a dictionary\n", + " OPENAI_API_KEY = config.get(\"OPENAI_API_KEY\") # Extract the API key from the config\n", + " OPENAI_API_BASE = config.get(\"OPENAI_API_BASE\") # Extract the OpenAI base URL from the config\n", + "\n", + "# Store API credentials in environment variables\n", + "os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY # Set API key as environment variable\n", + "os.environ[\"OPENAI_BASE_URL\"] = OPENAI_API_BASE # Set API base URL as environment variable\n", + "\n", + "# Initialize OpenAI client\n", + "client = OpenAI(api_key=OPENAI_API_KEY, base_url=OPENAI_API_BASE) # Create an instance of the OpenAI client" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "Q9vjnCz0WSJC" + }, + "outputs": [], + "source": [ + "# Define a function to get a response\n", + "def ask_llm(user_prompt, max_tokens=512, temperature=0, top_p=0.95): # Complete the code to set default paramenters\n", + " # Create a chat completion using the OpenAI client\n", + " completion = client.chat.completions.create(\n", + " model=\"gpt-4o-mini\", # Complete the code by specifying the model to be used.\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": user_prompt} # User prompt is the input/query to respond to\n", + " ],\n", + " max_tokens=max_tokens, # Max number of tokens to generate in the response\n", + " temperature=temperature, # Controls randomness in output\n", + " top_p=top_p # Controls diversity via nucleus sampling\n", + " )\n", + " return completion.choices[0].message.content # Return the text content from the model's reply" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K8YgK91SFjVY" + }, + "source": [ + "### Question 1: What is the protocol for managing sepsis in a critical care unit?" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "u2Q_QZ4OFjVa", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "outputId": "3f53c399-5a2e-4a32-d696-14c6634039b6" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "\"Managing sepsis in a critical care unit involves a systematic approach that includes early recognition, prompt intervention, and ongoing monitoring. The following is a general protocol based on current guidelines, such as those from the Surviving Sepsis Campaign:\\n\\n### 1. **Early Recognition**\\n - **Identify Symptoms**: Look for signs of infection (fever, chills, tachycardia, tachypnea) and organ dysfunction (altered mental status, hypotension, oliguria).\\n - **Use Screening Tools**: Utilize tools like the qSOFA (quick Sequential Organ Failure Assessment) or SIRS (Systemic Inflammatory Response Syndrome) criteria to identify patients at risk.\\n\\n### 2. **Initial Assessment**\\n - **Obtain Vital Signs**: Monitor blood pressure, heart rate, respiratory rate, and temperature.\\n - **Assess Organ Function**: Evaluate renal function (urine output, creatinine), liver function (bilirubin, liver enzymes), and coagulation status (platelets, INR).\\n\\n### 3. **Immediate Interventions**\\n - **Fluid Resuscitation**: Administer intravenous (IV) fluids (crystalloids) promptly, typically 30 mL/kg within the first 3 hours.\\n - **Antibiotic Therapy**: Start broad-spectrum IV antibiotics within 1 hour of recognition of sepsis. Adjust based on culture results and sensitivity.\\n - **Source Control**: Identify and control the source of infection (e.g., drainage of abscess, removal of infected devices).\\n\\n### 4. **Monitoring and Support**\\n - **Hemodynamic Monitoring**: Use invasive monitoring (e.g., arterial line, central venous pressure) if necessary to guide fluid resuscitation and vasopressor therapy.\\n - **Vasopressors**: If hypotension persists despite adequate fluid resuscitation, initiate vasopressors (e.g., norepinephrine) to maintain mean arterial pressure (MAP) ≥ 65 mmHg.\\n - **Oxygenation and Ventilation**: Provide supplemental oxygen and consider mechanical ventilation if respiratory failure occurs.\\n\\n### 5. **Ongoing Management**\\n - **Reassess Fluid Status**: Continuously evaluate the patient's response to fluids and adjust as necessary.\\n - **Monitor Laboratory Values**: Regularly check lactate levels, complete blood counts, and organ function tests to assess the patient's status.\\n - **Nutritional Support**: Initiate enteral nutrition as soon as feasible, typically within 24\"" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 7 + } + ], + "source": [ + "question_1 = \"What is the protocol for managing sepsis in a critical care unit?\"\n", + "base_prompt_response_1=ask_llm(question_1)\n", + "base_prompt_response_1" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "J6yxICeVFjVc" + }, + "source": [ + "### Question 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "WO1OTE9CFjVd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "outputId": "eeee39fb-3d62-4a5d-e036-08c58622ec45" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "\"Common symptoms of appendicitis include:\\n\\n1. **Abdominal Pain**: Typically starts around the navel and then moves to the lower right abdomen.\\n2. **Loss of Appetite**: A sudden decrease in appetite is common.\\n3. **Nausea and Vomiting**: Often follows the onset of abdominal pain.\\n4. **Fever**: A low-grade fever may develop.\\n5. **Constipation or Diarrhea**: Changes in bowel habits can occur.\\n6. **Abdominal Swelling**: In some cases, the abdomen may become swollen.\\n\\nAppendicitis cannot be effectively treated with medication alone. The standard treatment is surgical removal of the appendix, known as an **appendectomy**. This can be performed using two main techniques:\\n\\n1. **Open Appendectomy**: A larger incision is made in the lower right abdomen to remove the appendix.\\n2. **Laparoscopic Appendectomy**: This is a minimally invasive procedure where several small incisions are made, and the appendix is removed with the aid of a camera and special instruments.\\n\\nLaparoscopic appendectomy is often preferred due to its benefits, including less postoperative pain, shorter recovery time, and minimal scarring. However, the choice of procedure may depend on the patient's specific situation and the surgeon's expertise.\"" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "question_2 = \"What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?\" #Complete the code to define the question #2\n", + "base_prompt_response_2=ask_llm(question_2) #Complete the code to pass the user input\n", + "base_prompt_response_2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oflaoOGiFjVd" + }, + "source": [ + "### Question 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "id": "JFm5Tq7RFjVe", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "outputId": "fd5f4519-f515-4c51-d2ba-bbffc330456b" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'Sudden patchy hair loss, often referred to as alopecia areata, can manifest as localized bald spots on the scalp or other areas of the body. Here are some effective treatments and solutions, as well as potential causes behind this condition:\\n\\n### Possible Causes:\\n1. **Autoimmune Disorders**: The immune system mistakenly attacks hair follicles, leading to hair loss.\\n2. **Genetics**: A family history of alopecia or other autoimmune conditions can increase the risk.\\n3. **Stress**: Physical or emotional stress can trigger hair loss in some individuals.\\n4. **Hormonal Changes**: Changes in hormones, such as those occurring during pregnancy or menopause, can contribute.\\n5. **Nutritional Deficiencies**: Lack of essential nutrients, such as iron, zinc, or vitamins, can affect hair health.\\n6. **Infections**: Fungal infections like tinea capitis can cause patchy hair loss.\\n7. **Other Medical Conditions**: Conditions like thyroid disease or vitiligo can also lead to hair loss.\\n\\n### Effective Treatments:\\n1. **Topical Corticosteroids**: These are often the first line of treatment for alopecia areata. They help reduce inflammation and suppress the immune response.\\n \\n2. **Minoxidil (Rogaine)**: This over-the-counter topical treatment can stimulate hair growth and is sometimes used in conjunction with other therapies.\\n\\n3. **Intralesional Corticosteroid Injections**: For more severe cases, corticosteroids can be injected directly into the bald patches to promote hair regrowth.\\n\\n4. **Immunotherapy**: This involves applying a chemical solution (like diphencyprone) to the scalp to provoke an allergic reaction, which may help stimulate hair growth.\\n\\n5. **Oral Medications**: In some cases, oral corticosteroids or other immunosuppressive drugs may be prescribed for extensive hair loss.\\n\\n6. **Light Therapy (Phototherapy)**: This treatment uses ultraviolet light to stimulate hair follicles and promote regrowth.\\n\\n7. **Nutritional Support**: Ensuring a balanced diet rich in vitamins and minerals can support overall hair health. Supplements may be recommended if deficiencies are identified.\\n\\n8. **Stress Management**: Techniques such as mindfulness, yoga, or therapy can help manage stress, which may contribute to hair loss.\\n\\n9. **Hairpieces or Wigs**: For those who experience significant hair loss, cosmetic solutions like wigs or hairpieces can provide a temporary solution while exploring other treatments.\\n\\n### Consultation with'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 9 + } + ], + "source": [ + "question_3 = \"What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?\" #Complete the code to define the question #3\n", + "base_prompt_response_3=ask_llm(question_3) #Complete the code to pass the user input\n", + "base_prompt_response_3" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WUUqY4FbFjVe" + }, + "source": [ + "### Question 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "DGmG9hYzFjVf", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "outputId": "916a1a9a-95cb-45f3-d6b9-e39f173c25a8" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "\"The treatment for a person who has sustained a physical injury to brain tissue, such as a traumatic brain injury (TBI), can vary widely depending on the severity of the injury, the specific areas of the brain affected, and the resulting impairments. Here are some common approaches to treatment:\\n\\n1. **Emergency Care**: \\n - Immediate medical attention is crucial. This may involve stabilizing the patient, monitoring vital signs, and performing imaging studies (like CT or MRI scans) to assess the extent of the injury.\\n\\n2. **Surgical Interventions**: \\n - In some cases, surgery may be necessary to relieve pressure on the brain, remove blood clots (hematomas), or repair skull fractures.\\n\\n3. **Medication**: \\n - Medications may be prescribed to manage symptoms such as pain, seizures, or inflammation. Corticosteroids may be used to reduce swelling in the brain.\\n\\n4. **Rehabilitation**: \\n - **Physical Therapy**: To improve mobility and strength.\\n - **Occupational Therapy**: To help with daily living skills and regain independence.\\n - **Speech and Language Therapy**: To address communication difficulties and swallowing issues.\\n - **Neuropsychological Therapy**: To help with cognitive rehabilitation, including memory, attention, and problem-solving skills.\\n\\n5. **Psychological Support**: \\n - Counseling or therapy may be beneficial for coping with emotional and psychological challenges following a brain injury, such as depression, anxiety, or changes in personality.\\n\\n6. **Lifestyle Modifications**: \\n - Patients may need to make adjustments to their daily routines, including rest, nutrition, and avoiding activities that could lead to further injury.\\n\\n7. **Supportive Care**: \\n - Family support and education about the injury and its effects can be crucial for recovery. Support groups may also be helpful.\\n\\n8. **Long-term Management**: \\n - Ongoing follow-up with healthcare providers to monitor recovery and manage any long-term effects or complications.\\n\\n9. **Assistive Devices**: \\n - Depending on the nature of the impairment, assistive devices or technology may be recommended to aid in communication, mobility, or daily activities.\\n\\n10. **Alternative Therapies**: \\n - Some individuals may explore complementary therapies such as acupuncture, yoga, or meditation, although these should be discussed with a healthcare provider.\\n\\nIt's important for treatment plans to be individualized, taking into account the specific needs and circumstances of the person affected. A multidisciplinary team approach is often the most\"" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 10 + } + ], + "source": [ + "question_4 = \"What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?\" #Complete the code to define the question #4\n", + "base_prompt_response_4=ask_llm(question_4) #Complete the code to pass the user input\n", + "base_prompt_response_4" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AxbCD0S56VSj" + }, + "source": [ + "### Storing the generated outputs from the base prompt\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "MKc_HzFI6eRb", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 175 + }, + "outputId": "61d2f247-0aff-4547-e253-1b75bb3b1b12" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " questions \\\n", + "0 What is the protocol for managing sepsis in a ... \n", + "1 What are the common symptoms for appendicitis,... \n", + "2 What are the effective treatments or solutions... \n", + "3 What treatments are recommended for a person w... \n", + "\n", + " base_prompt_responses \n", + "0 Managing sepsis in a critical care unit involv... \n", + "1 Common symptoms of appendicitis include:\\n\\n1.... \n", + "2 Sudden patchy hair loss, often referred to as ... \n", + "3 The treatment for a person who has sustained a... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
questionsbase_prompt_responses
0What is the protocol for managing sepsis in a ...Managing sepsis in a critical care unit involv...
1What are the common symptoms for appendicitis,...Common symptoms of appendicitis include:\\n\\n1....
2What are the effective treatments or solutions...Sudden patchy hair loss, often referred to as ...
3What treatments are recommended for a person w...The treatment for a person who has sustained a...
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "result_df", + "summary": "{\n \"name\": \"result_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"questions\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?\",\n \"What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?\",\n \"What is the protocol for managing sepsis in a critical care unit?\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"base_prompt_responses\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Common symptoms of appendicitis include:\\n\\n1. **Abdominal Pain**: Typically starts around the navel and then moves to the lower right abdomen.\\n2. **Loss of Appetite**: A sudden decrease in appetite is common.\\n3. **Nausea and Vomiting**: Often follows the onset of abdominal pain.\\n4. **Fever**: A low-grade fever may develop.\\n5. **Constipation or Diarrhea**: Changes in bowel habits can occur.\\n6. **Abdominal Swelling**: In some cases, the abdomen may become swollen.\\n\\nAppendicitis cannot be effectively treated with medication alone. The standard treatment is surgical removal of the appendix, known as an **appendectomy**. This can be performed using two main techniques:\\n\\n1. **Open Appendectomy**: A larger incision is made in the lower right abdomen to remove the appendix.\\n2. **Laparoscopic Appendectomy**: This is a minimally invasive procedure where several small incisions are made, and the appendix is removed with the aid of a camera and special instruments.\\n\\nLaparoscopic appendectomy is often preferred due to its benefits, including less postoperative pain, shorter recovery time, and minimal scarring. However, the choice of procedure may depend on the patient's specific situation and the surgeon's expertise.\",\n \"The treatment for a person who has sustained a physical injury to brain tissue, such as a traumatic brain injury (TBI), can vary widely depending on the severity of the injury, the specific areas of the brain affected, and the resulting impairments. Here are some common approaches to treatment:\\n\\n1. **Emergency Care**: \\n - Immediate medical attention is crucial. This may involve stabilizing the patient, monitoring vital signs, and performing imaging studies (like CT or MRI scans) to assess the extent of the injury.\\n\\n2. **Surgical Interventions**: \\n - In some cases, surgery may be necessary to relieve pressure on the brain, remove blood clots (hematomas), or repair skull fractures.\\n\\n3. **Medication**: \\n - Medications may be prescribed to manage symptoms such as pain, seizures, or inflammation. Corticosteroids may be used to reduce swelling in the brain.\\n\\n4. **Rehabilitation**: \\n - **Physical Therapy**: To improve mobility and strength.\\n - **Occupational Therapy**: To help with daily living skills and regain independence.\\n - **Speech and Language Therapy**: To address communication difficulties and swallowing issues.\\n - **Neuropsychological Therapy**: To help with cognitive rehabilitation, including memory, attention, and problem-solving skills.\\n\\n5. **Psychological Support**: \\n - Counseling or therapy may be beneficial for coping with emotional and psychological challenges following a brain injury, such as depression, anxiety, or changes in personality.\\n\\n6. **Lifestyle Modifications**: \\n - Patients may need to make adjustments to their daily routines, including rest, nutrition, and avoiding activities that could lead to further injury.\\n\\n7. **Supportive Care**: \\n - Family support and education about the injury and its effects can be crucial for recovery. Support groups may also be helpful.\\n\\n8. **Long-term Management**: \\n - Ongoing follow-up with healthcare providers to monitor recovery and manage any long-term effects or complications.\\n\\n9. **Assistive Devices**: \\n - Depending on the nature of the impairment, assistive devices or technology may be recommended to aid in communication, mobility, or daily activities.\\n\\n10. **Alternative Therapies**: \\n - Some individuals may explore complementary therapies such as acupuncture, yoga, or meditation, although these should be discussed with a healthcare provider.\\n\\nIt's important for treatment plans to be individualized, taking into account the specific needs and circumstances of the person affected. A multidisciplinary team approach is often the most\",\n \"Managing sepsis in a critical care unit involves a systematic approach that includes early recognition, prompt intervention, and ongoing monitoring. The following is a general protocol based on current guidelines, such as those from the Surviving Sepsis Campaign:\\n\\n### 1. **Early Recognition**\\n - **Identify Symptoms**: Look for signs of infection (fever, chills, tachycardia, tachypnea) and organ dysfunction (altered mental status, hypotension, oliguria).\\n - **Use Screening Tools**: Utilize tools like the qSOFA (quick Sequential Organ Failure Assessment) or SIRS (Systemic Inflammatory Response Syndrome) criteria to identify patients at risk.\\n\\n### 2. **Initial Assessment**\\n - **Obtain Vital Signs**: Monitor blood pressure, heart rate, respiratory rate, and temperature.\\n - **Assess Organ Function**: Evaluate renal function (urine output, creatinine), liver function (bilirubin, liver enzymes), and coagulation status (platelets, INR).\\n\\n### 3. **Immediate Interventions**\\n - **Fluid Resuscitation**: Administer intravenous (IV) fluids (crystalloids) promptly, typically 30 mL/kg within the first 3 hours.\\n - **Antibiotic Therapy**: Start broad-spectrum IV antibiotics within 1 hour of recognition of sepsis. Adjust based on culture results and sensitivity.\\n - **Source Control**: Identify and control the source of infection (e.g., drainage of abscess, removal of infected devices).\\n\\n### 4. **Monitoring and Support**\\n - **Hemodynamic Monitoring**: Use invasive monitoring (e.g., arterial line, central venous pressure) if necessary to guide fluid resuscitation and vasopressor therapy.\\n - **Vasopressors**: If hypotension persists despite adequate fluid resuscitation, initiate vasopressors (e.g., norepinephrine) to maintain mean arterial pressure (MAP) \\u2265 65 mmHg.\\n - **Oxygenation and Ventilation**: Provide supplemental oxygen and consider mechanical ventilation if respiratory failure occurs.\\n\\n### 5. **Ongoing Management**\\n - **Reassess Fluid Status**: Continuously evaluate the patient's response to fluids and adjust as necessary.\\n - **Monitor Laboratory Values**: Regularly check lactate levels, complete blood counts, and organ function tests to assess the patient's status.\\n - **Nutritional Support**: Initiate enteral nutrition as soon as feasible, typically within 24\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 11 + } + ], + "source": [ + "# Create the DataFrame\n", + "result_df = pd.DataFrame({\n", + " \"questions\": [question_1, question_2, question_3, question_4],\n", + " \"base_prompt_responses\": [base_prompt_response_1, base_prompt_response_2, base_prompt_response_3, base_prompt_response_4]})\n", + "\n", + "# Display the DataFrame\n", + "result_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KQcOiXwSybZy" + }, + "source": [ + "**Observations:**\n", + "\n", + "1.The base LLM responses are clinically reasonable across critical care, surgery,dermatology, and neurology but lack citations, making them unsuitable for medical decision‑making without RAG.\n", + "\n", + "2.Generated answers are concise and structured but omit protocol depth such as dosage, contraindications, and clinical caveats typically found in medical manuals.\n", + "\n", + "3.The outputs demonstrate general medical knowledge rather than manual‑grounded evidence, confirming the need for retrieval from trusted clinical guidelines." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g5myZ5dOOefc" + }, + "source": [ + "## Question Answering using LLM with Prompt Engineering" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dHbFv8hO7Rjv" + }, + "source": [ + "In the next step, we will use prompt engineering to check the effect of a more detailed and well-engineered prompt on the output of the model." + ] + }, + { + "cell_type": "code", + "execution_count": 122, + "metadata": { + "id": "VMZqTudYBCWv" + }, + "outputs": [], + "source": [ + "system_prompt = \"\"\"\n", + "You are a helpful medical research assistant. Provide concise and accurate answers based on medical knowledge.\n", + "\"\"\" #system prompt" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X69OhTHAX9xO" + }, + "source": [ + "### Defining the function to Generate a Response From the LLM" + ] + }, + { + "cell_type": "code", + "execution_count": 123, + "metadata": { + "id": "x5Wi_VwNkipi" + }, + "outputs": [], + "source": [ + "# Define a function to get a response from the OpenAI chat model\n", + "def response(system_prompt, user_prompt, max_tokens=512, temperature=0, top_p=0.95): # set default paramenters\n", + " # Create a chat completion using the OpenAI client\n", + " completion = client.chat.completions.create(\n", + " model=\"gpt-4o-mini\", # specifying the model to be used.\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": system_prompt}, # System prompt sets the assistant's behavior\n", + " {\"role\": \"user\", \"content\": user_prompt} # User prompt is the input/query to respond to\n", + " ],\n", + " max_tokens=max_tokens, # Max number of tokens to generate in the response\n", + " temperature=temperature, # Controls randomness in output (0 = deterministic)\n", + " top_p=top_p # Controls diversity via nucleus sampling\n", + " )\n", + " return completion.choices[0].message.content # Return the text content from the model's reply" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9Jg3r_LWOeff" + }, + "source": [ + "### Question 1: What is the protocol for managing sepsis in a critical care unit?" + ] + }, + { + "cell_type": "code", + "execution_count": 124, + "metadata": { + "id": "O5zh3HQoOeff", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "outputId": "6f1b0eef-dfaa-4bb0-e590-83b53e3bab22" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'The management of sepsis in a critical care unit typically follows the Surviving Sepsis Campaign guidelines. Here’s a concise protocol:\\n\\n1. **Early Recognition**: Identify sepsis using clinical criteria (e.g., suspected infection plus organ dysfunction).\\n\\n2. **Immediate Resuscitation**:\\n - **Fluid Resuscitation**: Administer intravenous fluids (30 mL/kg of crystalloids within the first 3 hours).\\n - **Vasopressors**: If hypotension persists after fluid resuscitation, initiate norepinephrine to maintain mean arterial pressure (MAP) ≥ 65 mmHg.\\n\\n3. **Antibiotic Therapy**:\\n - Administer broad-spectrum antibiotics within 1 hour of sepsis recognition. Adjust based on culture results and local antibiograms.\\n\\n4. **Source Control**:\\n - Identify and manage the source of infection (e.g., drainage of abscess, removal of infected devices).\\n\\n5. **Monitoring**:\\n - Continuously monitor vital signs, urine output, and laboratory parameters (e.g., lactate levels, complete blood count, renal function).\\n\\n6. **Supportive Care**:\\n - Provide supportive care, including oxygen therapy, mechanical ventilation if needed, and renal replacement therapy for acute kidney injury.\\n\\n7. **Reassessment**:\\n - Reassess hemodynamic status and organ function frequently, adjusting treatment as necessary.\\n\\n8. **Consideration of Corticosteroids**:\\n - In cases of septic shock, consider low-dose corticosteroids (e.g., hydrocortisone) if there is no response to fluid resuscitation and vasopressors.\\n\\n9. **Glucose Control**:\\n - Maintain blood glucose levels between 140-180 mg/dL.\\n\\n10. **Communication and Team Approach**:\\n - Ensure effective communication among the healthcare team and involve specialists as needed.\\n\\nThis protocol should be tailored to individual patient needs and institutional protocols. Regular training and updates on sepsis management are essential for critical care staff.'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 124 + } + ], + "source": [ + "response_with_prompt_eng_1=response(system_prompt,question_1)\n", + "response_with_prompt_eng_1" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iYpyw4HjOeff" + }, + "source": [ + "### Question 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?" + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "metadata": { + "id": "WPPpDM6cOeff", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 70 + }, + "outputId": "e152969a-5fc6-4f54-cffc-108a9fccf818" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "\"Common symptoms of appendicitis include:\\n\\n1. Abdominal pain, often starting near the belly button and then moving to the lower right abdomen.\\n2. Loss of appetite.\\n3. Nausea and vomiting.\\n4. Fever.\\n5. Constipation or diarrhea.\\n6. Abdominal swelling.\\n\\nAppendicitis cannot be effectively treated with medication alone; it typically requires surgical intervention. The standard surgical procedure for treating appendicitis is an appendectomy, which involves the removal of the inflamed appendix. This can be performed as an open surgery or laparoscopically, depending on the case and the surgeon's preference.\"" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 125 + } + ], + "source": [ + "response_with_prompt_eng_2=response(system_prompt,question_2)\n", + "response_with_prompt_eng_2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dRp92JQZOeff" + }, + "source": [ + "### Question 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?" + ] + }, + { + "cell_type": "code", + "execution_count": 126, + "metadata": { + "id": "sC6rrtblOefg", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "outputId": "c3eb7703-3724-414e-c717-e4753b0fc0f7" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'Sudden patchy hair loss, often referred to as alopecia areata, can manifest as localized bald spots on the scalp. Here are effective treatments and potential causes:\\n\\n### Treatments:\\n1. **Corticosteroids**: Topical or intralesional corticosteroids can reduce inflammation and promote hair regrowth.\\n2. **Minoxidil (Rogaine)**: Over-the-counter topical solution that may stimulate hair growth.\\n3. **Immunotherapy**: Treatments like diphencyprone (DPCP) can provoke an allergic reaction to stimulate hair regrowth.\\n4. **Anthralin**: A topical medication that can help in some cases by irritating the skin.\\n5. **JAK Inhibitors**: Oral medications like tofacitinib and ruxolitinib have shown promise in clinical trials for alopecia areata.\\n6. **Light Therapy**: Phototherapy can be beneficial for some patients.\\n7. **Supportive Care**: Counseling and support groups can help manage the psychological impact of hair loss.\\n\\n### Possible Causes:\\n1. **Autoimmune Response**: The immune system mistakenly attacks hair follicles.\\n2. **Genetics**: Family history of alopecia or other autoimmune diseases may increase risk.\\n3. **Stress**: Physical or emotional stress can trigger hair loss.\\n4. **Hormonal Changes**: Changes in hormones, such as those during pregnancy or menopause, can contribute.\\n5. **Nutritional Deficiencies**: Lack of certain nutrients (e.g., iron, vitamin D) may play a role.\\n6. **Infections**: Fungal infections like tinea capitis can cause patchy hair loss.\\n\\nIf experiencing sudden hair loss, it is advisable to consult a healthcare professional or dermatologist for an accurate diagnosis and tailored treatment plan.'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 126 + } + ], + "source": [ + "response_with_prompt_eng_3=response(system_prompt,question_3)\n", + "response_with_prompt_eng_3" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AA45zwyUOefg" + }, + "source": [ + "### Question 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?" + ] + }, + { + "cell_type": "code", + "execution_count": 127, + "metadata": { + "id": "Ue8Lk8uXOefg", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "outputId": "a963bbff-e927-40df-cf52-fe174ba83fd8" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'Treatment for a person who has sustained a physical injury to brain tissue, such as a traumatic brain injury (TBI), typically involves a multidisciplinary approach and may include the following:\\n\\n1. **Emergency Care**: Immediate medical attention may involve stabilizing the patient, ensuring adequate oxygenation, and managing intracranial pressure.\\n\\n2. **Surgery**: In some cases, surgical intervention may be necessary to remove hematomas, repair skull fractures, or relieve pressure on the brain.\\n\\n3. **Medications**: \\n - **Analgesics** for pain management.\\n - **Anticonvulsants** to prevent seizures.\\n - **Diuretics** to reduce swelling.\\n - **Corticosteroids** may be used to decrease inflammation.\\n\\n4. **Rehabilitation**: \\n - **Physical therapy** to improve mobility and strength.\\n - **Occupational therapy** to assist with daily living activities.\\n - **Speech therapy** for communication and swallowing difficulties.\\n - **Neuropsychological therapy** to address cognitive and emotional challenges.\\n\\n5. **Supportive Care**: This may include counseling, support groups, and education for patients and families about the injury and recovery process.\\n\\n6. **Long-term Management**: Ongoing assessment and management of cognitive, emotional, and physical impairments may be necessary, including regular follow-ups with healthcare providers.\\n\\nThe specific treatment plan will depend on the severity of the injury, the areas of the brain affected, and the individual needs of the patient.'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 127 + } + ], + "source": [ + "response_with_prompt_eng_4=response(system_prompt,question_4)\n", + "response_with_prompt_eng_4" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QgSCW-EIDBlA" + }, + "source": [ + "### Storing the generated outputs from the structured prompts" + ] + }, + { + "cell_type": "code", + "execution_count": 128, + "metadata": { + "id": "N1hn6-lxDKy-", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 175 + }, + "outputId": "ce4b9ed8-46da-405f-afbf-5583f64f58ed" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " questions \\\n", + "0 What is the protocol for managing sepsis in a ... \n", + "1 What are the common symptoms for appendicitis,... \n", + "2 What are the effective treatments or solutions... \n", + "3 What treatments are recommended for a person w... \n", + "\n", + " base_prompt_responses \\\n", + "0 Managing sepsis in a critical care unit involv... \n", + "1 Common symptoms of appendicitis include:\\n\\n1.... \n", + "2 Sudden patchy hair loss, often referred to as ... \n", + "3 The treatment for a person who has sustained a... \n", + "\n", + " responses_with_prompt_eng \\\n", + "0 The management of sepsis in a critical care un... \n", + "1 Common symptoms of appendicitis include:\\n\\n1.... \n", + "2 Sudden patchy hair loss, often referred to as ... \n", + "3 Treatment for a person who has sustained a phy... \n", + "\n", + " responses_with_RAG \n", + "0 Answer:\\nThe protocol for managing sepsis in a... \n", + "1 Answer:\\nThe common symptoms of appendicitis i... \n", + "2 Answer:\\nThe effective treatment for sudden pa... \n", + "3 Answer:\\nInitial treatment for a person who ha... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
questionsbase_prompt_responsesresponses_with_prompt_engresponses_with_RAG
0What is the protocol for managing sepsis in a ...Managing sepsis in a critical care unit involv...The management of sepsis in a critical care un...Answer:\\nThe protocol for managing sepsis in a...
1What are the common symptoms for appendicitis,...Common symptoms of appendicitis include:\\n\\n1....Common symptoms of appendicitis include:\\n\\n1....Answer:\\nThe common symptoms of appendicitis i...
2What are the effective treatments or solutions...Sudden patchy hair loss, often referred to as ...Sudden patchy hair loss, often referred to as ...Answer:\\nThe effective treatment for sudden pa...
3What treatments are recommended for a person w...The treatment for a person who has sustained a...Treatment for a person who has sustained a phy...Answer:\\nInitial treatment for a person who ha...
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "result_df", + "summary": "{\n \"name\": \"result_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"questions\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?\",\n \"What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?\",\n \"What is the protocol for managing sepsis in a critical care unit?\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"base_prompt_responses\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Common symptoms of appendicitis include:\\n\\n1. **Abdominal Pain**: Typically starts around the navel and then moves to the lower right abdomen.\\n2. **Loss of Appetite**: A sudden decrease in appetite is common.\\n3. **Nausea and Vomiting**: Often follows the onset of abdominal pain.\\n4. **Fever**: A low-grade fever may develop.\\n5. **Constipation or Diarrhea**: Changes in bowel habits can occur.\\n6. **Abdominal Swelling**: In some cases, the abdomen may become swollen.\\n\\nAppendicitis cannot be effectively treated with medication alone. The standard treatment is surgical removal of the appendix, known as an **appendectomy**. This can be performed using two main techniques:\\n\\n1. **Open Appendectomy**: A larger incision is made in the lower right abdomen to remove the appendix.\\n2. **Laparoscopic Appendectomy**: This is a minimally invasive procedure where several small incisions are made, and the appendix is removed with the aid of a camera and special instruments.\\n\\nLaparoscopic appendectomy is often preferred due to its benefits, including less postoperative pain, shorter recovery time, and minimal scarring. However, the choice of procedure may depend on the patient's specific situation and the surgeon's expertise.\",\n \"The treatment for a person who has sustained a physical injury to brain tissue, such as a traumatic brain injury (TBI), can vary widely depending on the severity of the injury, the specific areas of the brain affected, and the resulting impairments. Here are some common approaches to treatment:\\n\\n1. **Emergency Care**: \\n - Immediate medical attention is crucial. This may involve stabilizing the patient, monitoring vital signs, and performing imaging studies (like CT or MRI scans) to assess the extent of the injury.\\n\\n2. **Surgical Interventions**: \\n - In some cases, surgery may be necessary to relieve pressure on the brain, remove blood clots (hematomas), or repair skull fractures.\\n\\n3. **Medication**: \\n - Medications may be prescribed to manage symptoms such as pain, seizures, or inflammation. Corticosteroids may be used to reduce swelling in the brain.\\n\\n4. **Rehabilitation**: \\n - **Physical Therapy**: To improve mobility and strength.\\n - **Occupational Therapy**: To help with daily living skills and regain independence.\\n - **Speech and Language Therapy**: To address communication difficulties and swallowing issues.\\n - **Neuropsychological Therapy**: To help with cognitive rehabilitation, including memory, attention, and problem-solving skills.\\n\\n5. **Psychological Support**: \\n - Counseling or therapy may be beneficial for coping with emotional and psychological challenges following a brain injury, such as depression, anxiety, or changes in personality.\\n\\n6. **Lifestyle Modifications**: \\n - Patients may need to make adjustments to their daily routines, including rest, nutrition, and avoiding activities that could lead to further injury.\\n\\n7. **Supportive Care**: \\n - Family support and education about the injury and its effects can be crucial for recovery. Support groups may also be helpful.\\n\\n8. **Long-term Management**: \\n - Ongoing follow-up with healthcare providers to monitor recovery and manage any long-term effects or complications.\\n\\n9. **Assistive Devices**: \\n - Depending on the nature of the impairment, assistive devices or technology may be recommended to aid in communication, mobility, or daily activities.\\n\\n10. **Alternative Therapies**: \\n - Some individuals may explore complementary therapies such as acupuncture, yoga, or meditation, although these should be discussed with a healthcare provider.\\n\\nIt's important for treatment plans to be individualized, taking into account the specific needs and circumstances of the person affected. A multidisciplinary team approach is often the most\",\n \"Managing sepsis in a critical care unit involves a systematic approach that includes early recognition, prompt intervention, and ongoing monitoring. The following is a general protocol based on current guidelines, such as those from the Surviving Sepsis Campaign:\\n\\n### 1. **Early Recognition**\\n - **Identify Symptoms**: Look for signs of infection (fever, chills, tachycardia, tachypnea) and organ dysfunction (altered mental status, hypotension, oliguria).\\n - **Use Screening Tools**: Utilize tools like the qSOFA (quick Sequential Organ Failure Assessment) or SIRS (Systemic Inflammatory Response Syndrome) criteria to identify patients at risk.\\n\\n### 2. **Initial Assessment**\\n - **Obtain Vital Signs**: Monitor blood pressure, heart rate, respiratory rate, and temperature.\\n - **Assess Organ Function**: Evaluate renal function (urine output, creatinine), liver function (bilirubin, liver enzymes), and coagulation status (platelets, INR).\\n\\n### 3. **Immediate Interventions**\\n - **Fluid Resuscitation**: Administer intravenous (IV) fluids (crystalloids) promptly, typically 30 mL/kg within the first 3 hours.\\n - **Antibiotic Therapy**: Start broad-spectrum IV antibiotics within 1 hour of recognition of sepsis. Adjust based on culture results and sensitivity.\\n - **Source Control**: Identify and control the source of infection (e.g., drainage of abscess, removal of infected devices).\\n\\n### 4. **Monitoring and Support**\\n - **Hemodynamic Monitoring**: Use invasive monitoring (e.g., arterial line, central venous pressure) if necessary to guide fluid resuscitation and vasopressor therapy.\\n - **Vasopressors**: If hypotension persists despite adequate fluid resuscitation, initiate vasopressors (e.g., norepinephrine) to maintain mean arterial pressure (MAP) \\u2265 65 mmHg.\\n - **Oxygenation and Ventilation**: Provide supplemental oxygen and consider mechanical ventilation if respiratory failure occurs.\\n\\n### 5. **Ongoing Management**\\n - **Reassess Fluid Status**: Continuously evaluate the patient's response to fluids and adjust as necessary.\\n - **Monitor Laboratory Values**: Regularly check lactate levels, complete blood counts, and organ function tests to assess the patient's status.\\n - **Nutritional Support**: Initiate enteral nutrition as soon as feasible, typically within 24\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"responses_with_prompt_eng\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Common symptoms of appendicitis include:\\n\\n1. Abdominal pain, often starting near the belly button and then moving to the lower right abdomen.\\n2. Loss of appetite.\\n3. Nausea and vomiting.\\n4. Fever.\\n5. Constipation or diarrhea.\\n6. Abdominal swelling.\\n\\nAppendicitis cannot be effectively treated with medication alone; it typically requires surgical intervention. The standard surgical procedure for treating appendicitis is an appendectomy, which involves the removal of the inflamed appendix. This can be performed as an open surgery or laparoscopically, depending on the case and the surgeon's preference.\",\n \"Treatment for a person who has sustained a physical injury to brain tissue, such as a traumatic brain injury (TBI), typically involves a multidisciplinary approach and may include the following:\\n\\n1. **Emergency Care**: Immediate medical attention may involve stabilizing the patient, ensuring adequate oxygenation, and managing intracranial pressure.\\n\\n2. **Surgery**: In some cases, surgical intervention may be necessary to remove hematomas, repair skull fractures, or relieve pressure on the brain.\\n\\n3. **Medications**: \\n - **Analgesics** for pain management.\\n - **Anticonvulsants** to prevent seizures.\\n - **Diuretics** to reduce swelling.\\n - **Corticosteroids** may be used to decrease inflammation.\\n\\n4. **Rehabilitation**: \\n - **Physical therapy** to improve mobility and strength.\\n - **Occupational therapy** to assist with daily living activities.\\n - **Speech therapy** for communication and swallowing difficulties.\\n - **Neuropsychological therapy** to address cognitive and emotional challenges.\\n\\n5. **Supportive Care**: This may include counseling, support groups, and education for patients and families about the injury and recovery process.\\n\\n6. **Long-term Management**: Ongoing assessment and management of cognitive, emotional, and physical impairments may be necessary, including regular follow-ups with healthcare providers.\\n\\nThe specific treatment plan will depend on the severity of the injury, the areas of the brain affected, and the individual needs of the patient.\",\n \"The management of sepsis in a critical care unit typically follows the Surviving Sepsis Campaign guidelines. Here\\u2019s a concise protocol:\\n\\n1. **Early Recognition**: Identify sepsis using clinical criteria (e.g., suspected infection plus organ dysfunction).\\n\\n2. **Immediate Resuscitation**:\\n - **Fluid Resuscitation**: Administer intravenous fluids (30 mL/kg of crystalloids within the first 3 hours).\\n - **Vasopressors**: If hypotension persists after fluid resuscitation, initiate norepinephrine to maintain mean arterial pressure (MAP) \\u2265 65 mmHg.\\n\\n3. **Antibiotic Therapy**:\\n - Administer broad-spectrum antibiotics within 1 hour of sepsis recognition. Adjust based on culture results and local antibiograms.\\n\\n4. **Source Control**:\\n - Identify and manage the source of infection (e.g., drainage of abscess, removal of infected devices).\\n\\n5. **Monitoring**:\\n - Continuously monitor vital signs, urine output, and laboratory parameters (e.g., lactate levels, complete blood count, renal function).\\n\\n6. **Supportive Care**:\\n - Provide supportive care, including oxygen therapy, mechanical ventilation if needed, and renal replacement therapy for acute kidney injury.\\n\\n7. **Reassessment**:\\n - Reassess hemodynamic status and organ function frequently, adjusting treatment as necessary.\\n\\n8. **Consideration of Corticosteroids**:\\n - In cases of septic shock, consider low-dose corticosteroids (e.g., hydrocortisone) if there is no response to fluid resuscitation and vasopressors.\\n\\n9. **Glucose Control**:\\n - Maintain blood glucose levels between 140-180 mg/dL.\\n\\n10. **Communication and Team Approach**:\\n - Ensure effective communication among the healthcare team and involve specialists as needed.\\n\\nThis protocol should be tailored to individual patient needs and institutional protocols. Regular training and updates on sepsis management are essential for critical care staff.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"responses_with_RAG\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Answer:\\nThe common symptoms of appendicitis include epigastric or periumbilical pain followed by nausea, vomiting, and anorexia, with pain shifting to the right lower quadrant. Classic signs include right lower quadrant tenderness at McBurney's point, Rovsing sign, psoas sign, and obturator sign. Appendicitis cannot be cured via medicine; the treatment is surgical removal, specifically an open or laparoscopic appendectomy.\\n\\nSource:\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition, Chapter 11. Acute Abdomen & Surgical Gastroenterology, pages 163.\",\n \"Answer:\\nInitial treatment for a person who has sustained a physical injury to brain tissue includes ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. Surgery may be needed for severe injuries to monitor and treat intracranial pressure, decompress the brain, or remove hematomas. Subsequently, many patients require rehabilitation, which should be planned early and may involve a team approach including physical, occupational, and speech therapy, as well as cognitive therapy for those with severe cognitive dysfunction.\\n\\nSource:\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition, Chapter 324. Traumatic Brain Injury, and Chapter 350. Rehabilitation.\",\n \"Answer:\\nThe protocol for managing sepsis in a critical care unit includes the following steps: \\n1. Obtain specimens of blood, body fluids, and wound sites for Gram stain and culture before starting parenteral antibiotics.\\n2. Initiate very prompt empiric antibiotic therapy immediately after suspecting sepsis, which may include gentamicin or tobramycin plus a 3rd-generation cephalosporin (e.g., cefotaxime or ceftriaxone), or ceftazidime plus a fluoroquinolone if Pseudomonas is suspected. Vancomycin should be added if resistant staphylococci or enterococci are suspected, and if there is an abdominal source, include a drug effective against anaerobes (e.g., metronidazole).\\n3. Change the antibiotic regimen based on culture and sensitivity results when available, continuing antibiotics for at least 5 days after shock resolves and evidence of infection subsides.\\n4. Drain abscesses and surgically excise necrotic tissues as necessary.\\n5. Monitor and manage blood glucose levels with a continuous IV insulin infusion to maintain glucose between 80 to 110 mg/dL.\\n6. Provide supportive care, including adequate nutrition and prevention of infections and complications.\\n\\nSource:\\nCritical Care Medicine, Chapter 222. Approach to the Critically Ill Patient.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 128 + } + ], + "source": [ + "# creating a dataframe\n", + "result_df['responses_with_prompt_eng'] = [response_with_prompt_eng_1, response_with_prompt_eng_2, response_with_prompt_eng_3, response_with_prompt_eng_4]\n", + "\n", + "# Display the DataFrame\n", + "result_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kStw6hGgyhzW" + }, + "source": [ + "**Observations**:\n", + "\n", + "1.The base LLM responses are clinically reasonable across critical care, surgery,dermatology, and neurology but lack citations, making them unsuitable for medical decision‑making without RAG.\n", + "\n", + "2.Generated answers are concise and structured but omit protocol depth such as dosage, contraindications, and clinical caveats typically found in medical manuals.\n", + "\n", + "3.The outputs demonstrate general medical knowledge rather than manual‑grounded evidence, confirming the need for retrieval from trusted clinical guidelines.\n", + "\n", + "4.The system consistently returns answers for all four use‑case questions, validating the basic inference pipeline before adding context‑grounding.\n", + "\n", + "5.The notebook setup correctly loads configs and models, but embedding/RAG steps are not yet grounded in medical source documents.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "t_O1PGdNO2M9" + }, + "source": [ + "## Data Preparation for RAG" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uTpWESc53dL9" + }, + "source": [ + "### Loading the Data" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "ksv9hSCR4BM_" + }, + "outputs": [], + "source": [ + "manual_pdf_path = \"/content/medical_diagnosis_manual.pdf\" #Complete the code to define the file name" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "id": "jhf34I1eYNtR" + }, + "outputs": [], + "source": [ + "pdf_loader = PyMuPDFLoader(manual_pdf_path)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "id": "YChLS31TxC3-" + }, + "outputs": [], + "source": [ + "manual = pdf_loader.load()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ffj0ca3eZT4u" + }, + "source": [ + "### Data Overview" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f9weTDzMxRRS" + }, + "source": [ + "#### Checking the first 5 pages" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "id": "JSOv3q2pxX4z", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "dec9cda4-d282-4e27-d978-e94c152b4028" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Page Number : 1\n", + "biplobsinha25@gmail.com\n", + "9X5AUD3EIR\n", + "ant for personal use by biplobsinha25@g\n", + "shing the contents in part or full is liable\n", + "Page Number : 2\n", + "biplobsinha25@gmail.com\n", + "9X5AUD3EIR\n", + "This file is meant for personal use by biplobsinha25@gmail.com only.\n", + "Sharing or publishing the contents in part or full is liable for legal action.\n", + "Page Number : 3\n", + "Table of Contents\n", + "1\n", + "Front ................................................................................................................................................................................................................\n", + "1\n", + "Cover .......................................................................................................................................................................................................\n", + "2\n", + "Front Matter ...........................................................................................................................................................................................\n", + "53\n", + "1 - Nutritional Disorders ...............................................................................................................................................................\n", + "53\n", + "Chapter 1. Nutrition: General Considerations .....................................................................................................................\n", + "59\n", + "Chapter 2. Undernutrition .............................................................................................................................................................\n", + "69\n", + "Chapter 3. Nutritional Support ...................................................................................................................................................\n", + "76\n", + "Chapter 4. Vitamin Deficiency, Dependency & Toxicity ..................................................................................................\n", + "99\n", + "Chapter 5. Mineral Deficiency & Toxicity ..............................................................................................................................\n", + "108\n", + "Chapter 6. Obesity & the Metabolic Syndrome ...............................................................................................................\n", + "120\n", + "2 - Gastrointestinal Disorders ..............................................................................................................................................\n", + "120\n", + "Chapter 7. Approach to the Patient With Upper GI Complaints ...............................................................................\n", + "132\n", + "Chapter 8. Approach to the Patient With Lower GI Complaints ...............................................................................\n", + "143\n", + "Chapter 9. Diagnostic & Therapeutic GI Procedures ....................................................................................................\n", + "150\n", + "Chapter 10. GI Bleeding ............................................................................................................................................................\n", + "158\n", + "Chapter 11. Acute Abdomen & Surgical Gastroenterology .........................................................................................\n", + "172\n", + "Chapter 12. Esophageal & Swallowing Disorders ..........................................................................................................\n", + "183\n", + "Chapter 13. Gastritis & Peptic Ulcer Disease ..................................................................................................................\n", + "196\n", + "Chapter 14. Bezoars & Foreign Bodies ..............................................................................................................................\n", + "199\n", + "Chapter 15. Pancreatitis ............................................................................................................................................................\n", + "206\n", + "Chapter 16. Gastroenteritis ......................................................................................................................................................\n", + "213\n", + "Chapter 17. Malabsorption Syndromes ..............................................................................................................................\n", + "225\n", + "Chapter 18. Irritable Bowel Syndrome ................................................................................................................................\n", + "229\n", + "Chapter 19. Inflammatory Bowel Disease .........................................................................................................................\n", + "241\n", + "Chapter 20. Diverticular Disease ...........................................................................................................................................\n", + "246\n", + "Chapter 21. Anorectal Disorders ............................................................................................................................................\n", + "254\n", + "Chapter 22. Tumors of the GI Tract ......................................................................................................................................\n", + "275\n", + "3 - Hepatic & Biliary Disorders ............................................................................................................................................\n", + "275\n", + "Chapter 23. Approach to the Patient With Liver Disease ...........................................................................................\n", + "294\n", + "Chapter 24. Testing for Hepatic & Biliary Disorders ......................................................................................................\n", + "305\n", + "Chapter 25. Drugs & the Liver ................................................................................................................................................\n", + "308\n", + "Chapter 26. Alcoholic Liver Disease ....................................................................................................................................\n", + "314\n", + "Chapter 27. Fibrosis & Cirrhosis ............................................................................................................................................\n", + "322\n", + "Chapter 28. Hepatitis ..................................................................................................................................................................\n", + "333\n", + "Chapter 29. Vascular Disorders of the Liver .....................................................................................................................\n", + "341\n", + "Chapter 30. Liver Masses & Granulomas ..........................................................................................................................\n", + "348\n", + "Chapter 31. Gallbladder & Bile Duct Disorders ...............................................................................................................\n", + "362\n", + "4 - Musculoskeletal & Connective Tissue Disorders .........................................................................................\n", + "362\n", + "Chapter 32. Approach to the Patient With Joint Disease ............................................................................................\n", + "373\n", + "Chapter 33. Autoimmune Rheumatic Disorders ..............................................................................................................\n", + "391\n", + "Chapter 34. Vasculitis .................................................................................................................................................................\n", + "416\n", + "Chapter 35. Joint Disorders .....................................................................................................................................................\n", + "435\n", + "Chapter 36. Crystal-Induced Arthritides ..............................................................................................................................\n", + "443\n", + "Chapter 37. Osteoporosis .........................................................................................................................................................\n", + "448\n", + "Chapter 38. Paget's Disease of Bone ..................................................................................................................................\n", + "451\n", + "Chapter 39. Osteonecrosis .......................................................................................................................................................\n", + "455\n", + "Chapter 40. Infections of Joints & Bones ...........................................................................................................................\n", + "463\n", + "Chapter 41. Bursa, Muscle & Tendon Disorders .............................................................................................................\n", + "470\n", + "Chapter 42. Neck & Back Pain ...............................................................................................................................................\n", + "481\n", + "Chapter 43. Hand Disorders ....................................................................................................................................................\n", + "biplobsinha25@gmail.com\n", + "9X5AUD3EIR\n", + "This file is meant for personal use by biplobsinha25@gmail.com only.\n", + "Sharing or publishing the contents in part or full is liable for legal action.\n", + "Page Number : 4\n", + "491\n", + "Chapter 44. Foot & Ankle Disorders .....................................................................................................................................\n", + "502\n", + "Chapter 45. Tumors of Bones & Joints ...............................................................................................................................\n", + "510\n", + "5 - Ear, Nose, Throat & Dental Disorders ..................................................................................................................\n", + "510\n", + "Chapter 46. Approach to the Patient With Ear Problems ...........................................................................................\n", + "523\n", + "Chapter 47. Hearing Loss .........................................................................................................................................................\n", + "535\n", + "Chapter 48. Inner Ear Disorders ............................................................................................................................................\n", + "542\n", + "Chapter 49. Middle Ear & Tympanic Membrane Disorders ........................................................................................\n", + "550\n", + "Chapter 50. External Ear Disorders .....................................................................................................................................\n", + "554\n", + "Chapter 51. Approach to the Patient With Nasal & Pharyngeal Symptoms .......................................................\n", + "567\n", + "Chapter 52. Oral & Pharyngeal Disorders .........................................................................................................................\n", + "578\n", + "Chapter 53. Nose & Paranasal Sinus Disorders .............................................................................................................\n", + "584\n", + "Chapter 54. Laryngeal Disorders ...........................................................................................................................................\n", + "590\n", + "Chapter 55. Tumors of the Head & Neck ...........................................................................................................................\n", + "600\n", + "Chapter 56. Approach to Dental & Oral Symptoms .......................................................................................................\n", + "619\n", + "Chapter 57. Common Dental Disorders .............................................................................................................................\n", + "629\n", + "Chapter 58. Dental Emergencies ..........................................................................................................................................\n", + "635\n", + "Chapter 59. Temporomandibular Disorders ......................................................................................................................\n", + "641\n", + "6 - Eye Disorders ............................................................................................................................................................................\n", + "641\n", + "Chapter 60. Approach to the Ophthalmologic Patient ..................................................................................................\n", + "669\n", + "Chapter 61. Refractive Error ...................................................................................................................................................\n", + "674\n", + "Chapter 62. Eyelid & Lacrimal Disorders ...........................................................................................................................\n", + "680\n", + "Chapter 63. Conjunctival & Scleral Disorders .................................................................................................................\n", + "690\n", + "Chapter 64. Corneal Disorders ...............................................................................................................................................\n", + "703\n", + "Chapter 65. Glaucoma ...............................................................................................................................................................\n", + "710\n", + "Chapter 66. Cataract ...................................................................................................................................................................\n", + "713\n", + "Chapter 67. Uveitis ......................................................................................................................................................................\n", + "719\n", + "Chapter 68. Retinal Disorders .................................................................................................................................................\n", + "731\n", + "Chapter 69. Optic Nerve Disorders ......................................................................................................................................\n", + "737\n", + "Chapter 70. Orbital Diseases ..................................................................................................................................................\n", + "742\n", + "7 - Dermatologic Disorders ....................................................................................................................................................\n", + "742\n", + "Chapter 71. Approach to the Dermatologic Patient .......................................................................................................\n", + "755\n", + "Chapter 72. Principles of Topical Dermatologic Therapy ............................................................................................\n", + "760\n", + "Chapter 73. Acne & Related Disorders ...............................................................................................................................\n", + "766\n", + "Chapter 74. Bullous Diseases .................................................................................................................................................\n", + "771\n", + "Chapter 75. Cornification Disorders .....................................................................................................................................\n", + "775\n", + "Chapter 76. Dermatitis ...............................................................................................................................................................\n", + "786\n", + "Chapter 77. Reactions to Sunlight ........................................................................................................................................\n", + "791\n", + "Chapter 78. Psoriasis & Scaling Diseases ........................................................................................................................\n", + "799\n", + "Chapter 79. Hypersensitivity & Inflammatory Disorders .............................................................................................\n", + "808\n", + "Chapter 80. Sweating Disorders ............................................................................................................................................\n", + "811\n", + "Chapter 81. Bacterial Skin Infections ...................................................................................................................................\n", + "822\n", + "Chapter 82. Fungal Skin Infections ......................................................................................................................................\n", + "831\n", + "Chapter 83. Parasitic Skin Infections ...................................................................................................................................\n", + "836\n", + "Chapter 84. Viral Skin Diseases ............................................................................................................................................\n", + "841\n", + "Chapter 85. Pigmentation Disorders ....................................................................................................................................\n", + "846\n", + "Chapter 86. Hair Disorders .......................................................................................................................................................\n", + "855\n", + "Chapter 87. Nail Disorders .......................................................................................................................................................\n", + "861\n", + "Chapter 88. Pressure Ulcers ...................................................................................................................................................\n", + "867\n", + "Chapter 89. Benign Tumors .....................................................................................................................................................\n", + "874\n", + "Chapter 90. Cancers of the Skin ............................................................................................................................................\n", + "882\n", + "8 - Endocrine & Metabolic Disorders .............................................................................................................................\n", + "882\n", + "Chapter 91. Principles of Endocrinology ............................................................................................................................\n", + "887\n", + "Chapter 92. Pituitary Disorders ..............................................................................................................................................\n", + "901\n", + "Chapter 93. Thyroid Disorders ................................................................................................................................................\n", + "biplobsinha25@gmail.com\n", + "9X5AUD3EIR\n", + "This file is meant for personal use by biplobsinha25@gmail.com only.\n", + "Sharing or publishing the contents in part or full is liable for legal action.\n", + "Page Number : 5\n", + "921\n", + "Chapter 94. Adrenal Disorders ................................................................................................................................................\n", + "936\n", + "Chapter 95. Polyglandular Deficiency Syndromes ........................................................................................................\n", + "939\n", + "Chapter 96. Porphyrias ..............................................................................................................................................................\n", + "949\n", + "Chapter 97. Fluid & Electrolyte Metabolism .....................................................................................................................\n", + "987\n", + "Chapter 98. Acid-Base Regulation & Disorders ..............................................................................................................\n", + "1001\n", + "Chapter 99. Diabetes Mellitus & Disorders of Carbohydrate Metabolism ........................................................\n", + "1024\n", + "Chapter 100. Lipid Disorders ................................................................................................................................................\n", + "1034\n", + "Chapter 101. Amyloidosis ......................................................................................................................................................\n", + "1037\n", + "Chapter 102. Carcinoid Tumors ..........................................................................................................................................\n", + "1040\n", + "Chapter 103. Multiple Endocrine Neoplasia Syndromes .........................................................................................\n", + "1046\n", + "9 - Hematology & Oncology ...............................................................................................................................................\n", + "1046\n", + "Chapter 104. Approach to the Patient With Anemia ..................................................................................................\n", + "1050\n", + "Chapter 105. Anemias Caused by Deficient Erythropoiesis ...................................................................................\n", + "1061\n", + "Chapter 106. Anemias Caused by Hemolysis ...............................................................................................................\n", + "1078\n", + "Chapter 107. Neutropenia & Lymphocytopenia ...........................................................................................................\n", + "1086\n", + "Chapter 108. Thrombocytopenia & Platelet Dysfunction .........................................................................................\n", + "1097\n", + "Chapter 109. Hemostasis ......................................................................................................................................................\n", + "1104\n", + "Chapter 110. Thrombotic Disorders ...................................................................................................................................\n", + "1107\n", + "Chapter 111. Coagulation Disorders ..................................................................................................................................\n", + "1113\n", + "Chapter 112. Bleeding Due to Abnormal Blood Vessels ...........................................................................................\n", + "1116\n", + "Chapter 113. Spleen Disorders ............................................................................................................................................\n", + "1120\n", + "Chapter 114. Eosinophilic Disorders .................................................................................................................................\n", + "1126\n", + "Chapter 115. Histiocytic Syndromes .................................................................................................................................\n", + "1131\n", + "Chapter 116. Myeloproliferative Disorders .....................................................................................................................\n", + "1141\n", + "Chapter 117. Leukemias .........................................................................................................................................................\n", + "1154\n", + "Chapter 118. Lymphomas ......................................................................................................................................................\n", + "1164\n", + "Chapter 119. Plasma Cell Disorders .................................................................................................................................\n", + "1172\n", + "Chapter 120. Iron Overload ...................................................................................................................................................\n", + "1177\n", + "Chapter 121. Transfusion Medicine ...................................................................................................................................\n", + "1186\n", + "Chapter 122. Overview of Cancer ......................................................................................................................................\n", + "1198\n", + "Chapter 123. Tumor Immunology .......................................................................................................................................\n", + "1204\n", + "Chapter 124. Principles of Cancer Therapy ...................................................................................................................\n", + "1215\n", + "10 - Immunology; Allergic Disorders ...........................................................................................................................\n", + "1215\n", + "Chapter 125. Biology of the Immune System ...............................................................................................................\n", + "1227\n", + "Chapter 126. Immunodeficiency Disorders ....................................................................................................................\n", + "1243\n", + "Chapter 127. Allergic & Other Hypersensitivity Disorders .......................................................................................\n", + "1263\n", + "Chapter 128. Transplantation ...............................................................................................................................................\n", + "1281\n", + "11 - Infectious Diseases ........................................................................................................................................................\n", + "1281\n", + "Chapter 129. Biology of Infectious Disease ...................................................................................................................\n", + "1300\n", + "Chapter 130. Laboratory Diagnosis of Infectious Disease ......................................................................................\n", + "1306\n", + "Chapter 131. Immunization ...................................................................................................................................................\n", + "1313\n", + "Chapter 132. Bacteria & Antibacterial Drugs .................................................................................................................\n", + "1353\n", + "Chapter 133. Gram-Positive Cocci ....................................................................................................................................\n", + "1366\n", + "Chapter 134. Gram-Positive Bacilli ...................................................................................................................................\n", + "1376\n", + "Chapter 135. Gram-Negative Bacilli .................................................................................................................................\n", + "1405\n", + "Chapter 136. Spirochetes ......................................................................................................................................................\n", + "1413\n", + "Chapter 137. Neisseriaceae .................................................................................................................................................\n", + "1419\n", + "Chapter 138. Chlamydia & Mycoplasmas ......................................................................................................................\n", + "1421\n", + "Chapter 139. Rickettsiae & Related Organisms ..........................................................................................................\n", + "1431\n", + "Chapter 140. Anaerobic Bacteria ........................................................................................................................................\n", + "1450\n", + "Chapter 141. Mycobacteria ...................................................................................................................................................\n", + "1470\n", + "Chapter 142. Fungi ...................................................................................................................................................................\n", + "1493\n", + "Chapter 143. Approach to Parasitic Infections .............................................................................................................\n", + "1496\n", + "Chapter 144. Nematodes (Roundworms) .......................................................................................................................\n", + "biplobsinha25@gmail.com\n", + "9X5AUD3EIR\n", + "This file is meant for personal use by biplobsinha25@gmail.com only.\n", + "Sharing or publishing the contents in part or full is liable for legal action.\n" + ] + } + ], + "source": [ + "for i in range(5):\n", + " print(f\"Page Number : {i+1}\",end=\"\\n\")\n", + " print(manual[i].page_content,end=\"\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LECMxTH-zB-R" + }, + "source": [ + "### Data Chunking" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oQfw-qErRoGr" + }, + "source": [ + "#### Chunk the PDF into Manageable Text Sections Using a Token-Based Splitter" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "id": "uG0_pBmizGGt" + }, + "outputs": [], + "source": [ + "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n", + " encoding_name='cl100k_base',\n", + " chunk_size= 500, #Complete the code to define the chunk size\n", + " chunk_overlap= 50 #Complete the code to define the chunk overlap\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "P7XqisNKR3DZ" + }, + "source": [ + "#### Split the Loaded PDF into Chunks for Further Processing" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "id": "w76ji7ECzLLQ" + }, + "outputs": [], + "source": [ + "document_chunks = pdf_loader.load_and_split(text_splitter)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gkkBb1GmSDTp" + }, + "source": [ + "#### Check the Number of Chunks Created" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "id": "i6TQ-mmLzR9I", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "5d1c5923-9386-4e73-f127-c423dd4e72f4" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "8875" + ] + }, + "metadata": {}, + "execution_count": 41 + } + ], + "source": [ + "len(document_chunks)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yKnhARSu0d8u" + }, + "source": [ + "### Generate Vector Embeddings for Text Chunks Using OpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": { + "collapsed": true, + "id": "c6cZVZWQz15c", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d943ba50-86c9-470e-f500-1135c1cfec96" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Dimension of the embedding vector 1536\n" + ] + } + ], + "source": [ + "# Initialize the OpenAI Embeddings model with API credentials\n", + "embedding_model = OpenAIEmbeddings(\n", + " openai_api_key=OPENAI_API_KEY, # Your OpenAI API key for authentication\n", + " openai_api_base=OPENAI_API_BASE # The OpenAI API base URL endpoint\n", + ")\n", + "\n", + "# Generate embeddings (vector representations) for the first two document chunks\n", + "embedding_1 = embedding_model.embed_query(document_chunks[0].page_content) # Embedding for chunk 0\n", + "embedding_2 = embedding_model.embed_query(document_chunks[1].page_content) # Embedding for chunk 1\n", + "\n", + "# Check and print the dimension (length) of the embedding vector\n", + "print(\"Dimension of the embedding vector \", len(embedding_1)) # Typically 1536 or 2048 depending on model" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "collapsed": true, + "id": "W0qy6xOZ0UBe", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "aebee7db-6b2c-485b-8a47-44cd83a1409a" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "([-0.013844738714396954,\n", + " 0.015309492126107216,\n", + " 0.008478669449687004,\n", + " -0.020055856555700302,\n", + " -0.012936308979988098,\n", + " 0.016126373782753944,\n", + " -0.027478214353322983,\n", + " -0.03104150854051113,\n", + " -0.020407961681485176,\n", + " -0.02126709558069706,\n", + " 0.04222434014081955,\n", + " 0.019605163484811783,\n", + " -0.00669702235609293,\n", + " 0.019520658999681473,\n", + " 0.0016249610343948007,\n", + " 0.017971400171518326,\n", + " 0.024281106889247894,\n", + " 0.014830630272626877,\n", + " 0.012971519492566586,\n", + " 0.004387218505144119,\n", + " -0.01829533651471138,\n", + " 0.03095700405538082,\n", + " -0.02168961986899376,\n", + " 0.01354897115379572,\n", + " -0.022633260115981102,\n", + " -0.023943087086081505,\n", + " 0.028985220938920975,\n", + " -0.034759730100631714,\n", + " -0.024731801822781563,\n", + " -0.01809815689921379,\n", + " 0.01074622105807066,\n", + " -0.010196938179433346,\n", + " -0.011823659762740135,\n", + " -0.019576994702219963,\n", + " -0.008013891987502575,\n", + " 0.00029576756060123444,\n", + " 0.02042204514145851,\n", + " -0.004249898251146078,\n", + " 0.02191496640443802,\n", + " 0.01597144827246666,\n", + " 0.006816737819463015,\n", + " -0.0028714099898934364,\n", + " -0.009703992865979671,\n", + " -0.004538623616099358,\n", + " -0.009422308765351772,\n", + " 0.006070276722311974,\n", + " -0.011549018323421478,\n", + " -0.022464249283075333,\n", + " -0.0013133487664163113,\n", + " -0.013710938394069672,\n", + " 0.02854861132800579,\n", + " 0.013344750739634037,\n", + " -0.02409801259636879,\n", + " -0.004408345092087984,\n", + " -0.004207645542919636,\n", + " -0.01166873425245285,\n", + " 0.007767419330775738,\n", + " -0.008591343648731709,\n", + " -0.0004022790817543864,\n", + " 0.004010467324405909,\n", + " 0.0036900523118674755,\n", + " -0.003144290763884783,\n", + " -0.01650664582848549,\n", + " -0.01599961705505848,\n", + " 0.0034206926357001066,\n", + " -0.018562935292720795,\n", + " -0.014957387931644917,\n", + " -0.00935893040150404,\n", + " -0.01028144359588623,\n", + " 0.010767347179353237,\n", + " 0.029238734394311905,\n", + " 0.012225058861076832,\n", + " 0.009443435817956924,\n", + " -0.010140601545572281,\n", + " 0.0033502718433737755,\n", + " -0.007739251013845205,\n", + " -0.005837887991219759,\n", + " 0.014929219149053097,\n", + " 0.0041653928346931934,\n", + " -0.01040820125490427,\n", + " 0.022717764601111412,\n", + " -0.02030937187373638,\n", + " 0.005633667577058077,\n", + " -0.0023115642834454775,\n", + " 0.018534766510128975,\n", + " 0.0007451405981555581,\n", + " 0.0020228386856615543,\n", + " 0.011661692522466183,\n", + " -0.012985603883862495,\n", + " -0.011872954666614532,\n", + " 0.003774557262659073,\n", + " 0.014520778320729733,\n", + " 0.022069893777370453,\n", + " 0.02294311113655567,\n", + " -0.013457424007356167,\n", + " 0.028041580691933632,\n", + " -0.0128165939822793,\n", + " 0.03740755468606949,\n", + " 0.002249946119263768,\n", + " -0.009992717765271664,\n", + " 0.0020545281004160643,\n", + " 0.00784488208591938,\n", + " -0.00799980852752924,\n", + " -0.012288437224924564,\n", + " -0.03298512473702431,\n", + " -0.012950393371284008,\n", + " 0.010387075133621693,\n", + " -0.017098180949687958,\n", + " -0.0005426806164905429,\n", + " 0.02143610641360283,\n", + " -0.0042745452374219894,\n", + " 0.022436082363128662,\n", + " 0.017168601974844933,\n", + " -0.05797044187784195,\n", + " 0.01829533651471138,\n", + " 0.006785048637539148,\n", + " 0.018309419974684715,\n", + " -0.01340812910348177,\n", + " -0.028590863570570946,\n", + " -0.020238950848579407,\n", + " -0.003992862068116665,\n", + " 0.0064047761261463165,\n", + " -0.003897793823853135,\n", + " 0.009767371229827404,\n", + " 0.016239047050476074,\n", + " 0.017253106459975243,\n", + " -0.025675440207123756,\n", + " -0.030280964449048042,\n", + " -0.0017658027354627848,\n", + " -0.0043836976401507854,\n", + " 0.0050139641389250755,\n", + " 0.02656274288892746,\n", + " 0.007045605685561895,\n", + " 0.01602778397500515,\n", + " -0.03273160755634308,\n", + " 0.025013484060764313,\n", + " -0.017041845247149467,\n", + " -0.027379624545574188,\n", + " -0.033576659858226776,\n", + " -0.020745981484651566,\n", + " 0.017126349732279778,\n", + " 0.04084409028291702,\n", + " -0.012668710201978683,\n", + " -0.006753358989953995,\n", + " 0.004760449286550283,\n", + " 0.007851924747228622,\n", + " 0.01026735920459032,\n", + " 0.0158869419246912,\n", + " 0.015732016414403915,\n", + " -0.009443435817956924,\n", + " -0.018055904656648636,\n", + " 0.005218184553086758,\n", + " 0.02387266606092453,\n", + " 0.005334379151463509,\n", + " 0.004228771664202213,\n", + " 0.020084025338292122,\n", + " 0.007302641868591309,\n", + " 0.030703488737344742,\n", + " -0.008584300987422466,\n", + " -0.020605139434337616,\n", + " -0.015393996611237526,\n", + " 0.004795659799128771,\n", + " 0.0022728329058736563,\n", + " -0.0004207645542919636,\n", + " 0.027393709868192673,\n", + " 0.034337203949689865,\n", + " -0.004668902140110731,\n", + " -0.0046372124925255775,\n", + " -0.006042108405381441,\n", + " -0.006126613821834326,\n", + " 0.002014036290347576,\n", + " 0.020295288413763046,\n", + " -0.017746053636074066,\n", + " -0.016675656661391258,\n", + " -0.015309492126107216,\n", + " 0.019478406757116318,\n", + " 0.022731849923729897,\n", + " 0.002410153392702341,\n", + " -0.025266999378800392,\n", + " -0.015084145590662956,\n", + " -0.015929196029901505,\n", + " 0.024703633040189743,\n", + " 0.028675368055701256,\n", + " 0.030506310984492302,\n", + " -0.009204004891216755,\n", + " 0.002056288765743375,\n", + " 0.0263373963534832,\n", + " -0.01888687163591385,\n", + " -0.0036759681534022093,\n", + " -0.028928883373737335,\n", + " 0.019478406757116318,\n", + " 0.010344821959733963,\n", + " 0.010344821959733963,\n", + " -0.011295503936707973,\n", + " -0.6417874097824097,\n", + " -0.024788137525320053,\n", + " 0.011063114739954472,\n", + " -0.033266808837652206,\n", + " 0.00889415293931961,\n", + " -0.0067216698080301285,\n", + " -0.004193561151623726,\n", + " 0.011633523739874363,\n", + " -0.004950585309416056,\n", + " 0.02109808474779129,\n", + " -0.023069869726896286,\n", + " -0.006306186784058809,\n", + " 0.005214663688093424,\n", + " -0.017084097489714622,\n", + " 0.0069822268560528755,\n", + " -0.021478358656167984,\n", + " -0.013457424007356167,\n", + " -0.02302761748433113,\n", + " -0.013492634519934654,\n", + " 0.0021531174425035715,\n", + " -0.019407985731959343,\n", + " 0.008739227429032326,\n", + " -0.0012895817635580897,\n", + " 0.020745981484651566,\n", + " 0.00387314660474658,\n", + " -0.009316678158938885,\n", + " 0.04830870032310486,\n", + " -0.015872858464717865,\n", + " 0.012689836323261261,\n", + " 0.005464657675474882,\n", + " -0.015281323343515396,\n", + " 0.025238830596208572,\n", + " 0.014140506274998188,\n", + " -0.018619270995259285,\n", + " 0.04326656833291054,\n", + " -0.013584180735051632,\n", + " -0.009880044497549534,\n", + " 0.028478190302848816,\n", + " 0.027421876788139343,\n", + " 0.029745765030384064,\n", + " -0.015267238952219486,\n", + " -0.029830269515514374,\n", + " 0.010577211156487465,\n", + " 0.0021795250941067934,\n", + " 0.0005677680601365864,\n", + " 0.011619439348578453,\n", + " 0.029661260545253754,\n", + " 0.003077391069382429,\n", + " -0.007725166622549295,\n", + " -0.00768291437998414,\n", + " 0.0007953154272399843,\n", + " 0.0070984214544296265,\n", + " 0.024928979575634003,\n", + " -0.018253082409501076,\n", + " 0.019830510020256042,\n", + " 0.007080816198140383,\n", + " 0.026295144110918045,\n", + " -0.013133487664163113,\n", + " 0.014675704762339592,\n", + " 0.006133655551820993,\n", + " -0.01619679480791092,\n", + " -0.012971519492566586,\n", + " -0.007527988404035568,\n", + " -0.04622424393892288,\n", + " -0.009063162840902805,\n", + " -0.006707585416734219,\n", + " -0.027999328449368477,\n", + " -0.02123892679810524,\n", + " 0.026182470843195915,\n", + " -0.02064739167690277,\n", + " -0.0001938773930305615,\n", + " 0.0035333659034222364,\n", + " -0.015408081002533436,\n", + " -0.022407913580536842,\n", + " 0.0006839624838903546,\n", + " 0.04213983565568924,\n", + " -0.005844930186867714,\n", + " -0.038506120443344116,\n", + " -0.0017746053636074066,\n", + " 0.009823707863688469,\n", + " -0.0015369349857792258,\n", + " -0.010718053206801414,\n", + " -0.00416187196969986,\n", + " -0.01420388463884592,\n", + " 0.027942990884184837,\n", + " 0.006806174758821726,\n", + " -0.005690004210919142,\n", + " -0.014436273835599422,\n", + " -0.006094924174249172,\n", + " -0.004665381275117397,\n", + " 0.01585877500474453,\n", + " 0.004668902140110731,\n", + " 0.008140649646520615,\n", + " -0.025506431236863136,\n", + " -0.00640829699113965,\n", + " -0.007739251013845205,\n", + " -0.015901027247309685,\n", + " -0.0002532949729356915,\n", + " -0.001292222528718412,\n", + " 0.0007099301437847316,\n", + " -0.005978730041533709,\n", + " -0.0005501628620550036,\n", + " -0.01314052939414978,\n", + " 0.012753215618431568,\n", + " 0.030365468934178352,\n", + " 0.0038097677752375603,\n", + " -0.01284476276487112,\n", + " 0.010133559815585613,\n", + " 0.016605235636234283,\n", + " -0.023661404848098755,\n", + " -0.008851900696754456,\n", + " -0.01950657367706299,\n", + " 0.016816498711705208,\n", + " 0.00292070466093719,\n", + " 0.0027816235087811947,\n", + " -0.033407650887966156,\n", + " 0.01806998997926712,\n", + " 0.0023907877039164305,\n", + " 0.0036900523118674755,\n", + " -0.02443603426218033,\n", + " 0.008316702209413052,\n", + " 0.004031593445688486,\n", + " -0.02468954771757126,\n", + " 0.009422308765351772,\n", + " 0.002149596344679594,\n", + " 0.017267191782593727,\n", + " 0.019013628363609314,\n", + " -0.02636556513607502,\n", + " -0.00587661936879158,\n", + " -0.01168281864374876,\n", + " 0.015323576517403126,\n", + " -0.007387146819382906,\n", + " 0.018365757539868355,\n", + " -0.01605595275759697,\n", + " 0.027576804161071777,\n", + " -0.010950441472232342,\n", + " 0.021534694358706474,\n", + " -0.0024383217096328735,\n", + " 0.02690076269209385,\n", + " -0.014999640174210072,\n", + " -0.030337300151586533,\n", + " -0.008232196792960167,\n", + " 0.01611229032278061,\n", + " -0.016210878267884254,\n", + " 0.0058660563081502914,\n", + " -0.035097748041152954,\n", + " -0.01597144827246666,\n", + " -0.00387314660474658,\n", + " 0.000913710449822247,\n", + " 0.0187460295855999,\n", + " 0.0037992047145962715,\n", + " -0.001453310251235962,\n", + " 0.009992717765271664,\n", + " 0.02194313518702984,\n", + " 0.019605163484811783,\n", + " -0.028112001717090607,\n", + " 0.0002709002001211047,\n", + " -0.022830437868833542,\n", + " -0.0027992285322397947,\n", + " -0.003781599458307028,\n", + " 0.008619511500000954,\n", + " 0.017520707100629807,\n", + " -0.022224819287657738,\n", + " -0.0010325456969439983,\n", + " -0.019802341237664223,\n", + " -0.003749910043552518,\n", + " 0.01431655790656805,\n", + " 0.017436200752854347,\n", + " 0.0012138793244957924,\n", + " -0.026802174746990204,\n", + " 0.017717884853482246,\n", + " -0.013133487664163113,\n", + " -0.030027449131011963,\n", + " 0.004978753626346588,\n", + " -0.03628081828355789,\n", + " 0.00048810450243763626,\n", + " -0.004105535335838795,\n", + " 0.0013142289826646447,\n", + " -0.004690028261393309,\n", + " 0.001977065345272422,\n", + " -0.0009603642974980175,\n", + " -0.005880140699446201,\n", + " -0.01653481461107731,\n", + " -0.0006637164624407887,\n", + " 0.024252939969301224,\n", + " -0.003424213733524084,\n", + " -0.012027880176901817,\n", + " 0.02460504323244095,\n", + " -0.005580852273851633,\n", + " 0.004778054542839527,\n", + " -0.002329169539734721,\n", + " 0.0042745452374219894,\n", + " 0.0024682506918907166,\n", + " 0.016915086656808853,\n", + " 0.008239238522946835,\n", + " 0.006397733930498362,\n", + " -0.007696998305618763,\n", + " -0.01899954490363598,\n", + " 0.020013604313135147,\n", + " 0.03329497575759888,\n", + " 0.03599913790822029,\n", + " 0.0011390572180971503,\n", + " 0.012513784691691399,\n", + " -0.012563078664243221,\n", + " 0.028759874403476715,\n", + " -0.050308652222156525,\n", + " -0.009689908474683762,\n", + " -0.02826692722737789,\n", + " 0.03639349341392517,\n", + " 0.017323527485132217,\n", + " -0.013612349517643452,\n", + " -0.009168794378638268,\n", + " -0.00773220881819725,\n", + " -0.0063519603572785854,\n", + " -0.0028397205751389265,\n", + " 0.011746197007596493,\n", + " -0.003707657568156719,\n", + " 0.018464345484972,\n", + " -0.006915327161550522,\n", + " -0.016999593004584312,\n", + " -0.00819698628038168,\n", + " 0.02718244679272175,\n", + " -0.004313276614993811,\n", + " -0.01566159538924694,\n", + " 0.00799980852752924,\n", + " 0.017520707100629807,\n", + " 0.01168281864374876,\n", + " 0.01821083016693592,\n", + " -0.015253155492246151,\n", + " -0.0036583628971129656,\n", + " 0.01294335164129734,\n", + " 0.029520418494939804,\n", + " 0.020999496802687645,\n", + " -0.01594327948987484,\n", + " 0.0038942727260291576,\n", + " -0.012112385593354702,\n", + " 0.03104150854051113,\n", + " -0.006911805830895901,\n", + " 0.005285084713250399,\n", + " -0.014563030563294888,\n", + " -0.00514776399359107,\n", + " 0.03126685693860054,\n", + " 0.04166097193956375,\n", + " -0.013556012883782387,\n", + " -0.0010457495227456093,\n", + " 0.022844523191452026,\n", + " 0.013098277151584625,\n", + " 0.013133487664163113,\n", + " -0.001913686515763402,\n", + " 0.045379191637039185,\n", + " -0.020689643919467926,\n", + " 0.029351409524679184,\n", + " -0.012154637835919857,\n", + " 0.012027880176901817,\n", + " 0.018647439777851105,\n", + " -0.020055856555700302,\n", + " 0.006309707649052143,\n", + " 0.028957052156329155,\n", + " 0.016548898071050644,\n", + " 0.022224819287657738,\n", + " -0.017351696267724037,\n", + " 0.014372894540429115,\n", + " 0.0016311228973791003,\n", + " -0.008725143037736416,\n", + " 0.03847794979810715,\n", + " 0.00663364352658391,\n", + " 0.004675944335758686,\n", + " 0.005119595676660538,\n", + " -0.03095700405538082,\n", + " -0.013985579833388329,\n", + " -0.011830702424049377,\n", + " -0.015534838661551476,\n", + " 0.0020334019791334867,\n", + " 0.012774341739714146,\n", + " 0.00842233281582594,\n", + " -0.008401206694543362,\n", + " -0.004932980053126812,\n", + " 0.00601746141910553,\n", + " 0.025957124307751656,\n", + " 0.0203375406563282,\n", + " -0.02602754533290863,\n", + " -0.02039387635886669,\n", + " 0.011056073009967804,\n", + " 0.019717836752533913,\n", + " 0.01165464986115694,\n", + " 0.007577282842248678,\n", + " -0.0031478118617087603,\n", + " -0.021985387429594994,\n", + " -0.004580875858664513,\n", + " 0.01329545583575964,\n", + " -0.005950561258941889,\n", + " 0.024196602404117584,\n", + " 0.03847794979810715,\n", + " 0.00899274181574583,\n", + " -0.013450381346046925,\n", + " -0.0263373963534832,\n", + " -0.003925962373614311,\n", + " -0.01326728705316782,\n", + " 0.0070420848205685616,\n", + " 0.025464177131652832,\n", + " 0.007492777891457081,\n", + " -0.007207573391497135,\n", + " -0.014661620371043682,\n", + " 0.02673175372183323,\n", + " 0.03287244960665703,\n", + " 0.011999712325632572,\n", + " -0.008204028941690922,\n", + " -0.00548578379675746,\n", + " -0.00832374393939972,\n", + " -0.0067216698080301285,\n", + " 0.014133463613688946,\n", + " 0.00292070466093719,\n", + " -0.033492155373096466,\n", + " 0.0420834980905056,\n", + " 0.005915351212024689,\n", + " 0.005471699871122837,\n", + " -0.003947088494896889,\n", + " -0.034478046000003815,\n", + " 0.01431655790656805,\n", + " 0.007880092598497868,\n", + " -0.008922320790588856,\n", + " -0.004584397189319134,\n", + " -0.03135136142373085,\n", + " 0.016915086656808853,\n", + " 0.05737890675663948,\n", + " 0.025759944692254066,\n", + " 0.0021372726187109947,\n", + " 0.022731849923729897,\n", + " -0.004925938323140144,\n", + " 0.007549114525318146,\n", + " -0.010964525863528252,\n", + " -0.021788209676742554,\n", + " 0.013612349517643452,\n", + " -0.003925962373614311,\n", + " 0.01073213666677475,\n", + " 0.003100277855992317,\n", + " -0.007753334939479828,\n", + " 0.0062357657589018345,\n", + " -0.006841385271400213,\n", + " -0.0023362115025520325,\n", + " 0.013436296954751015,\n", + " -0.02098541148006916,\n", + " 0.006024503149092197,\n", + " -0.014858798123896122,\n", + " -0.0062357657589018345,\n", + " 0.006894201040267944,\n", + " -0.0012675751931965351,\n", + " 0.009978634305298328,\n", + " 0.022450165823101997,\n", + " 0.00926034152507782,\n", + " 0.03298512473702431,\n", + " 0.010823683813214302,\n", + " -0.002663668477907777,\n", + " -0.02030937187373638,\n", + " -0.00924625713378191,\n", + " 0.002841481240466237,\n", + " 0.005989293102174997,\n", + " 0.006225202698260546,\n", + " -0.02092907577753067,\n", + " 0.004323840141296387,\n", + " 0.0011997951660305262,\n", + " 0.00023745029466226697,\n", + " -0.01314052939414978,\n", + " 0.010119475424289703,\n", + " 0.01821083016693592,\n", + " 0.0029717597644776106,\n", + " 0.009591319598257542,\n", + " 0.0038308941293507814,\n", + " 0.005351984407752752,\n", + " -0.02381633035838604,\n", + " -0.012675751931965351,\n", + " 0.0005017485236749053,\n", + " -0.00045509470510296524,\n", + " -0.0214642733335495,\n", + " 0.039548348635435104,\n", + " 0.013936285860836506,\n", + " 0.034309037029743195,\n", + " -0.013119403272867203,\n", + " 0.004221729934215546,\n", + " 0.006013940088450909,\n", + " -0.003300977172330022,\n", + " 0.026943014934659004,\n", + " -0.013802485540509224,\n", + " -0.055716972798109055,\n", + " -0.015239071100950241,\n", + " -0.012365900911390781,\n", + " 0.018562935292720795,\n", + " -0.0061935135163366795,\n", + " -0.013774317689239979,\n", + " -0.012767299078404903,\n", + " -0.017717884853482246,\n", + " -0.002181285759434104,\n", + " -0.017365779727697372,\n", + " -0.0004198842798359692,\n", + " -0.011225082911550999,\n", + " -0.02715427801012993,\n", + " -0.023971255868673325,\n", + " 0.020858654752373695,\n", + " 0.01647847704589367,\n", + " 0.02319662645459175,\n", + " 0.002554516075178981,\n", + " -0.007478693965822458,\n", + " 0.002369661582633853,\n", + " 0.014816545881330967,\n", + " 0.011027904227375984,\n", + " -0.03799908980727196,\n", + " 0.03619631379842758,\n", + " -0.022492418065667152,\n", + " 0.007471651770174503,\n", + " 0.025590935721993446,\n", + " 0.004457639530301094,\n", + " -0.0008199627045542002,\n", + " -0.01854884997010231,\n", + " 0.029464082792401314,\n", + " 0.013013772666454315,\n", + " -0.005376631394028664,\n", + " 0.005042132455855608,\n", + " -0.007894176989793777,\n", + " -0.010225106962025166,\n", + " 0.008408249355852604,\n", + " 0.028816210106015205,\n", + " 0.013901075348258018,\n", + " -0.004313276614993811,\n", + " 0.0009498011786490679,\n", + " -0.008858942426741123,\n", + " -0.008443459868431091,\n", + " 0.00584140932187438,\n", + " -0.009880044497549534,\n", + " 0.011436345055699348,\n", + " 0.006851948332041502,\n", + " 0.003954130690544844,\n", + " 0.03647799789905548,\n", + " -0.034647054970264435,\n", + " -0.001811576308682561,\n", + " -0.004411865957081318,\n", + " 0.003538647433742881,\n", + " 0.021647367626428604,\n", + " 0.0035949843004345894,\n", + " -0.014802461490035057,\n", + " 0.00608084024861455,\n", + " 0.007542072795331478,\n", + " 0.005982250906527042,\n", + " 0.0034118900075554848,\n", + " 0.011112409643828869,\n", + " -0.03616814687848091,\n", + " -0.039323002099990845,\n", + " 0.013964453712105751,\n", + " 0.021816378459334373,\n", + " -0.03321047127246857,\n", + " 0.005753383040428162,\n", + " 0.00830965954810381,\n", + " -0.0039506093598902225,\n", + " -0.0003642078081611544,\n", + " 0.005654794164001942,\n", + " 0.004380176775157452,\n", + " 0.003718220628798008,\n", + " -0.03557661175727844,\n", + " -0.013323623687028885,\n", + " -0.03653433546423912,\n", + " -0.0203375406563282,\n", + " -0.019900931045413017,\n", + " 0.019069965928792953,\n", + " -0.0028608469292521477,\n", + " -0.013746148906648159,\n", + " -0.009196962229907513,\n", + " -0.028985220938920975,\n", + " -0.004200603347271681,\n", + " -0.004447076469659805,\n", + " 0.014732041396200657,\n", + " -0.024450117722153664,\n", + " -0.01560525968670845,\n", + " 0.01792914792895317,\n", + " -0.005524515174329281,\n", + " 0.034449879080057144,\n", + " 0.003883709665387869,\n", + " 0.002425998216494918,\n", + " -5.580714514508145e-06,\n", + " 0.0023432536981999874,\n", + " -0.015267238952219486,\n", + " -0.025097990408539772,\n", + " -0.006288581527769566,\n", + " -0.007739251013845205,\n", + " 0.05242127925157547,\n", + " 0.0207600649446249,\n", + " 0.02342197299003601,\n", + " -0.006837863940745592,\n", + " 0.0038555413484573364,\n", + " 0.02798524498939514,\n", + " 0.0052815633825957775,\n", + " 0.011746197007596493,\n", + " -0.011020862497389317,\n", + " 0.011732112616300583,\n", + " -0.044815827161073685,\n", + " 0.012239143252372742,\n", + " 0.02625289186835289,\n", + " 0.019562911242246628,\n", + " 0.024619128555059433,\n", + " -0.00041680337744764984,\n", + " 0.006045629736036062,\n", + " 0.022309323772788048,\n", + " -0.014760209247469902,\n", + " -0.01580243743956089,\n", + " -0.0020545281004160643,\n", + " -0.02022486738860607,\n", + " -0.019802341237664223,\n", + " 0.0006214639870449901,\n", + " -0.01036594808101654,\n", + " 0.0023450141306966543,\n", + " -0.034224532544612885,\n", + " -0.022098060697317123,\n", + " 0.02056288719177246,\n", + " 0.012070133350789547,\n", + " 0.03352032229304314,\n", + " 0.013077151030302048,\n", + " 0.01585877500474453,\n", + " -0.029295071959495544,\n", + " 0.016830582171678543,\n", + " 0.012640541419386864,\n", + " -0.007003352977335453,\n", + " -0.00013258925173431635,\n", + " 0.014182758517563343,\n", + " 0.001666333293542266,\n", + " 0.0008674967684783041,\n", + " 0.03309779614210129,\n", + " 0.009711034595966339,\n", + " -0.022745933383703232,\n", + " -0.015957362949848175,\n", + " 0.0013432776322588325,\n", + " -0.020436128601431847,\n", + " 0.012520826421678066,\n", + " -0.0050703007727861404,\n", + " -0.019210806116461754,\n", + " -0.019295312464237213,\n", + " -0.01622496359050274,\n", + " -0.01809815689921379,\n", + " -0.007003352977335453,\n", + " -0.027027521282434464,\n", + " -0.028675368055701256,\n", + " 0.00403511431068182,\n", + " 0.015027808956801891,\n", + " -0.034337203949689865,\n", + " 0.015520754270255566,\n", + " -0.023295216262340546,\n", + " 0.003362595336511731,\n", + " -0.020886823534965515,\n", + " -0.006971663795411587,\n", + " 0.03864695876836777,\n", + " 0.015154565684497356,\n", + " 0.028126085177063942,\n", + " 0.034168194979429245,\n", + " -0.01849251426756382,\n", + " -0.026323312893509865,\n", + " 0.029464082792401314,\n", + " -0.015872858464717865,\n", + " -0.0066125174053013325,\n", + " 0.02078823372721672,\n", + " 0.0355202741920948,\n", + " -0.015309492126107216,\n", + " -0.026295144110918045,\n", + " 0.015408081002533436,\n", + " 0.0055421204306185246,\n", + " -0.005781551357358694,\n", + " 0.009112457744777203,\n", + " 0.007077294867485762,\n", + " -0.0023027616553008556,\n", + " 0.017309444025158882,\n", + " -0.013302497565746307,\n", + " -0.015140482224524021,\n", + " -0.010323695838451385,\n", + " 0.015675680711865425,\n", + " -0.014647535979747772,\n", + " 0.014929219149053097,\n", + " -0.044646818190813065,\n", + " -0.0011945136357098818,\n", + " -0.014886966906487942,\n", + " 0.03230908513069153,\n", + " -0.008556133136153221,\n", + " 0.016675656661391258,\n", + " 0.0023432536981999874,\n", + " -0.0018996023572981358,\n", + " -0.006443507503718138,\n", + " -0.0054400102235376835,\n", + " -0.015224986709654331,\n", + " -0.008513879962265491,\n", + " -0.013725022785365582,\n", + " 0.02718244679272175,\n", + " -0.009971591643989086,\n", + " 0.009140625596046448,\n", + " -0.0017270712414756417,\n", + " 0.012654625810682774,\n", + " -0.0014665140770375729,\n", + " -0.004401302896440029,\n", + " -0.01656298339366913,\n", + " 0.020802317187190056,\n", + " -0.008535006083548069,\n", + " 0.004591439384967089,\n", + " 0.0006082600448280573,\n", + " -0.002038683509454131,\n", + " 0.033914677798748016,\n", + " 0.011013820767402649,\n", + " -0.008013891987502575,\n", + " -0.01636580377817154,\n", + " -0.00957019254565239,\n", + " 0.0004016188904643059,\n", + " 0.014732041396200657,\n", + " 0.007929387502372265,\n", + " 0.024534622207283974,\n", + " -0.022351576015353203,\n", + " -0.017140433192253113,\n", + " -0.006901242770254612,\n", + " -0.003982299007475376,\n", + " -0.024027593433856964,\n", + " 0.016239047050476074,\n", + " 0.0025263477582484484,\n", + " 0.01353488676249981,\n", + " 0.012774341739714146,\n", + " -0.024844475090503693,\n", + " 0.017210854217410088,\n", + " 0.017295360565185547,\n", + " 0.01280250959098339,\n", + " 0.01073213666677475,\n", + " 0.03081616200506687,\n", + " -0.020745981484651566,\n", + " 0.004908333066850901,\n", + " 0.010274401865899563,\n", + " 0.019591080024838448,\n", + " -0.03368933126330376,\n", + " 0.02806974947452545,\n", + " -0.0006320271058939397,\n", + " -0.02692893147468567,\n", + " -0.001562462537549436,\n", + " -0.02030937187373638,\n", + " -0.020689643919467926,\n", + " -0.02191496640443802,\n", + " 0.0210417490452528,\n", + " 0.01633763685822487,\n", + " 0.008668806403875351,\n", + " -0.024943063035607338,\n", + " 0.025844451040029526,\n", + " 0.0028009891975671053,\n", + " 0.024858558550477028,\n", + " -0.009429351426661015,\n", + " -0.038815971463918686,\n", + " 3.7438581784954295e-05,\n", + " -0.015309492126107216,\n", + " 0.014732041396200657,\n", + " 0.004130182787775993,\n", + " -0.007253346964716911,\n", + " 0.016379889100790024,\n", + " -0.022323409095406532,\n", + " 0.011330714449286461,\n", + " 0.008499796502292156,\n", + " -0.027196530252695084,\n", + " -0.033661164343357086,\n", + " 0.011950417421758175,\n", + " 0.006341397296637297,\n", + " -0.006549138575792313,\n", + " 2.7109274014947005e-05,\n", + " 0.0018221393693238497,\n", + " 0.002102062338963151,\n", + " -0.00924625713378191,\n", + " 0.021365685388445854,\n", + " -0.018957292661070824,\n", + " -0.0005765706882812083,\n", + " -0.005647751968353987,\n", + " 0.00472875963896513,\n", + " 0.0022657907102257013,\n", + " 0.018534766510128975,\n", + " -0.0018467867048457265,\n", + " 0.013048983179032803,\n", + " -0.009992717765271664,\n", + " -0.01843617670238018,\n", + " -0.045379191637039185,\n", + " -0.01747845485806465,\n", + " -0.005654794164001942,\n", + " 0.02815425395965576,\n", + " -0.004323840141296387,\n", + " -0.021478358656167984,\n", + " -0.03090066649019718,\n", + " -0.034027352929115295,\n", + " -0.03247809410095215,\n", + " -0.00900682620704174,\n", + " -0.022802269086241722,\n", + " 0.008436417207121849,\n", + " 0.021224843338131905,\n", + " 0.01561934407800436,\n", + " -0.00657026469707489,\n", + " 0.0012499700533226132,\n", + " 0.012922225520014763,\n", + " 0.025464177131652832,\n", + " -0.013323623687028885,\n", + " -0.009647656232118607,\n", + " 0.019675584509968758,\n", + " -0.03284428268671036,\n", + " -0.007429399061948061,\n", + " 0.007119547575712204,\n", + " -0.04715379700064659,\n", + " -0.016408057883381844,\n", + " 0.011281419545412064,\n", + " 0.0038344149943441153,\n", + " 0.01064763218164444,\n", + " 0.024619128555059433,\n", + " -0.023605067282915115,\n", + " -0.0031161224469542503,\n", + " 0.000817321939393878,\n", + " 0.0036337156780064106,\n", + " 0.020914990454912186,\n", + " 0.0256331879645586,\n", + " 0.011739155277609825,\n", + " -0.022027641534805298,\n", + " -0.010964525863528252,\n", + " -0.005749862175434828,\n", + " 0.0008186423219740391,\n", + " -0.014985555782914162,\n", + " -0.025112073868513107,\n", + " 0.006883637513965368,\n", + " -0.017323527485132217,\n", + " -0.005316773895174265,\n", + " -0.02718244679272175,\n", + " 0.004394260700792074,\n", + " 0.009633571840822697,\n", + " 0.007105463184416294,\n", + " 0.03216824308037758,\n", + " 0.02532333694398403,\n", + " 0.018422093242406845,\n", + " -0.0037146995309740305,\n", + " 0.011056073009967804,\n", + " -0.017422117292881012,\n", + " -0.0006214639870449901,\n", + " 0.02370365709066391,\n", + " -0.034703392535448074,\n", + " -0.00472875963896513,\n", + " 0.023154374212026596,\n", + " -0.027858486399054527,\n", + " 0.009084288962185383,\n", + " -0.012077175080776215,\n", + " 0.0032745692878961563,\n", + " -0.01545033324509859,\n", + " -0.003749910043552518,\n", + " -0.011929291300475597,\n", + " -0.009675824083387852,\n", + " -0.010626506060361862,\n", + " -0.0020774148870259523,\n", + " 0.01885870285332203,\n", + " -0.03109784610569477,\n", + " -0.023098036646842957,\n", + " -0.008809647522866726,\n", + " 0.026097966358065605,\n", + " 0.005024527199566364,\n", + " -0.01684466563165188,\n", + " -0.0025069820694625378,\n", + " -0.04278770461678505,\n", + " 0.004570312798023224,\n", + " 0.014365852810442448,\n", + " -0.017408033832907677,\n", + " -0.019407985731959343,\n", + " 0.001058953464962542,\n", + " 0.021422021090984344,\n", + " 0.010274401865899563,\n", + " 0.01536582875996828,\n", + " 0.19909381866455078,\n", + " -0.003707657568156719,\n", + " 0.04760449007153511,\n", + " 0.03844978287816048,\n", + " -0.0009744484559632838,\n", + " -0.0095842769369483,\n", + " 0.01806998997926712,\n", + " -0.007094900123775005,\n", + " 0.01832350343465805,\n", + " 0.0046372124925255775,\n", + " -0.015098229050636292,\n", + " -0.011837744154036045,\n", + " -0.009689908474683762,\n", + " -0.007024479564279318,\n", + " 0.02339380420744419,\n", + " 0.014365852810442448,\n", + " -0.039801862090826035,\n", + " -0.04118211194872856,\n", + " -0.01399262249469757,\n", + " -0.03191472589969635,\n", + " -0.01905588060617447,\n", + " -0.018253082409501076,\n", + " -0.027816234156489372,\n", + " -0.023576898500323296,\n", + " 0.010718053206801414,\n", + " -0.009795540012419224,\n", + " 0.00043462865869514644,\n", + " 0.007429399061948061,\n", + " 0.02078823372721672,\n", + " 0.0008476909133605659,\n", + " -0.019872762262821198,\n", + " -0.020407961681485176,\n", + " 0.01063354779034853,\n", + " -0.017224939540028572,\n", + " -0.0004260461137164384,\n", + " -0.004070324823260307,\n", + " 0.02205580845475197,\n", + " -0.008647680282592773,\n", + " 0.0283514317125082,\n", + " 0.012478574179112911,\n", + " 0.007760377135127783,\n", + " -0.0056653572246432304,\n", + " -0.0017085857689380646,\n", + " -0.0377737432718277,\n", + " 0.0005299168406054378,\n", + " 0.009718076325953007,\n", + " ...],\n", + " [-0.024379240348935127,\n", + " 0.00834567565470934,\n", + " 0.013073521666228771,\n", + " -0.02563999965786934,\n", + " -0.016047269105911255,\n", + " 0.020199550315737724,\n", + " -0.029956728219985962,\n", + " -0.02932634763419628,\n", + " -0.01712987571954727,\n", + " -0.013285932131111622,\n", + " 0.03436938300728798,\n", + " -0.007537145633250475,\n", + " 0.012381474487483501,\n", + " 0.01829470880329609,\n", + " -0.010922009125351906,\n", + " 0.016938021406531334,\n", + " 0.021104007959365845,\n", + " 0.004871736746281385,\n", + " 0.020638074725866318,\n", + " 0.01112756785005331,\n", + " -0.012936482205986977,\n", + " 0.03299899399280548,\n", + " -0.019980287179350853,\n", + " 0.023666637018322945,\n", + " -0.03217675909399986,\n", + " -0.029901912435889244,\n", + " 0.024639613926410675,\n", + " -0.03138193488121033,\n", + " -0.007955114357173443,\n", + " -0.0374116487801075,\n", + " 0.010421817190945148,\n", + " -0.020624371245503426,\n", + " -0.013066669926047325,\n", + " -0.011483869515359402,\n", + " -0.011614056304097176,\n", + " 0.00977088138461113,\n", + " 0.02717483602464199,\n", + " 0.0017900720704346895,\n", + " 0.033656779676675797,\n", + " 0.007639924995601177,\n", + " 0.012299250811338425,\n", + " -0.0035116246435791254,\n", + " -0.009270689450204372,\n", + " -0.009071982465684414,\n", + " -0.012162212282419205,\n", + " 0.0029823114164173603,\n", + " -0.0003957001317758113,\n", + " -0.022282542660832405,\n", + " 0.0013052965514361858,\n", + " 0.00017076345102395862,\n", + " 0.0272433552891016,\n", + " 0.01181961502879858,\n", + " -0.01226499117910862,\n", + " -0.005361651070415974,\n", + " -0.004275617189705372,\n", + " -0.00842104759067297,\n", + " 0.0013635382056236267,\n", + " 0.008715680800378323,\n", + " -0.0029360607732087374,\n", + " 0.009517359547317028,\n", + " 0.003782276762649417,\n", + " -0.002487258054316044,\n", + " -0.01622541807591915,\n", + " -0.014087609946727753,\n", + " 0.0053171138279139996,\n", + " -0.025256289169192314,\n", + " -0.0075302934274077415,\n", + " 0.0026945294812321663,\n", + " 0.017513586208224297,\n", + " 0.0006526482757180929,\n", + " 0.03771313652396202,\n", + " 0.008427899330854416,\n", + " 0.008160673081874847,\n", + " -0.01795211061835289,\n", + " 0.011381089687347412,\n", + " -0.00242559053003788,\n", + " -0.006423703860491514,\n", + " 0.0036246818490326405,\n", + " 0.013642233796417713,\n", + " -0.018746936693787575,\n", + " 0.010545152239501476,\n", + " -0.015252442099153996,\n", + " 0.0026945294812321663,\n", + " 0.002965181600302458,\n", + " -0.001325852470472455,\n", + " -0.004594232887029648,\n", + " 0.005714526865631342,\n", + " 0.0034550961572676897,\n", + " -0.03256046772003174,\n", + " -0.015183922834694386,\n", + " -0.004690160043537617,\n", + " 0.018555082380771637,\n", + " 0.023543301969766617,\n", + " 0.005389058962464333,\n", + " -0.00621814513579011,\n", + " 0.02210439182817936,\n", + " -0.004512009210884571,\n", + " 0.04459249600768089,\n", + " 0.005858417600393295,\n", + " -0.022214023396372795,\n", + " 0.006259256973862648,\n", + " 0.012922778725624084,\n", + " -0.012306103482842445,\n", + " -0.00559804355725646,\n", + " -0.034287162125110626,\n", + " -0.0069238957948982716,\n", + " 0.012689812108874321,\n", + " -0.01817137375473976,\n", + " -0.00028542656218633056,\n", + " 0.017417658120393753,\n", + " -0.005358225200325251,\n", + " 0.012970742769539356,\n", + " 0.0021635033190250397,\n", + " -0.04683993384242058,\n", + " 0.021597348153591156,\n", + " 0.02820262871682644,\n", + " 0.005210908595472574,\n", + " -0.007283623330295086,\n", + " -0.014347984455525875,\n", + " -0.011757947504520416,\n", + " 0.007639924995601177,\n", + " 0.009298097342252731,\n", + " -0.015965044498443604,\n", + " -0.00946254376322031,\n", + " 0.010003848001360893,\n", + " 0.01378612406551838,\n", + " -0.012497957795858383,\n", + " -0.03820647671818733,\n", + " 0.002643140032887459,\n", + " -0.011511276476085186,\n", + " -0.0013652511406689882,\n", + " 0.01906212605535984,\n", + " 0.0016633110353723168,\n", + " 0.013333896175026894,\n", + " -0.021158823743462563,\n", + " 0.027202242985367775,\n", + " -0.011908690445125103,\n", + " -0.028586337342858315,\n", + " -0.04656585678458214,\n", + " -0.009764029644429684,\n", + " 0.005656285211443901,\n", + " 0.04056354612112045,\n", + " -0.008462158963084221,\n", + " -0.0039467234164476395,\n", + " -0.0009481386514380574,\n", + " -0.001090316683985293,\n", + " 0.004984794184565544,\n", + " -0.0018568786326795816,\n", + " 0.017020246013998985,\n", + " -0.006481945049017668,\n", + " -0.004042651038616896,\n", + " 0.0053171138279139996,\n", + " 0.02563999965786934,\n", + " 0.006125643849372864,\n", + " -0.0010928860865533352,\n", + " 0.02065177820622921,\n", + " 0.005779620260000229,\n", + " 0.026640383526682854,\n", + " -0.0056357295252382755,\n", + " -0.030395252630114555,\n", + " -0.019925471395254135,\n", + " 0.011696279980242252,\n", + " -0.01830841228365898,\n", + " -0.0048066433519124985,\n", + " 0.01740395464003086,\n", + " 0.03151897341012955,\n", + " -0.0063072205521166325,\n", + " 0.0014097888488322496,\n", + " -0.00750288600102067,\n", + " -0.010044959373772144,\n", + " 0.0029566166922450066,\n", + " 0.015019475482404232,\n", + " -0.022748475894331932,\n", + " -0.01407390646636486,\n", + " -0.005618599243462086,\n", + " 0.03436938300728798,\n", + " 0.008708829060196877,\n", + " 0.0009652685257606208,\n", + " -0.02883300743997097,\n", + " -0.015416888520121574,\n", + " -0.020185846835374832,\n", + " 0.016252826899290085,\n", + " 0.03513680398464203,\n", + " 0.03308121860027313,\n", + " -0.02066548354923725,\n", + " 0.013984831050038338,\n", + " 0.023378854617476463,\n", + " -0.03598644584417343,\n", + " -0.009304949082434177,\n", + " -0.0172532107681036,\n", + " 0.021542532369494438,\n", + " 0.008188080973923206,\n", + " 0.0031827311031520367,\n", + " -0.004337284713983536,\n", + " -0.6507708430290222,\n", + " -0.01954176276922226,\n", + " 0.012354066595435143,\n", + " -0.020213253796100616,\n", + " -0.002315959194675088,\n", + " 0.014252057299017906,\n", + " -0.003463661065325141,\n", + " 0.012470549903810024,\n", + " -0.00929124467074871,\n", + " 0.013107781298458576,\n", + " -0.026297785341739655,\n", + " 0.006684077903628349,\n", + " -0.010058663785457611,\n", + " -0.015129107050597668,\n", + " -0.0029172180220484734,\n", + " -0.021378085017204285,\n", + " -0.01378612406551838,\n", + " -0.026311490684747696,\n", + " -0.01022311020642519,\n", + " -0.0028384204488247633,\n", + " -0.01445761602371931,\n", + " 0.01416983362287283,\n", + " -0.003386576659977436,\n", + " -0.002060724189504981,\n", + " 0.008215488865971565,\n", + " -0.007372698746621609,\n", + " 0.05336299166083336,\n", + " -0.02184401825070381,\n", + " 0.010716450400650501,\n", + " -0.0009935328271239996,\n", + " -0.0027116595301777124,\n", + " 0.008434751071035862,\n", + " 0.02676371857523918,\n", + " -0.01559503935277462,\n", + " 0.03998798504471779,\n", + " -0.019802136346697807,\n", + " -0.009263836778700352,\n", + " 0.02017214149236679,\n", + " 0.026051115244627,\n", + " 0.02633889764547348,\n", + " -0.015512815676629543,\n", + " -0.017006540670990944,\n", + " 0.010065515525639057,\n", + " 0.011285162530839443,\n", + " -0.007852335460484028,\n", + " 0.014813916757702827,\n", + " 0.03417753055691719,\n", + " -0.0010406399378553033,\n", + " -0.0026602698490023613,\n", + " -0.00530340988188982,\n", + " 0.000868056493345648,\n", + " 0.001206799759529531,\n", + " 0.01636245846748352,\n", + " -0.025324808433651924,\n", + " 0.014855029061436653,\n", + " 0.005070443265140057,\n", + " 0.02620185911655426,\n", + " -0.0027305022813379765,\n", + " 0.012813147157430649,\n", + " 0.011326273903250694,\n", + " -0.01407390646636486,\n", + " -0.008099005557596684,\n", + " -0.018760640174150467,\n", + " -0.037329427897930145,\n", + " -0.019870657473802567,\n", + " -0.020528443157672882,\n", + " -0.02905227057635784,\n", + " -0.019651394337415695,\n", + " 0.02729817107319832,\n", + " -0.013005002401769161,\n", + " 0.002036742400377989,\n", + " 0.007023249287158251,\n", + " -0.010545152239501476,\n", + " -0.014019090682268143,\n", + " 0.012018321081995964,\n", + " 0.03754868730902672,\n", + " 0.0054130409844219685,\n", + " -0.02662668004631996,\n", + " -0.009503655135631561,\n", + " 0.006961581762880087,\n", + " -0.010942565277218819,\n", + " -0.012881667353212833,\n", + " 0.0011314282892271876,\n", + " -0.017212100327014923,\n", + " 0.023063665255904198,\n", + " 0.0020076215732842684,\n", + " -0.00786603894084692,\n", + " -0.011408497579395771,\n", + " -0.010976824909448624,\n", + " -0.013601121492683887,\n", + " 0.02551666460931301,\n", + " 0.00552609795704484,\n", + " -0.002728789346292615,\n", + " -0.01976102590560913,\n", + " 0.0074549224227666855,\n", + " -0.013512046076357365,\n", + " -0.01601986028254032,\n", + " -0.011202938854694366,\n", + " -0.004871736746281385,\n", + " -0.006776579190045595,\n", + " -3.795338125200942e-05,\n", + " -0.005022479686886072,\n", + " -0.0016590285813435912,\n", + " 0.00658472441136837,\n", + " 0.026188155636191368,\n", + " 0.006087957881391048,\n", + " -0.009236428886651993,\n", + " 0.018979903310537338,\n", + " 0.024790357798337936,\n", + " -0.0065641687251627445,\n", + " -0.006060550455003977,\n", + " -0.03705534711480141,\n", + " 0.011319422163069248,\n", + " 0.0012881667353212833,\n", + " -0.008044189773499966,\n", + " -0.03283454850316048,\n", + " 0.007893446832895279,\n", + " -0.007537145633250475,\n", + " 0.005176648497581482,\n", + " -0.006077680271118879,\n", + " 0.00022911209089215845,\n", + " 0.0010552003514021635,\n", + " -0.01614319533109665,\n", + " 0.01877434365451336,\n", + " 0.008852720260620117,\n", + " 0.005402762908488512,\n", + " 0.00970236212015152,\n", + " -0.015677262097597122,\n", + " -0.008044189773499966,\n", + " -0.005039609503000975,\n", + " 0.018609898164868355,\n", + " 0.001987065654247999,\n", + " 0.028051884844899178,\n", + " -0.017897294834256172,\n", + " 0.013964274898171425,\n", + " -0.019322499632835388,\n", + " 0.007317882962524891,\n", + " 0.00019549470744095743,\n", + " 0.02965524233877659,\n", + " -0.01663653552532196,\n", + " -0.030121173709630966,\n", + " -0.011566092260181904,\n", + " 0.019432131201028824,\n", + " -0.013546306639909744,\n", + " -0.005964622832834721,\n", + " -0.03453383222222328,\n", + " -0.015183922834694386,\n", + " 0.0065915766172111034,\n", + " -0.008893831633031368,\n", + " 0.0092227254062891,\n", + " -0.0025711944326758385,\n", + " -0.0007995369960553944,\n", + " 0.010579411871731281,\n", + " 0.0217754989862442,\n", + " 0.009832548908889294,\n", + " -0.03138193488121033,\n", + " -0.00223544891923666,\n", + " -0.021857721731066704,\n", + " -0.003847370157018304,\n", + " -0.0032135648652911186,\n", + " 0.007400106638669968,\n", + " 0.018541378900408745,\n", + " -0.02599629946053028,\n", + " -0.006776579190045595,\n", + " -0.014347984455525875,\n", + " 0.01559503935277462,\n", + " -0.0057042487896978855,\n", + " 0.015087994746863842,\n", + " 0.021446604281663895,\n", + " -0.026434825733304024,\n", + " 0.007338439114391804,\n", + " -0.009229577146470547,\n", + " -0.028257444500923157,\n", + " 0.011552388779819012,\n", + " -0.022282542660832405,\n", + " -0.00021273165475577116,\n", + " 0.0010706172324717045,\n", + " 0.002122391713783145,\n", + " 0.0036589414812624454,\n", + " -0.008880128152668476,\n", + " -0.006550464779138565,\n", + " -0.00887327641248703,\n", + " -0.01615689881145954,\n", + " -0.007667332887649536,\n", + " 0.026873350143432617,\n", + " -0.00302513618953526,\n", + " -0.0092227254062891,\n", + " 0.02085733786225319,\n", + " 0.0034259753301739693,\n", + " 0.008880128152668476,\n", + " -0.005889251362532377,\n", + " 0.01774655282497406,\n", + " -0.003939871676266193,\n", + " 0.023721452802419662,\n", + " 0.009866808541119099,\n", + " 0.024845173582434654,\n", + " -0.006406573578715324,\n", + " -0.010408112779259682,\n", + " 0.011799058876931667,\n", + " 0.021474013105034828,\n", + " 0.02729817107319832,\n", + " 0.0035664401948451996,\n", + " 0.023118481040000916,\n", + " -0.013758717104792595,\n", + " 0.015252442099153996,\n", + " -0.05870751291513443,\n", + " -0.010620523244142532,\n", + " -0.02932634763419628,\n", + " 0.03305380791425705,\n", + " -0.005786472465842962,\n", + " -0.0008654869743622839,\n", + " -0.00219262414611876,\n", + " -0.010250518098473549,\n", + " -0.00016102084191516042,\n", + " 0.002893236232921481,\n", + " 0.017705440521240234,\n", + " 0.006255830638110638,\n", + " 0.017897294834256172,\n", + " -0.015320961363613605,\n", + " -0.0019750746432691813,\n", + " -0.009284392930567265,\n", + " 0.029353756457567215,\n", + " -0.005193778313696384,\n", + " -0.010928860865533352,\n", + " 0.016033563762903214,\n", + " 0.02710631676018238,\n", + " 0.01573207788169384,\n", + " 0.01559503935277462,\n", + " 0.00611879164353013,\n", + " -0.005807028152048588,\n", + " 0.015279849991202354,\n", + " 0.03806943818926811,\n", + " 0.020679187029600143,\n", + " -0.01105219591408968,\n", + " 0.009202169254422188,\n", + " -0.017691737040877342,\n", + " 0.030532291159033775,\n", + " -0.01739025115966797,\n", + " 0.012333511374890804,\n", + " -0.0063414801843464375,\n", + " 0.0024547113571316004,\n", + " 0.01856878586113453,\n", + " 0.03957686573266983,\n", + " -0.012641848996281624,\n", + " 0.010695895180106163,\n", + " 0.014868732541799545,\n", + " -0.004186541773378849,\n", + " 0.015855412930250168,\n", + " 0.010689042508602142,\n", + " 0.03554791957139969,\n", + " -0.021501420065760612,\n", + " 0.031107855960726738,\n", + " -0.013402415439486504,\n", + " 0.012278695590794086,\n", + " 0.024995915591716766,\n", + " -0.01581430248916149,\n", + " -0.004929978400468826,\n", + " 0.020062511786818504,\n", + " 0.018349522724747658,\n", + " 0.01823989301919937,\n", + " -0.014389095827937126,\n", + " 0.023748859763145447,\n", + " 0.011627759784460068,\n", + " -0.01574578322470188,\n", + " 0.025283697992563248,\n", + " 0.0020932708866894245,\n", + " -0.004614788573235273,\n", + " 0.015183922834694386,\n", + " -0.04152281954884529,\n", + " -0.02003510296344757,\n", + " -0.018048036843538284,\n", + " -0.004885440692305565,\n", + " 0.0041659860871732235,\n", + " 0.015690967440605164,\n", + " 0.002643140032887459,\n", + " -0.007633072789758444,\n", + " 0.006228423211723566,\n", + " 0.010058663785457611,\n", + " 0.03127230331301689,\n", + " 0.020953264087438583,\n", + " -0.026777422055602074,\n", + " -0.015183922834694386,\n", + " 0.014649470336735249,\n", + " 0.021501420065760612,\n", + " 0.014183538034558296,\n", + " 0.014745397493243217,\n", + " -0.01157979667186737,\n", + " -0.027887439355254173,\n", + " -0.015828005969524384,\n", + " 0.015060586854815483,\n", + " 0.0021600774489343166,\n", + " 0.017691737040877342,\n", + " 0.023447373881936073,\n", + " -0.0029257829301059246,\n", + " -0.007180843967944384,\n", + " -0.018404338508844376,\n", + " -0.007132880389690399,\n", + " -0.022638844326138496,\n", + " 0.020775113254785538,\n", + " 0.02224143221974373,\n", + " 0.016444681212306023,\n", + " -0.014402800239622593,\n", + " -0.013066669926047325,\n", + " 0.02203587256371975,\n", + " 0.026859646663069725,\n", + " 0.003384863492101431,\n", + " -0.0026688347570598125,\n", + " -0.007448070216923952,\n", + " -0.011216643266379833,\n", + " -0.008962350897490978,\n", + " 0.02314588986337185,\n", + " -0.0002858548250515014,\n", + " -0.033985674381256104,\n", + " 0.0329715870320797,\n", + " 0.0015348369488492608,\n", + " -0.0019219721434637904,\n", + " -0.00981199275702238,\n", + " -0.028175219893455505,\n", + " 0.015540223568677902,\n", + " 0.008688272908329964,\n", + " -0.010997381061315536,\n", + " -0.011312570422887802,\n", + " -0.034972354769706726,\n", + " 0.02307736873626709,\n", + " 0.046401407569646835,\n", + " 0.034753091633319855,\n", + " -0.0025352216325700283,\n", + " 0.030998224392533302,\n", + " -0.0013823810731992126,\n", + " 0.012121100910007954,\n", + " -0.023269224911928177,\n", + " -0.021515125408768654,\n", + " 0.014334280975162983,\n", + " -0.0015810875920578837,\n", + " -0.0003783561405725777,\n", + " -0.002456424292176962,\n", + " -0.002024751389399171,\n", + " 0.00476553151383996,\n", + " 0.004429786000400782,\n", + " 0.0008397921919822693,\n", + " 0.015471704304218292,\n", + " -0.013299635611474514,\n", + " 0.003081664675846696,\n", + " -0.0034071323461830616,\n", + " -0.007735852152109146,\n", + " 0.019706210121512413,\n", + " 0.0006457963609136641,\n", + " 0.002627722918987274,\n", + " 0.028860416263341904,\n", + " -5.877688818145543e-05,\n", + " 0.047716982662677765,\n", + " 0.022268839180469513,\n", + " 0.0010080932406708598,\n", + " -0.008455307222902775,\n", + " -0.009784585796296597,\n", + " 0.013710753060877323,\n", + " 0.0175958089530468,\n", + " 0.004871736746281385,\n", + " -0.012285547330975533,\n", + " 0.03272491693496704,\n", + " 0.011764799244701862,\n", + " -0.0036349596921354532,\n", + " -0.010298481211066246,\n", + " 0.012148507870733738,\n", + " 0.0239133071154356,\n", + " -0.001457752427086234,\n", + " 0.0030456921085715294,\n", + " 0.013148892670869827,\n", + " 0.013731309212744236,\n", + " -0.03256046772003174,\n", + " -0.015923932194709778,\n", + " 0.004152282141149044,\n", + " 6.439763092203066e-05,\n", + " -0.0209258571267128,\n", + " 0.0324508361518383,\n", + " 0.009489951655268669,\n", + " 0.021871427074074745,\n", + " -0.020679187029600143,\n", + " 0.009243281558156013,\n", + " 0.011833318509161472,\n", + " 0.0012136517325416207,\n", + " 0.01948694698512554,\n", + " -0.010236813686788082,\n", + " -0.05245853215456009,\n", + " -0.019870657473802567,\n", + " -0.012038877233862877,\n", + " 0.013498342595994473,\n", + " -0.004789513535797596,\n", + " -0.015375777147710323,\n", + " -0.023378854617476463,\n", + " -0.03480790928006172,\n", + " -0.011682575568556786,\n", + " -0.00866086594760418,\n", + " 0.017376545816659927,\n", + " 0.0033882895950227976,\n", + " -0.021090304479002953,\n", + " -0.02355700545012951,\n", + " 0.004251635167747736,\n", + " 0.008681421168148518,\n", + " 0.011250902898609638,\n", + " 0.005971475038677454,\n", + " -0.00620444118976593,\n", + " -0.011600351892411709,\n", + " 0.006780005060136318,\n", + " -0.0013489777920767665,\n", + " -0.042372461408376694,\n", + " 0.02495480328798294,\n", + " -0.013745012693107128,\n", + " 0.00029527622973546386,\n", + " 0.006581298541277647,\n", + " 0.0006179602933116257,\n", + " -0.007482329849153757,\n", + " -0.023447373881936073,\n", + " 0.031080447137355804,\n", + " 0.01455354318022728,\n", + " 0.004570250865072012,\n", + " 0.003521902486681938,\n", + " -0.01712987571954727,\n", + " -0.005001924000680447,\n", + " 0.0071739922277629375,\n", + " 0.01954176276922226,\n", + " 0.017636921256780624,\n", + " -0.011257754638791084,\n", + " 0.0011477017542347312,\n", + " -0.003102220594882965,\n", + " -0.015430592931807041,\n", + " -0.009537914767861366,\n", + " -0.0119566535577178,\n", + " 0.005015627946704626,\n", + " 0.005834436044096947,\n", + " 0.005368503276258707,\n", + " 0.044126562774181366,\n", + " -0.02765447273850441,\n", + " -0.006420277524739504,\n", + " -0.016033563762903214,\n", + " 0.004604510962963104,\n", + " 0.02251550927758217,\n", + " 0.00020984098955523223,\n", + " -0.00983940064907074,\n", + " 0.02030918188393116,\n", + " 0.0093940244987607,\n", + " 0.00469701224938035,\n", + " 0.009777733124792576,\n", + " -0.006889636162668467,\n", + " -0.04448286443948746,\n", + " -0.0353560633957386,\n", + " -0.0018431746866554022,\n", + " 0.014224649406969547,\n", + " -0.013470934703946114,\n", + " 0.013258524239063263,\n", + " 0.018938791006803513,\n", + " -0.010161442682147026,\n", + " -0.0018791474867612123,\n", + " 0.00733158690854907,\n", + " 0.006814264692366123,\n", + " -0.006903340108692646,\n", + " -0.03193008899688721,\n", + " -0.011524980887770653,\n", + " -0.026051115244627,\n", + " -0.01648579351603985,\n", + " -0.01324482075870037,\n", + " 0.01600615680217743,\n", + " -0.006183885503560305,\n", + " -0.021378085017204285,\n", + " -0.005108129233121872,\n", + " -0.017636921256780624,\n", + " 0.006577872671186924,\n", + " 0.0003533037088345736,\n", + " 0.003467086935415864,\n", + " -0.027599656954407692,\n", + " -0.005488412454724312,\n", + " 0.001148558221757412,\n", + " -0.017362842336297035,\n", + " 0.030806370079517365,\n", + " 0.009147354401648045,\n", + " 0.017856182530522346,\n", + " 0.0027459191624075174,\n", + " -0.016102083027362823,\n", + " -0.005399337038397789,\n", + " -0.024653317406773567,\n", + " -0.007694740314036608,\n", + " 0.0003267524007242173,\n", + " 0.036150891333818436,\n", + " 0.020692890509963036,\n", + " 0.0183632280677557,\n", + " -0.009709213860332966,\n", + " 0.011703131720423698,\n", + " 0.024557391181588173,\n", + " 0.00022975447063799948,\n", + " 0.01348463911563158,\n", + " -0.007030101493000984,\n", + " 0.0030234232544898987,\n", + " -0.039686497300863266,\n", + " 0.01386149600148201,\n", + " 0.024653317406773567,\n", + " 0.024762948974967003,\n", + " 0.02341996692121029,\n", + " -0.0068827844224870205,\n", + " -0.005543227773159742,\n", + " 0.02191253751516342,\n", + " -0.01641727425158024,\n", + " -0.011319422163069248,\n", + " -0.009962735697627068,\n", + " -0.018801752477884293,\n", + " -0.021186230704188347,\n", + " -0.00886642374098301,\n", + " -0.017486177384853363,\n", + " 0.0038576482329517603,\n", + " -0.0396316833794117,\n", + " -0.02024066261947155,\n", + " 0.01552652008831501,\n", + " 0.005693970713764429,\n", + " 0.032258983701467514,\n", + " 0.007859187200665474,\n", + " 0.02176179550588131,\n", + " -0.027695583179593086,\n", + " 0.009764029644429684,\n", + " 0.009030871093273163,\n", + " 0.0008697694865986705,\n", + " -0.012203323654830456,\n", + " 0.012299250811338425,\n", + " -0.0006595002487301826,\n", + " 0.0025557775516062975,\n", + " 0.023721452802419662,\n", + " 0.00886642374098301,\n", + " -0.01634875312447548,\n", + " -0.019637690857052803,\n", + " 0.01378612406551838,\n", + " -0.005580913741141558,\n", + " 0.01892508752644062,\n", + " -0.018541378900408745,\n", + " -0.016458384692668915,\n", + " -0.008708829060196877,\n", + " -0.021295862272381783,\n", + " -0.010757562704384327,\n", + " -0.010031255893409252,\n", + " -0.025132954120635986,\n", + " -0.021542532369494438,\n", + " 0.007352143060415983,\n", + " 0.002600315259769559,\n", + " -0.026859646663069725,\n", + " 0.01001069974154234,\n", + " -0.015978747978806496,\n", + " -0.002317672362551093,\n", + " -0.011586648412048817,\n", + " -0.004213949665427208,\n", + " 0.03373900428414345,\n", + " 0.019733617082238197,\n", + " 0.03957686573266983,\n", + " 0.035794589668512344,\n", + " -0.023831084370613098,\n", + " -0.015471704304218292,\n", + " 0.03258787840604782,\n", + " -0.005437022540718317,\n", + " -0.00872938521206379,\n", + " 0.02426960878074169,\n", + " 0.021611051633954048,\n", + " -0.030093766748905182,\n", + " -0.020254366099834442,\n", + " 0.014964659698307514,\n", + " 0.0004826342628803104,\n", + " -0.00705065717920661,\n", + " 0.0074891820549964905,\n", + " 0.0077769639901816845,\n", + " -0.010003848001360893,\n", + " 0.012079988606274128,\n", + " 0.0025866113137453794,\n", + " -0.013546306639909744,\n", + " -0.022200319916009903,\n", + " 0.019021015614271164,\n", + " -0.01399853453040123,\n", + " 0.009654398076236248,\n", + " -0.04157763719558716,\n", + " -0.002600315259769559,\n", + " -0.019870657473802567,\n", + " 0.03464346379041672,\n", + " -0.000787117809522897,\n", + " 0.022954033687710762,\n", + " 0.01358056627213955,\n", + " -0.013539453968405724,\n", + " -0.016540609300136566,\n", + " -0.01181276235729456,\n", + " -0.022131800651550293,\n", + " -0.018555082380771637,\n", + " -0.010236813686788082,\n", + " 0.011387941427528858,\n", + " -0.009791437536478043,\n", + " 0.0038199624978005886,\n", + " -0.01115497574210167,\n", + " 0.011326273903250694,\n", + " 0.00012290685845073313,\n", + " -0.00453941710293293,\n", + " -0.010771266184747219,\n", + " 0.01884286478161812,\n", + " 0.0006342337001115084,\n", + " -0.00469701224938035,\n", + " 0.004508583340793848,\n", + " -0.010353296995162964,\n", + " 0.026380009949207306,\n", + " 0.015704670920968056,\n", + " -0.011161827482283115,\n", + " -0.02577703818678856,\n", + " 0.0008487853920087218,\n", + " -0.011339978314936161,\n", + " 0.015293553471565247,\n", + " 0.006077680271118879,\n", + " 0.023255519568920135,\n", + " -0.01996658369898796,\n", + " -0.005834436044096947,\n", + " -0.007434366270899773,\n", + " -0.013971127569675446,\n", + " -0.010750710032880306,\n", + " 0.011209791526198387,\n", + " -0.017568401992321014,\n", + " 0.020610667765140533,\n", + " -0.006248978897929192,\n", + " -0.009626990184187889,\n", + " 0.014115017838776112,\n", + " 0.015704670920968056,\n", + " 0.02329663187265396,\n", + " 0.015225034207105637,\n", + " 0.0158828217536211,\n", + " -0.016883205622434616,\n", + " -0.010784970596432686,\n", + " 0.017349138855934143,\n", + " 0.02085733786225319,\n", + " -0.03754868730902672,\n", + " 0.023337744176387787,\n", + " 0.0042002457194030285,\n", + " -0.02258402854204178,\n", + " -0.004128300119191408,\n", + " -0.013285932131111622,\n", + " -0.028778191655874252,\n", + " -0.02106289565563202,\n", + " 0.00415913388133049,\n", + " 0.02189883403480053,\n", + " 0.008236045017838478,\n", + " -0.021953649818897247,\n", + " 0.009195317514240742,\n", + " 0.001166544621810317,\n", + " 0.01843174733221531,\n", + " 0.005656285211443901,\n", + " -0.02183031477034092,\n", + " 0.001601643394678831,\n", + " -0.009161057882010937,\n", + " -0.004563399124890566,\n", + " 0.0041968198493123055,\n", + " -0.0075577013194561005,\n", + " 0.016965430229902267,\n", + " -0.024365536868572235,\n", + " 0.0217754989862442,\n", + " 0.005693970713764429,\n", + " -0.0338212288916111,\n", + " -0.02203587256371975,\n", + " 0.014813916757702827,\n", + " 0.021446604281663895,\n", + " -0.0054164668545126915,\n", + " 0.011771650984883308,\n", + " -0.006838246714323759,\n", + " 0.007831779308617115,\n", + " -0.013882052153348923,\n", + " 0.02309107407927513,\n", + " -0.008976055309176445,\n", + " 0.0012718932703137398,\n", + " -0.0176780316978693,\n", + " -0.0016367597272619605,\n", + " 0.009428284130990505,\n", + " 0.015622447244822979,\n", + " 0.00462506664916873,\n", + " 0.01455354318022728,\n", + " -0.01982954517006874,\n", + " -0.01941842772066593,\n", + " -0.0442361943423748,\n", + " -0.023899603635072708,\n", + " 0.006903340108692646,\n", + " 0.011627759784460068,\n", + " -0.020692890509963036,\n", + " -0.021597348153591156,\n", + " -0.03401308134198189,\n", + " -0.03149156644940376,\n", + " -0.011257754638791084,\n", + " -0.019637690857052803,\n", + " -0.022063281387090683,\n", + " 0.001298444578424096,\n", + " 0.005882399622350931,\n", + " 0.010003848001360893,\n", + " -0.007098620757460594,\n", + " 0.022830698639154434,\n", + " 0.00673889322206378,\n", + " 0.027599656954407692,\n", + " -0.015978747978806496,\n", + " -0.020076215267181396,\n", + " 0.009270689450204372,\n", + " -0.01989806443452835,\n", + " -0.00963384285569191,\n", + " 0.006101661827415228,\n", + " -0.037959806621074677,\n", + " 0.004111170303076506,\n", + " 0.004258487373590469,\n", + " 0.013532602228224277,\n", + " 0.013292783871293068,\n", + " 0.02440664730966091,\n", + " -0.03182045742869377,\n", + " 0.0029823114164173603,\n", + " 0.01669135130941868,\n", + " 0.013491490855813026,\n", + " 0.01641727425158024,\n", + " 0.01662283204495907,\n", + " 0.008311416022479534,\n", + " -0.03362937271595001,\n", + " -0.012655552476644516,\n", + " -0.0017138441326096654,\n", + " 0.001448331051506102,\n", + " -0.0033112051896750927,\n", + " -0.030724145472049713,\n", + " 0.017308026552200317,\n", + " -0.028367076069116592,\n", + " -0.00033231961424462497,\n", + " -0.022844403982162476,\n", + " 0.020199550315737724,\n", + " 0.012545921839773655,\n", + " -0.0015168505487963557,\n", + " 0.033026400953531265,\n", + " 0.014087609946727753,\n", + " 0.02246069349348545,\n", + " 0.004659326281398535,\n", + " 0.013888903893530369,\n", + " -0.021268455311655998,\n", + " 0.0028230035677552223,\n", + " 0.018746936693787575,\n", + " -0.03560273349285126,\n", + " -0.014676878228783607,\n", + " 0.02398182637989521,\n", + " -0.038288701325654984,\n", + " 0.00915420614182949,\n", + " -0.015965044498443604,\n", + " 0.0015151376137509942,\n", + " -0.00831826776266098,\n", + " -0.004751827567815781,\n", + " -6.123930506873876e-05,\n", + " -0.006413425784558058,\n", + " -0.013183153234422207,\n", + " -0.0034602349624037743,\n", + " 0.018609898164868355,\n", + " -0.03998798504471779,\n", + " -0.02592778019607067,\n", + " -0.0053856330923736095,\n", + " 0.019843248650431633,\n", + " -0.0003205428074579686,\n", + " -0.012491106055676937,\n", + " -0.0009961023461073637,\n", + " -0.032807137817144394,\n", + " 0.0016350466758012772,\n", + " 0.01514281053096056,\n", + " -0.021364381536841393,\n", + " -0.013977979309856892,\n", + " 0.010839785449206829,\n", + " 0.01656801626086235,\n", + " 0.010312185622751713,\n", + " 0.022912923246622086,\n", + " 0.205997034907341,\n", + " -0.007358994800597429,\n", + " 0.04566139727830887,\n", + " 0.038782041519880295,\n", + " -0.004258487373590469,\n", + " 0.0002391759044257924,\n", + " 0.013758717104792595,\n", + " -0.005190352443605661,\n", + " 0.01303241029381752,\n", + " 0.011847022920846939,\n", + " -0.015828005969524384,\n", + " -0.001850026659667492,\n", + " -0.01732173189520836,\n", + " -0.0075302934274077415,\n", + " 0.006070828065276146,\n", + " 0.010134034790098667,\n", + " -0.04588066041469574,\n", + " -0.034753091633319855,\n", + " -0.018870271742343903,\n", + " -0.03875463083386421,\n", + " -0.03417753055691719,\n", + " -0.016252826899290085,\n", + " -0.0172532107681036,\n", + " -0.015389480628073215,\n", + " 0.014855029061436653,\n", + " -0.010942565277218819,\n", + " -0.0048409029841423035,\n", + " 0.0006440833676606417,\n", + " 0.01996658369898796,\n", + " 0.0021241046488285065,\n", + " -0.014320576563477516,\n", + " -0.014786508865654469,\n", + " 0.0037308870814740658,\n", + " -0.019582875072956085,\n", + " -0.0010851776460185647,\n", + " 0.009044574573636055,\n", + " 0.03151897341012955,\n", + " -0.014046498574316502,\n", + " 0.02080252207815647,\n", + " 0.015334665775299072,\n", + " 0.012662404216825962,\n", + " -0.00658472441136837,\n", + " -0.00234679295681417,\n", + " -0.030258214101195335,\n", + " -0.003343751886859536,\n", + " 0.019514355808496475,\n", + " ...])" + ] + }, + "metadata": {}, + "execution_count": 43 + } + ], + "source": [ + "# Verify if both embeddings have the same dimension (should be True)\n", + "len(embedding_1) == len(embedding_2)\n", + "\n", + "# Return/display the two embedding vectors for further inspection or use\n", + "embedding_1, embedding_2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qiKCOv4X0d7B" + }, + "source": [ + "### Vector Database" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_lKe-Yo6UzHL" + }, + "source": [ + "#### Setup Vector Database Directory" + ] + }, + { + "cell_type": "markdown", + "source": [ + "LangChain is used here to help orchestrate the various components of the Retrieval Augmented Generation (RAG) system. It provides tools and abstractions for:\n", + "\n", + "Loading and splitting documents: Making it easy to load the PDF manual and break it into smaller, manageable chunks.\n", + "Creating embeddings: Interfacing with embedding models (like OpenAI's) to convert text into numerical vectors.\n", + "Vector databases: Simplifying the process of storing these embeddings in a vector database (Chroma) and performing similarity searches to retrieve relevant information.\n", + "Essentially, LangChain helps connect these different pieces together to build the RAG pipeline for question answering.\n", + "\n" + ], + "metadata": { + "id": "ziVCjOwxQw75" + } + }, + { + "cell_type": "code", + "source": [ + "from langchain_openai import OpenAIEmbeddings\n", + "from langchain_community.vectorstores import Chroma\n", + "\n", + "# Define vector DB directory\n", + "out_dir = \"Chroma\"\n", + "os.makedirs(out_dir, exist_ok=True)\n", + "\n", + "# Create the embedding function\n", + "embedding_function = OpenAIEmbeddings(\n", + " model=\"text-embedding-3-small\"\n", + ")\n", + "\n" + ], + "metadata": { + "id": "gQ_rvak0wSFM" + }, + "execution_count": 44, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "#### Create Vector Database from Documents" + ], + "metadata": { + "id": "_rHpwgzIMMc3" + } + }, + { + "cell_type": "code", + "source": [ + "# Building the vector store and saving it to disk for future use\n", + "# Process documents in smaller batches to avoid exceeding the token limit\n", + "import time # Import the time module\n", + "\n", + "batch_size = 100 # Adjust batch size as needed\n", + "for i in range(0, len(document_chunks), batch_size):\n", + " batch_chunks = document_chunks[i:i + batch_size]\n", + " if i == 0:\n", + " vectorstore = Chroma.from_documents(\n", + " batch_chunks, # Documents to index\n", + " embedding_model, # Embedding model for converting text to vectors\n", + " persist_directory=out_dir # Save vector DB files here\n", + " )\n", + " else:\n", + " vectorstore.add_documents(batch_chunks)\n", + "\n", + " time.sleep(0.3) # Add a 1-second delay between batches to mitigate rate limiting\n", + "\n", + "print(\"Vector store created and documents added.\")" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "llRHoT4dGHwu", + "outputId": "14788ac0-3cad-44b3-d05a-6a487be6047e" + }, + "execution_count": 45, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Vector store created and documents added.\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DoJ1Bqb2VWkm" + }, + "source": [ + "#### Load Vector Database" + ] + }, + { + "cell_type": "code", + "source": [ + "retriever = vectorstore.as_retriever(\n", + " search_type=\"similarity\",\n", + " search_kwargs={\"k\": 3}\n", + ")\n" + ], + "metadata": { + "id": "caYiuqw7bm2F" + }, + "execution_count": 46, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "retriever.invoke(\"What are the common symptoms of appendicitis?\")\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "clYOUd48bsnR", + "outputId": "647c0581-8f3d-41a0-a5b0-e61ab07caf49" + }, + "execution_count": 47, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[Document(metadata={'moddate': '2025-11-05T06:16:39+00:00', 'trapped': '', 'format': 'PDF 1.7', 'keywords': '', 'page': 173, 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'creationdate': '2012-06-15T05:44:40+00:00', 'modDate': 'D:20251105061639Z', 'creationDate': 'D:20120615054440Z', 'creator': 'Atop CHM to PDF Converter', 'total_pages': 4114, 'author': '', 'source': '/content/medical_diagnosis_manual.pdf', 'subject': '', 'file_path': '/content/medical_diagnosis_manual.pdf', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)'}, page_content=\"Etiology\\nAppendicitis is thought to result from obstruction of the appendiceal lumen, typically by lymphoid\\nhyperplasia, but occasionally by a fecalith, foreign body, or even worms. The obstruction leads to\\ndistention, bacterial overgrowth, ischemia, and inflammation. If untreated, necrosis, gangrene, and\\nperforation occur. If the perforation is contained by the omentum, an appendiceal abscess results.\\nSymptoms and Signs\\nThe classic symptoms of acute appendicitis are epigastric or periumbilical pain followed by brief nausea,\\nvomiting, and anorexia; after a few hours, the pain shifts to the right lower quadrant. Pain increases with\\ncough and motion. Classic signs are right lower quadrant direct and rebound tenderness located at\\nMcBurney's point (junction of the middle and outer thirds of the line joining the umbilicus to the anterior\\nsuperior spine). Additional signs are pain felt in the right lower quadrant with palpation of the left lower\\nquadrant (Rovsing sign), an increase in pain from passive extension of the right hip joint that stretches\\nthe iliopsoas muscle (psoas sign), or pain caused by passive internal rotation of the flexed thigh\\n(obturator sign). Low-grade fever (rectal temperature 37.7 to 38.3° C [100 to 101° F]) is common.\\nUnfortunately, these classic findings appear in < 50% of patients. Many variations of symptoms and signs\\noccur. Pain may not be localized, particularly in infants and children. Tenderness may be diffuse or, in rare\\ninstances, absent. Bowel movements are usually less frequent or absent; if diarrhea is a sign, a\\nretrocecal appendix should be suspected. RBCs or WBCs may be present in the urine. Atypical symptoms\\nare common among elderly patients and pregnant women; in particular, pain is less severe and local\\ntenderness is less marked.\\nDiagnosis\\n• Clinical evaluation\\n• Abdominal CT if necessary\\n• Ultrasound an option to CT\\nWhen classic symptoms and signs are present, the diagnosis is clinical. In such patients, delaying\\nlaparotomy to do imaging tests only increases the likelihood of perforation and subsequent complications.\"),\n", + " Document(metadata={'page': 172, 'creationdate': '2012-06-15T05:44:40+00:00', 'creator': 'Atop CHM to PDF Converter', 'subject': '', 'file_path': '/content/medical_diagnosis_manual.pdf', 'author': '', 'format': 'PDF 1.7', 'modDate': 'D:20251105061639Z', 'creationDate': 'D:20120615054440Z', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'moddate': '2025-11-05T06:16:39+00:00', 'trapped': '', 'keywords': '', 'total_pages': 4114, 'source': '/content/medical_diagnosis_manual.pdf'}, page_content=\"antibiotics effective against intestinal flora should be given (eg, cefotetan 1 to 2 g bid, or amikacin 5\\nmg/kg tid plus clindamycin 600 to 900 mg qid).\\nAppendicitis\\nAppendicitis is acute inflammation of the vermiform appendix, typically resulting in abdominal\\npain, anorexia, and abdominal tenderness. Diagnosis is clinical, often supplemented by CT or\\nultrasound. Treatment is surgical removal.\\nIn the US, acute appendicitis is the most common cause of acute abdominal pain requiring surgery. Over\\n5% of the population develops appendicitis at some point. It most commonly occurs in the teens and 20s\\nbut may occur at any age.\\nOther conditions affecting the appendix include carcinoids, cancer, villous adenomas, and diverticula. The\\nappendix may also be affected by Crohn's disease or ulcerative colitis with pancolitis.\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition\\nChapter 11. Acute Abdomen & Surgical Gastroenterology\\n163\\nbiplobsinha25@gmail.com\\n9X5AUD3EIR\\nThis file is meant for personal use by biplobsinha25@gmail.com only.\\nSharing or publishing the contents in part or full is liable for legal action.\"),\n", + " Document(metadata={'total_pages': 4114, 'moddate': '2025-11-05T06:16:39+00:00', 'creator': 'Atop CHM to PDF Converter', 'format': 'PDF 1.7', 'trapped': '', 'page': 173, 'source': '/content/medical_diagnosis_manual.pdf', 'modDate': 'D:20251105061639Z', 'creationDate': 'D:20120615054440Z', 'subject': '', 'author': '', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'keywords': '', 'creationdate': '2012-06-15T05:44:40+00:00', 'file_path': '/content/medical_diagnosis_manual.pdf'}, page_content='• Ultrasound an option to CT\\nWhen classic symptoms and signs are present, the diagnosis is clinical. In such patients, delaying\\nlaparotomy to do imaging tests only increases the likelihood of perforation and subsequent complications.\\nIn patients with atypical or equivocal findings, imaging studies should be done without delay. Contrast-\\nenhanced CT has reasonable accuracy in diagnosing appendicitis and can also reveal other causes of\\nan acute abdomen. Graded compression ultrasound can usually be done quickly and uses no radiation\\n(of particular concern in children); however, it is occasionally limited by the presence of bowel gas and is\\nless useful for recognizing nonappendiceal causes of pain. Appendicitis remains primarily a clinical\\ndiagnosis. Selective and judicious use of radiographic studies may reduce the rate of negative\\nlaparotomy.\\nLaparoscopy can be used for diagnosis as well as definitive treatment; it may be especially helpful in\\nwomen with lower abdominal pain of unclear etiology. Laboratory studies typically show leukocytosis\\n(12,000 to 15,000/μL), but this finding is highly variable; a normal WBC count should not be used to\\nexclude appendicitis.\\nPrognosis\\nWithout surgery or antibiotics, mortality is > 50%.\\nWith early surgery, the mortality rate is < 1%, and convalescence is normally rapid and complete. With\\ncomplications (rupture and development of an abscess or peritonitis), the prognosis is worse: Repeat\\noperations and a long convalescence may follow.\\nTreatment\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition\\nChapter 11. Acute Abdomen & Surgical Gastroenterology\\n164\\nbiplobsinha25@gmail.com\\n9X5AUD3EIR\\nThis file is meant for personal use by biplobsinha25@gmail.com only.\\nSharing or publishing the contents in part or full is liable for legal action.')]" + ] + }, + "metadata": {}, + "execution_count": 47 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RtGfyOaeVlqP" + }, + "source": [ + "#### Explore Vector Database and Perform Searches" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": { + "id": "GdZON_Uj1EeS", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "8ff8ce60-8948-421b-bb73-b6dd7a17acee" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "OpenAIEmbeddings(client=, async_client=, model='text-embedding-ada-002', dimensions=None, deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base='https://api.openai.com/v1', openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)" + ] + }, + "metadata": {}, + "execution_count": 48 + } + ], + "source": [ + "vectorstore.embeddings" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dhXVQLa48mR0" + }, + "source": [ + "**Instructions:**\n", + "\n", + "In this step, the vector database has already been loaded into memory using the Chroma store, and the embedding function is attached. The next logical action is to query the vector database to verify that similarity search is working correctly.\n", + "\n", + "Therefore, in the following cell, we should perform a similarity search test by passing a clinical question to the vector store and retrieving the top-k most relevant document chunks based on the embeddings stored earlier.\n", + "\n", + "This will confirm:\n", + "\n", + "- my embeddings were generated correctly\n", + "\n", + "- my Chroma vector store loaded properly\n", + "\n", + "- my retrieval step is functioning\n", + "\n", + "- The RAG pipeline is ready for the answer-generation step" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": { + "id": "P9HgsipF1I4H", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c80e4de8-89e1-4361-ddb9-fc10e4a13e35" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[Document(metadata={'file_path': '/content/medical_diagnosis_manual.pdf', 'trapped': '', 'creationdate': '2012-06-15T05:44:40+00:00', 'total_pages': 4114, 'page': 2456, 'keywords': '', 'author': '', 'format': 'PDF 1.7', 'creator': 'Atop CHM to PDF Converter', 'moddate': '2025-11-05T06:16:39+00:00', 'subject': '', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition', 'source': '/content/medical_diagnosis_manual.pdf', 'creationDate': 'D:20120615054440Z', 'modDate': 'D:20251105061639Z'}, page_content=\"Parenteral antibiotics should be given after specimens of blood, body fluids, and wound sites have been\\ntaken for Gram stain and culture. Very prompt empiric therapy, started immediately after suspecting\\nsepsis, is essential and may be lifesaving. Antibiotic selection requires an educated guess based on the\\nsuspected source, clinical setting, knowledge or suspicion of causative organisms and of sensitivity\\npatterns common to that specific inpatient unit, and previous culture results.\\nOne regimen for septic shock of unknown cause is gentamicin or tobramycin 5.1 mg/kg IV once/day plus\\na 3rd-generation cephalosporin (cefotaxime 2 g q 6 to 8 h or ceftriaxone 2 g once/day or, if Pseudomonas\\nis suspected, ceftazidime 2 g IV q 8 h). Alternatively, ceftazidime plus a fluoroquinolone (eg, ciprofloxacin)\\nmay be used. Monotherapy with maximal therapeutic doses of ceftazidime (2 g IV q 8 h) or imipenem (1 g\\nIV q 6 h) may be effective but is not recommended.\\nVancomycin must be added if resistant staphylococci or enterococci are suspected. If there is an\\nabdominal source, a drug effective against anaerobes (eg, metronidazole) should be included. When\\nculture and sensitivity results are available, the antibiotic regimen is changed accordingly. Antibiotics are\\ncontinued for at least 5 days after shock resolves and evidence of infection subsides.\\nAbscesses must be drained, and necrotic tissues (eg, infarcted bowel, gangrenous gall-bladder,\\nabscessed uterus) must be surgically excised. The patient's condition will continue to deteriorate despite\\nantibiotic therapy unless septic foci are eliminated.\\nNormalization of blood glucose improves outcome in critically ill patients, even those not known to be\\ndiabetic. A continuous IV insulin infusion (crystalline zinc 1 to 4 U/h) is titrated to maintain glucose\\nbetween 80 to 110 mg/dL (4.4 to 6.1 mmol/L). This approach necessitates frequent (eg, q 1 to 4 h)\"),\n", + " Document(metadata={'file_path': '/content/medical_diagnosis_manual.pdf', 'creationDate': 'D:20120615054440Z', 'subject': '', 'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creator': 'Atop CHM to PDF Converter', 'creationdate': '2012-06-15T05:44:40+00:00', 'total_pages': 4114, 'author': '', 'moddate': '2025-11-05T06:16:39+00:00', 'trapped': '', 'source': '/content/medical_diagnosis_manual.pdf', 'format': 'PDF 1.7', 'page': 2400, 'keywords': '', 'modDate': 'D:20251105061639Z', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition'}, page_content=\"16 - Critical Care Medicine\\nChapter 222. Approach to the Critically Ill Patient\\nIntroduction\\nCritical care medicine specializes in caring for the most seriously ill patients. These patients are best\\ntreated in an ICU staffed by experienced personnel. Some hospitals maintain separate units for special\\npopulations (eg, cardiac, surgical, neurologic, pediatric, or neonatal patients). ICUs have a high\\nnurse:patient ratio to provide the necessary high intensity of service, including treatment and monitoring\\nof physiologic parameters.\\nSupportive care for the ICU patient includes provision of adequate nutrition (see p. 21) and prevention of\\ninfection, stress ulcers and gastritis (see p. 131), and pulmonary embolism (see p. 1920). Because 15 to\\n25% of patients admitted to ICUs die there, physicians should know how to minimize suffering and help\\ndying patients maintain dignity (see p. 3480).\\nPatient Monitoring and Testing\\nSome monitoring is manual (ie, by direct observation and physical examination) and intermittent, with the\\nfrequency depending on the patient's illness. This monitoring usually includes measurement of vital signs\\n(temperature, BP, pulse, and respiration rate), quantification of all fluid intake and output, and often daily\\nweight. BP may be recorded by an automated sphygmomanometer; a transcutaneous sensor for pulse\\noximetry is used as well.\\nOther monitoring is ongoing and continuous, provided by complex devices that require special training\\nand experience to operate. Most such devices generate an alarm if certain physiologic parameters are\\nexceeded. Every ICU should strictly follow protocols for investigating alarms.\\nBlood Tests\\nAlthough frequent blood draws can destroy veins, cause pain, and lead to anemia, ICU patients typically\\nhave routine daily blood tests to help detect problems early. Generally, patients need a daily set of\\nelectrolytes and a CBC. Patients with arrhythmias should also have Mg, phosphate, and Ca levels\\nmeasured. Patients receiving TPN need weekly liver enzymes and coagulation profiles. Other tests (eg,\\nblood culture for fever, CBC after a bleeding episode) are done as needed.\\nPoint-of-care testing uses miniaturized, highly automated devices to do certain blood tests at the patient's\\nbedside or unit (particularly ICU, emergency department, and operating room). Commonly available tests\"),\n", + " Document(metadata={'producer': 'pdf-lib (https://github.com/Hopding/pdf-lib)', 'creationdate': '2012-06-15T05:44:40+00:00', 'moddate': '2025-11-05T06:16:39+00:00', 'file_path': '/content/medical_diagnosis_manual.pdf', 'subject': '', 'creationDate': 'D:20120615054440Z', 'modDate': 'D:20251105061639Z', 'author': '', 'format': 'PDF 1.7', 'page': 2995, 'total_pages': 4114, 'trapped': '', 'source': '/content/medical_diagnosis_manual.pdf', 'creator': 'Atop CHM to PDF Converter', 'keywords': '', 'title': 'The Merck Manual of Diagnosis & Therapy, 19th Edition'}, page_content='can approximate bone marrow NSP levels. I:T ratios of > 0.80 correlate with NSP depletion and death;\\nsuch a ratio may identify neonates who might benefit from granulocyte transfusion.\\nTreatment\\n• Antibiotic therapy\\n• Supportive therapy\\nBecause sepsis may manifest with non-specific clinical signs and its effects may be devastating, rapid\\nempiric antibiotic therapy is recommended (see p. 1182); drugs are later adjusted according to\\nsensitivities and the site of infection. If bacterial cultures show no growth by 48 h (although some\\npathogens may require 72 h) and the neonate appears well, antibiotics are stopped.\\nGeneral supportive measures, including respiratory and hemodynamic management, are combined with\\nantibiotic treatment.\\nAntimicrobials: In early-onset sepsis, initial therapy should include ampicillin or penicillin G plus an\\naminoglycoside. Cefotaxime may be added to or substituted for the aminoglycoside if meningitis is\\nsuspected. If foul-smelling amniotic fluid is present at birth, therapy for anaerobes (eg, clindamycin,\\nmetronidazole) should be added. Antibiotics may be changed as soon as an organism is identified.\\nPreviously well infants admitted from the community with presumed late-onset sepsis should also receive\\ntherapy with ampicillin plus gentamicin or ampicillin plus cefotaxime. If gram-negative meningitis is\\nsuspected, ampicillin, cefotaxime, and an aminoglycoside may be used. In late-onset hospital-acquired\\nsepsis, initial therapy should include vancomycin (active against methicillin-resistant S. aureus) plus an\\naminoglycoside. If P. aeruginosa is prevalent in the nursery, ceftazidime may be used instead of an\\naminoglycoside. For neonates previously treated with a full 7- to 14-day aminoglycoside course who need\\nretreatment, a different aminoglycoside or a 3rd-generation cephalosporin should be considered.\\nIf coagulase-negative staphylococci are suspected (eg, an indwelling catheter has been in place for > 72')]" + ] + }, + "metadata": {}, + "execution_count": 49 + } + ], + "source": [ + "vectorstore.similarity_search(\"What is the protocol for managing sepsis in a critical care unit?\",k=3) #Complete the code to pass a query and an appropriate k value" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7uo9cym60X-U" + }, + "source": [ + "### Retrieval and Response Generation using Vector Search" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9zscYgoFfgXi" + }, + "source": [ + "#### Convert Vector Database into a Retriever and Retrieve Relevant Documents" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": { + "id": "zO5kmp381VsX", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "e8f34f6a-6a69-4b9f-c5fd-b32ab757b422" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=, search_kwargs={'k': 3})" + ] + }, + "metadata": {}, + "execution_count": 50 + } + ], + "source": [ + "retriever = vectorstore.as_retriever(\n", + " search_type='similarity',\n", + " search_kwargs={'k': 3}\n", + ")\n", + "retriever" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vw8qcwq66B0C", + "nteract": { + "transient": { + "deleting": false + } + } + }, + "source": [ + "### System and User Prompt Template" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3wRkZYtO6B0D" + }, + "source": [ + "Prompts guide the model to generate accurate responses. Here, we define two parts:\n", + "\n", + " 1. The system message describing the assistant's role.\n", + " 2. A user message template including context and the question." + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": { + "gather": { + "logged": 1737358838889 + }, + "id": "Dyl60SEs6B0D", + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [], + "source": [ + "#define the system message\n", + "qna_system_message = \"\"\"\n", + "You are an AI clinical assistant designed to support healthcare professionals in quickly reviewing authoritative medical literature.\n", + "Your task is to provide evidence-based, concise, and context-grounded responses strictly based on the excerpts provided from\n", + "medical manuals, guidelines, and research papers.\n", + "\n", + "User input will include the necessary clinical context for you to answer the question. The context begins with the token: ###Context\n", + "\n", + "### When crafting your response:\n", + "- Use ONLY the information given in the provided context.\n", + "- Provide clear, clinically accurate answers grounded in the supplied medical text.\n", + "- Avoid adding assumptions, interpretations, or general medical knowledge not present in the context.\n", + "- When relevant, include the name of the medical manual or research paper, as well as section or page numbers, if they appear in the context.\n", + "- If the answer cannot be found in the context, respond strictly with:\n", + " \"Sorry, this is out of my knowledge base.\"\n", + "\n", + "### Response Formatting Requirements:\n", + "Answer:\n", + "[A concise answer using only the information in the context]\n", + "\n", + "Source:\n", + "[Cite the specific source(s) mentioned in the context, including page/section if available]\n", + "\n", + "If the context is empty or irrelevant, your full response must be:\n", + "\"Sorry, this is out of my knowledge base.\"\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": { + "id": "XW38rWoNjJkQ" + }, + "outputs": [], + "source": [ + "#define the user message\n", + "qna_user_message_template = \"\"\"\n", + "###Context\n", + "Below are relevant excerpts taken from standard medical manuals, clinical guidelines, and research papers that relate to the healthcare question:\n", + "\n", + "{context}\n", + "\n", + "###Question\n", + "{question}\n", + "\n", + "Please answer using ONLY the context provided above.\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TkIteX4m6mny" + }, + "source": [ + "### Response Function" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": { + "id": "J-SfCZqC6B0E", + "jupyter": { + "outputs_hidden": false, + "source_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [], + "source": [ + "def generate_rag_response(user_input,k=3,max_tokens=500,temperature=0,top_p=0.95):\n", + " global qna_system_message,qna_user_message_template\n", + " # Retrieve relevant document chunks\n", + " relevant_document_chunks = retriever.invoke(input=user_input)\n", + " context_list = [d.page_content for d in relevant_document_chunks]\n", + "\n", + " # Combine document chunks into a single context\n", + " context_for_query = \". \".join(context_list)\n", + "\n", + " user_message = qna_user_message_template.replace('{context}', context_for_query)\n", + " user_message = user_message.replace('{question}', user_input)\n", + "\n", + " # Generate the response\n", + " try:\n", + " response = client.chat.completions.create(\n", + " model=\"gpt-4o-mini\", # specifying the model to be used.\n", + " messages=[\n", + " {\"role\": \"system\", \"content\": qna_system_message},\n", + " {\"role\": \"user\", \"content\": user_message}\n", + " ],\n", + " max_tokens=max_tokens,\n", + " temperature=temperature,\n", + " top_p=top_p\n", + " )\n", + " # Extract and print the generated text from the response\n", + " response = response.choices[0].message.content.strip()\n", + " except Exception as e:\n", + " response = f'Sorry, I encountered the following error: \\n {e}'\n", + "\n", + " return response" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ffP1SRYbPQHN" + }, + "source": [ + "## Question Answering using RAG" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JjajBEj06B0E" + }, + "source": [ + "### Question 1: What is the protocol for managing sepsis in a critical care unit?" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": { + "id": "Gt4TAQNa6B0E", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 105 + }, + "outputId": "9b47fddb-bfab-4bb3-b04b-f04c0a432358" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'Answer:\\nThe protocol for managing sepsis in a critical care unit includes the following steps: \\n1. Obtain specimens of blood, body fluids, and wound sites for Gram stain and culture before starting parenteral antibiotics.\\n2. Initiate very prompt empiric antibiotic therapy immediately after suspecting sepsis, which may include gentamicin or tobramycin plus a 3rd-generation cephalosporin (e.g., cefotaxime or ceftriaxone), or ceftazidime plus a fluoroquinolone if Pseudomonas is suspected. Vancomycin should be added if resistant staphylococci or enterococci are suspected, and if there is an abdominal source, include a drug effective against anaerobes (e.g., metronidazole).\\n3. Change the antibiotic regimen based on culture and sensitivity results when available, continuing antibiotics for at least 5 days after shock resolves and evidence of infection subsides.\\n4. Drain abscesses and surgically excise necrotic tissues as necessary.\\n5. Monitor and manage blood glucose levels with a continuous IV insulin infusion to maintain glucose between 80 to 110 mg/dL.\\n6. Provide supportive care, including adequate nutrition and prevention of infections and complications.\\n\\nSource:\\nCritical Care Medicine, Chapter 222. Approach to the Critically Ill Patient.'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 57 + } + ], + "source": [ + "response_with_rag_1 = generate_rag_response(question_1)\n", + "response_with_rag_1" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QDw8zXuq6B0F" + }, + "source": [ + "### Question 2: What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": { + "id": "i92cv0dQ6B0F", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 70 + }, + "outputId": "aa13726c-d879-4be4-e376-a49f901e9f7b" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "\"Answer:\\nThe common symptoms of appendicitis include epigastric or periumbilical pain followed by nausea, vomiting, and anorexia, with pain shifting to the right lower quadrant. Classic signs include right lower quadrant tenderness at McBurney's point, Rovsing sign, psoas sign, and obturator sign. Appendicitis cannot be cured via medicine; the treatment is surgical removal, specifically an open or laparoscopic appendectomy.\\n\\nSource:\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition, Chapter 11. Acute Abdomen & Surgical Gastroenterology, pages 163.\"" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 58 + } + ], + "source": [ + "response_with_rag_2 = generate_rag_response(question_2)\n", + "response_with_rag_2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TggYyQPL6B0G" + }, + "source": [ + "### Question 3: What are the effective treatments or solutions for addressing sudden patchy hair loss, commonly seen as localized bald spots on the scalp, and what could be the possible causes behind it?" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": { + "id": "Ed6x6LGb6B0G", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 70 + }, + "outputId": "e11a529b-be68-4f8d-e906-0adebf4ac275" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'Answer:\\nThe effective treatment for sudden patchy hair loss, known as alopecia areata, is not specified in the provided context. However, it is noted that alopecia areata is thought to be an autoimmune disorder affecting genetically susceptible individuals. Possible causes include systemic illnesses, particularly those that cause high fever, systemic lupus, endocrine disorders, and nutritional deficiencies. \\n\\nSource:\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition, Chapter 86. Hair Disorders, pages 848-849.'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 59 + } + ], + "source": [ + "response_with_rag_3 = generate_rag_response(question_3)\n", + "response_with_rag_3" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1TgxdI-_6B0G" + }, + "source": [ + "### Question 4: What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": { + "id": "u7ru57_c6B0G", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 87 + }, + "outputId": "d8dd94ba-0e20-41d1-9d25-f0610666089c" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'Answer:\\nInitial treatment for a person who has sustained a physical injury to brain tissue includes ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. Surgery may be needed for severe injuries to monitor and treat intracranial pressure, decompress the brain, or remove hematomas. Subsequently, many patients require rehabilitation, which should be planned early and may involve a team approach including physical, occupational, and speech therapy, as well as cognitive therapy for those with severe cognitive dysfunction.\\n\\nSource:\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition, Chapter 324. Traumatic Brain Injury, and Chapter 350. Rehabilitation.'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 60 + } + ], + "source": [ + "response_with_rag_4 = generate_rag_response(question_4)\n", + "response_with_rag_4" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ukLJz60rI78h" + }, + "source": [ + "### Storing the RAG system outputs\n" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": { + "id": "8TQn5dssxTJV", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 175 + }, + "outputId": "e81b7960-21bd-4119-f2ce-6e27c831be12" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " questions \\\n", + "0 What is the protocol for managing sepsis in a ... \n", + "1 What are the common symptoms for appendicitis,... \n", + "2 What are the effective treatments or solutions... \n", + "3 What treatments are recommended for a person w... \n", + "\n", + " base_prompt_responses \\\n", + "0 Managing sepsis in a critical care unit involv... \n", + "1 Common symptoms of appendicitis include:\\n\\n1.... \n", + "2 Sudden patchy hair loss, often referred to as ... \n", + "3 The treatment for a person who has sustained a... \n", + "\n", + " responses_with_prompt_eng \\\n", + "0 The management of sepsis in a critical care un... \n", + "1 Common symptoms of appendicitis include:\\n\\n1.... \n", + "2 Sudden patchy hair loss, often referred to as ... \n", + "3 Treatment for a person who has sustained a phy... \n", + "\n", + " responses_with_RAG \n", + "0 Answer:\\nThe protocol for managing sepsis in a... \n", + "1 Answer:\\nThe common symptoms of appendicitis i... \n", + "2 Answer:\\nThe effective treatment for sudden pa... \n", + "3 Answer:\\nInitial treatment for a person who ha... " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
questionsbase_prompt_responsesresponses_with_prompt_engresponses_with_RAG
0What is the protocol for managing sepsis in a ...Managing sepsis in a critical care unit involv...The management of sepsis in a critical care un...Answer:\\nThe protocol for managing sepsis in a...
1What are the common symptoms for appendicitis,...Common symptoms of appendicitis include:\\n\\n1....Common symptoms of appendicitis include:\\n\\n1....Answer:\\nThe common symptoms of appendicitis i...
2What are the effective treatments or solutions...Sudden patchy hair loss, often referred to as ...Sudden patchy hair loss, often referred to as ...Answer:\\nThe effective treatment for sudden pa...
3What treatments are recommended for a person w...The treatment for a person who has sustained a...Treatment for a person who has sustained a phy...Answer:\\nInitial treatment for a person who ha...
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "result_df", + "summary": "{\n \"name\": \"result_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"questions\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?\",\n \"What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?\",\n \"What is the protocol for managing sepsis in a critical care unit?\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"base_prompt_responses\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Common symptoms of appendicitis include:\\n\\n1. **Abdominal Pain**: Typically starts around the navel and then moves to the lower right abdomen.\\n2. **Loss of Appetite**: A sudden decrease in appetite is common.\\n3. **Nausea and Vomiting**: Often follows the onset of abdominal pain.\\n4. **Fever**: A low-grade fever may develop.\\n5. **Constipation or Diarrhea**: Changes in bowel habits can occur.\\n6. **Abdominal Swelling**: In some cases, the abdomen may become swollen.\\n\\nAppendicitis cannot be effectively treated with medication alone. The standard treatment is surgical removal of the appendix, known as an **appendectomy**. This can be performed using two main techniques:\\n\\n1. **Open Appendectomy**: A larger incision is made in the lower right abdomen to remove the appendix.\\n2. **Laparoscopic Appendectomy**: This is a minimally invasive procedure where several small incisions are made, and the appendix is removed with the aid of a camera and special instruments.\\n\\nLaparoscopic appendectomy is often preferred due to its benefits, including less postoperative pain, shorter recovery time, and minimal scarring. However, the choice of procedure may depend on the patient's specific situation and the surgeon's expertise.\",\n \"The treatment for a person who has sustained a physical injury to brain tissue, such as a traumatic brain injury (TBI), can vary widely depending on the severity of the injury, the specific areas of the brain affected, and the resulting impairments. Here are some common approaches to treatment:\\n\\n1. **Emergency Care**: \\n - Immediate medical attention is crucial. This may involve stabilizing the patient, monitoring vital signs, and performing imaging studies (like CT or MRI scans) to assess the extent of the injury.\\n\\n2. **Surgical Interventions**: \\n - In some cases, surgery may be necessary to relieve pressure on the brain, remove blood clots (hematomas), or repair skull fractures.\\n\\n3. **Medication**: \\n - Medications may be prescribed to manage symptoms such as pain, seizures, or inflammation. Corticosteroids may be used to reduce swelling in the brain.\\n\\n4. **Rehabilitation**: \\n - **Physical Therapy**: To improve mobility and strength.\\n - **Occupational Therapy**: To help with daily living skills and regain independence.\\n - **Speech and Language Therapy**: To address communication difficulties and swallowing issues.\\n - **Neuropsychological Therapy**: To help with cognitive rehabilitation, including memory, attention, and problem-solving skills.\\n\\n5. **Psychological Support**: \\n - Counseling or therapy may be beneficial for coping with emotional and psychological challenges following a brain injury, such as depression, anxiety, or changes in personality.\\n\\n6. **Lifestyle Modifications**: \\n - Patients may need to make adjustments to their daily routines, including rest, nutrition, and avoiding activities that could lead to further injury.\\n\\n7. **Supportive Care**: \\n - Family support and education about the injury and its effects can be crucial for recovery. Support groups may also be helpful.\\n\\n8. **Long-term Management**: \\n - Ongoing follow-up with healthcare providers to monitor recovery and manage any long-term effects or complications.\\n\\n9. **Assistive Devices**: \\n - Depending on the nature of the impairment, assistive devices or technology may be recommended to aid in communication, mobility, or daily activities.\\n\\n10. **Alternative Therapies**: \\n - Some individuals may explore complementary therapies such as acupuncture, yoga, or meditation, although these should be discussed with a healthcare provider.\\n\\nIt's important for treatment plans to be individualized, taking into account the specific needs and circumstances of the person affected. A multidisciplinary team approach is often the most\",\n \"Managing sepsis in a critical care unit involves a systematic approach that includes early recognition, prompt intervention, and ongoing monitoring. The following is a general protocol based on current guidelines, such as those from the Surviving Sepsis Campaign:\\n\\n### 1. **Early Recognition**\\n - **Identify Symptoms**: Look for signs of infection (fever, chills, tachycardia, tachypnea) and organ dysfunction (altered mental status, hypotension, oliguria).\\n - **Use Screening Tools**: Utilize tools like the qSOFA (quick Sequential Organ Failure Assessment) or SIRS (Systemic Inflammatory Response Syndrome) criteria to identify patients at risk.\\n\\n### 2. **Initial Assessment**\\n - **Obtain Vital Signs**: Monitor blood pressure, heart rate, respiratory rate, and temperature.\\n - **Assess Organ Function**: Evaluate renal function (urine output, creatinine), liver function (bilirubin, liver enzymes), and coagulation status (platelets, INR).\\n\\n### 3. **Immediate Interventions**\\n - **Fluid Resuscitation**: Administer intravenous (IV) fluids (crystalloids) promptly, typically 30 mL/kg within the first 3 hours.\\n - **Antibiotic Therapy**: Start broad-spectrum IV antibiotics within 1 hour of recognition of sepsis. Adjust based on culture results and sensitivity.\\n - **Source Control**: Identify and control the source of infection (e.g., drainage of abscess, removal of infected devices).\\n\\n### 4. **Monitoring and Support**\\n - **Hemodynamic Monitoring**: Use invasive monitoring (e.g., arterial line, central venous pressure) if necessary to guide fluid resuscitation and vasopressor therapy.\\n - **Vasopressors**: If hypotension persists despite adequate fluid resuscitation, initiate vasopressors (e.g., norepinephrine) to maintain mean arterial pressure (MAP) \\u2265 65 mmHg.\\n - **Oxygenation and Ventilation**: Provide supplemental oxygen and consider mechanical ventilation if respiratory failure occurs.\\n\\n### 5. **Ongoing Management**\\n - **Reassess Fluid Status**: Continuously evaluate the patient's response to fluids and adjust as necessary.\\n - **Monitor Laboratory Values**: Regularly check lactate levels, complete blood counts, and organ function tests to assess the patient's status.\\n - **Nutritional Support**: Initiate enteral nutrition as soon as feasible, typically within 24\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"responses_with_prompt_eng\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Common symptoms of appendicitis include:\\n\\n1. Abdominal pain, often starting near the belly button and then moving to the lower right abdomen.\\n2. Loss of appetite.\\n3. Nausea and vomiting.\\n4. Fever.\\n5. Constipation or diarrhea.\\n6. Abdominal swelling.\\n\\nAppendicitis cannot be effectively treated with medication alone; it typically requires surgical intervention. The standard surgical procedure for treating appendicitis is an **appendectomy**, which involves the removal of the inflamed appendix. This can be performed as an open surgery or laparoscopically, depending on the case and the surgeon's preference.\",\n \"Treatment for a person who has sustained a physical injury to brain tissue, such as a traumatic brain injury (TBI), may include the following approaches:\\n\\n1. **Immediate Medical Care**: \\n - Stabilization of the patient, including airway management, breathing support, and circulation.\\n - Imaging studies (CT or MRI) to assess the extent of the injury.\\n\\n2. **Surgical Interventions**: \\n - Surgery may be required to relieve pressure on the brain, remove blood clots (hematomas), or repair skull fractures.\\n\\n3. **Medications**: \\n - Anti-inflammatory drugs to reduce swelling.\\n - Anticonvulsants to prevent seizures.\\n - Pain management medications.\\n\\n4. **Rehabilitation**: \\n - Physical therapy to improve mobility and strength.\\n - Occupational therapy to assist with daily living activities.\\n - Speech therapy for communication and swallowing difficulties.\\n - Neuropsychological therapy to address cognitive and emotional challenges.\\n\\n5. **Supportive Care**: \\n - Psychological support and counseling for emotional and behavioral issues.\\n - Family education and support to help caregivers understand the injury and its effects.\\n\\n6. **Long-term Management**: \\n - Regular follow-up with healthcare providers to monitor recovery and manage any ongoing symptoms or complications.\\n\\nThe specific treatment plan will depend on the severity of the injury, the areas of the brain affected, and the individual patient's needs. Early intervention and a multidisciplinary approach are crucial for optimal recovery.\",\n \"The management of sepsis in a critical care unit typically follows the Surviving Sepsis Campaign guidelines. Here\\u2019s a concise protocol:\\n\\n1. **Early Recognition**: Identify sepsis using clinical criteria (e.g., suspected infection plus organ dysfunction).\\n\\n2. **Immediate Resuscitation**:\\n - **Fluid Resuscitation**: Administer IV fluids (30 mL/kg of crystalloids within the first 3 hours).\\n - **Vasopressors**: If hypotension persists after fluid resuscitation, initiate norepinephrine to maintain mean arterial pressure (MAP) \\u2265 65 mmHg.\\n\\n3. **Antibiotic Therapy**:\\n - Administer broad-spectrum antibiotics within 1 hour of recognition of sepsis.\\n - Adjust based on culture results and clinical response.\\n\\n4. **Source Control**:\\n - Identify and manage the source of infection (e.g., drainage of abscess, removal of infected devices).\\n\\n5. **Monitoring**:\\n - Monitor vital signs, urine output, and laboratory parameters closely.\\n - Use lactate levels to guide resuscitation efforts.\\n\\n6. **Supportive Care**:\\n - Provide organ support as needed (e.g., mechanical ventilation, renal replacement therapy).\\n - Consider corticosteroids in cases of septic shock.\\n\\n7. **Reassessment**:\\n - Reassess hemodynamic status and organ function frequently.\\n - Adjust treatment based on response.\\n\\n8. **Follow-Up**:\\n - Continue monitoring for complications and adjust care plans accordingly.\\n\\nThis protocol emphasizes timely intervention and continuous assessment to improve outcomes in patients with sepsis.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"responses_with_RAG\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Answer:\\nThe common symptoms of appendicitis include epigastric or periumbilical pain followed by nausea, vomiting, and anorexia, with pain shifting to the right lower quadrant. Classic signs include right lower quadrant tenderness at McBurney's point, Rovsing sign, psoas sign, and obturator sign. Appendicitis cannot be cured via medicine; the treatment is surgical removal, specifically an open or laparoscopic appendectomy.\\n\\nSource:\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition, Chapter 11. Acute Abdomen & Surgical Gastroenterology, pages 163.\",\n \"Answer:\\nInitial treatment for a person who has sustained a physical injury to brain tissue includes ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. Surgery may be needed for severe injuries to monitor and treat intracranial pressure, decompress the brain, or remove hematomas. Subsequently, many patients require rehabilitation, which should be planned early and may involve a team approach including physical, occupational, and speech therapy, as well as cognitive therapy for those with severe cognitive dysfunction.\\n\\nSource:\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition, Chapter 324. Traumatic Brain Injury, and Chapter 350. Rehabilitation.\",\n \"Answer:\\nThe protocol for managing sepsis in a critical care unit includes the following steps: \\n1. Obtain specimens of blood, body fluids, and wound sites for Gram stain and culture before starting parenteral antibiotics.\\n2. Initiate very prompt empiric antibiotic therapy immediately after suspecting sepsis, which may include gentamicin or tobramycin plus a 3rd-generation cephalosporin (e.g., cefotaxime or ceftriaxone), or ceftazidime plus a fluoroquinolone if Pseudomonas is suspected. Vancomycin should be added if resistant staphylococci or enterococci are suspected, and if there is an abdominal source, include a drug effective against anaerobes (e.g., metronidazole).\\n3. Change the antibiotic regimen based on culture and sensitivity results when available, continuing antibiotics for at least 5 days after shock resolves and evidence of infection subsides.\\n4. Drain abscesses and surgically excise necrotic tissues as necessary.\\n5. Monitor and manage blood glucose levels with a continuous IV insulin infusion to maintain glucose between 80 to 110 mg/dL.\\n6. Provide supportive care, including adequate nutrition and prevention of infections and complications.\\n\\nSource:\\nCritical Care Medicine, Chapter 222. Approach to the Critically Ill Patient.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 61 + } + ], + "source": [ + "# Add the results to a new column in the DataFrame\n", + "result_df['responses_with_RAG'] = [response_with_rag_1, response_with_rag_2, response_with_rag_3, response_with_rag_4]\n", + "\n", + "# Display the DataFrame\n", + "result_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yyQrTipNfuBN" + }, + "source": [ + "## Output Evaluation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tY9j1MVei5eI" + }, + "source": [ + "#### **Defining required System Prompts**" + ] + }, + { + "cell_type": "code", + "execution_count": 93, + "metadata": { + "id": "2dDxkZdyKSnh", + "tags": [] + }, + "outputs": [], + "source": [ + "groundedness_rater_system_message = \"\"\"\n", + "You are tasked with rating AI generated answers to questions posed by users.\n", + "You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.\n", + "In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.\n", + "\n", + "Evaluation criteria:\n", + "The task is to judge the extent to which the metric is followed by the answer.\n", + "1 - The metric is not followed at all\n", + "2 - The metric is followed only to a limited extent\n", + "3 - The metric is followed to a good extent\n", + "4 - The metric is followed mostly\n", + "5 - The metric is followed completely\n", + "\n", + "Metric:\n", + "The answer should be derived only from the information presented in the context\n", + "\n", + "Instructions:\n", + "1. First write down the steps that are needed to evaluate the answer as per the metric.\n", + "2. Give a step-by-step explanation if the answer adheres to the metric considering the question and context as the input.\n", + "3. Next, evaluate the extent to which the metric is followed.\n", + "4. Use the previous information to rate the answer using the evaluaton criteria and assign a score.\n", + "\n", + "Return only the Score in last in a dictionary format not json and score should be in the range of 1 to 5.\n", + "Example {groundedness_score:4}\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": { + "id": "NIosu2Wk7OVs", + "tags": [] + }, + "outputs": [], + "source": [ + "relevance_rater_system_message = \"\"\"\n", + "You are tasked with rating AI generated answers to questions posed by users.\n", + "You will be presented a question, context used by the AI system to generate the answer and an AI generated answer to the question.\n", + "In the input, the question will begin with ###Question, the context will begin with ###Context while the AI generated answer will begin with ###Answer.\n", + "\n", + "Evaluation criteria:\n", + "The task is to judge the extent to which the metric is followed by the answer.\n", + "1 - The metric is not followed at all\n", + "2 - The metric is followed only to a limited extent\n", + "3 - The metric is followed to a good extent\n", + "4 - The metric is followed mostly\n", + "5 - The metric is followed completely\n", + "\n", + "Metric:\n", + "Relevance measures how well the answer addresses the main aspects of the question, based on the context.\n", + "Consider whether all and only the important aspects are contained in the answer when evaluating relevance.\n", + "\n", + "Instructions:\n", + "1. First write down the steps that are needed to evaluate the context as per the metric.\n", + "2. Give a step-by-step explanation if the context adheres to the metric considering the question as the input.\n", + "3. Next, evaluate the extent to which the metric is followed.\n", + "4. Use the previous information to rate the context using the evaluaton criteria and assign a score.\n", + "Return only the Score in last in a dictionary format not json and score should be in the range of 1 to 5.\n", + "Example {relevance_score:4}\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": { + "id": "7boMupgh_Gux", + "tags": [] + }, + "outputs": [], + "source": [ + "user_message_template = \"\"\"\n", + "###Question\n", + "{question}\n", + "\n", + "###Context\n", + "{context}\n", + "\n", + "###Answer\n", + "{answer}\n", + "\"\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7QPs3ApgzB_D" + }, + "source": [ + "#### **Definig the LLM-as-a-Judge Evaluation function**" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": { + "id": "dXxdhhEDxatr" + }, + "outputs": [], + "source": [ + "def generate_ground_relevance_response(user_input,response, max_tokens=500, temperature=0, top_p=0.95): # Complete the code to set default paramenters\n", + " global qna_user_message_template\n", + "\n", + " context_for_query = [doc.page_content for doc in retriever.invoke(input=user_input)]\n", + "\n", + " # Combine user_prompt and system_message to create the prompt\n", + " groundedness_prompt = f\"\"\"[INST]{groundedness_rater_system_message}\\n\n", + " {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=response)}\n", + " [/INST]\"\"\"\n", + "\n", + " # Combine user_prompt and system_message to create the prompt\n", + " relevance_prompt = f\"\"\"[INST]{relevance_rater_system_message}\\n\n", + " {'user'}: {user_message_template.format(context=context_for_query, question=user_input, answer=response)}\n", + " [/INST]\"\"\"\n", + "\n", + " response_1 = client.chat.completions.create(\n", + " model=\"gpt-3.5-turbo\", # Complete the code by specifying the model to be used.\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": groundedness_prompt}\n", + " ],\n", + " max_tokens=max_tokens,\n", + " temperature=temperature,\n", + " top_p=top_p\n", + " )\n", + "\n", + " response_2 = client.chat.completions.create(\n", + " model=\"gpt-3.5-turbo\", # Complete the code by specifying the model to be used.\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": relevance_prompt}\n", + " ],\n", + " max_tokens=max_tokens,\n", + " temperature=temperature,\n", + " top_p=top_p\n", + " )\n", + "\n", + " return response_1.choices[0].message.content,response_2.choices[0].message.content" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "h_kKHCe2mqpE" + }, + "source": [ + "#### **Evaluation 1: Base Prompt Response Evaluation**" + ] + }, + { + "cell_type": "code", + "execution_count": 135, + "metadata": { + "id": "iJ-lchv-sVJ6", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "6b74cd9a-5843-4df3-f6e5-4248f6823caa" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "To evaluate the answer as per the metric, the following steps need to be taken:\n", + "1. Identify if the information provided in the answer is directly derived from the context given.\n", + "2. Check if there are any additional details or information in the answer that is not present in the context.\n", + "3. Ensure that the answer does not include any information that is not mentioned in the context.\n", + "\n", + "Explanation:\n", + "The AI-generated answer provides a detailed protocol for managing sepsis in a critical care unit. It includes steps for early recognition, initial assessment, immediate interventions, monitoring and support, and ongoing management. The information provided in the answer aligns with the context given, which outlines the protocol for managing sepsis in a critical care unit. The answer does not introduce any new information that is not present in the context.\n", + "\n", + "Therefore, the answer follows the metric of being derived only from the information presented in the context.\n", + "\n", + "Based on the evaluation criteria, the score for groundedness would be 5. \n", + "\n", + "{groundedness_score: 5}\n", + "\n", + "{relevance_score: 3}\n" + ] + } + ], + "source": [ + "# Question 1\n", + "ground_1,rel_1 = generate_ground_relevance_response(user_input=result_df.questions[0], response=result_df.base_prompt_responses[0], max_tokens=516)\n", + "print(ground_1,end=\"\\n\\n\")\n", + "print(rel_1)" + ] + }, + { + "cell_type": "code", + "execution_count": 136, + "metadata": { + "id": "mjQHf0DgnNEQ", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "63e679f4-2f79-43d0-8d8a-1558f6cd3fc4" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "To evaluate the answer based on the metric provided, we need to follow these steps:\n", + "1. Identify if the information provided in the answer is solely derived from the context given.\n", + "2. Check if any additional information not present in the context is included in the answer.\n", + "3. Determine if the answer directly addresses the question asked without introducing new information.\n", + "\n", + "Explanation:\n", + "The AI-generated answer provides a list of common symptoms of appendicitis and details the standard treatment procedures for appendicitis, including open appendectomy and laparoscopic appendectomy. The information presented in the answer aligns with the details provided in the context regarding the symptoms, diagnosis, and treatment of appendicitis. The answer does not introduce any new information that is not present in the context.\n", + "\n", + "Therefore, the answer follows the metric of being derived only from the information presented in the context.\n", + "\n", + "Based on the evaluation criteria:\n", + "The answer demonstrates a complete adherence to the metric by solely utilizing the information provided in the context to address the question about the symptoms and treatment of appendicitis.\n", + "\n", + "Hence, the groundedness score for this answer is 5. \n", + "\n", + "{groundedness_score: 5}\n", + "\n", + "To evaluate the context based on the metric of relevance, the following steps can be followed:\n", + "\n", + "1. Identify the main aspects of the question: Understand the key components of the question, which in this case are the common symptoms of appendicitis and the treatment options available.\n", + "\n", + "2. Analyze the context: Review the provided context to see if it contains information related to the common symptoms of appendicitis and the treatment options, specifically focusing on surgical procedures and medication.\n", + "\n", + "3. Check for completeness: Ensure that all important aspects related to the question are covered in the context, including details about symptoms and treatment options.\n", + "\n", + "4. Verify accuracy: Confirm that the information provided in the context aligns with medical knowledge and guidelines regarding appendicitis symptoms and treatment.\n", + "\n", + "Based on the steps outlined above, the context provided contains detailed information about the symptoms of appendicitis, including abdominal pain, loss of appetite, nausea, vomiting, fever, changes in bowel habits, and abdominal swelling. Additionally, it discusses the standard treatment for appendicitis, which is surgical removal of the appendix through open or laparoscopic appendectomy.\n", + "\n", + "The context also mentions that antibiotics are not curative for appendicitis and that surgical removal is the preferred treatment option. It provides insights into the surgical procedures involved in appendectomy, such as open appendectomy and laparoscopic appendectomy, along with details about antibiotic use before and after surgery.\n", + "\n", + "Overall, the context aligns well with the main aspects of the question by addressing the common symptoms of appendicitis and the surgical procedures used for its treatment. It provides comprehensive information that is relevant to the question posed.\n", + "\n", + "Therefore, based on the evaluation, the context adheres to the metric of relevance to a good extent.\n", + "\n", + "{relevance_score: 3}\n" + ] + } + ], + "source": [ + "# Question 2\n", + "ground_2,rel_2 = generate_ground_relevance_response(user_input=result_df.questions[1], response=result_df.base_prompt_responses[1], max_tokens=516) #Complete the code to calculate the groundedness and relevance score\n", + "print(ground_2,end=\"\\n\\n\")\n", + "print(rel_2)" + ] + }, + { + "cell_type": "code", + "execution_count": 137, + "metadata": { + "id": "3VmHQIa-nX9W", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d468d5d3-7436-42ab-bdb9-184922c0f9db" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "To evaluate the answer as per the metric, the following steps need to be followed:\n", + "1. Identify if the information provided in the answer is solely derived from the context given.\n", + "2. Check if the answer includes any additional information not present in the context.\n", + "3. Verify if the treatments and causes mentioned in the answer align with the information provided in the context regarding sudden patchy hair loss and alopecia areata.\n", + "\n", + "Explanation:\n", + "The AI-generated answer provides information on the possible causes and effective treatments for sudden patchy hair loss, specifically focusing on alopecia areata. It mentions autoimmune disorders, genetics, stress, hormonal changes, nutritional deficiencies, infections, and other medical conditions as possible causes. For effective treatments, it lists topical corticosteroids, minoxidil, intralesional corticosteroid injections, immunotherapy, oral medications, light therapy, nutritional support, stress management, and cosmetic solutions like wigs.\n", + "\n", + "The context also discusses alopecia areata as sudden patchy hair loss, autoimmune disorders affecting hair follicles, and the treatment options including topical corticosteroids, minoxidil, and other therapies. It also mentions possible causes such as autoimmune disorders, genetics, stress, hormonal changes, nutritional deficiencies, infections, and other medical conditions.\n", + "\n", + "The AI-generated answer aligns well with the information provided in the context regarding the causes and treatments for sudden patchy hair loss, staying grounded in the context.\n", + "\n", + "Therefore, the answer follows the metric of being derived only from the information presented in the context.\n", + "\n", + "Based on the evaluation criteria, the score for groundedness would be 5. \n", + "\n", + "{groundedness_score: 5}\n", + "\n", + "To evaluate the context based on the metric of relevance, the following steps need to be followed:\n", + "\n", + "1. Identify the main aspects of the question: The question asks about effective treatments or solutions for sudden patchy hair loss and the possible causes behind it.\n", + "2. Determine if the context provides information on the effective treatments or solutions for sudden patchy hair loss, including localized bald spots on the scalp.\n", + "3. Check if the context mentions the possible causes behind sudden patchy hair loss, such as autoimmune disorders, genetics, stress, hormonal changes, nutritional deficiencies, infections, and other medical conditions.\n", + "4. Evaluate whether all and only the important aspects related to treatments and causes of sudden patchy hair loss are addressed in the context.\n", + "\n", + "Explanation:\n", + "The context provided includes detailed information on the causes and treatments for sudden patchy hair loss, which aligns well with the main aspects of the question. It covers autoimmune disorders, genetics, stress, hormonal changes, nutritional deficiencies, infections, and other medical conditions as possible causes. Additionally, it lists effective treatments such as topical corticosteroids, minoxidil, corticosteroid injections, immunotherapy, oral medications, light therapy, nutritional support, stress management, and cosmetic solutions like wigs.\n", + "\n", + "Based on the evaluation, the context follows the metric of relevance by addressing all the important aspects related to the question about sudden patchy hair loss, its causes, and effective treatments.\n", + "\n", + "Therefore, the relevance score for this context is 5. \n", + "\n", + "{relevance_score: 5}\n" + ] + } + ], + "source": [ + "# Question 3\n", + "ground_3,rel_3 = generate_ground_relevance_response(user_input=result_df.questions[2], response=result_df.base_prompt_responses[2], max_tokens=516) #Complete the code to calculate the groundedness and relevance score\n", + "print(ground_3,end=\"\\n\\n\")\n", + "print(rel_3)" + ] + }, + { + "cell_type": "code", + "execution_count": 138, + "metadata": { + "id": "ebZtHibwnbe2", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "f672d1ff-f5b2-43d8-e173-ae9b40fef49e" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "To evaluate the answer based on the metric provided, we need to ensure that the information presented in the context is the sole source for generating the answer. Here are the steps to evaluate the answer:\n", + "\n", + "1. Identify if the information provided in the answer is directly sourced from the context given.\n", + "2. Check if the details mentioned in the answer align with the information presented in the context.\n", + "3. Verify if there are no additional external sources or information used to formulate the answer.\n", + "\n", + "Explanation:\n", + "The AI-generated answer provides a comprehensive list of treatments recommended for a person who has sustained a physical injury to brain tissue, specifically a traumatic brain injury (TBI). The answer includes emergency care, surgical interventions, medication, rehabilitation, psychological support, lifestyle modifications, supportive care, long-term management, assistive devices, and alternative therapies. Each treatment option is explained in detail, covering various aspects of care and support.\n", + "\n", + "The answer directly reflects the information provided in the context, mentioning the need for immediate medical attention, surgical interventions to relieve pressure on the brain, rehabilitation through physical, occupational, speech, and neuropsychological therapy, psychological support, lifestyle modifications, family education, long-term management, assistive devices, and alternative therapies. The details align closely with the content presented in the context regarding the recommended treatments for brain injuries.\n", + "\n", + "Therefore, the answer follows the metric of being derived solely from the information presented in the context.\n", + "\n", + "Based on the evaluation, the score for groundedness is 5.\n", + "\n", + "To evaluate the context based on the metric of relevance, the following steps can be followed:\n", + "\n", + "1. Identify the main aspects of the question:\n", + " - The question asks about the recommended treatments for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function.\n", + "\n", + "2. Analyze the context provided:\n", + " - The context includes information about traumatic brain injury (TBI), rehabilitation, head injury, spinal cord injury, and the pathology of brain injuries.\n", + " - It discusses the importance of rehabilitation, early intervention, cognitive therapy, and family education in the treatment of brain injuries.\n", + " - Surgical interventions, medication, rehabilitation therapies, psychological support, lifestyle modifications, supportive care, long-term management, assistive devices, and alternative therapies are also mentioned.\n", + "\n", + "3. Compare the context with the main aspects of the question:\n", + " - The context covers a wide range of treatments and interventions for brain injuries, including emergency care, surgical interventions, medication, rehabilitation, psychological support, lifestyle modifications, supportive care, long-term management, assistive devices, and alternative therapies.\n", + " - It specifically addresses the need for rehabilitation, cognitive therapy, and family education, which are important aspects of treating brain injuries.\n", + "\n", + "Based on the evaluation of the context, it can be seen that the context aligns well with the main aspects of the question by providing detailed information about the recommended treatments for a person with a brain injury. Therefore, the relevance score for the context is 5.\n", + "\n", + "{\n", + "\"relevance_score\": 5\n", + "}\n" + ] + } + ], + "source": [ + "# Question 4\n", + "ground_4,rel_4 = generate_ground_relevance_response(user_input=result_df.questions[3], response=result_df.base_prompt_responses[3], max_tokens=516) #Complete the code to calculate the groundedness and relevance score\n", + "print(ground_4,end=\"\\n\\n\")\n", + "print(rel_4)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Evaluation 1: Base Prompt Response Evaluation**" + ], + "metadata": { + "id": "grgj2iGSRT8a" + } + }, + { + "cell_type": "code", + "source": [ + "# Create a DataFrame to store the base prompt evaluation results\n", + "base_prompt_evaluation_df = pd.DataFrame({\n", + " \"question\": [question_1, question_2, question_3,question_4],\n", + " \"base_prompt_response\": [result_df.base_prompt_responses[0], result_df.base_prompt_responses[1], result_df.base_prompt_responses[2],result_df.base_prompt_responses[3]],\n", + " \"groundedness_score\": [ground_1[-2], ground_2[-2], ground_3[-2],ground_4[-2]],\n", + " \"relevance_score\": [rel_1[-2], rel_2[-2], rel_3[-2],rel_4[-2]]\n", + "})\n", + "\n", + "base_prompt_evaluation_df['groundedness_score'] = pd.to_numeric(base_prompt_evaluation_df['groundedness_score'], errors='coerce')\n", + "base_prompt_evaluation_df['relevance_score'] = pd.to_numeric(base_prompt_evaluation_df['relevance_score'], errors='coerce')\n", + "\n", + "# Display the DataFrame\n", + "display(base_prompt_evaluation_df)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 836 + }, + "id": "XHg3NGwrOluD", + "outputId": "cec27acc-dde5-49cb-8dd0-5cd325c5bc7f" + }, + "execution_count": 139, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + " question \\\n", + "0 What is the protocol for managing sepsis in a ... \n", + "1 What are the common symptoms for appendicitis,... \n", + "2 What are the effective treatments or solutions... \n", + "3 What treatments are recommended for a person w... \n", + "\n", + " base_prompt_response groundedness_score \\\n", + "0 Managing sepsis in a critical care unit involv... 5 \n", + "1 Common symptoms of appendicitis include:\\n\\n1.... 5 \n", + "2 Sudden patchy hair loss, often referred to as ... 5 \n", + "3 The treatment for a person who has sustained a... 5 \n", + "\n", + " relevance_score \n", + "0 3.0 \n", + "1 3.0 \n", + "2 5.0 \n", + "3 NaN " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
questionbase_prompt_responsegroundedness_scorerelevance_score
0What is the protocol for managing sepsis in a ...Managing sepsis in a critical care unit involv...53.0
1What are the common symptoms for appendicitis,...Common symptoms of appendicitis include:\\n\\n1....53.0
2What are the effective treatments or solutions...Sudden patchy hair loss, often referred to as ...55.0
3What treatments are recommended for a person w...The treatment for a person who has sustained a...5NaN
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "base_prompt_evaluation_df", + "summary": "{\n \"name\": \"base_prompt_evaluation_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"question\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?\",\n \"What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?\",\n \"What is the protocol for managing sepsis in a critical care unit?\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"base_prompt_response\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Common symptoms of appendicitis include:\\n\\n1. **Abdominal Pain**: Typically starts around the navel and then moves to the lower right abdomen.\\n2. **Loss of Appetite**: A sudden decrease in appetite is common.\\n3. **Nausea and Vomiting**: Often follows the onset of abdominal pain.\\n4. **Fever**: A low-grade fever may develop.\\n5. **Constipation or Diarrhea**: Changes in bowel habits can occur.\\n6. **Abdominal Swelling**: In some cases, the abdomen may become swollen.\\n\\nAppendicitis cannot be effectively treated with medication alone. The standard treatment is surgical removal of the appendix, known as an **appendectomy**. This can be performed using two main techniques:\\n\\n1. **Open Appendectomy**: A larger incision is made in the lower right abdomen to remove the appendix.\\n2. **Laparoscopic Appendectomy**: This is a minimally invasive procedure where several small incisions are made, and the appendix is removed with the aid of a camera and special instruments.\\n\\nLaparoscopic appendectomy is often preferred due to its benefits, including less postoperative pain, shorter recovery time, and minimal scarring. However, the choice of procedure may depend on the patient's specific situation and the surgeon's expertise.\",\n \"The treatment for a person who has sustained a physical injury to brain tissue, such as a traumatic brain injury (TBI), can vary widely depending on the severity of the injury, the specific areas of the brain affected, and the resulting impairments. Here are some common approaches to treatment:\\n\\n1. **Emergency Care**: \\n - Immediate medical attention is crucial. This may involve stabilizing the patient, monitoring vital signs, and performing imaging studies (like CT or MRI scans) to assess the extent of the injury.\\n\\n2. **Surgical Interventions**: \\n - In some cases, surgery may be necessary to relieve pressure on the brain, remove blood clots (hematomas), or repair skull fractures.\\n\\n3. **Medication**: \\n - Medications may be prescribed to manage symptoms such as pain, seizures, or inflammation. Corticosteroids may be used to reduce swelling in the brain.\\n\\n4. **Rehabilitation**: \\n - **Physical Therapy**: To improve mobility and strength.\\n - **Occupational Therapy**: To help with daily living skills and regain independence.\\n - **Speech and Language Therapy**: To address communication difficulties and swallowing issues.\\n - **Neuropsychological Therapy**: To help with cognitive rehabilitation, including memory, attention, and problem-solving skills.\\n\\n5. **Psychological Support**: \\n - Counseling or therapy may be beneficial for coping with emotional and psychological challenges following a brain injury, such as depression, anxiety, or changes in personality.\\n\\n6. **Lifestyle Modifications**: \\n - Patients may need to make adjustments to their daily routines, including rest, nutrition, and avoiding activities that could lead to further injury.\\n\\n7. **Supportive Care**: \\n - Family support and education about the injury and its effects can be crucial for recovery. Support groups may also be helpful.\\n\\n8. **Long-term Management**: \\n - Ongoing follow-up with healthcare providers to monitor recovery and manage any long-term effects or complications.\\n\\n9. **Assistive Devices**: \\n - Depending on the nature of the impairment, assistive devices or technology may be recommended to aid in communication, mobility, or daily activities.\\n\\n10. **Alternative Therapies**: \\n - Some individuals may explore complementary therapies such as acupuncture, yoga, or meditation, although these should be discussed with a healthcare provider.\\n\\nIt's important for treatment plans to be individualized, taking into account the specific needs and circumstances of the person affected. A multidisciplinary team approach is often the most\",\n \"Managing sepsis in a critical care unit involves a systematic approach that includes early recognition, prompt intervention, and ongoing monitoring. The following is a general protocol based on current guidelines, such as those from the Surviving Sepsis Campaign:\\n\\n### 1. **Early Recognition**\\n - **Identify Symptoms**: Look for signs of infection (fever, chills, tachycardia, tachypnea) and organ dysfunction (altered mental status, hypotension, oliguria).\\n - **Use Screening Tools**: Utilize tools like the qSOFA (quick Sequential Organ Failure Assessment) or SIRS (Systemic Inflammatory Response Syndrome) criteria to identify patients at risk.\\n\\n### 2. **Initial Assessment**\\n - **Obtain Vital Signs**: Monitor blood pressure, heart rate, respiratory rate, and temperature.\\n - **Assess Organ Function**: Evaluate renal function (urine output, creatinine), liver function (bilirubin, liver enzymes), and coagulation status (platelets, INR).\\n\\n### 3. **Immediate Interventions**\\n - **Fluid Resuscitation**: Administer intravenous (IV) fluids (crystalloids) promptly, typically 30 mL/kg within the first 3 hours.\\n - **Antibiotic Therapy**: Start broad-spectrum IV antibiotics within 1 hour of recognition of sepsis. Adjust based on culture results and sensitivity.\\n - **Source Control**: Identify and control the source of infection (e.g., drainage of abscess, removal of infected devices).\\n\\n### 4. **Monitoring and Support**\\n - **Hemodynamic Monitoring**: Use invasive monitoring (e.g., arterial line, central venous pressure) if necessary to guide fluid resuscitation and vasopressor therapy.\\n - **Vasopressors**: If hypotension persists despite adequate fluid resuscitation, initiate vasopressors (e.g., norepinephrine) to maintain mean arterial pressure (MAP) \\u2265 65 mmHg.\\n - **Oxygenation and Ventilation**: Provide supplemental oxygen and consider mechanical ventilation if respiratory failure occurs.\\n\\n### 5. **Ongoing Management**\\n - **Reassess Fluid Status**: Continuously evaluate the patient's response to fluids and adjust as necessary.\\n - **Monitor Laboratory Values**: Regularly check lactate levels, complete blood counts, and organ function tests to assess the patient's status.\\n - **Nutritional Support**: Initiate enteral nutrition as soon as feasible, typically within 24\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"groundedness_score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 5,\n \"max\": 5,\n \"num_unique_values\": 1,\n \"samples\": [\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"relevance_score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.1547005383792515,\n \"min\": 3.0,\n \"max\": 5.0,\n \"num_unique_values\": 2,\n \"samples\": [\n 5.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2J55TQZTnlYE" + }, + "source": [ + "#### **Evaluation 2: Prompt Engineering Response Evaluation**" + ] + }, + { + "cell_type": "code", + "execution_count": 130, + "metadata": { + "id": "6yEkbFmHnxpe", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "3cb0547a-3814-4936-9d42-8326d44f356c" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "To evaluate the answer based on the metric provided, which states that the answer should be derived only from the information presented in the context, the following steps need to be followed:\n", + "\n", + "1. Identify the key points and information provided in the context regarding the protocol for managing sepsis in a critical care unit.\n", + "2. Compare the details mentioned in the AI-generated answer with the information presented in the context.\n", + "3. Determine if the answer provided by the AI system aligns with the specific details and guidelines outlined in the context.\n", + "\n", + "Explanation:\n", + "The AI-generated answer provides a detailed protocol for managing sepsis in a critical care unit, including steps for early recognition, resuscitation, antibiotic therapy, source control, monitoring, supportive care, reassessment, corticosteroid consideration, glucose control, and communication. These steps are not directly mentioned in the context provided.\n", + "\n", + "Therefore, the AI-generated answer does not strictly adhere to the metric of deriving the answer solely from the information presented in the context. The answer goes beyond the specific details provided in the context and introduces additional guidelines and recommendations that are not explicitly stated in the context.\n", + "\n", + "Based on the evaluation criteria:\n", + "The answer partially follows the metric, as it includes some relevant information but also introduces additional details not present in the context.\n", + "\n", + "Hence, the score for groundedness is 2.\n", + "\n", + "{relevance_score:3}\n" + ] + } + ], + "source": [ + "# Question 1\n", + "ground_1,rel_1 = generate_ground_relevance_response(user_input=result_df.questions[0], response=result_df.responses_with_prompt_eng[0], max_tokens=516)\n", + "print(ground_1,end=\"\\n\\n\")\n", + "print(rel_1)" + ] + }, + { + "cell_type": "code", + "execution_count": 131, + "metadata": { + "id": "BHZzDnLsnxpf", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "2125425c-d1bf-4f80-99f2-821310bda46c" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Steps to evaluate the answer:\n", + "1. Identify if the information provided in the answer is solely based on the context.\n", + "2. Check if the symptoms of appendicitis, treatment options, and surgical procedures mentioned in the answer are supported by the context.\n", + "3. Evaluate if any additional information not present in the context is included in the answer.\n", + "\n", + "Explanation:\n", + "The answer provided lists common symptoms of appendicitis and explains that appendicitis cannot be cured with medicine alone, requiring surgical intervention in the form of an appendectomy. It also mentions the two types of surgical procedures for appendicitis treatment - open surgery and laparoscopic surgery. All the information provided in the answer is directly derived from the context given.\n", + "\n", + "The symptoms mentioned in the answer align with the symptoms described in the context, such as abdominal pain, loss of appetite, nausea, vomiting, fever, constipation or diarrhea, and abdominal swelling. The answer also correctly states that appendicitis cannot be cured with medication alone and requires surgical removal of the inflamed appendix, which is supported by the context mentioning appendectomy as the treatment for acute appendicitis.\n", + "\n", + "Therefore, the answer follows the metric of being derived only from the information presented in the context.\n", + "\n", + "Score: {groundedness_score: 5}\n", + "\n", + "Steps to evaluate the context for relevance:\n", + "1. Identify the main aspects of the question, which include common symptoms of appendicitis and the treatment options (medicine or surgical procedure).\n", + "2. Check if the context provides information on the common symptoms of appendicitis and the treatment options mentioned in the question.\n", + "3. Evaluate whether the context covers all the important aspects related to appendicitis symptoms and treatment options.\n", + "4. Consider if the context aligns with the question and provides relevant information that directly addresses the query.\n", + "\n", + "Explanation:\n", + "The context provided includes detailed information about the etiology, symptoms, diagnosis, and treatment of appendicitis. It covers the common symptoms of appendicitis, such as abdominal pain, loss of appetite, nausea, vomiting, fever, constipation or diarrhea, and abdominal swelling. Additionally, it explains that appendicitis cannot be effectively treated with medication alone and typically requires surgical intervention in the form of an appendectomy. The context also describes the surgical procedure for treating appendicitis, including open or laparoscopic appendectomy, IV fluids, and antibiotics.\n", + "\n", + "Based on the evaluation criteria, the context aligns well with the question by providing comprehensive information on the common symptoms of appendicitis and the surgical procedure required for its treatment. It covers all the important aspects related to the question and directly addresses the query about symptoms and treatment options for appendicitis.\n", + "\n", + "Therefore, the relevance score for the context is 5. \n", + "\n", + "{relevance_score: 5}\n" + ] + } + ], + "source": [ + "# Question 2\n", + "ground_2,rel_2 = generate_ground_relevance_response(user_input=result_df.questions[1], response=result_df.responses_with_prompt_eng[1], max_tokens=516) #Complete the code to calculate the groundedness and relevance score\n", + "print(ground_2,end=\"\\n\\n\")\n", + "print(rel_2)" + ] + }, + { + "cell_type": "code", + "execution_count": 132, + "metadata": { + "id": "v4dCtQMCnxpg", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0e699b91-e40b-4d8e-e01a-31a2feb93c61" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "To evaluate the answer as per the metric, the following steps need to be followed:\n", + "1. Identify if the information provided in the answer is directly derived from the context.\n", + "2. Check if any additional information not present in the context is included in the answer.\n", + "3. Ensure that the treatments and possible causes mentioned in the answer are supported by the context provided.\n", + "\n", + "Explanation:\n", + "The AI-generated answer provides information on effective treatments and possible causes for sudden patchy hair loss, specifically focusing on alopecia areata. The context mentions various treatments and causes related to hair loss, including corticosteroids, minoxidil, immunotherapy, genetics, stress, hormonal changes, and nutritional deficiencies. The answer aligns with the context by including treatments like corticosteroids, minoxidil, immunotherapy, and JAK inhibitors, which are supported by the context. The possible causes mentioned in the answer, such as autoimmune response, genetics, stress, hormonal changes, nutritional deficiencies, and infections, are also supported by the context.\n", + "\n", + "Therefore, the answer follows the metric by deriving information solely from the context provided.\n", + "\n", + "Based on the evaluation, the score for groundedness is 5.\n", + "\n", + "To evaluate the context based on the metric of relevance, the following steps need to be followed:\n", + "\n", + "1. Identify the main aspects of the question: The main aspects of the question include effective treatments or solutions for sudden patchy hair loss (alopecia areata) and the possible causes behind it.\n", + "\n", + "2. Check if the context provides information on effective treatments: Look for details on treatments such as corticosteroids, minoxidil, immunotherapy, anthralin, JAK inhibitors, light therapy, and supportive care.\n", + "\n", + "3. Check if the context mentions possible causes: Look for information on autoimmune response, genetics, stress, hormonal changes, nutritional deficiencies, and infections as potential causes of sudden patchy hair loss.\n", + "\n", + "4. Evaluate if all and only the important aspects are contained in the context: Ensure that the context addresses both the effective treatments and possible causes related to sudden patchy hair loss.\n", + "\n", + "Based on the evaluation steps, the context provided aligns well with the metric of relevance. It covers the effective treatments and possible causes of sudden patchy hair loss as requested in the question. The context includes detailed information on various treatment options and potential causes, providing a comprehensive overview of the topic.\n", + "\n", + "Therefore, the relevance score for the context is 5. \n", + "\n", + "{relevance_score: 5}\n" + ] + } + ], + "source": [ + "# Question 3\n", + "ground_3,rel_3 = generate_ground_relevance_response(user_input=result_df.questions[2], response=result_df.responses_with_prompt_eng[2], max_tokens=516) #Complete the code to calculate the groundedness and relevance score\n", + "print(ground_3,end=\"\\n\\n\")\n", + "print(rel_3)" + ] + }, + { + "cell_type": "code", + "execution_count": 133, + "metadata": { + "id": "jWBc97uynxpg", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "ff7fd2ce-2ede-4136-b8fb-53aa9ad6a711" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "To evaluate the answer as per the metric, the following steps need to be followed:\n", + "1. Identify if the information provided in the answer is directly derived from the context.\n", + "2. Check if the treatments and possible causes mentioned in the answer are supported by the information given in the context.\n", + "3. Ensure that no additional information beyond what is provided in the context is included in the answer.\n", + "\n", + "Explanation:\n", + "The AI-generated answer provides information on effective treatments and possible causes for sudden patchy hair loss, specifically focusing on alopecia areata. The context mentions various treatments for different types of hair loss, including corticosteroids, minoxidil, immunotherapy, anthralin, JAK inhibitors, and light therapy. It also discusses possible causes such as autoimmune response, genetics, stress, hormonal changes, nutritional deficiencies, and infections. The answer aligns with the information presented in the context and does not introduce any new information.\n", + "\n", + "Therefore, the answer follows the metric by deriving all the information solely from the context.\n", + "\n", + "Based on the evaluation, the score for groundedness is 5.\n", + "\n", + "To evaluate the context based on the metric of relevance, the following steps can be followed:\n", + "\n", + "1. Identify the main aspects of the question:\n", + " - Effective treatments for sudden patchy hair loss\n", + " - Possible causes behind sudden patchy hair loss\n", + "\n", + "2. Check if the context provides information related to the effective treatments and possible causes of sudden patchy hair loss.\n", + "\n", + "3. Ensure that the treatments and causes mentioned in the context align with the information required to address the question.\n", + "\n", + "Explanation:\n", + "The context provided includes detailed information about the different types of hair loss, including sudden patchy hair loss (alopecia areata). It covers various treatments and possible causes related to sudden patchy hair loss, such as corticosteroids, minoxidil, immunotherapy, genetics, stress, hormonal changes, and more. The context also emphasizes the importance of consulting healthcare professionals for accurate diagnosis and treatment plans.\n", + "\n", + "Based on the evaluation, the context aligns well with the main aspects of the question regarding effective treatments and possible causes of sudden patchy hair loss. It provides comprehensive information that directly addresses the user's query.\n", + "\n", + "Therefore, the relevance score for the context is 5. \n", + "\n", + "{relevance_score: 5}\n" + ] + } + ], + "source": [ + "# Question 4\n", + "ground_4,rel_4 = generate_ground_relevance_response(user_input=result_df.questions[2], response=result_df.responses_with_prompt_eng[2], max_tokens=516) #Complete the code to calculate the groundedness and relevance score\n", + "print(ground_4,end=\"\\n\\n\")\n", + "print(rel_4)" + ] + }, + { + "cell_type": "code", + "source": [ + "# Create a DataFrame to store the prompt engineering evaluation results\n", + "prompt_engineering_evaluation_df = pd.DataFrame({\n", + " \"question\": [question_1, question_2, question_3,question_4],\n", + " \"prompt_engg_response\": [result_df.responses_with_prompt_eng[0], result_df.responses_with_prompt_eng[1], result_df.responses_with_prompt_eng[2],result_df.responses_with_prompt_eng[3]],\n", + " \"groundedness_score\": [ground_1[-2], ground_2[-2], ground_3[-2],ground_4[-2]],\n", + " \"relevance_score\": [rel_1[-2], rel_2[-2], rel_3[-2],rel_4[-2]]\n", + "})\n", + "\n", + "prompt_engineering_evaluation_df['groundedness_score'] = pd.to_numeric(prompt_engineering_evaluation_df['groundedness_score'], errors='coerce')\n", + "prompt_engineering_evaluation_df['relevance_score'] = pd.to_numeric(prompt_engineering_evaluation_df['relevance_score'], errors='coerce')\n", + "\n", + "# Display the DataFrame\n", + "display(prompt_engineering_evaluation_df)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 637 + }, + "id": "YIF50xSFTDD5", + "outputId": "4e3c0984-df4f-44e4-c653-e60a1c086d70" + }, + "execution_count": 134, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + " question \\\n", + "0 What is the protocol for managing sepsis in a ... \n", + "1 What are the common symptoms for appendicitis,... \n", + "2 What are the effective treatments or solutions... \n", + "3 What treatments are recommended for a person w... \n", + "\n", + " prompt_engg_response groundedness_score \\\n", + "0 The management of sepsis in a critical care un... 2 \n", + "1 Common symptoms of appendicitis include:\\n\\n1.... 5 \n", + "2 Sudden patchy hair loss, often referred to as ... 5 \n", + "3 Treatment for a person who has sustained a phy... 5 \n", + "\n", + " relevance_score \n", + "0 3 \n", + "1 5 \n", + "2 5 \n", + "3 5 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
questionprompt_engg_responsegroundedness_scorerelevance_score
0What is the protocol for managing sepsis in a ...The management of sepsis in a critical care un...23
1What are the common symptoms for appendicitis,...Common symptoms of appendicitis include:\\n\\n1....55
2What are the effective treatments or solutions...Sudden patchy hair loss, often referred to as ...55
3What treatments are recommended for a person w...Treatment for a person who has sustained a phy...55
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "prompt_engineering_evaluation_df", + "summary": "{\n \"name\": \"prompt_engineering_evaluation_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"question\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?\",\n \"What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?\",\n \"What is the protocol for managing sepsis in a critical care unit?\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"prompt_engg_response\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Common symptoms of appendicitis include:\\n\\n1. Abdominal pain, often starting near the belly button and then moving to the lower right abdomen.\\n2. Loss of appetite.\\n3. Nausea and vomiting.\\n4. Fever.\\n5. Constipation or diarrhea.\\n6. Abdominal swelling.\\n\\nAppendicitis cannot be effectively treated with medication alone; it typically requires surgical intervention. The standard surgical procedure for treating appendicitis is an appendectomy, which involves the removal of the inflamed appendix. This can be performed as an open surgery or laparoscopically, depending on the case and the surgeon's preference.\",\n \"Treatment for a person who has sustained a physical injury to brain tissue, such as a traumatic brain injury (TBI), typically involves a multidisciplinary approach and may include the following:\\n\\n1. **Emergency Care**: Immediate medical attention may involve stabilizing the patient, ensuring adequate oxygenation, and managing intracranial pressure.\\n\\n2. **Surgery**: In some cases, surgical intervention may be necessary to remove hematomas, repair skull fractures, or relieve pressure on the brain.\\n\\n3. **Medications**: \\n - **Analgesics** for pain management.\\n - **Anticonvulsants** to prevent seizures.\\n - **Diuretics** to reduce swelling.\\n - **Corticosteroids** may be used to decrease inflammation.\\n\\n4. **Rehabilitation**: \\n - **Physical therapy** to improve mobility and strength.\\n - **Occupational therapy** to assist with daily living activities.\\n - **Speech therapy** for communication and swallowing difficulties.\\n - **Neuropsychological therapy** to address cognitive and emotional challenges.\\n\\n5. **Supportive Care**: This may include counseling, support groups, and education for patients and families about the injury and recovery process.\\n\\n6. **Long-term Management**: Ongoing assessment and management of cognitive, emotional, and physical impairments may be necessary, including regular follow-ups with healthcare providers.\\n\\nThe specific treatment plan will depend on the severity of the injury, the areas of the brain affected, and the individual needs of the patient.\",\n \"The management of sepsis in a critical care unit typically follows the Surviving Sepsis Campaign guidelines. Here\\u2019s a concise protocol:\\n\\n1. **Early Recognition**: Identify sepsis using clinical criteria (e.g., suspected infection plus organ dysfunction).\\n\\n2. **Immediate Resuscitation**:\\n - **Fluid Resuscitation**: Administer intravenous fluids (30 mL/kg of crystalloids within the first 3 hours).\\n - **Vasopressors**: If hypotension persists after fluid resuscitation, initiate norepinephrine to maintain mean arterial pressure (MAP) \\u2265 65 mmHg.\\n\\n3. **Antibiotic Therapy**:\\n - Administer broad-spectrum antibiotics within 1 hour of sepsis recognition. Adjust based on culture results and local antibiograms.\\n\\n4. **Source Control**:\\n - Identify and manage the source of infection (e.g., drainage of abscess, removal of infected devices).\\n\\n5. **Monitoring**:\\n - Continuously monitor vital signs, urine output, and laboratory parameters (e.g., lactate levels, complete blood count, renal function).\\n\\n6. **Supportive Care**:\\n - Provide supportive care, including oxygen therapy, mechanical ventilation if needed, and renal replacement therapy for acute kidney injury.\\n\\n7. **Reassessment**:\\n - Reassess hemodynamic status and organ function frequently, adjusting treatment as necessary.\\n\\n8. **Consideration of Corticosteroids**:\\n - In cases of septic shock, consider low-dose corticosteroids (e.g., hydrocortisone) if there is no response to fluid resuscitation and vasopressors.\\n\\n9. **Glucose Control**:\\n - Maintain blood glucose levels between 140-180 mg/dL.\\n\\n10. **Communication and Team Approach**:\\n - Ensure effective communication among the healthcare team and involve specialists as needed.\\n\\nThis protocol should be tailored to individual patient needs and institutional protocols. Regular training and updates on sepsis management are essential for critical care staff.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"groundedness_score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 2,\n \"max\": 5,\n \"num_unique_values\": 2,\n \"samples\": [\n 5,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"relevance_score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 3,\n \"max\": 5,\n \"num_unique_values\": 2,\n \"samples\": [\n 5,\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y7wXIFmfn4tX" + }, + "source": [ + "#### **Evaluation 3: RAG Response Evaluation**" + ] + }, + { + "cell_type": "code", + "execution_count": 140, + "metadata": { + "id": "lPGdtRCcoBIi", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "6ffcdced-bc86-49d4-cf13-74bf236ed95a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "To evaluate the answer as per the metric, the following steps need to be followed:\n", + "1. Identify if the information provided in the answer is directly derived from the context.\n", + "2. Check if any additional information not present in the context is included in the answer.\n", + "3. Verify if the steps mentioned in the answer align with the protocol for managing sepsis in a critical care unit as described in the context.\n", + "\n", + "Explanation:\n", + "The AI-generated answer provides a detailed protocol for managing sepsis in a critical care unit based on the information presented in the context. It includes steps such as obtaining specimens for culture, initiating prompt empiric antibiotic therapy, changing antibiotic regimen based on culture and sensitivity results, draining abscesses, monitoring blood glucose levels, and providing supportive care. All the steps mentioned in the answer are directly derived from the context provided.\n", + "\n", + "Therefore, the answer follows the metric completely as it only uses the information presented in the context to formulate the protocol for managing sepsis in a critical care unit.\n", + "\n", + "Score: {groundedness_score: 5}\n", + "\n", + "{\n", + "relevance_score: 5\n", + "}\n" + ] + } + ], + "source": [ + "# Question 1\n", + "ground_1,rel_1 = generate_ground_relevance_response(user_input=result_df.questions[0], response=result_df.responses_with_RAG[0], max_tokens=516) #Complete the code to calculate the groundedness and relevance\n", + "print(ground_1,end=\"\\n\\n\")\n", + "print(rel_1)" + ] + }, + { + "cell_type": "code", + "execution_count": 141, + "metadata": { + "id": "ZrB06dkBoBIj", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "bf5873bb-b903-49fa-80c7-58dc07623b87" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Steps to evaluate the answer:\n", + "1. Identify if the information provided in the answer is directly from the context.\n", + "2. Check if the symptoms of appendicitis and the treatment mentioned in the answer are supported by the context.\n", + "3. Ensure that no additional information beyond what is provided in the context is included in the answer.\n", + "\n", + "Explanation:\n", + "The answer directly mentions the common symptoms of appendicitis as described in the context, such as epigastric or periumbilical pain, nausea, vomiting, and anorexia, with pain shifting to the right lower quadrant. It also includes classic signs like right lower quadrant tenderness at McBurney's point, Rovsing sign, psoas sign, and obturator sign. The answer correctly states that appendicitis cannot be cured via medicine and requires surgical removal, specifically an open or laparoscopic appendectomy. The source of the information is also mentioned.\n", + "\n", + "Therefore, the answer strictly adheres to the metric by deriving all the information solely from the context provided.\n", + "\n", + "Evaluation: \n", + "The answer follows the metric completely by providing information solely from the context without introducing any external information. Hence, the answer deserves a score of 5.\n", + "\n", + "{groundedness_score: 5}\n", + "\n", + "To evaluate the context based on the metric of relevance, the following steps can be followed:\n", + "\n", + "1. Identify the main aspects of the question: The question asks about the common symptoms of appendicitis and inquires about the possibility of curing it with medicine or the surgical procedure required for treatment.\n", + "\n", + "2. Check if the context provides information relevant to the question: Look for details in the context that discuss the symptoms of appendicitis, whether it can be cured with medicine, and the surgical procedure for treatment.\n", + "\n", + "3. Evaluate if all and only the important aspects are contained in the answer: Ensure that the answer addresses all the key points mentioned in the question and context, without including irrelevant information.\n", + "\n", + "Explanation:\n", + "The context provides detailed information about the etiology, symptoms, signs, diagnosis, and treatment of appendicitis. It mentions that appendicitis is acute inflammation of the vermiform appendix, resulting in abdominal pain, anorexia, and abdominal tenderness. It also states that the treatment for appendicitis is surgical removal through open or laparoscopic appendectomy. Additionally, it specifies the use of IV antibiotics before appendectomy and the continuation of antibiotics in case of perforation.\n", + "\n", + "The AI-generated answer accurately addresses the common symptoms of appendicitis, the inability to cure it with medicine, and the recommended surgical procedure for treatment, which aligns well with the main aspects of the question and context.\n", + "\n", + "Therefore, the relevance of the context to the question is high, as it covers all the important aspects related to appendicitis symptoms, treatment, and surgical procedures.\n", + "\n", + "Based on the evaluation, the relevance score for the context is 5. \n", + "\n", + "{relevance_score: 5}\n" + ] + } + ], + "source": [ + "# Question 2\n", + "ground_2,rel_2 = generate_ground_relevance_response(user_input=result_df.questions[1], response=result_df.responses_with_RAG[1], max_tokens=516) #Complete the code to calculate the groundedness and relevance\n", + "print(ground_2,end=\"\\n\\n\")\n", + "print(rel_2)" + ] + }, + { + "cell_type": "code", + "execution_count": 142, + "metadata": { + "id": "pCFMhRGUoBIj", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "e0ea5905-b980-4767-e512-66e23776f685" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "To evaluate the answer based on the metric provided, we need to follow these steps:\n", + "1. Identify if the information provided in the answer is solely derived from the context given.\n", + "2. Check if the answer directly addresses the question asked.\n", + "3. Determine if the answer includes any additional information not present in the context.\n", + "\n", + "Explanation:\n", + "The AI-generated answer correctly identifies that the effective treatment for sudden patchy hair loss, specifically alopecia areata, is not specified in the provided context. It mentions that alopecia areata is thought to be an autoimmune disorder affecting genetically susceptible individuals and lists possible causes such as systemic illnesses, high fever, systemic lupus, endocrine disorders, and nutritional deficiencies. This information is directly derived from the context provided, specifically from the section on alopecia areata and its possible causes.\n", + "\n", + "Therefore, the answer adheres to the metric by providing information solely based on the context and directly addressing the question asked.\n", + "\n", + "Based on the evaluation criteria:\n", + "The answer demonstrates a good extent of adherence to the metric, as it is derived entirely from the context provided and directly addresses the question asked.\n", + "\n", + "Hence, the score for groundedness is 3.\n", + "\n", + "To evaluate the context based on the metric of relevance, the following steps can be followed:\n", + "\n", + "1. Identify the main aspects of the question:\n", + "- Effective treatments or solutions for sudden patchy hair loss (alopecia areata)\n", + "- Possible causes behind sudden patchy hair loss\n", + "\n", + "2. Check if the context addresses these main aspects:\n", + "- Look for information on effective treatments for sudden patchy hair loss, such as alopecia areata.\n", + "- Check if the context mentions possible causes behind sudden patchy hair loss, including systemic illnesses, autoimmune diseases, and other factors.\n", + "\n", + "3. Evaluate if all and only the important aspects are contained in the answer:\n", + "- Ensure that the context provides relevant information on treatments and causes related to sudden patchy hair loss, specifically alopecia areata.\n", + "\n", + "Now, considering the question and the context provided, the AI-generated answer does address some important aspects related to the question. It mentions that alopecia areata is thought to be an autoimmune disorder affecting genetically susceptible individuals and lists possible causes such as systemic illnesses, lupus, endocrine disorders, and nutritional deficiencies. However, it does not specify the effective treatments for sudden patchy hair loss, which was a key aspect of the question.\n", + "\n", + "Based on the evaluation, the relevance of the context can be rated as follows:\n", + "- The context is followed only to a limited extent.\n", + "\n", + "Therefore, the relevance score for the context is 2. \n", + "\n", + "{\n", + "relevance_score: 2\n", + "}\n" + ] + } + ], + "source": [ + "# Question 3\n", + "ground_3,rel_3 = generate_ground_relevance_response(user_input=result_df.questions[2], response=result_df.responses_with_RAG[2], max_tokens=516) #Complete the code to calculate the groundedness and relevance\n", + "print(ground_3,end=\"\\n\\n\")\n", + "print(rel_3)" + ] + }, + { + "cell_type": "code", + "execution_count": 143, + "metadata": { + "id": "Tf-SrQOgoBIj", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "a504b8a5-4d11-4046-b313-239c040dc746" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "To evaluate the answer based on the metric provided, we need to follow these steps:\n", + "1. Identify if the information provided in the answer is solely derived from the context given.\n", + "2. Check if the answer includes any additional information not present in the context.\n", + "3. Ensure that the answer does not introduce any new concepts or details that are not mentioned in the context.\n", + "4. Verify that the answer directly relates to the question asked about treatments recommended for a person with a physical injury to brain tissue.\n", + "\n", + "Explanation:\n", + "The AI-generated answer provides information on the initial treatment for a person with a physical injury to brain tissue, which includes ensuring a reliable airway, maintaining ventilation, oxygenation, and blood pressure. It also mentions the possibility of surgery for severe injuries and the need for subsequent rehabilitation involving a team approach. The answer directly relates to the context provided, which discusses rehabilitation and treatment for traumatic brain injury.\n", + "\n", + "Based on the evaluation criteria, the answer follows the metric of being derived only from the information presented in the context. It does not introduce any new concepts or details that are not mentioned in the context. Therefore, the answer adheres to the metric.\n", + "\n", + "Therefore, the groundedness score for this answer is 5.\n", + "\n", + "To evaluate the context based on the metric of relevance, the following steps can be followed:\n", + "\n", + "1. Identify the main aspects of the question:\n", + "- The question asks about the recommended treatments for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function.\n", + "\n", + "2. Analyze the context provided:\n", + "- The context includes information about traumatic brain injury (TBI), rehabilitation, and the importance of early intervention and rehabilitation therapy for patients with cognitive dysfunction.\n", + "- It also mentions the need for a team approach in rehabilitation, including physical, occupational, and speech therapy, as well as cognitive therapy for severe cognitive dysfunction.\n", + "- The context discusses the initial treatment for brain injuries, which involves ensuring a reliable airway, maintaining adequate ventilation, oxygenation, and blood pressure, and the possibility of surgery for severe cases.\n", + "\n", + "3. Evaluate the AI-generated answer:\n", + "- The answer provided includes information about the initial treatment for brain injuries, the possibility of surgery for severe cases, and the importance of rehabilitation for patients with cognitive dysfunction.\n", + "- It mentions the team approach in rehabilitation and the different types of therapy involved.\n", + "\n", + "Based on the evaluation, the AI-generated answer addresses the main aspects of the question by providing information about initial treatment, surgery, and rehabilitation for brain injuries. It includes relevant details from the context that are essential for understanding the recommended treatments for individuals with brain injuries.\n", + "\n", + "Therefore, the relevance score for this context is 5. \n", + "\n", + "{relevance_score: 5}\n" + ] + } + ], + "source": [ + "# Question 4\n", + "ground_4,rel_4 = generate_ground_relevance_response(user_input=result_df.questions[3], response=result_df.responses_with_RAG[3], max_tokens= 516) #Complete the code to calculate the groundedness and relevance\n", + "print(ground_4,end=\"\\n\\n\")\n", + "print(rel_4)" + ] + }, + { + "cell_type": "code", + "source": [ + "# Create a DataFrame to store the Rag evaluation results\n", + "Rag_evaluation_df = pd.DataFrame({\n", + " \"question\": [question_1, question_2, question_3,question_4],\n", + " \"rag_response\": [result_df.responses_with_RAG[0], result_df.responses_with_RAG[1], result_df.responses_with_RAG[2],result_df.responses_with_RAG[3]],\n", + " \"groundedness_score\": [ground_1[-2], ground_2[-2], ground_3[-2],ground_4[-2]],\n", + " \"relevance_score\": [rel_1[-2], rel_2[-2], rel_3[-2],rel_4[-2]]\n", + "})\n", + "\n", + "Rag_evaluation_df['groundedness_score'] = pd.to_numeric(Rag_evaluation_df['groundedness_score'], errors='coerce')\n", + "Rag_evaluation_df['relevance_score'] = pd.to_numeric(Rag_evaluation_df['relevance_score'], errors='coerce')\n", + "\n", + "# Display the DataFrame\n", + "display(Rag_evaluation_df)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 505 + }, + "id": "SQnXolywWm--", + "outputId": "520fd586-2969-4969-db78-9d229f7a9e56" + }, + "execution_count": 144, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + " question \\\n", + "0 What is the protocol for managing sepsis in a ... \n", + "1 What are the common symptoms for appendicitis,... \n", + "2 What are the effective treatments or solutions... \n", + "3 What treatments are recommended for a person w... \n", + "\n", + " rag_response groundedness_score \\\n", + "0 Answer:\\nThe protocol for managing sepsis in a... 5 \n", + "1 Answer:\\nThe common symptoms of appendicitis i... 5 \n", + "2 Answer:\\nThe effective treatment for sudden pa... 3 \n", + "3 Answer:\\nInitial treatment for a person who ha... 5 \n", + "\n", + " relevance_score \n", + "0 NaN \n", + "1 5.0 \n", + "2 NaN \n", + "3 5.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
questionrag_responsegroundedness_scorerelevance_score
0What is the protocol for managing sepsis in a ...Answer:\\nThe protocol for managing sepsis in a...5NaN
1What are the common symptoms for appendicitis,...Answer:\\nThe common symptoms of appendicitis i...55.0
2What are the effective treatments or solutions...Answer:\\nThe effective treatment for sudden pa...3NaN
3What treatments are recommended for a person w...Answer:\\nInitial treatment for a person who ha...55.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "Rag_evaluation_df", + "summary": "{\n \"name\": \"Rag_evaluation_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"question\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"What are the common symptoms for appendicitis, and can it be cured via medicine? If not, what surgical procedure should be followed to treat it?\",\n \"What treatments are recommended for a person who has sustained a physical injury to brain tissue, resulting in temporary or permanent impairment of brain function?\",\n \"What is the protocol for managing sepsis in a critical care unit?\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"rag_response\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Answer:\\nThe common symptoms of appendicitis include epigastric or periumbilical pain followed by nausea, vomiting, and anorexia, with pain shifting to the right lower quadrant. Classic signs include right lower quadrant tenderness at McBurney's point, Rovsing sign, psoas sign, and obturator sign. Appendicitis cannot be cured via medicine; the treatment is surgical removal, specifically an open or laparoscopic appendectomy.\\n\\nSource:\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition, Chapter 11. Acute Abdomen & Surgical Gastroenterology, pages 163.\",\n \"Answer:\\nInitial treatment for a person who has sustained a physical injury to brain tissue includes ensuring a reliable airway and maintaining adequate ventilation, oxygenation, and blood pressure. Surgery may be needed for severe injuries to monitor and treat intracranial pressure, decompress the brain, or remove hematomas. Subsequently, many patients require rehabilitation, which should be planned early and may involve a team approach including physical, occupational, and speech therapy, as well as cognitive therapy for those with severe cognitive dysfunction.\\n\\nSource:\\nThe Merck Manual of Diagnosis & Therapy, 19th Edition, Chapter 324. Traumatic Brain Injury, and Chapter 350. Rehabilitation.\",\n \"Answer:\\nThe protocol for managing sepsis in a critical care unit includes the following steps: \\n1. Obtain specimens of blood, body fluids, and wound sites for Gram stain and culture before starting parenteral antibiotics.\\n2. Initiate very prompt empiric antibiotic therapy immediately after suspecting sepsis, which may include gentamicin or tobramycin plus a 3rd-generation cephalosporin (e.g., cefotaxime or ceftriaxone), or ceftazidime plus a fluoroquinolone if Pseudomonas is suspected. Vancomycin should be added if resistant staphylococci or enterococci are suspected, and if there is an abdominal source, include a drug effective against anaerobes (e.g., metronidazole).\\n3. Change the antibiotic regimen based on culture and sensitivity results when available, continuing antibiotics for at least 5 days after shock resolves and evidence of infection subsides.\\n4. Drain abscesses and surgically excise necrotic tissues as necessary.\\n5. Monitor and manage blood glucose levels with a continuous IV insulin infusion to maintain glucose between 80 to 110 mg/dL.\\n6. Provide supportive care, including adequate nutrition and prevention of infections and complications.\\n\\nSource:\\nCritical Care Medicine, Chapter 222. Approach to the Critically Ill Patient.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"groundedness_score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 3,\n \"max\": 5,\n \"num_unique_values\": 2,\n \"samples\": [\n 3,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"relevance_score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.0,\n \"min\": 5.0,\n \"max\": 5.0,\n \"num_unique_values\": 1,\n \"samples\": [\n 5.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "print(\"Average scores for Base Prompt Evaluation:\")\n", + "print(base_prompt_evaluation_df[['groundedness_score', 'relevance_score']].mean(numeric_only=True))\n", + "print(\"Average scores for Prompt engg Evaluation:\")\n", + "print(prompt_engineering_evaluation_df[['groundedness_score', 'relevance_score']].mean(numeric_only=True))\n", + "print(\"\\nAverage scores for RAG Response Evaluation:\")\n", + "print(Rag_evaluation_df[['groundedness_score', 'relevance_score']].mean(numeric_only=True))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ud7OiP5gaZhj", + "outputId": "a4ee9adb-d831-4a45-d2b2-b84395e2c681" + }, + "execution_count": 146, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Average scores for Base Prompt Evaluation:\n", + "groundedness_score 5.000000\n", + "relevance_score 3.666667\n", + "dtype: float64\n", + "Average scores for Prompt engg Evaluation:\n", + "groundedness_score 4.25\n", + "relevance_score 4.50\n", + "dtype: float64\n", + "\n", + "Average scores for RAG Response Evaluation:\n", + "groundedness_score 4.5\n", + "relevance_score 5.0\n", + "dtype: float64\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y7QICRU-njdj" + }, + "source": [ + "## Actionable Insights and Business Recommendations" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ObyXYhOojIaY" + }, + "source": [ + "1. Implement Retrieval‑Augmented Generation using authoritative medical manuals (e.g., Merck Manual, Harrison’s) to ensure evidence‑based answers.\n", + "2. RAG is your best-performing method — with a perfect relevance score of 5.0, confirming that your retrieval pipeline is correctly matching medical queries to the right chunks.\n", + "3. Groundedness still needs improvement (4.5 vs 5.0) — meaning the answers are clinically correct but not always fully supported by retrieved evidence. This is now the main improvement area.\n", + "4. Base Prompt’s perfect groundedness is misleading — it’s “correct-looking” medical reasoning, not actually evidence-based. we should not rely on this for clinical use.\n", + "5. Chunking issues are visible because RAG groundedness is lower than Base Prompt.If RAG was fully optimized:\n", + "Groundedness should be higher than Base Prompt.\n", + "Instead:\n", + "✅ RAG groundedness = 4.5\n", + "✅ Base Prompt = 5.0\n", + "This confirms our chunk overlap or chunk size is suboptimal.\n", + "\n", + "5. Chunking strategy needs refinement — increase chunk size to 800–1000 tokens with 150–200 token overlap, however due to limitation of tokens couldn't carried out the higher chunking. But to boost evidence density, we need to use higher chunking strategy with proper overlapping.\n", + "\n", + "6. we need to add a cross-encoder reranker — to guarantee the most evidence-rich chunk is ranked #1, directly improving groundedness from 4.5 → 5.0.\n", + "7. Need to standardize a clinical answer template —\n", + "Assessment → Management → Red Flags → Evidence\n", + "This ensures the model maps retrieved facts into structured, grounded output.\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "###Here are the business recommendations:\n", + "\n", + "1. Strengthen the RAG pipeline with hybrid retrieval, re-ranking, and strict evidence-only answer generation to ensure fully grounded medical outputs.\n", + "\n", + "2. Expand the evaluation dataset to a comprehensive 30–50 question medical benchmark covering multiple specialties for reliable performance validation.\n", + "\n", + "3. Introduce mandatory citation formatting and full traceability (chunk IDs, page numbers, document metadata) to increase clinical trust and regulatory readiness.\n", + "\n", + "4. Implement safety guardrails, fallback responses, and PHI-safe logging to meet compliance standards and reduce legal or clinical risks.\n", + "\n", + "5. Standardize structured outputs (Assessment → Management → Red Flags → Evidence) to align with clinician workflows and enhance usability.\n", + "\n", + "6. Build monitoring dashboards for groundedness, relevance, retrieval quality, latency, and token cost to support continuous improvement and enterprise scalability." + ], + "metadata": { + "id": "wwwmRGs0icgj" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ybRlzaIhWaM9" + }, + "source": [ + "Power Ahead\n", + "___" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [], + "include_colab_link": true + }, + "kernel_info": { + "name": "python310-sdkv2" + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + }, + "microsoft": { + "host": { + "AzureML": { + "notebookHasBeenCompleted": true + } + }, + "ms_spell_check": { + "ms_spell_check_language": "en" + } + }, + "nteract": { + "version": "nteract-front-end@1.0.0" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file From 60769d83a51c4e99848421081d4efa797661923c Mon Sep 17 00:00:00 2001 From: biplob <110578485+bks1984@users.noreply.github.com> Date: Mon, 10 Nov 2025 08:50:28 +0530 Subject: [PATCH 2/4] Created using Colab --- stock_market__news_sentiment_analysis.ipynb | 6747 +++++++++++++++++++ 1 file changed, 6747 insertions(+) create mode 100644 stock_market__news_sentiment_analysis.ipynb diff --git a/stock_market__news_sentiment_analysis.ipynb b/stock_market__news_sentiment_analysis.ipynb new file mode 100644 index 0000000..d9566e5 --- /dev/null +++ b/stock_market__news_sentiment_analysis.ipynb @@ -0,0 +1,6747 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "id": "inNE1fy-ISPj", + "metadata": { + "id": "inNE1fy-ISPj" + }, + "source": [ + "

\n", + " \n", + "

\n", + "\n", + "
Stock Market News Sentiment Analysis
" + ] + }, + { + "cell_type": "markdown", + "id": "EvCcfwuSU-fz", + "metadata": { + "id": "EvCcfwuSU-fz" + }, + "source": [ + "## **Problem Statement**" + ] + }, + { + "cell_type": "markdown", + "id": "6QR_RHvIVHT2", + "metadata": { + "id": "6QR_RHvIVHT2" + }, + "source": [ + "### Business Context" + ] + }, + { + "cell_type": "markdown", + "id": "pl3dmH-EnJGl", + "metadata": { + "id": "pl3dmH-EnJGl" + }, + "source": [ + "The prices of the stocks of companies listed under a global exchange are influenced by a variety of factors, with the company's financial performance, innovations and collaborations, and market sentiment being factors that play a significant role. News and media reports can rapidly affect investor perceptions and, consequently, stock prices in the highly competitive financial industry. With the sheer volume of news and opinions from a wide variety of sources, investors and financial analysts often struggle to stay updated and accurately interpret its impact on the market. As a result, investment firms need sophisticated tools to analyze market sentiment and integrate this information into their investment strategies." + ] + }, + { + "cell_type": "markdown", + "id": "Vn6bbxSwVKl3", + "metadata": { + "id": "Vn6bbxSwVKl3" + }, + "source": [ + "### Problem Definition" + ] + }, + { + "cell_type": "markdown", + "id": "jCIswL3zobj6", + "metadata": { + "id": "jCIswL3zobj6" + }, + "source": [ + "With an ever-rising number of news articles and opinions, an investment startup aims to leverage artificial intelligence to address the challenge of interpreting stock-related news and its impact on stock prices. They have collected historical daily news for a specific company listed under NASDAQ, along with data on its daily stock price and trade volumes.\n", + "\n", + "As a member of the Data Science and AI team in the startup, you have been tasked with developing an AI-driven sentiment analysis system that will automatically process and analyze news articles to gauge market sentiment, and summarizing the news at a weekly level to enhance the accuracy of their stock price predictions and optimize investment strategies. This will empower their financial analysts with actionable insights, leading to more informed investment decisions and improved client outcomes." + ] + }, + { + "cell_type": "markdown", + "id": "ZJOtDHVSF5hu", + "metadata": { + "id": "ZJOtDHVSF5hu" + }, + "source": [ + "### Data Dictionary" + ] + }, + { + "cell_type": "markdown", + "id": "ZlkjI8V5F9RK", + "metadata": { + "id": "ZlkjI8V5F9RK" + }, + "source": [ + "* `Date` : The date the news was released\n", + "* `News` : The content of news articles that could potentially affect the company's stock price\n", + "* `Open` : The stock price (in \\$) at the beginning of the day\n", + "* `High` : The highest stock price (in \\$) reached during the day\n", + "* `Low` : The lowest stock price (in \\$) reached during the day\n", + "* `Close` : The adjusted stock price (in \\$) at the end of the day\n", + "* `Volume` : The number of shares traded during the day\n", + "* `Label` : The sentiment polarity of the news content\n", + " * 1: positive\n", + " * 0: neutral\n", + " * -1: negative" + ] + }, + { + "cell_type": "markdown", + "id": "VrFQHcW5mYgv", + "metadata": { + "id": "VrFQHcW5mYgv" + }, + "source": [ + "## **Installing and Importing the necessary libraries**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "A-E2-iaumpo8", + "metadata": { + "id": "A-E2-iaumpo8", + "collapsed": true, + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "5f12e599-de14-4e2b-adb4-de180cdd4fba" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Requirement already satisfied: numpy==1.26.4 in /usr/local/lib/python3.12/dist-packages (1.26.4)\n", + "Requirement already satisfied: scikit-learn==1.6.1 in /usr/local/lib/python3.12/dist-packages (1.6.1)\n", + "Requirement already satisfied: scipy==1.13.1 in /usr/local/lib/python3.12/dist-packages (1.13.1)\n", + "Requirement already satisfied: gensim==4.3.3 in /usr/local/lib/python3.12/dist-packages (4.3.3)\n", + "Requirement already satisfied: sentence-transformers==3.4.1 in /usr/local/lib/python3.12/dist-packages (3.4.1)\n", + "Requirement already satisfied: pandas==2.2.2 in /usr/local/lib/python3.12/dist-packages (2.2.2)\n", + "Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn==1.6.1) (1.5.2)\n", + "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn==1.6.1) (3.6.0)\n", + "Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.12/dist-packages (from gensim==4.3.3) (7.3.1)\n", + "Requirement already satisfied: transformers<5.0.0,>=4.41.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers==3.4.1) (4.56.1)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from sentence-transformers==3.4.1) (4.67.1)\n", + "Requirement already satisfied: torch>=1.11.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers==3.4.1) (2.8.0+cu126)\n", + "Requirement already satisfied: huggingface-hub>=0.20.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers==3.4.1) (0.35.0)\n", + "Requirement already satisfied: Pillow in /usr/local/lib/python3.12/dist-packages (from sentence-transformers==3.4.1) (11.3.0)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas==2.2.2) (2.9.0.post0)\n", + "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas==2.2.2) (2025.2)\n", + "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas==2.2.2) (2025.2)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (3.19.1)\n", + "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (2025.3.0)\n", + "Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (25.0)\n", + "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (6.0.2)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (2.32.4)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (4.15.0)\n", + "Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (1.1.10)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas==2.2.2) (1.17.0)\n", + "Requirement already satisfied: wrapt in /usr/local/lib/python3.12/dist-packages (from smart-open>=1.8.1->gensim==4.3.3) (1.17.3)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (75.2.0)\n", + "Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (1.13.3)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (3.5)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (3.1.6)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.77)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.77)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.80)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (9.10.2.21)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.4.1)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (11.3.0.4)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (10.3.7.77)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (11.7.1.2)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.5.4.2)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (0.7.1)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (2.27.3)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.77)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.85)\n", + "Requirement already satisfied: nvidia-cufile-cu12==1.11.1.6 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (1.11.1.6)\n", + "Requirement already satisfied: triton==3.4.0 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (3.4.0)\n", + "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers==3.4.1) (2024.11.6)\n", + "Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers==3.4.1) (0.22.0)\n", + "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers==3.4.1) (0.6.2)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch>=1.11.0->sentence-transformers==3.4.1) (1.3.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch>=1.11.0->sentence-transformers==3.4.1) (3.0.2)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (3.4.3)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (2.5.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (2025.8.3)\n" + ] + } + ], + "source": [ + "# installing the sentence-transformers and gensim libraries for word embeddings\n", + "!pip install numpy==1.26.4 \\\n", + " scikit-learn==1.6.1 \\\n", + " scipy==1.13.1 \\\n", + " gensim==4.3.3 \\\n", + " sentence-transformers==3.4.1 \\\n", + " pandas==2.2.2" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Note:\n", + "- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.\n", + "- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in this notebook." + ], + "metadata": { + "id": "Su4_EiqL5aIZ" + }, + "id": "Su4_EiqL5aIZ" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "179a2a45", + "metadata": { + "id": "179a2a45" + }, + "outputs": [], + "source": [ + "# To manipulate and analyze data\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "# To visualize data\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "\n", + "# To used time-related functions\n", + "import time\n", + "\n", + "# To build, tune, and evaluate ML models\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.metrics import confusion_matrix, accuracy_score, f1_score, precision_score, recall_score\n", + "\n", + "# To load/create word embeddings\n", + "from gensim.models import Word2Vec\n", + "\n", + "# To work with transformer models\n", + "import torch\n", + "from sentence_transformers import SentenceTransformer\n", + "\n", + "# Import TensorFlow and Keras for deep learning model building.\n", + "import tensorflow as tf\n", + "from tensorflow import keras\n", + "from tensorflow.keras.models import Sequential\n", + "from tensorflow.keras.layers import Dense, Dropout\n", + "\n", + "# To implement progress bar related functionalities\n", + "from tqdm import tqdm\n", + "tqdm.pandas()\n", + "\n", + "# To ignore unnecessary warnings\n", + "import warnings\n", + "warnings.filterwarnings('ignore')" + ] + }, + { + "cell_type": "markdown", + "id": "wQ46zPgumfjF", + "metadata": { + "id": "wQ46zPgumfjF" + }, + "source": [ + "## **Loading the Dataset**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "yu_7XbWQWma8", + "metadata": { + "id": "yu_7XbWQWma8" + }, + "outputs": [], + "source": [ + "# # uncomment and run the following code if Google Colab is being used and the dataset is in Google Drive\n", + "# from google.colab import drive\n", + "# drive.mount('/content/drive')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62a33eef", + "metadata": { + "id": "62a33eef" + }, + "outputs": [], + "source": [ + "# Read the CSV file named 'stock_news' into a pandas DataFrame named 'stock'\n", + "stock_news = pd.read_csv(\"/content/02. Dataset - stock_news.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1xFSwCCer1uA", + "metadata": { + "id": "1xFSwCCer1uA" + }, + "outputs": [], + "source": [ + "#Creating a copy of the dataset\n", + "stock = stock_news.copy()" + ] + }, + { + "cell_type": "markdown", + "id": "EvFNfrvGWthn", + "metadata": { + "id": "EvFNfrvGWthn" + }, + "source": [ + "## **Data Overview**" + ] + }, + { + "cell_type": "markdown", + "id": "GW4rkWI1WzBb", + "metadata": { + "id": "GW4rkWI1WzBb" + }, + "source": [ + "#### **Displaying the first few rows of the dataset**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd2f105b", + "metadata": { + "id": "dd2f105b", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "4fce6c84-28cc-4aee-e18c-154c97c9e849" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Date News Open \\\n", + "0 01-02-2019 The dollar minutes ago tumbled to 106 67 from... 38.72 \n", + "1 01-02-2019 By Wayne Cole and Swati Pandey SYDNEY Reuters... 38.72 \n", + "2 01-02-2019 By Stephen Culp NEW YORK Reuters Wall Stre... 38.72 \n", + "3 01-02-2019 By Wayne Cole SYDNEY Reuters The Australia... 38.72 \n", + "4 01-02-2019 Investing com Asian equities fell in morning... 38.72 \n", + "\n", + " High Low Close Volume Label \n", + "0 39.71 38.56 39.48 130672400 1 \n", + "1 39.71 38.56 39.48 130672400 -1 \n", + "2 39.71 38.56 39.48 130672400 0 \n", + "3 39.71 38.56 39.48 130672400 -1 \n", + "4 39.71 38.56 39.48 130672400 1 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DateNewsOpenHighLowCloseVolumeLabel
001-02-2019The dollar minutes ago tumbled to 106 67 from...38.7239.7138.5639.481306724001
101-02-2019By Wayne Cole and Swati Pandey SYDNEY Reuters...38.7239.7138.5639.48130672400-1
201-02-2019By Stephen Culp NEW YORK Reuters Wall Stre...38.7239.7138.5639.481306724000
301-02-2019By Wayne Cole SYDNEY Reuters The Australia...38.7239.7138.5639.48130672400-1
401-02-2019Investing com Asian equities fell in morning...38.7239.7138.5639.481306724001
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "stock", + "summary": "{\n \"name\": \"stock\",\n \"rows\": 418,\n \"fields\": [\n {\n \"column\": \"Date\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 73,\n \"samples\": [\n \"01-08-2019\",\n \"04-15-2019\",\n \"01-30-2019\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"News\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 418,\n \"samples\": [\n \" Reuters Apple Inc NASDAQ AAPL is expected to unveil a new video streaming service and a news subscription platform at an event on Monday at its California headquarters The iPhone maker is banking on growing its services business to offset a dip in smartphone sales While the Wall Street Journal plans to join Apple s new subscription news service other major publishers including the New York Times and the Washington Post have declined according to a New York Times report Apple has also partnered with Hollywood celebrities to make a streaming debut with a slate of original content taking a page out of Netflix NASDAQ NFLX Inc s playbook Below are some of the shows curated from media reports and Apple s own announcements which are part of the iPhone maker s content library SHOWS CONFIRMED BY APPLE UNTITLED DRAMA SERIES WITH REESE WITHERSPOON AND JENNIFER ANISTON Two seasons of a drama series starring Reese Witherspoon and Jennifer Aniston that looks at the lives of people working on a morning television show REVIVAL OF STEVEN SPIELBERG S 1985 AMAZING STORIES The tech giant has also struck a deal with director Steven Spielberg to make new episodes of Amazing Stories a science fiction and horror anthology series that ran on NBC in the 1980s A NEW THRILLER BY M NIGHT SHYAMALAN Plot of the story has not been disclosed ARE YOU SLEEPING A MYSTERY SERIES A drama featuring Octavia Spencer based on a crime novel by Kathleen Barber AN ANTHOLOGY SERIES CALLED LITTLE AMERICA Focuses on stories of immigrants coming to the United States AN ANIMATED CARTOON MUSICAL CALLED CENTRAL PARK The animated musical comedy is about a family of caretakers who end up saving the park and the world DICKINSON AN EMILY DICKINSON COMEDY A half hour comedy series that is set during American poet Emily Dickinson s era with a modern sensibility and tone OPRAH WINFREY PARTNERSHIP Apple in June last year announced a multi year deal with Oprah Winfrey to create original programming SHOWS REPORTED BY MEDIA TIME BANDITS A FANTASY SERIES The potential series is an adaptation of Terry Gilliam s 1981 fantasy film of the same name about a young boy who joins a group of renegade time traveling dwarves Deadline reported https UNTITLED CAPTAIN MARVEL STAR BRIE LARSON S CIA PROJECT The new series looks at a young woman s journey in the CIA reported Variety https DEFENDING JACOB STARRING CAPTAIN AMERICA CHRIS EVANS This limited series is based on the novel of the same name and is about an assistant district attorney who is investigating the murder of a 14 year old boy according to Deadline https FOR ALL MANKIND A SCI FI SERIES A space drama from producer Ronald Moore according to Deadline https MY GLORY WAS I HAD SUCH FRIENDS A series featuring Jennifer Garner is based on the 2017 memoir of the same name by Amy Silverstein reported Variety https SEE A FANTASY EPIC STARRING JASON MOMOA The show poses the question about the fate of humanity if everyone lost their sight Variety reported https FOUNDATION A SCI FI ADAPTATION An adaptation of the iconic novel series from famed sci fi author Isaac Asimov Deadline reported The book series follows a mathematician who predicts the collapse of humanity A COMEDY SHOW BY ROB MCELHENNEY AND CHARLIE DAY The sitcom comedy based on the lives of a diverse group of people who work together in a video game development studio Variety reported https AN UNSCRIPTED SERIES HOME FROM THE DOCUMENTARY FILMMAKER MATT TYRNAUER The series will offer viewers a never before seen look inside the world s most extraordinary homes and feature interviews with people who built them according to Variety https UNTITLED RICHARD GERE SERIES Based on an Israeli series Nevelot the show is about two elderly Vietnam vets whose lives are changed when a woman they both love is killed in a car accident Deadline reported J J ABRAMS PRODUCED LITTLE VOICE Singer and actress Sara Bareilles is writing the music and could possibly star in the J J Abrams produced half hour show which explores the journey of finding one s authentic voice in early 20s according to Variety THE PEANUTS GANG Apple has acquired the rights to the famous characters and the first series will be a science and math oriented short featuring Snoopy as an astronaut according to Hollywood Reporter ON THE ROCKS A feature film directed by Sofia Coppola starring Bill Murray is about a young mother who reconnects with her larger than life playboy father on an adventure through New York Variety reported https LOSING EARTH Apple has acquired the rights to a TV series based on Nathaniel Rich s 70 page New York Times Magazine story Losing Earth New York Times reported THE ELEPHANT QUEEN Apple has acquired the rights to Victoria Stone and Mark Deeble s documentary The Elephant Queen Deadline reported WOLFWALKERS An Irish animation about a young hunter who comes to Ireland with her father to destroy a pack of evil wolves but instead befriends a wild native girl who runs with them first reported by Bloomberg PACHINKO Apple has secured the rights to develop Min Jin Lee s best selling novel about four generations of a Korean immigrant family into a series reported Variety CALLS Apple has bought the rights to make an English language version of the French original short form series according to Variety SHANTARAM Apple has won the rights to develop the hit novel Shantaram as a drama series reported Variety https SWAGGER A DRAMA SERIES BASED ON KEVIN DURANT A drama series based on the early life and career of NBA superstar Kevin Durant according to Variety https YOU THINK IT I LL SAY IT Apple has ordered a 10 episode half hour run of the comedy show which is an adaptation of Curtis Sittenfeld s short story collection by the same name Variety reported https WHIPLASH DIRECTOR DAMIEN CHAZELLE DRAMA SERIES According to Variety Apple has ordered a whole season of a series without first shooting a pilot but no other details are known about the show \\n Apple may offer cut priced bundles with video offering The Information reported on Thursday \",\n \"Investing com Stocks in focus in premarket trade Monday \\n Viacom NASDAQ VIAB jumped 4 2 by 8 04 AM ET 12 04 GMT as the company announced that it had renewed its contract with AT T NYSE T avoiding a blackout of MTV Nickelodeon and Comedy Central for DirecTV users \\n Nike NYSE NKE fell 0 3 after European Union antitrust regulators fined the company 12 5 million euros 14 14 million for restricting cross border sales of merchandising products \\n Apple NASDAQ AAPL dropped 0 3 while markets geared up for a company presentation that is expected to lift the curtain on Monday on a secretive years long effort to build a video streaming prodduct \\n Boeing NYSE BA gained 0 4 as the company preps to brief more than 200 global airline pilots technical leaders and regulators on Wednesday over software and training updates for its 737 MAX aircraft \\n CalAmp NASDAQ CAMP fell 2 4 after JP Morgan downgraded it to neutral from overweight according to Briefing com \\n Winnebago Industries NYSE WGO stock declined 0 9 after the company s fiscal second quarter revenue was lower than expected although earnings per share beat expectations\\n Thermo Fisher Scientific NYSE TMO stock could see movement in the regular session after the company announced that it would acquire Brammer Bio for approximately 1 7 billion in cash \\n Biogen NASDAQ BIIB bounced 1 5 after announcing a new 5 billion buyback The stock had fallen by nearly one third last week after saying it had halted the development of a drug it had been developing to treat Alzheimer s \",\n \"By Yimou Lee TAIPEI Reuters Terry Gou chairman of Apple NASDAQ AAPL supplier Foxconn said on Wednesday he will contest Taiwan s 2020 presidential election shaking up the political landscape at a time of heightened tension between the self ruled island and Beijing Gou Taiwan s richest person with a net worth of 7 6 billion according to Forbes said he would join the already competitive race and take part in the opposition China friendly Kuomintang KMT primaries His decision capped a flurry of news this week that began when Gou told Reuters on Monday he planned to step down from the world s largest contract manufacturer to pave the way for younger talent to move up the company s ranks He announced on Tuesday he was considering a presidential bid and hinted he was close to a decision when he told more than 100 people packed into a temple he would follow the instruction of a sea goddess who had told him to run in the presidential race Peace stability and Taiwan s economy future are my core values Gou said later at the KMT s headquarters in Taipei He urged the party to rediscover the spirit of the KMT the honor of KMT members and the KMT s lost support of the youth Gou s bid which requires KMT approval comes at a delicate time for cross strait relations and delivers a blow to the ruling pro independence Democratic Progressive NYSE PGR Party which is struggling in opinion polls China Taiwan relations have deteriorated since the island s president Tsai Ing wen of the independence leaning DPP swept to power in 2016 China suspects Tsai is pushing for the island s formal independence That is a red line for China which has never renounced the use of force to bring Taiwan under its control Tsai says she wants to maintain the status quo with China but will defend Taiwan s security and democracy VERY PRO CHINA A senior adviser to Tsai told Reuters he thought Gou s bid could create problems given his extensive business ties with China This is problematic to Taiwan s national security the adviser Yao Chia wen said He s very pro China and he represents the class of the wealthy people Will that gain support from Taiwanese Yao said adding he believed Gou would face a tough battle in the KMT primary Tension between Taipei and Beijing escalated again on Monday as Chinese bombers and warships conducted drills around the island prompting Taiwan to scramble jets and ships to monitor the Chinese forces The KMT which once ruled China before fleeing to Taiwan at the end of a civil war with the Communists in 1949 said in February it could sign a peace treaty with Beijing if it won the hotly contested presidential election Zhang Baohui a regional security analyst at Hong Kong s Lingnan University said Gou s run could mark the start of the most unusual election race in Taiwan history This is something entirely fresh for Taiwan politics here is a candidate who sees everything through the pragmatic angle of a businessman rather than raw politics or ideology Zhang told Reuters He has no baggage and that will be a fascinating scenario Gou s news comes as Tsai is grappling with a series of unpopular domestic reform initiatives from a pension scheme to labour law which have come under intense voter scrutiny The KMT said this week Gou had been a party member for more than 50 years and had given it an interest free loan of T 45 million 1 5 million in 2016 under the name of his mother which had signalled his loyalty to the party Foxconn said on Tuesday Gou would remain chairman of Foxconn though he planned to withdraw from daily operations \"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Open\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.947134201503234,\n \"min\": 35.99,\n \"max\": 51.84,\n \"num_unique_values\": 69,\n \"samples\": [\n 43.22,\n 38.72,\n 48.83\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"High\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.947413441172774,\n \"min\": 36.43,\n \"max\": 52.12,\n \"num_unique_values\": 67,\n \"samples\": [\n 43.87,\n 39.08,\n 37.96\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Low\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.967879507972434,\n \"min\": 35.5,\n \"max\": 51.76,\n \"num_unique_values\": 66,\n \"samples\": [\n 49.54,\n 50.97,\n 38.56\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.999867403609388,\n \"min\": 35.55,\n \"max\": 51.87,\n \"num_unique_values\": 68,\n \"samples\": [\n 48.77,\n 39.08,\n 37.69\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 45745495,\n \"min\": 45448000,\n \"max\": 365248800,\n \"num_unique_values\": 73,\n \"samples\": [\n 216071600,\n 70146400,\n 244439200\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Label\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": -1,\n \"max\": 1,\n \"num_unique_values\": 3,\n \"samples\": [\n 1,\n -1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 4 + } + ], + "source": [ + "stock.head(5)" + ] + }, + { + "cell_type": "markdown", + "id": "y2ewB36LL9Cz", + "metadata": { + "id": "y2ewB36LL9Cz" + }, + "source": [ + "#### **Understanding the shape of the dataset**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "wWx6wqN0MTPw", + "metadata": { + "id": "wWx6wqN0MTPw", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "edd77532-57d5-4ab5-ebc3-b8a4fa737b0a" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(418, 8)" + ] + }, + "metadata": {}, + "execution_count": 5 + } + ], + "source": [ + "stock.shape" + ] + }, + { + "cell_type": "markdown", + "id": "yQjb8QOTivg3", + "metadata": { + "id": "yQjb8QOTivg3" + }, + "source": [ + "**Observations:**\n", + "* There are a total of 418 records with 8 attributes each." + ] + }, + { + "cell_type": "markdown", + "id": "fPLJXhFcMA7N", + "metadata": { + "id": "fPLJXhFcMA7N" + }, + "source": [ + "#### **Checking the data types of the columns**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "Gc_eAiMdMVe2", + "metadata": { + "id": "Gc_eAiMdMVe2", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "8c61445b-ef88-4a92-b810-0e15bb3000f5" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "RangeIndex: 418 entries, 0 to 417\n", + "Data columns (total 8 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Date 418 non-null object \n", + " 1 News 418 non-null object \n", + " 2 Open 418 non-null float64\n", + " 3 High 418 non-null float64\n", + " 4 Low 418 non-null float64\n", + " 5 Close 418 non-null float64\n", + " 6 Volume 418 non-null int64 \n", + " 7 Label 418 non-null int64 \n", + "dtypes: float64(4), int64(2), object(2)\n", + "memory usage: 26.3+ KB\n" + ] + } + ], + "source": [ + "stock.info()" + ] + }, + { + "cell_type": "markdown", + "id": "i1CgPxT5mxEf", + "metadata": { + "id": "i1CgPxT5mxEf" + }, + "source": [ + "Let's convert the Date column to pandas `datetime` type." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ZD5fstuv6ery", + "metadata": { + "id": "ZD5fstuv6ery" + }, + "outputs": [], + "source": [ + "# Convert the 'Date' column in the 'stocks' DataFrame to datetime format\n", + "stock['Date'] = pd.to_datetime(stock['Date'])" + ] + }, + { + "cell_type": "markdown", + "id": "8dORemydMDfR", + "metadata": { + "id": "8dORemydMDfR" + }, + "source": [ + "#### **Checking the statistical summary**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gUazWjegMeQl", + "metadata": { + "id": "gUazWjegMeQl", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "cd88b728-318a-4f67-d74e-71373400a55b" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Date Open High Low \\\n", + "count 418 418.000000 418.000000 418.000000 \n", + "mean 2019-02-14 12:24:06.889952256 42.308852 42.787321 41.923732 \n", + "min 2019-01-02 00:00:00 35.990000 36.430000 35.500000 \n", + "25% 2019-01-11 00:00:00 38.130000 38.420000 37.720000 \n", + "50% 2019-01-31 00:00:00 41.530000 42.250000 41.140000 \n", + "75% 2019-03-21 00:00:00 47.190000 47.427500 46.480000 \n", + "max 2019-04-29 00:00:00 51.840000 52.120000 51.760000 \n", + "std NaN 4.947134 4.947413 4.967880 \n", + "\n", + " Close Volume Label \n", + "count 418.000000 4.180000e+02 418.000000 \n", + "mean 42.418517 1.294225e+08 0.308612 \n", + "min 35.550000 4.544800e+07 -1.000000 \n", + "25% 38.270000 1.029072e+08 -1.000000 \n", + "50% 41.610000 1.156272e+08 1.000000 \n", + "75% 47.032500 1.511252e+08 1.000000 \n", + "max 51.870000 3.652488e+08 1.000000 \n", + "std 4.999867 4.574550e+07 0.943473 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DateOpenHighLowCloseVolumeLabel
count418418.000000418.000000418.000000418.0000004.180000e+02418.000000
mean2019-02-14 12:24:06.88995225642.30885242.78732141.92373242.4185171.294225e+080.308612
min2019-01-02 00:00:0035.99000036.43000035.50000035.5500004.544800e+07-1.000000
25%2019-01-11 00:00:0038.13000038.42000037.72000038.2700001.029072e+08-1.000000
50%2019-01-31 00:00:0041.53000042.25000041.14000041.6100001.156272e+081.000000
75%2019-03-21 00:00:0047.19000047.42750046.48000047.0325001.511252e+081.000000
max2019-04-29 00:00:0051.84000052.12000051.76000051.8700003.652488e+081.000000
stdNaN4.9471344.9474134.9678804.9998674.574550e+070.943473
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"stock\",\n \"rows\": 8,\n \"fields\": [\n {\n \"column\": \"Date\",\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": \"1970-01-01 00:00:00.000000418\",\n \"max\": \"2019-04-29 00:00:00\",\n \"num_unique_values\": 7,\n \"samples\": [\n \"418\",\n \"2019-02-14 12:24:06.889952256\",\n \"2019-03-21 00:00:00\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Open\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 135.29734162506347,\n \"min\": 4.947134201503234,\n \"max\": 418.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 42.308851674641154,\n 47.19,\n 418.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"High\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 135.18648944299875,\n \"min\": 4.947413441172774,\n \"max\": 418.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 42.78732057416268,\n 47.427499999999995,\n 418.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Low\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 135.4078256172517,\n \"min\": 4.967879507972434,\n \"max\": 418.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 41.923732057416274,\n 46.48,\n 418.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 135.30548063571206,\n \"min\": 4.999867403609388,\n \"max\": 418.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 42.41851674641149,\n 47.0325,\n 418.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 111473859.17448182,\n \"min\": 418.0,\n \"max\": 365248800.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 129422491.86602871,\n 151125200.0,\n 418.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Label\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 147.67411440583984,\n \"min\": -1.0,\n \"max\": 418.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 0.30861244019138756,\n 0.9434730920044713,\n -1.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "stock.describe()" + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Observations:**\n", + "\n", + "- **Date Range and Trading Period**:\n", + " - The data covers a period from January 2, 2019, to April 29, 2019, indicating a span of approximately four months.\n", + "\n", + "- **Price Overview**:\n", + " - **Average Prices**: The average opening price is approximately \\$42.30, while the average closing price is about \\$42.41.\n", + " - **Price Variability**: The prices range from a minimum of around \\$35.99 for opening to a maximum of \\$51.84 for opening, reflecting significant volatility during this period.\n", + "\n", + "- **Trading Volume**:\n", + " - The average trading volume is approximately 129.42 million shares, with fluctuations from around 45.45 million to 365.24 million, highlighting varying market activity levels." + ], + "metadata": { + "id": "0wZ7x_5W77tD" + }, + "id": "0wZ7x_5W77tD" + }, + { + "cell_type": "markdown", + "id": "lXRpNWnQMGIY", + "metadata": { + "id": "lXRpNWnQMGIY" + }, + "source": [ + "#### **Checking the duplicate values**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ti4UpPi6M5kM", + "metadata": { + "id": "ti4UpPi6M5kM", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "66fad013-2ec1-4ec3-9cfe-842656486d7c" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ], + "source": [ + "stock.duplicated().sum()" + ] + }, + { + "cell_type": "markdown", + "id": "XkwHzJH6k_jx", + "metadata": { + "id": "XkwHzJH6k_jx" + }, + "source": [ + "**Observations:**\n", + "* There are no duplicate values." + ] + }, + { + "cell_type": "markdown", + "id": "fxghULa0MOY-", + "metadata": { + "id": "fxghULa0MOY-" + }, + "source": [ + "#### **Checking for missing values**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "yItWheKoNGkf", + "metadata": { + "id": "yItWheKoNGkf", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "ff1d5335-a019-465d-e7f8-546c97e5873f" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Date 0\n", + "News 0\n", + "Open 0\n", + "High 0\n", + "Low 0\n", + "Close 0\n", + "Volume 0\n", + "Label 0\n", + "dtype: int64" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0
Date0
News0
Open0
High0
Low0
Close0
Volume0
Label0
\n", + "

" + ] + }, + "metadata": {}, + "execution_count": 10 + } + ], + "source": [ + "stock.isnull().sum()" + ] + }, + { + "cell_type": "markdown", + "id": "qg7TsQTclDUS", + "metadata": { + "id": "qg7TsQTclDUS" + }, + "source": [ + "**Observations:**\n", + "* There are no missing values." + ] + }, + { + "cell_type": "markdown", + "id": "hGHBK8-QeKOB", + "metadata": { + "id": "hGHBK8-QeKOB" + }, + "source": [ + "## **Exploratory Data Analysis**" + ] + }, + { + "cell_type": "markdown", + "id": "Q0UlMQnyegl7", + "metadata": { + "id": "Q0UlMQnyegl7" + }, + "source": [ + "### **Univariate Analysis**" + ] + }, + { + "cell_type": "markdown", + "id": "RrznHeBaLu0W", + "metadata": { + "id": "RrznHeBaLu0W" + }, + "source": [ + "#### **Countplot on Label**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "meVjTKoxLpmA", + "metadata": { + "id": "meVjTKoxLpmA", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "dcd02675-45e6-4016-fb12-10ce1aef7018" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAGwCAYAAACzXI8XAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAIQ9JREFUeJzt3X1wVOXdh/Hvxrzyko0JkoBsIJTYQHkPECL4jEJqhlYEia1QNMhQmdqIQqpopgiFIlEYgVIDiEUirYyKIyjOCGqUqDQgRFHAEsVCkxp2KdDskthsINnnD8edbsNLWELO3nB9Zs4Me5/dk1+clVycPbux+Xw+nwAAAAwUZvUAAAAAwSJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGCscKsHuNyamppUXV2tjh07ymazWT0OAABoAZ/Pp1OnTqlr164KCzv3eZcrPmSqq6vlcDisHgMAAAShqqpK3bp1O+f+Kz5kOnbsKOm7/xCxsbEWTwMAAFrC4/HI4XD4f46fyxUfMt+/nBQbG0vIAABgmAtdFsLFvgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjBVu9QAAgCtD+iPrrR4BIaR8SW6bfB3OyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMZXnIfPPNN7r77ruVkJCgmJgY9evXT3v27PHv9/l8mjt3rrp06aKYmBhlZWXpq6++snBiAAAQKiwNmX//+98aMWKEIiIi9NZbb+mLL77Q008/rWuvvdZ/n8WLF2vFihVavXq1du3apfbt2ys7O1v19fUWTg4AAEJBuJVf/KmnnpLD4dC6dev8aykpKf4/+3w+LV++XHPmzNG4ceMkSevXr1diYqI2b96siRMntvnMAAAgdFh6RuaNN97QkCFD9LOf/UydO3fWoEGD9Nxzz/n3Hz58WE6nU1lZWf41u92ujIwMlZWVnfWYXq9XHo8nYAMAAFcmS0Pm73//u1atWqXU1FRt27ZN999/vx588EG98MILkiSn0ylJSkxMDHhcYmKif9//KiwslN1u928Oh+PyfhMAAMAyloZMU1OTBg8erEWLFmnQoEGaPn267rvvPq1evTroYxYUFMjtdvu3qqqqVpwYAACEEktDpkuXLurTp0/AWu/evVVZWSlJSkpKkiS5XK6A+7hcLv++/xUVFaXY2NiADQAAXJksDZkRI0aooqIiYO3LL79U9+7dJX134W9SUpJKSkr8+z0ej3bt2qXMzMw2nRUAAIQeS9+1NGvWLN14441atGiRfv7zn+vjjz/WmjVrtGbNGkmSzWbTzJkztXDhQqWmpiolJUWPP/64unbtqvHjx1s5OgAACAGWhszQoUO1adMmFRQUaMGCBUpJSdHy5cs1efJk/31mz56turo6TZ8+XTU1NRo5cqS2bt2q6OhoCycHAAChwObz+XxWD3E5eTwe2e12ud1urpcBgMso/ZH1Vo+AEFK+JPeSHt/Sn9+W/4oCAACAYBEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxlacj87ne/k81mC9jS0tL8++vr65WXl6eEhAR16NBBOTk5crlcFk4MAABCieVnZH70ox/p6NGj/u2jjz7y75s1a5a2bNmijRs3qrS0VNXV1ZowYYKF0wIAgFASbvkA4eFKSkpqtu52u7V27Vpt2LBBo0aNkiStW7dOvXv31s6dOzV8+PCzHs/r9crr9fpvezyeyzM4AACwnOVnZL766it17dpVPXv21OTJk1VZWSlJKi8v1+nTp5WVleW/b1pampKTk1VWVnbO4xUWFsput/s3h8Nx2b8HAABgDUtDJiMjQ8XFxdq6datWrVqlw4cP66abbtKpU6fkdDoVGRmpuLi4gMckJibK6XSe85gFBQVyu93+raqq6jJ/FwAAwCqWvrQ0ZswY/5/79++vjIwMde/eXa+88opiYmKCOmZUVJSioqJaa0QAABDCLH9p6b/FxcXphhtu0KFDh5SUlKSGhgbV1NQE3Mflcp31mhoAAHD1CamQqa2t1ddff60uXbooPT1dERERKikp8e+vqKhQZWWlMjMzLZwSAACECktfWnr44Yc1duxYde/eXdXV1Zo3b56uueYaTZo0SXa7XdOmTVN+fr7i4+MVGxurGTNmKDMz85zvWAIAAFcXS0Pmn//8pyZNmqQTJ07ouuuu08iRI7Vz505dd911kqRly5YpLCxMOTk58nq9ys7O1sqVK60cGQAAhBCbz+fzWT3E5eTxeGS32+V2uxUbG2v1OABwxUp/ZL3VIyCElC/JvaTHt/Tnd0hdIwMAAHAxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGCtkQubJJ5+UzWbTzJkz/Wv19fXKy8tTQkKCOnTooJycHLlcLuuGBAAAISUkQmb37t169tln1b9//4D1WbNmacuWLdq4caNKS0tVXV2tCRMmWDQlAAAINZaHTG1trSZPnqznnntO1157rX/d7XZr7dq1Wrp0qUaNGqX09HStW7dOf/3rX7Vz504LJwYAAKHC8pDJy8vTT3/6U2VlZQWsl5eX6/Tp0wHraWlpSk5OVllZ2TmP5/V65fF4AjYAAHBlCrfyi7/00kv65JNPtHv37mb7nE6nIiMjFRcXF7CemJgop9N5zmMWFhZq/vz5rT0qAAAIQZadkamqqtJDDz2kF198UdHR0a123IKCArndbv9WVVXVascGAAChxbKQKS8v17FjxzR48GCFh4crPDxcpaWlWrFihcLDw5WYmKiGhgbV1NQEPM7lcikpKemcx42KilJsbGzABgAArkyWvbQ0evRo7du3L2Bt6tSpSktL06OPPiqHw6GIiAiVlJQoJydHklRRUaHKykplZmZaMTIAAAgxloVMx44d1bdv34C19u3bKyEhwb8+bdo05efnKz4+XrGxsZoxY4YyMzM1fPhwK0YGAAAhxtKLfS9k2bJlCgsLU05Ojrxer7Kzs7Vy5UqrxwIAACEipEJm+/btAbejo6NVVFSkoqIiawYCAAAhzfLPkQEAAAgWIQMAAIxFyAAAAGMFFTI9e/bUiRMnmq3X1NSoZ8+elzwUAABASwQVMkeOHFFjY2Ozda/Xq2+++eaShwIAAGiJi3rX0htvvOH/87Zt22S32/23GxsbVVJSoh49erTacAAAAOdzUSEzfvx4SZLNZtOUKVMC9kVERKhHjx56+umnW204AACA87mokGlqapIkpaSkaPfu3erUqdNlGQoAAKAlgvpAvMOHD7f2HAAAABct6E/2LSkpUUlJiY4dO+Y/U/O9559//pIHAwAAuJCgQmb+/PlasGCBhgwZoi5dushms7X2XAAAABcUVMisXr1axcXFuueee1p7HgAAgBYL6nNkGhoadOONN7b2LAAAABclqJD55S9/qQ0bNrT2LAAAABclqJeW6uvrtWbNGr377rvq37+/IiIiAvYvXbq0VYYDAAA4n6BC5vPPP9fAgQMlSfv37w/Yx4W/AACgrQQVMu+//35rzwEAAHDRgrpG5nuHDh3Stm3b9J///EeS5PP5WmUoAACAlggqZE6cOKHRo0frhhtu0E9+8hMdPXpUkjRt2jT95je/adUBAQAAziWokJk1a5YiIiJUWVmpdu3a+dfvuusubd26tdWGAwAAOJ+grpF5++23tW3bNnXr1i1gPTU1Vf/4xz9aZTAAAIALCeqMTF1dXcCZmO+dPHlSUVFRlzwUAABASwQVMjfddJPWr1/vv22z2dTU1KTFixfrlltuabXhAAAAzieol5YWL16s0aNHa8+ePWpoaNDs2bN14MABnTx5Ujt27GjtGQEAAM4qqDMyffv21ZdffqmRI0dq3Lhxqqur04QJE/Tpp5/qBz/4QWvPCAAAcFZBnZGRJLvdrt/+9retOQsAAMBFCeqMzLp167Rx48Zm6xs3btQLL7xwyUMBAAC0RFBnZAoLC/Xss882W+/cubOmT5+uKVOmXPJgoST9kfUXvhOuKuVLcq0eAQCgIM/IVFZWKiUlpdl69+7dVVlZeclDAQAAtERQIdO5c2d9/vnnzdY/++wzJSQkXPJQAAAALRFUyEyaNEkPPvig3n//fTU2NqqxsVHvvfeeHnroIU2cOLG1ZwQAADiroK6R+f3vf68jR45o9OjRCg//7hBNTU3Kzc3VokWLWnVAAACAc7nokPH5fHI6nSouLtbChQu1d+9excTEqF+/furevfvlmBEAAOCsggqZXr166cCBA0pNTVVqaurlmAsAAOCCLvoambCwMKWmpurEiROXYx4AAIAWC+pi3yeffFKPPPKI9u/f39rzAAAAtFhQF/vm5ubq22+/1YABAxQZGamYmJiA/SdPnmyV4QAAAM4nqJBZvnx5K48BAABw8YIKmSvtVxAAAAAzBXWNjCR9/fXXmjNnjiZNmqRjx45Jkt566y0dOHCg1YYDAAA4n6BCprS0VP369dOuXbv02muvqba2VtJ3v6Jg3rx5rTogAADAuQQVMo899pgWLlyod955R5GRkf71UaNGaefOna02HAAAwPkEFTL79u3THXfc0Wy9c+fOOn78+CUPBQAA0BJBhUxcXJyOHj3abP3TTz/V9ddff8lDAQAAtERQITNx4kQ9+uijcjqdstlsampq0o4dO/Twww8rNze3xcdZtWqV+vfvr9jYWMXGxiozM1NvvfWWf399fb3y8vKUkJCgDh06KCcnRy6XK5iRAQDAFSiokFm0aJHS0tLkcDhUW1urPn366KabbtKNN96oOXPmtPg43bp105NPPqny8nLt2bNHo0aN0rhx4/zvfJo1a5a2bNmijRs3qrS0VNXV1ZowYUIwIwMAgCuQzefz+YJ9cFVVlfbt26e6ujoNGjRIvXr1uuSB4uPjtWTJEt1555267rrrtGHDBt15552SpIMHD6p3794qKyvT8OHDz/p4r9crr9frv+3xeORwOOR2uxUbGxvUTOmPrA/qcbhylS9p+ZlH4GrB35X4b5f696TH45Hdbr/gz++gP0dm7dq1GjNmjO644w7dfffdGj9+vP70pz8Fezg1NjbqpZdeUl1dnTIzM1VeXq7Tp08rKyvLf5+0tDQlJyerrKzsnMcpLCyU3W73bw6HI+iZAABAaAvqk33nzp2rpUuXasaMGcrMzJQklZWVadasWaqsrNSCBQtafKx9+/YpMzNT9fX16tChgzZt2qQ+ffpo7969ioyMVFxcXMD9ExMT5XQ6z3m8goIC5efn+29/f0YGAABceYIKmVWrVum5557TpEmT/Gu33367+vfvrxkzZlxUyPzwhz/U3r175Xa79eqrr2rKlCkqLS0NZixJUlRUlKKiooJ+PAAAMEdQIXP69GkNGTKk2Xp6errOnDlzUceKjIz0X1uTnp6u3bt36w9/+IPuuusuNTQ0qKamJuCsjMvlUlJSUjBjAwCAK0xQ18jcc889WrVqVbP1NWvWaPLkyZc0UFNTk7xer9LT0xUREaGSkhL/voqKClVWVvpfzgIAAFe3oM7ISN9d7Pv222/73z20a9cuVVZWKjc3N+AalaVLl57zGAUFBRozZoySk5N16tQpbdiwQdu3b9e2bdtkt9s1bdo05efnKz4+XrGxsf5rcs71jiUAAHB1CSpk9u/fr8GDB0v67rdgS1KnTp3UqVMn7d+/338/m8123uMcO3ZMubm5Onr0qOx2u/r3769t27bpxz/+sSRp2bJlCgsLU05Ojrxer7Kzs7Vy5cpgRgYAAFegoELm/fffb5Uvvnbt2vPuj46OVlFRkYqKilrl6wEAgCtL0J8jAwAAYDVCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGsjRkCgsLNXToUHXs2FGdO3fW+PHjVVFREXCf+vp65eXlKSEhQR06dFBOTo5cLpdFEwMAgFBiaciUlpYqLy9PO3fu1DvvvKPTp0/r1ltvVV1dnf8+s2bN0pYtW7Rx40aVlpaqurpaEyZMsHBqAAAQKsKt/OJbt24NuF1cXKzOnTurvLxc//d//ye32621a9dqw4YNGjVqlCRp3bp16t27t3bu3Knhw4dbMTYAAAgRIXWNjNvtliTFx8dLksrLy3X69GllZWX575OWlqbk5GSVlZWd9Rher1cejydgAwAAV6aQCZmmpibNnDlTI0aMUN++fSVJTqdTkZGRiouLC7hvYmKinE7nWY9TWFgou93u3xwOx+UeHQAAWCRkQiYvL0/79+/XSy+9dEnHKSgokNvt9m9VVVWtNCEAAAg1ll4j870HHnhAb775pj744AN169bNv56UlKSGhgbV1NQEnJVxuVxKSko667GioqIUFRV1uUcGAAAhwNIzMj6fTw888IA2bdqk9957TykpKQH709PTFRERoZKSEv9aRUWFKisrlZmZ2dbjAgCAEGPpGZm8vDxt2LBBr7/+ujp27Oi/7sVutysmJkZ2u13Tpk1Tfn6+4uPjFRsbqxkzZigzM5N3LAEAAGtDZtWqVZKkm2++OWB93bp1uvfeeyVJy5YtU1hYmHJycuT1epWdna2VK1e28aQAACAUWRoyPp/vgveJjo5WUVGRioqK2mAiAABgkpB51xIAAMDFImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGMvSkPnggw80duxYde3aVTabTZs3bw7Y7/P5NHfuXHXp0kUxMTHKysrSV199Zc2wAAAg5FgaMnV1dRowYICKiorOun/x4sVasWKFVq9erV27dql9+/bKzs5WfX19G08KAABCUbiVX3zMmDEaM2bMWff5fD4tX75cc+bM0bhx4yRJ69evV2JiojZv3qyJEyee9XFer1der9d/2+PxtP7gAAAgJITsNTKHDx+W0+lUVlaWf81utysjI0NlZWXnfFxhYaHsdrt/czgcbTEuAACwQMiGjNPplCQlJiYGrCcmJvr3nU1BQYHcbrd/q6qquqxzAgAA61j60tLlEBUVpaioKKvHAAAAbSBkz8gkJSVJklwuV8C6y+Xy7wMAAFe3kA2ZlJQUJSUlqaSkxL/m8Xi0a9cuZWZmWjgZAAAIFZa+tFRbW6tDhw75bx8+fFh79+5VfHy8kpOTNXPmTC1cuFCpqalKSUnR448/rq5du2r8+PHWDQ0AAEKGpSGzZ88e3XLLLf7b+fn5kqQpU6aouLhYs2fPVl1dnaZPn66amhqNHDlSW7duVXR0tFUjAwCAEGJpyNx8883y+Xzn3G+z2bRgwQItWLCgDacCAACmCNlrZAAAAC6EkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYKt3oAAMFJf2S91SMghJQvybV6BMASnJEBAADGImQAAICxCBkAAGAsQgYAABjLiJApKipSjx49FB0drYyMDH388cdWjwQAAEJAyIfMyy+/rPz8fM2bN0+ffPKJBgwYoOzsbB07dszq0QAAgMVCPmSWLl2q++67T1OnTlWfPn20evVqtWvXTs8//7zVowEAAIuF9OfINDQ0qLy8XAUFBf61sLAwZWVlqays7KyP8Xq98nq9/ttut1uS5PF4gp6j0fufoB+LK9OlPJ9aC89L/Deekwg1l/qc/P7xPp/vvPcL6ZA5fvy4GhsblZiYGLCemJiogwcPnvUxhYWFmj9/frN1h8NxWWbE1cn+x19ZPQIQgOckQk1rPSdPnTolu91+zv0hHTLBKCgoUH5+vv92U1OTTp48qYSEBNlsNgsnM5/H45HD4VBVVZViY2OtHgfgOYmQw3Oy9fh8Pp06dUpdu3Y97/1COmQ6deqka665Ri6XK2Dd5XIpKSnprI+JiopSVFRUwFpcXNzlGvGqFBsby/+gCCk8JxFqeE62jvOdifleSF/sGxkZqfT0dJWUlPjXmpqaVFJSoszMTAsnAwAAoSCkz8hIUn5+vqZMmaIhQ4Zo2LBhWr58uerq6jR16lSrRwMAABYL+ZC566679K9//Utz586V0+nUwIEDtXXr1mYXAOPyi4qK0rx585q9dAdYheckQg3PybZn813ofU0AAAAhKqSvkQEAADgfQgYAABiLkAEAAMYiZAAAgLEIGbTIa6+9pltvvdX/Ccl79+61eiRc5YqKitSjRw9FR0crIyNDH3/8sdUj4Sr2wQcfaOzYseratatsNps2b95s9UhXDUIGLVJXV6eRI0fqqaeesnoUQC+//LLy8/M1b948ffLJJxowYICys7N17Ngxq0fDVaqurk4DBgxQUVGR1aNcdXj7NS7KkSNHlJKSok8//VQDBw60ehxcpTIyMjR06FA988wzkr77xG+Hw6EZM2boscces3g6XO1sNps2bdqk8ePHWz3KVYEzMgCM0tDQoPLycmVlZfnXwsLClJWVpbKyMgsnA2AFQgaAUY4fP67GxsZmn+6dmJgop9Np0VQArELIoJkXX3xRHTp08G8ffvih1SMBAHBWIf+7ltD2br/9dmVkZPhvX3/99RZOAwTq1KmTrrnmGrlcroB1l8ulpKQki6YCYBXOyKCZjh07qlevXv4tJibG6pEAv8jISKWnp6ukpMS/1tTUpJKSEmVmZlo4GQArcEYGLXLy5ElVVlaqurpaklRRUSFJSkpK4l/BaHP5+fmaMmWKhgwZomHDhmn58uWqq6vT1KlTrR4NV6na2lodOnTIf/vw4cPau3ev4uPjlZycbOFkVz7efo0WKS4uPusPiXnz5ul3v/td2w+Eq94zzzyjJUuWyOl0auDAgVqxYkXAS6JAW9q+fbtuueWWZutTpkxRcXFx2w90FSFkAACAsbhGBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgaAcYqLixUXF3fJx7HZbNq8efMlHweAdQgZAJa49957NX78eKvHAGA4QgYAABiLkAEQcpYuXap+/fqpffv2cjgc+vWvf63a2tpm99u8ebNSU1MVHR2t7OxsVVVVBex//fXXNXjwYEVHR6tnz56aP3++zpw501bfBoA2QMgACDlhYWFasWKFDhw4oBdeeEHvvfeeZs+eHXCfb7/9Vk888YTWr1+vHTt2qKamRhMnTvTv//DDD5Wbm6uHHnpIX3zxhZ599lkVFxfriSeeaOtvB8BlxG+/BmCJe++9VzU1NS262PbVV1/Vr371Kx0/flzSdxf7Tp06VTt37lRGRoYk6eDBg+rdu7d27dqlYcOGKSsrS6NHj1ZBQYH/OH/5y180e/ZsVVdXS/ruYt9NmzZxrQ5gsHCrBwCA//Xuu++qsLBQBw8elMfj0ZkzZ1RfX69vv/1W7dq1kySFh4dr6NCh/sekpaUpLi5Of/vb3zRs2DB99tln2rFjR8AZmMbGxmbHAWA2QgZASDly5Ihuu+023X///XriiScUHx+vjz76SNOmTVNDQ0OLA6S2tlbz58/XhAkTmu2Ljo5u7bEBWISQARBSysvL1dTUpKefflphYd9dxvfKK680u9+ZM2e0Z88eDRs2TJJUUVGhmpoa9e7dW5I0ePBgVVRUqFevXm03PIA2R8gAsIzb7dbevXsD1jp16qTTp0/rj3/8o8aOHasdO3Zo9erVzR4bERGhGTNmaMWKFQoPD9cDDzyg4cOH+8Nm7ty5uu2225ScnKw777xTYWFh+uyzz7R//34tXLiwLb49AG2Ady0BsMz27ds1aNCggO3Pf/6zli5dqqeeekp9+/bViy++qMLCwmaPbdeunR599FH94he/0IgRI9ShQwe9/PLL/v3Z2dl688039fbbb2vo0KEaPny4li1bpu7du7fltwjgMuNdSwAAwFickQEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGCs/wfB2y9rKPmapAAAAABJRU5ErkJggg==\n" + }, + "metadata": {} + } + ], + "source": [ + "sns.countplot(data=stock, x='Label', stat=\"percent\");" + ] + }, + { + "cell_type": "markdown", + "id": "nXPvfQr-Avd7", + "metadata": { + "id": "nXPvfQr-Avd7" + }, + "source": [ + "**Observations:**\n", + "* The dataset is imbalanced for the sentiment polarities.\n", + "* There is more news content with positive polarity compared to other types." + ] + }, + { + "cell_type": "markdown", + "id": "dpGHhbGeeoF8", + "metadata": { + "id": "dpGHhbGeeoF8" + }, + "source": [ + "#### **Density Plot of Price (Open, High, Low, Close)**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "BKqgbg0_v5EM", + "metadata": { + "id": "BKqgbg0_v5EM", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0a50f216-e030-496b-de8c-cc3a2b8153ca" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkkAAAHpCAYAAACWdKhHAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAxH9JREFUeJzs3Xd4VFX6wPHvnZbeOymE3nsHpdjAjr0jFty1rYrdVSxYfu6qq2vXVdFdK3YRkSJFAemdEAgJSUglvWcmM/f3x51KJpQQ0ng/z5NnxnvPvfcMBvLmnPe8R1FVVUUIIYQQQnjQtXUHhBBCCCHaIwmShBBCCCG8kCBJCCGEEMILCZKEEEIIIbyQIEkIIYQQwgsJkoQQQgghvJAgSQghhBDCCwmSmklVVSoqKpAyU0IIIUTnJEFSM1VWVhISEkJlZWVbd0UIIYQQJ4EESUIIIYQQXkiQJIQQQgjhhQRJQgghhBBeSJAkhBBCCOGFBElCCCGEEF5IkCSEEEII4YUESUIIIYQQXkiQJIQQQgjhhQRJQgghhBBeSJAkhBBCCOGFBElCCCGEEF5IkCSEEEII4YUESUIIIYQQXkiQJIQQQgjhhQRJQgghhBBeSJAkhBBCCOGFBElCCCGEEF5IkCSEEEII4YUESUIIIYQQXhjaugNCtIYN+Rt4d9u71DTUEOYbxl8H/5VBUYPaultCCCHaMQmSRKemqipvbn2T97a/h4rqPL42dy2PjH6EK/tc2Ya9E0II0Z7JdJvo1Obvnc+7299FRWV6z+m8fsbrnJl0Jhabhbl/zmVp5tK27qIQQoh2SlFVVT16M3G4iooKQkJCKC8vJzg4uK27I7zIrsjmsp8uo7ahlnuH38stg24BtNGlf2z4B/9L+R8RvhH8MP0HQnxC2ri3Qggh2hsZSRKdkqqqzFkzh9qGWkbFjuKmgTc5zymKwr0j7qVbSDeK64r5x4Z/tGFPhRBCtFcSJIlOaUP+BjYWbMRX78vcCXPRKZ7f6j56H54Z/wyoen5IW0B6Wbp24sBqWPVPWDYXDvzRBj0XQgjRXkjituiUPtr1EQAX97yY+MD4RudXpxXx6tIaqrOewabauOur5cxL/IrodS+4Gv3+Eoy+Dc6eC0bf1uq6EEKIdkKCJNHp7C3dyx85f6BTdNzY/8ZG53/alsvsr7ZisaqAHtCzOz2W89N9+MEnnC79xoPeCLu+g/XvQW0ZXPZ+a38MIYQQbUym20Sn88muTwA4K+ksEoMTPc6t2nuIv32xBYtV5YLBcSydPZGpyd/QQ8nhEGHc4vca1Zd8DFfMg2u+AEUPO76CHV+3wScRQgjRliRIEp1KbUMtizMXA3BD/xs8ztVZrDz+/U5UFS4bnsBrVw+jZ5iBV6p/52PTiwTpKkkp0zPnh13aBX3OhYkPaO8XzIaKvNb8KEIIIdqYBEmiU1mZvZLahlriA+MZEjXE49xbK/aTVVJDbLAvz1w8AL1OgT/fJqCyEKOxDGP8fwH4dstBdudWaBdNfBC6DIP6cljz79b+OEIIIdqQBEmiU1mYsRCAc7udi6IozuP55XW8s2I/AE9e2J8AHwNYG2DDfwBY0es0LIFZ9EwoR1Xhn7/u0S7UG+GMx7X3m+ZBdXGrfRYhhBBtS4Ik0WlUmCv4I0dbtn9ut3M9zn26LhOz1cbIrmFMGxirHdyzACpywD+SqNG3A2AO+RaDTmF56iHWZ5Ro7XqcCbGDwVID699ttc8jhBCibUmQJDqNZZnLsNgs9AztSe+w3s7jdRYrn63LAuDm07q5RpjW2QOekTcxIWkKAcYASm2pTBngB8C8NRnaeUWB02e7rrHUtsrnEUII0bYkSBKdxorsFQCck3yOx/EF2/MorjYTF+LLOf1jtIMFuyFrDegMMPJmfPQ+TEmcAkB4zHYAFu8q4FBlvda+30UQkgR1ZbB3USt8GiGEEG1NgiTRKVisFtblrwNgYsJEj3P//TMTgOvHdsWgt3/L7/5ee+15NgR3AXAGSalVyxmWFEqDTeXrTQe1djo9DL5Ce7/ty5P3QYQQQrQbEiSJTmHroa1UW6oJ9w2nX3g/5/Gs4hq2ZZehU+DKkW41k3b/qL32v9h5aEzcGHSKjv3l+zlviLbh7RcbsrDZ7HtAD7pSe01bIgncQghxCpAgSXQKq3NWAzC+y3iPfdoW7tRqG43tHkFUkI92sGgfHErRptr6TMNWW0vlb79h+fAz+vv3ACAwbC9BPgYyi2vYcMCewB3dF+KGgK0Bdn3beh9OCCFEm5AgSXQKq3O1IGlC/ASP47/s0IKk8wbFuQ7u/kF77TaJ6u372DdpMgfvuJNDr/2bvr+mArAxZyVnD9DylxbtynddO/hq7VUqcAshRKcnQZLo8A7VHGJPyR4UFMZ3Ge88nl1Sw7aD5egUXMv+AVJ+AsASOoqDd9yBraICQ5c4AiaezpADWpM1mSs5u28koCVwq6p9yq3/Rfabr5MpNyGE6OQkSBId3p95fwLQL6If4b7hzuOLdmojQGO6RRAZaJ9qqzoEeVsByHrlR2zV1fiPGkWPX34h6b33OPPlz/CvgyqjFcPKd/Ez6skpq2Vnjr0Cd0gCxAwCVC03SQghRKclQZLo8DYVbAJgVMwoj+PL9hQAh40iZawEwKKLwZxXhql7dxLeehOdjxZEBQ0ewsig/gBs3PQNE+N9AVi0y23ftt5Ttde9v7b4ZxFCCNF+SJAkOrzNhZsBGBEzwnmsxtzApsxSACb2jnI1tgdJlalaQcjohx5EHxTkcb8xQ84HYE+Cwti92iiVY1QKcAVJacvAamm5DyKEEKJdkSBJdGgldSVklGuVsYdFD3MeX5degsWqkhDmR3KEv+uC9BUAVOUa8Bs5gsBJkxrdc3jMcABSExQGLPkKvQL7D1WTXVKjNYgfAf4R2qa3WX+enA8mhBCizUmQJDq0LQVbAOgZ2pNQ31Dn8d/3FQFweq9I1zYkJRlQloVqg5pDJqJn3++xCa5Dn/A++Bn8qPZTKA2pZYC1DIA/0rR7otNrRSgB9i0+KZ9LCCFE25MgSXRoGws2AjA8erjH8d/3HQLgtJ6Np9pqi0349BmE//BheGPUGRkcNRiAPYkKg/dt8LgnAD206twc+P2EP4MQQoj2SYIk0aE58pEcU2QA+eV17CusQlFgQs8I53E1YxUA1QUmQi679Ij3dQRd+waGMjw/BYDVacVYHdW3k0/XXvO2QW1ZS3wUIYQQ7YwESaLDqrZUs6dkD+CZtL3aPi02OD6EUH+T87ia/gcAtWWBhFxwwRHv7chv2pOg0LvsIAFWM+W1FnbklGsNQuIhvAeoNshc02KfSQghRPshQZLosHYX78am2ojxjyE2wLXMf2Omto3ImO6uUSQq8tDVFKDawDB0Gvrg4CPee0jUEPSKngIqKA1RGFqgBWO/73WbcutmH02SKTchhOiUJEgSHdbOop0Azvwhh40HtKX/I7qGOY+p2esBqC83EHzx5Ue9t7/Rn15hvQA4OKUvwwr3AW7J2+CacsuQIEkIITojCZJEh7WjaAcAAyIGOI+V1ZjZV1gFeAZJDdsWAVBX7o//mDHHdH/HfTOHxjKoaD8AW7PLqG+wag0cQVLBDqgpaf4HEUII0S5JkCQ6rF1FuwAYFDnIeWxzljaK1C0ywLUVCUC6tgGuLXIQOpMrT+lIBkYOBGBfUBWJtcUE11dR32BjpyMvKSgGIvto7yUvSQghOh0JkkSHVFxbTG51LgoK/SP6O497m2qjwYzenA2AYcSRE7bdOYKv3WWp+A0bysBirWjl+oxSV6OksdrrwfXN+RhCCCHaMQmSRIe0q1gbReoW0o1AU6Dz+Eb7ViSjkt2m2navQKezYa1X8D/nymN+Ro/QHvjqfamyVFE6aRADi9MB2HDAbWot0T51ly1BkhBCdDYSJIkOyZG07ZgSAzA32NiWXQbAiK7hruOrvwGg3hKJISLymJ9h0BnoG94XgAP9wxhYZA+SMkpc9ZIcQVLOZmgwN+uzCCGEaJ8kSBIdkiNp2z1I2pNfQX2DjRA/Iz2iApzH1axN2puYQRwvx/1TDcX0DlDxs9RRWd9Aan6l1iCiB/iFg7Ue8rc389MIIYRojyRIEh2OqqrsLt4NeK5scxR6HJwQ4tyTTbXZ0Ndr+Uj6fo03sz0aR5C0s3gnIRNPp19JJuA25aYoblNu647/wwghhGi3JEgSHc6h2kOU1JWgU3T0DuvtPL7joBYkDYoPcR6rT03BJ6AOANPo84/7Wc6RpJJUfMaMon/JAQC2ZLklbyeO1l4lSBJCiE5FgiTR4Ti2IukW3A1fg6/z+PaDrpEkh/rVP6LowWYzokT1PO5nJQYl4m/wp95az6G+0fQt1UaSNntL3s5aB6p63M8QQgjRPkmQJDqcvaV7AegT3sd5rM5iZW+Blic0KCHUedy6R9vUtsE3SZsaO046Red8zj61gIEhegCyyuooqbYnascPB0UPVflQkXPczxBCCNE+SZAkOhzHSJJj5RlASl4FDTaViAATXUK00SXVZkMpSgFASRzR+EbHyPGcPcV7iBk5hITKQgDnSjqMfhBtr9WUs7nZzxFCCNG+SJAkOpzUklTAcyTJUQV7kFvSdv2+ffgEVANgGDC52c9zBkmlewgYNco55eaRlxQ/THvNlSBJCCE6CwmSRIdSY6khs0ILUvqEuYKk7V6Stms3b8I31AKAkjC82c90BEmpJan4jRxJn5IsALZkuG12G28fqZKRJCGE6DQkSBIdyt7SvaioRPlFEeEX4TzuWP7vHiSZt61CZ1RRMUBEr2Y/s2doTwyKgbL6MooDbAzybwC0zW5tjqKSXexBWO5WsNma/SwhhBDthwRJokPxNtVW32AlrbAKgAFuQZKaqY3q2IKSQW9o9jNNehPdQ7sDWj5U/76J+DSYqWyA9CJtOo/ofmDwhfpyKElv9rOEEEK0HxIkiQ4ltVQLktyTttMKq2iwqQT7GpxJ2w2lpejNuQAoCUNO+LnOvKSSPQQNG0qPcm0VmyMXCr0RYgdr7yUvSQghOgUJkkSHsq90H4BHEck9edrS/75xwc6k7bodO/AJ0fKRdAlDT/i57kGS35Ah9Cw7CMCOg2WuRvH2KTfJSxJCiE5BgiTRYaiqSlpZGqDlCTnsya8AoH9csPNY7dZt+IRouUPO5fknoFeYltO0v2w/Pj160KtWS9resT/f1ciZl7TlhJ8nhBCi7UmQJDqMgpoCqixVGBQDycHJzuMpjpGk2CDnsdptmzEFOYKkfif8bEdQllWZRb1qYUAX7Vm7C2tdydtx9mm9gp2SvC2EEJ2ABEmiw3CMInUN7opRb3Qed4wk9bWPJKk2G9aMzSg6UI2BENzlhJ8d4RtBqE8oNtVGRnkGffsnY7A2UGVTyC6tsTfqqSVvm6ugNOOEnymEEKJtSZAkOoy0UvtUW5hrqq2wso6iKjOKAn1itNEdc2YmJkOZ1iBmQLO2IzmcoijO0aS0sjSChw2lW0UeADtztCANvUF7HkDethN+phBCiLYlQZLoMBwjST1CeziPOZK2u0UE4GfS9lWr273bmY+kxJz4VJuD47lpZWn4Dh7sTN7emVHoahQ7SHvN39FizxVCCNE2JEgSHYYjSOoV6ioM6Zhq6+eWtO0eJLVE0raD47lpZWkYwsLoo9Om2banuSVvO8oA5G9vsecKIYRoGxIkiQ7BptpIL9eKNHobSerjlrStBUna8v+WSNp2cEzzOab9BsTZk7eL61FVe/K2M0iSkSQhhOjoJEgSHUJOVQ61DbWYdCYSgxKdx/fZK233tucjqapK/Z5dGAOtWoPIPo3u1VyOnKTc6lyqLdUM6JuIzmal1KYnv6JOaxQzABQdVBVAZUGLPVsIIUTrkyBJdAiO0Zvuod0x6LQtRmw21bkdSa+YQAAsObkYbCUoCqi+IRAY3WJ9CPEJIcovCtDqJYUMHkhXeyDkTN42+bv2iZMpNyGE6NAkSBIdwv7y/YDnVFtOWS21FismvY6u4f4A1O3e5ayPpET0apGVbe7cV7j59u9PzzJte5Lt7kUlncnbEiQJIURH1i6CpDfffJPk5GR8fX0ZM2YM69evP2L7+fPn07dvX3x9fRk0aBALFy50nrNYLDz88MMMGjSIgIAAunTpwowZM8jNzfW4R0lJCddddx3BwcGEhoZyyy23UFVVdVI+nzhxGeVa3aHuId2dx/YVavlI3aMCMOi1b+W63bsxBduTtiN709LcV7jpg4PprdM2uN3hnrwdZ89LypMgSQghOrI2D5K+/PJLZs+ezZNPPsnmzZsZMmQIU6dOpbCw0Gv7NWvWcM0113DLLbewZcsWpk+fzvTp09m5cycANTU1bN68mSeeeILNmzfz7bffkpqaykUXXeRxn+uuu45du3axZMkSFixYwKpVq7jttttO+ucVzeMIkrqFdHMe21egBbU9owOdx+p278bHGSS5VsG1FMf2JI7pv4H2hPHdRfWuRlIGQAghOoU2D5JeeeUVZs2axU033UT//v1555138Pf358MPP/Ta/rXXXmPatGk8+OCD9OvXj7lz5zJ8+HDeeOMNAEJCQliyZAlXXnklffr0YezYsbzxxhts2rSJrKwsAFJSUli0aBH/+c9/GDNmDKeddhqvv/46X3zxRaMRJ9H2VFV1rmxzH0naaw+SekW7rWxLSXFtR3ISgiT3kSSAgX0TUVQbhVY9hyrtgZJjhVvJfqivbPE+CCGEaB1tGiSZzWY2bdrEWWed5Tym0+k466yzWLt2rddr1q5d69EeYOrUqU22BygvL0dRFEJDQ533CA0NZeTIkc42Z511FjqdjnXr1nm9R319PRUVFR5fonUcqj1EtaUavaL3WNmWZp9u621P2m4oKcF66NDJnW4L6eHsU3l9OeGD+pNQdQiAXbnlWqOASAiyb4VSsKvF+yCEEKJ1tGmQVFRUhNVqJSYmxuN4TEwM+fn5Xq/Jz88/rvZ1dXU8/PDDXHPNNQQHBzvvER3tuerJYDAQHh7e5H1eeOEFQkJCnF+JiYle24mW55hqSwhKwKQ3Adro0r7DVrbV792Lwc+G3qiCooewbt5veAICTYF0CdACoLSyNHz79aWHPXl7R7rbFLHkJQkhRIfX5tNtJ5PFYuHKK69EVVXefvvtE7rXo48+Snl5ufMrOzu7hXopjsaZjxTsCnpyy+uoMVsx6hW6RgQAUJ+a6ppqC0sGg+mk9Mc55Vaahj4khN6qNqq4bV+eq5GscBNCiA7P0JYPj4yMRK/XU1DgWXSvoKCA2NhYr9fExsYeU3tHgJSZmclvv/3mHEVy3OPwxPCGhgZKSkqafK6Pjw8+Pj7H/NlEy3HkI7knbe8tsO/ZFhmA0bGyLXWvW9J2y0+1OfQM68nvOb8785L6RfgBkFpU62ok25MIIUSH16YjSSaTiREjRrBs2TLnMZvNxrJlyxg3bpzXa8aNG+fRHmDJkiUe7R0B0r59+1i6dCkRERGN7lFWVsamTZucx3777TdsNhtjxoxpiY8mWpC3lW377VNtPaJcK9vqU1Pd8pFaPmnbwb1WEsCA7trU7UGzjup6+/Md022FKWC1nLS+CCGEOHnafLpt9uzZvP/++3z88cekpKRw++23U11dzU033QTAjBkzePTRR53t77nnHhYtWsTLL7/Mnj17eOqpp9i4cSN33XUXoAVIl19+ORs3buTTTz/FarWSn59Pfn4+ZrMZgH79+jFt2jRmzZrF+vXrWb16NXfddRdXX301Xbp0af0/BHFE3oKk9CKtPpEjSFKtVurT0lzTbRE9T1p/3IMkVVWJ7d+bsLoKVBTnCBehXcEnBKxmOJR60voihBDi5GnzIOmqq67ipZdeYs6cOQwdOpStW7eyaNEiZ3J2VlYWeXmuXI/x48fz2Wef8d577zFkyBC+/vprvv/+ewYOHAhATk4OP/74IwcPHmTo0KHExcU5v9asWeO8z6effkrfvn0588wzOe+88zjttNN47733WvfDi6OqtlRTUKNNr3oESYe0kaTuUVo+kjkzC7W+HlOQfc+2iB6cLN1DuqOgUFZfRnFdMb79+tKtXPseTckp0xopCsRq35MU7DxpfRFCCHHytGlOksNdd93lHAk63IoVKxodu+KKK7jiiiu8tk9OTnbtyH4E4eHhfPbZZ8fVT9H6DpQfACDcN5wQnxDn8fRD2khS9yjHyrZUUFSMAfYgKbw7J4uvwZfEoESyKrNIK0tjTMJoutceYjN92L0vF8bZg7mYAZC5WoIkIYTooNp8JEmIIzlQcQCA5OBk57HKOguF9sKNjpGkutRUjAFWFEUFoz8ExZ3UfjmKWmaUZ6DodPQO1P4qpRwsdTWKsY8k5UuQJIQQHZEESaJdy6zIBCA5JNl5LMOejxQZ6EOwrxGA+n37XPlI4d1bfGPbw3UL1UaLHPlS/RJCAdhbaXONZDqCJCkoKYQQHZIESaJdcwRJSUFJzmPOqbbIAOcxc9p+tyCp5YtIHs5Rs8kRJPXu0xW9zUqlqievvE5rFN0XUKC6EKq870UohBCi/ZIgSbRrzpEkt+m2w5O2VbMZc3Y2pkBHPtLJS9p2cCSRO4Kk4AF9SazUAqE9+fYta0wBrgRyyUsSQogOR4Ik0W6pqkpWhbYpcdfgrs7j+4scSduOlW2ZYLXiE2qf5jqJSdsOjiCpoKaAaks1Pr16kVypbWmzK81ta5uYAdqrTLkJIUSHI0GSaLdK6kqotFSioJAY7NorzzXdZl/Ztn8/AKZQe4OTuPzfIcQnhHDfcEBLLtf5+dFTr02z7U53qwgf49ieREaShBCio5EgSbRbjqm2uIA4fPTaljA2m0pGked0W/3+/aCoGIz2bUFaYboNXKNJ6WXatin9In0BSC2qcTWSkSQhhOiwJEgS7ZYjSHKfasurqKPOYsOgU0gM9wfAvD/9sOX/3vffa2mH5yX17649N7NeT53Fnh/lCJIO7YEGc6v0SwghRMuQIEm0W86VbcGulW2Z9nykpHB/58a29fv3t+ryfwfHCjdHLaeEgb0Irq/GquhIs+8tR2gS+ASDzQLF+1qlX0IIIVqGBEmi3cqq1JK23Ve2ZRRrQVLXCG0USbVaMWdkYAq0B0lhybSWw0eSfPv2o1uFfXuS7BKtkaLIlJsQQnRQEiSJdssxQuM+3ZZZrOX7JNtrJFlyclDNZkyOHUtaMUjqHqqtosusyKTB1oAhOoru9VpwtGtPtquhs/L2jlbrmxBCiBMnQZJol2yqzevyf0e17eQIe9J2mrayzSfSpDVoxSDJkVBusVnIrcpFURR6B9u3J3FsdAsykiSEEB2UBEmiXTpUc4h6az16RU9coGsftszDptvM6fbl/0E2rUFoV1qLTtE5pwKd25PEhwGwt8rmahhrLwMgBSWFEKJDkSBJtEvZldp0VVxAHEadtj+bzaY6p9u6RbqPJKnoDVrw1JojSdA4L6lP3yR0qo1S1cgh+ya8RNm3J6kqgKpDrdo/IYQQzSdBkmiXHEFSYpCriGR+RR31Ddry//hQPwDq09PRm1R0qj0gCU1qdK+TyRkkVWhBUlj/PnSpKgIgJa9ca+QT6NpPTkaThBCiw5AgSbRLB6sOAp5B0gF7PlJiuD8GvQ5VVTHv348xwL6yLTAWjL6t2s/DR5JMPXrQzb49ye60PFdDR/K25CUJIUSHIUGSaJccI0kJQQnOYwfsU22OfKSGggJs1dUYg+17toW1Xj6Sg7Pqdnk6qqqiM5lc25Psd9/DTYIkIYToaCRIEu3SwUovI0nFh61ss+/Z5hev7eHWmknbDl2Du6KgUF5fTml9KQB9IrTRrD2Hal0NYx1BkpQBEEKIjkKCJNEuectJOuBc/u/ajgTAJ1rb160tRpL8DH50CewCuG1P0i1G++96PRarfZWbc3uSVLBaWr2fQgghjp8ESaLdqTRXUlZfBnhOtx1eSNIxkmQMtO+T1sor2xySQ7TnOoKkbgO642epw6LonHWdCEkCUxBYzVAk25MIIURHIEGSaHccU23hvuEEGLWASFVVskq0ICnJubGtFiQZ9PZApA2m28C1h5tre5K+zu1Jdju2J9HppKikEEJ0MBIkiXbHW9J2UZWZWosVRYGEMC1Iqk9PB1R0lmKtURtMt0HjFW6GmBi612l98tyexBEkSV6SEEJ0BBIkiXbH2/J/xyhSXLAvJoOOhtJSrCUlGPxsKDYL6AwQHN8m/T08SFIUhd5aHMfug6WuhrGywk0IIToSCZJEu+McSQp0jSRl24OkRMdUW7qWtO2XHKo1CEkAnb71OunGESTlVOVQb9WKWvaN13bc3VtudTWUMgBCCNGhSJAk2h1vK9saBUkHDgDglxCsNWijfCSACN8IgkxBqKhkVmQCMKCPFuAdUo2UVpu1htH9tdfKPKgubouuCiGEOA4SJIl2x1uNpEZJ2/YgySdS29etrVa2gTa9dviUW3j/PsRW27cnya/QGvoEQphsTyKEEB2FBEmiXbHYLORVayvD3BO3s0sPD5K0ERvnliRtlLTt4Fjhll5ur93UsyfdKhzbk7hX3nYkb0uQJIQQ7Z0ESaJdyavKw6ba8NX7EuUX5TyeXaJVr04M1za2dYwkGQxtu/zfoXtod8A1kqTz86OXogV2u9z3cIsdpL1KXpIQQrR7EiSJdsV9+b+iKACYG2zklTuCJH9Umw1zVhYAOos2pdWW023gGkk6UH7AeaxPmFYJfE9htauhYyQpX8oACCFEeydBkmhXHPlI7lNtuWW12FTwNeqICvShIT8ftb4exccA1YVaozYeSXLkJB2oOIBN1bYi6Z8cCcD+ej0Nzu1J7CvcDu0Ba0Or91MIIcSxkyBJtCtel//b85ESw/xRFMU51ebfIwoFFYz+EBDZ6n11Fx8Uj0FnoLahloLqAgC627cnMaNzbs5LaFcwBWrbkxSntWGPhRBCHI0ESaJd8bb8P+uw5f/1juX/ifbl/2HJYJ+aaytGnZGkoCTAlZfk17cvyY7k7YNlWkOdzlUKQJK3hRCiXZMgSbQr2VVNB0mOlW2WTG1lm0+0lvPT1lNtDs4yABVakGTs0oVutYcA2JV60NXQWXlbgiQhhGjPJEgS7Yaqql5zkg6WuJK2wTWSZApWtQZtvPzfwREkpZdpZQAUnY4+flofd2eVuBo6k7clSBJCiPZMgiTRbhTXFVPbUIuCQnygax8253RbmLb832KvkWQwasFTex1JAugTFwRAarnF1TBGygAIIURHIEGSaDcco0ixAbGY9CbncWchyQh/VIsF80Gtnc5qH51pJyNJ3UM8ayUBDOitBXuFNiNlNfbtSWIc25PkQk0JQggh2icJkkS74V4jyaGizkJZjTYKkxjmrwVIViuKnx+KvTI3oUmt3ldvkoOTASiqLaLCrG1FEtG/D7H2fdpS8iq1hj5BrrpOkpckhBDtlgRJot3wtmebY2PbiAATAT4GzI6k7eQElBr7JrEhibQHgaZAov2iAVdRSZ9evehWngvArvQCV2NHvSSZchNCiHZLgiTRbhyssidtu9dIOmz5v6NGkl83e10kUxD4hrReJ4+iW6jnHm76oCB6qlqNpN3u25NI8rYQQrR7EiSJdiOnKgfAe9L2YUGST2yA1iAkoc1rJLlzbE/inpfUO0zLr0opqHI1jJEyAEII0d5JkCTajdwqbVqqS2AX5zHHxrZJjo1tHdNt4XqtQWj7mGpzcK5wcwuS+neNAGB/nc5texL7SFJhimxPIoQQ7ZQESaJdsNgsFNRoOTvel/87RpK0IMnob9UahCTQnnQPbbzCrUf/bvg21HtuTxLWDYwBYK2Hkv1t0VUhhBBHIUGSaBcKqguwqTZMOhMRfhHO487l/+H+2GpracjT8nr0enuw0c6CJMd0W3ZlNhartipP255E63dKTrnWUKdzlQLI39Hq/RRCCHF0EiSJdsF9qk2naN+WNpvqUW3bnKWVCNCFhKDUFmoXtpOVbQ7R/tH4G/yxqlZnSQNTUhLdq7T+bk/NcTV2TLnJCjchhGiXJEgS7YIjads9H6mwsh6z1YZepxAX4utM2jYld0Upt++F1s5GkhRFcW1PYl/hpuj19PXV8o52ZRa7GksZACGEaNckSBLtQm5146RtRz5SfKgfBr3OmbRt6poEFfYRmXY2kgRNVN6OCwRgV3kDqmrfc84RJMl0mxBCtEsSJIl2wTHd5p607aqRZF/ZZh9J8k2MAKsZFB0ExbVuR4+B1xVuvRLQ26yU2/TkltdpB2MHAoq2PUnVoTboqRBCiCORIEm0C87ptgDXSNLBUi0fKSH0sBpJ0VrQRFAX0Btar5PHyFuQFNyvD10r8gHY6Uje9gmCiJ7a+7xtrdpHIYQQRydBkmgXvNVIyimzT7eF2UeSsrMAMAbbi0e2s3wkB+d0W0WGc2rNp3cvetrzqHZkuI0adRmqveZtac0uCiGEOAYSJIk211SNpJwybSQpPtQPW00N1kNFABhN9umqdhokJQYlolf0VFuqKazRVrUZwsLobdU2uN2+320Pt7ih2mvu1tbtpBBCiKOSIEm0uaZqJOXYp9viw/wwZ2ujMLqQEHT1WrDU3qptOxj1RucmvRkVbnlJkdr2JLsP1boaxw3RXmW6TQgh2h0JkkSba6pGUm6ZNmIUH+qHxT7VZkpMhHa6/N+dswxAWbrz2MDuMehUG0UNOgor7KNhcYO11/JsqC4+/DZCCCHakARJos15q5FUVKXVSNIpEBvi6xxJMiUlagEFtMvl/w7ekrdD+vUhsVKbftuZa0/e9g2B8B7ae8lLEkKIdkWCJNHmvNVIOmjPR4oN9sWo1zlHkowJHWskyX26zW/AAHqW2ZO3s0pdjZ3J2zLlJoQQ7YkESaLNeauR5J6PBDi3JDElRENtidaoHQdJ3gpKGpOS6FmrrWzbnpbvauxM3paRJCGEaE8kSBJtzluNJPeVbQCWbHuQFKElP+MTok1VtVPJIckAFNYUUmWuAkDR6RgQpvV/V36Vq3H8CO314KbW7KIQQoijkCBJtDmvNZLcRpJUqxVzrtbGFGjf0qMdjyIBBJuCifSLBDxHkwb2jAEg36xQXFWvHYwbolUPr8yFitxW76sQQgjvJEgSberoNZL8seTlg8WCYjSi19lHYNp5kATQI1RLyE4rS3MeixjYj3hn8naFdtAnEKL7a+9zZDRJCCHaCwmSRJs6lhpJrqTtBJRK+0hLBwiSeoZqW464B0m+/fvTs1ybXtx50C15O3649ipBkhBCtBsSJIk25a1GkqqqHjlJZns+kjEpEcocy/87TpC0v2y/85ipWzd6VWlJ29v3uSVvx4/UXg9ubLX+CSGEODIJkkSbciRtu0+1VdQ2UFXfoB0P9XMlbXss/2+/NZIcvI0kKXo9/UO1TXmd020ACfYgKXcL2Kyt1kchhBBNkyBJtClvhSQP2je2jQgw4WfSu5b/uxeSbKdbkrjrHqqVASioKaDC7AqIBvWIBSCnXqGk2qwdjOoLxgAwV0HR3lbvqxBCiMYkSBJt6mgr2wDMzpykeNfqrw4w3RZsCibGX1vN5r49SfSQ/iTYk7e3ZZdpB3V66DJMe39wQ2t2UwghRBMkSBJtytt0m3s+kqqqWBwjSVEBYLOAoofA2NbvbDN4Td4eNIg+pVrgt+WA235tiaO116x1rdY/IYQQTZMgSbQpb1uSOEeSQv2wlpVhq9KW/RsDtDwlgruA3tC6HW0mRxkAj+Ttrl3pV6uVPdiU6lYXKWms9pr9Z6v1TwghRNMkSBJtxmK1UFijTTt5HUkKcyVtG6Kj0dVqbTvCVJuDYyRpX9k+5zFFp2NIlDaVuL2wFpvNXiDTMZJUnAZVh1q1n0IIIRqTIEm0mfyafGeNpHDfcOdxj+X/WW7L/x35SMHxje7VXnkrAwAwoF8iPg1mKq0K6UXV2kG/MFdRyWyZchNCiLYmQZJoM/nVWp2guMA4Z40kOKyQ5EF7PlJikluQ1IWOwjHdVlRbRFldmfN40JDB9CzTyhlszXYdJ3GM9pq1tpV6KIQQoikSJIk2k1edB0BcQJzzWK3ZSrF9WXxCqL9rJCkxASq0JO+ONJLkb/R3btzrnrzt55a8vXm/29Ra0jjtVUaShBCizUmQJNqMY/m/e5DkmGoL9DEQ7GfAkqUFEh11JAm8J28boqIYiFY7aVNagauxI3k7dyuYa1qri0IIIbyQIEm0Ged0m5cgKT7UD0VRnFuSaIUk7SNJIR1nJAmgZ1jjMgAAI7tqeVh7Kxoor7VoB0OTtJEymwUOrm/VfgohhPAkQZJoM47pttgAV80j93wkW309DQXaKIsxoQvY9zzrSNNt4Ja8Xe6ZvJ0wYiBdqg6horA5y77ZraJAt4na+/SVrdlNIYQQh5EgSbQZR5DkUSPJviVJfKgfloNaYrMuIAC9oR5UG+gMEBDV+p09AY7ptrRSz5Ek/+HDGVCcAcCGdLeiko4gKUOCJCGEaEsSJIk2oaqq9+k2t5Eksz0fyZiUhFKhBVQExWlbeHQg3UO6o6BQWl9Kca0rGPLp3ZtB9rys9SluRSW7TdJec7dAXXlrdlUIIYQbCZJEmyirL6O2QQuIYgJinMfdc5IchSRNiYluK9s6VtI2gJ/Bj4QgrQCme/K2otczIs4fgO2HaqlvsGonQuIhoqc2cnZgdav3VwghhEaCJNEmHFNtEb4R+Oh9nMdzy+oA+0hStjbdZkpK7JDL/905p9wOS97uNbg3oXWVmFWFHQfdRo1kyk0IIdqcBEmiTXirkWS1qeRX2IOkUD/n8n9jQmKHXf7v4G2jW4CAkSOceUnrMtzzkuxTbukrWqN7QgghvJAgSbQJ92rbDoWVdVhtKgadQmSgj+fy/w4+ktRUkOQ3eDCDSw8A8MeuHNeJbhNB0cOhPVCa2VrdFEII4UaCJNEm8qoajyTl2vORYkN80aE6V7cZkzpuIUmHPmF9ANhbuhebanMe1/n7MzZCS0TfnFNFncWel+Qf7iosuW9xq/ZVCCGERoIk0SZyqxtX23bkI3UJ8aOhsBDVbAaDAWNsbIfc3NZdckgyJp2Jaks1OZU5Huf6jehPeG059arC5sxS14neU7XX1F9asadCCCEcJEgSbcLb8v+8cm0kKS7U17X8v0sXFJ0ClfYSAB10JMmgMziTt1NLUz3OBY4fy7BD+wD4I63IdaL3NO31wO9QX9Uq/RRCCOHS5kHSm2++SXJyMr6+vowZM4b164+8FcP8+fPp27cvvr6+DBo0iIULF3qc//bbbznnnHOIiIhAURS2bt3a6B6TJ09GURSPr7/+9a8t+bHEUTirbQe6qm07RpLiQg5b/l9VCLYGLUcnKLbxzTqIvuF9gcZBkt/gwQwt1/KOVu866DoR2RvCksFqlgRuIYRoA20aJH355ZfMnj2bJ598ks2bNzNkyBCmTp1KYWGh1/Zr1qzhmmuu4ZZbbmHLli1Mnz6d6dOns3PnTmeb6upqTjvtNF588cUjPnvWrFnk5eU5v/7xj3+06GcTTTNbzRTVaiMmXQJcI0O5zhpJvs6kbWOS28q2oNgOV0jSXZ9wLS9pT8kej+OKycS4+AAAdhyqo6LOvo+borhGk/b83Gr9FEIIoWnTIOmVV15h1qxZ3HTTTfTv35933nkHf39/PvzwQ6/tX3vtNaZNm8aDDz5Iv379mDt3LsOHD+eNN95wtrnhhhuYM2cOZ5111hGf7e/vT2xsrPMrODj4iO3r6+upqKjw+BLN45hq89X7EuoT6jyeV+42kpRlH0lK6NiFJN05k7dL9jY6123MUBIqC7GhsHqf25Rb/4u11z0LwFLXGt0UQghh12ZBktlsZtOmTR7BjE6n46yzzmLt2rVer1m7dm2j4Gfq1KlNtj+STz/9lMjISAYOHMijjz5KTU3NEdu/8MILhISEOL8SExOP+5lC476xraIoruPuOUneRpI6eJDUO7w3oCWtV5g9g+yACRMYXZACwNJdbluUJI6F4ASor5BVbkII0craLEgqKirCarUSExPjcTwmJob8/Hyv1+Tn5x9X+6Zce+21/O9//2P58uU8+uij/Pe//+X6668/4jWPPvoo5eXlzq9s+w9xcfy8FZKss1gpqjIDnoUkTUlJHb5GkkOwKdg5vZha4pmX5NOnD+MsBQAs35WPzaZqJ3Q6GHip9n7H/FbrqxBCCDC0dQfawm233eZ8P2jQIOLi4jjzzDPZv38/PXr08HqNj48PPj4+Xs+J4+MMktwKSebbp9r8jHoCLbXklWtbdJgSEmBH55huAy0vKbc6l9SSVEbFjnIeVxSFccN74F9RSwl+bDtYxrCkMO3koCtgzb9h76/ahre+IW3UeyGEOLW02UhSZGQker2egoICj+MFBQXExnpfwRQbG3tc7Y/VmDFjAEhLSztKS9ESvC3/z3WbanMUkdRHRKALCOjwNZLcOVa4pZSkNDoXesYZjCzQRpiWpbh9n8cOgsg+YK2Hnd+0Sj+FEEK0YZBkMpkYMWIEy5Ytcx6z2WwsW7aMcePGeb1m3LhxHu0BlixZ0mT7Y+UoExAXF3fkhqJF5FY1LiSZ51ZI0mP5P3Sa6TaAAREDANhdvLvROf8xoxlbqgXqS7dmuU4oCgyfob1f/z6o6knvpxBCiDZe3TZ79mzef/99Pv74Y1JSUrj99tuprq7mpptuAmDGjBk8+uijzvb33HMPixYt4uWXX2bPnj089dRTbNy4kbvuusvZpqSkhK1bt7J7t/ZDKDU1la1btzrzlvbv38/cuXPZtGkTBw4c4Mcff2TGjBlMnDiRwYMHt+KnP3V5HUmyL//vEuqLOcstadtmg4qOXUjSXf+I/gCkl6dTY/FcLKAzmZjUPRydzcqeUgsHiqpdJ4ddD0Z/KNytFZcUQghx0rVpkHTVVVfx0ksvMWfOHIYOHcrWrVtZtGiRMzk7KyuLvLw8Z/vx48fz2Wef8d577zFkyBC+/vprvv/+ewYOHOhs8+OPPzJs2DDOP/98AK6++mqGDRvGO++8A2gjWEuXLuWcc86hb9++3H///Vx22WX89NNPrfjJT12qqnpN3M4t91ZIMglqisBmAZQOXUjSIco/imi/aGyqrVFRSYCEc6Y4q2//uM1tlZtfKAy5Rnu/7t2W7VR9JWyaBx9fCC/1gWci4Y1R8P2dcKhxH4UQ4lShqKqM3TdHRUUFISEhlJeXH7XGknApqSth0peTUFDYeP1GTHoTADM/Ws+K1EO8eNkgxr75JDV//knc/71A6Oiu8N5kCIyFBzrHD+y7l93NioMreHjUw1zf33NVpa2mhjeuvJtXBl5Kj2ADSx89x1UmoXAPvDUGFB3cvgai+51YR6wNsPFDWP6slhDujaKDkbfAtBdAbzyx5wkhRAfT5tuSiFOLYxQp0i/SGSCB+3TbYcv/yzvPyjaH/pHalJu3vCSdvz/TBsRgsDawv6KB1IJK18novtD3AlBtsOTJE+tE6QH44Cz45UEtQIroCWfPhVt/g79theu+hj7na8/a8D58dpU24iSEEKcQCZJEq8qrajzVBq7E7Th/AxZ7/pgp0a2QZEjHT9p2cCRv7yre5fV8lwvPZ5S9sOSPmw6rx3XW06AzwL5fIWNV8zqQsgDemQi5W7RyAue/Aneuhwl/g4QREN4Nep0N13wG13wBBj/Yvwy+vAFs1uY9UwghOiAJkkSrcq+27VBZZ6GyvgGAyJpSsNlQ/PzQR0Z2qpVtDo7k7YzyjEbJ2wABY8dwRsV+AL5efwCL1eY6GdkTRmgLG1j4EJiPXCneQ4MZFj0GX14H9eWQMIqGqxdQbRtE5cpV1KeloVosntf0ORdmLtCSxtOXw/LnjuuzCiFERyZBkmhV3pK2HXu2hfgZMeRrQZEpIUHLxekkW5K4i/SLJNo/GhXV65SbYjBw7mn9Ca2rpLD+sJpJAJMfgYBoOJSiTZcdi7Js+Ohc+PNNAOpjzuPAskj2TbuKrBtv5ODtd5B+wYXsm3IGxR98gK3abWVdwki46HXt/e8vQ9rS5nxsIYTocCRIEq3Kufzfrdp2jls+ktmej2RMStJOdqJCku4GRQ4CYEfRDq/nY264lqnZGwCYt/SwQCogEi77j5ZUveV/sPrfTddOUlXY9iW8cxrkbEQ1BlGQNYz017ZSu2UbKArGxER8+vdD5++PtaiIwn++RMZll2M+cMCtw5fDqFna+58fkM12hRCnBAmSRKvylpPkKiTpiyXr8EKSWvXtzjSSBDA4SqvJte3QNq/njTExXNkzAEW18Wd+HfsPVXk26D4Jpvxde7/kCVhwH9SUeLbJ3gD/vQS+uw3qymjwTWb/D0GUrClAFxhI5F130WvVSnouWUz3b7+l97o/iXv+eQyxsZgPHCDjqqup2bLFdb+znoSgOCjNgDWvt9QfhRBCtFsSJIlWlVvtpdq225YkZvuWJMakRG0UpJOOJA2JGgJoQVJTVTgGzLyG0flaAvc7P3sJpk6/H855FlBg00fwSn/43+Xw1Qx4bYi2ei19OarOSCXj2fdxPZYKCDzzTLr//DNRd92JISrKeTvFaCT00kvoNv8rfIcMxlZezsE778J80J4X5hNkfx7w+0uu/zdCCNFJSZAkWk1dQx0lddpoh3uQ5JhuiwtxW/6fmAg1xWA1a42COteWMQMiBmBQDBTVFjkDx8P59uvHjIBSAL5NKeFA0WGjSYoC4++Ga7+CmEHQUAtpS2D3D9oSf50BW/8ryU2fyMEvDgA6ombPJuGN1zHGRDfZN0NUFF3nzcO3f3+sJSUcvOMOV47SwMsgcSw01GnTfEII0YlJkCRaTUGNloDsZ/AjxMe1k737dJtjJElb/m8fwQiIBoOJzsTX4Euf8D4AbCv0PuUGcNaDtzGicC9WRccrn/3hvVHvc+Cvv8MtS7QE62n/B9d+Rd0li0l/J52KP1PQBQaS8PZbRN42y1Wc8gh0fn4kvPUm+qhI6vfupfDlV7QTigKTHtLeb/oIqgqP63MLIURHIkGSaDXuK9vcf1A7pttilHrU2lrQ6TB26dIpV7a5c59ya4pPt27c1VsLEBfkWNi1J9t7Q0WBxNHaRrhjb6ciQ+HAjFlYcnMxdk0i+csvCJo8+bj6Z4yNJf7FFwEo/fxzV35SjzMgfoQ2miS5SUKITkyCJNFqvCVtq6rq3LctsrIY0H44KyZTp6yR5M4RJG0/tP2I7SbdPZPTytKwKTpmv7scc01tk21Vm41D/36dnHvuQa2tJWDCBLp99RU+PXo0q48B48cTcskloKrkPfGEVkdJUWCifTRp40dQX3XkmwghRAclQZJoNd4KSRZXmzE32FAUCCvSRo6cy/874ZYk7oZEa0HSnpI91DY0HfjoAgJ4/s5pBFpqSfWJ4OUHX8NS2Hiaqy41lcxrrqXorbcACJ85k8R330EfEtKo7fGIfuhB9OHhmNP2U/bNt9rBXudAeA8wV8L2L0/o/kII0V5JkCRajddCkvZ8pKhAHzh4+PL/zrclibsuAV2I9o+mQW046mhS0qA+PDI6EoD/BPRn/o33UfDCC5T/+CMln3xC1s03k3HxdGq3bUMXEEDcCy8Q88jDKAbDCffTEBZG5B13AFD01lvY6upAp4NRt2oNNnzQdJ0mIYTowCRIEq3GGSS5FZLMdS7/98NsD5KMSY4gqXNPtymKwqjYUQBsyN9w1PbXXTmJC3oEY9XpeWbAZaz+cQW5Dz1MwfMvUL1mLSgKQdOm0X3hz4ReMt3rPVRVpazGTPqhKvLL66hvOLa92EKvvAJjly40FBZS+tnn2sGh12j7uhXugqw/j+k+QgjRkZz4r5lCHCNntW23kaRcR7Vtr4UkO3fiNsDo2NH8nP7zMQVJiqLwyk0TKP1wHavTS3h40l3cUbaFy3xL8evZk5Dp0zEleAaUVptKSl4Fa/cXs2Z/ERsPlDr3yQMw6XWMTA7jvEFxXDkyEZPB++9NOpOJyLvuIu+xxyh+7z3CrrkanV8YDL4CNn+irXTrOu7E/jCEEKKdkSBJtApVVb1X27YnbXcJ9cOcbR9JSjy8kGTnDZJGxWgjSduLtlPbUIufwe+I7U0GHe/eOIp7v9jK0pQCXgsdwY+RAVw+IIH+VUYCMkqoqLWQWlDJ1uwy1qUXU1HX0Og+gT4Gai1WzFYba/YXs2Z/Me+u2s8zFw1kSl/vNZRCLrqQorfewnLwIOU//EDY1VfDsBlakJTyk5bA7RN44n8oQgjRTkiQJFpFcV0xZpsZBYUY/xjnccdIUqyfDmuxtrrNlJQEtaVacUSAoM4bJCUEJRAbEEt+dT5bC7cyrsvRR2MCfQy8P2MEH60+wL+W7CWjqJp//pp6xPaju4UzvkcEY7tH0DM6EF+jHlVVSS+qZllKAe//nkF2SS03f7yB+8/uzZ1Tejaqp6QYDITPmEHB889TMu9jQq+8EiVhpJbAXbJfC5SGXnPCfyZCCNFeSJAkWoVjqi3KLwqj3ug87giSohq0is760FD0QUGQb9/41T8CjL6t29lWpCgKo2JG8VP6T2zI33BMQZLjuptP68ZVoxL5bksOa/YXkX6oGnODjQAfA90iAxjQJZjR3cIZFB+CQd94Gk1RFHpEBdIjKpAbxibz/MIU/vtnJi8t3ktpjYUnLujf6JrQyy7l0BtvYD5wgKoVKwg64wwYcjUsfw62fS5BkhCiU5EgSbQKb0nb4Jpui6wsAuxTbdBp92zzZlSsK0g6XgE+Bq4f25Xrx3Y9oT74mfTMnT6QPrFBPP79Tj74I4PYYF9mTezu0U4XEEDYVVdS/P5/KPn4Ey1IGnylFiRlrNLKNnTS1YhCiFOPrG4TrcJbPlKD1UZBhT1IKtbOu5K2O/fKNndj48YCsKNoB+X15W3al+vHduWx8/oC8NzCFFbuPdSoTdi114JOR826ddRnZEBYMiSNB1TY+U3rdlgI0e5lZ2dz880306VLF0wmE127duWee+6h2J5i0Z5JkCRahbdCkoWV9dhUMOoVgnIPAO7L/zt/0rZDXGAcPUN7YlWtrM1d29bdYdbp3bl2jFbQ88H52yitNnucN8bFEThxIgBl87/WDg68VHvd/UOr9VMI0f6lp6czcuRI9u3bx+eff05aWhrvvPMOy5YtY9y4cZSUlLR1F4+oWUFSenp6S/dDdHJHWv4fE+yL1bmx7alRbftwpyecDsCqg6vauCdartIT5/enR1QAhZX1PP79zkZtQq+8EoDy777DZjZDv4sABXI2QlkT+8sJIVqMqqrUmBta/Us9zsKxd955JyaTicWLFzNp0iSSkpI499xzWbp0KTk5Ofz9738HIDk5mblz53LNNdcQEBBAfHw8b775pse9ysrKuPXWW4mKiiI4OJgzzjiDbdtce18+9dRTDB06lP/+978kJycTEhLC1VdfTWVlZbP/nJuVk9SzZ08mTZrELbfcwuWXX46vb+dNrBUtw1u17Vyvy/8TtJOn0HQbwOnxp/PRzo/4I+cPrDYrep2+TfvjZ9Lz6lXDmP7Wan7ekcfV+w5xeq8o5/nAiadjiImhoaCAyiVLCDn/fOg6HjJXQ8qPMO7ONuy9EJ1frcVK/zm/tvpzdz8zFX/TsYUOJSUl/Prrrzz33HP4+XmWN4mNjeW6667jyy+/5C37Vkr//Oc/eeyxx3j66af59ddfueeee+jduzdnn302AFdccQV+fn788ssvhISE8O6773LmmWeyd+9ewsPDAdi/fz/ff/89CxYsoLS0lCuvvJL/+7//47nnnmvW523WSNLmzZsZPHgws2fPJjY2lr/85S+sX7++WR0QpwZv02159pGkuGAfLLna9JrJsW/bKTTdBjA0eihBxiBK60vZVbyrrbsDwKCEEGaM0xLC5y7YTYPV5jynGAyEXqZNsZX/YJ9i63+x9ipTbkIIYN++faiqSr9+/bye79evH6WlpRw6pOU+TpgwgUceeYTevXtz9913c/nll/Ovf/0LgD/++IP169czf/58Ro4cSa9evXjppZcIDQ3l66+/dt7TZrMxb948Bg4cyOmnn84NN9zAsmXLmv0ZmjWSNHToUF577TVefvllfvzxR+bNm8dpp51G7969ufnmm7nhhhuIioo6+o3EKaHeWk9JnTbv7HW6Td8ADQ0oJhOG6GjPQpIhCa3e37Zg1BkZ12UcizMXs+rgKgZHDW7rLgFw75m9+X5LDnsLqvhsfRYzxiU7zwVfeCFFb71N9eo1NBQXY+h3EfzyMGSvg4o8CI5r+sZCiBPiZ9Sz+5mpbfLc43WsU3Tjxo1r9N+vvvoqANu2baOqqoqIiAiPNrW1tezfv9/538nJyQQFBTn/Oy4ujkIvG4IfqxNK3DYYDFx66aXMnz+fF198kbS0NB544AESExOZMWMGeXl5J3J70UkUVBcA4GfwI8THtSO9Y7ot2lIFaMv/FZ0O6srBotVNIujU+UE7OXEyAIszFx/3vP/JEuJvZPbZvQF4/bc06iyuvd58unXDd+BAsFqp+GWRFhTFj9BO7lvcFt0V4pShKAr+JkOrfx1eZPZIevbUitKmpKR4PZ+SkkJYWNgxDapUVVURFxfH1q1bPb5SU1N58MEHne2MRqPHdYqiYLPZDr/dMTuhIGnjxo3ccccdxMXF8corr/DAAw+wf/9+lixZQm5uLhdffPGJ3F50Eu5Tbe5/wfLsm9tGVWmjTKYERz6SfRTJLwxM/q3X0TY2JXEKJp2JjPIMUkubrqDd2q4alUR8qB+HKuv5Yn2Wx7mQCy8AoOKnn7QDfaZpr3sXtWYXhRDtUEREBGeffTZvvfUWtbW1Hufy8/P59NNPueqqq5w/F/7803Oj7D///NM5VTd8+HDy8/MxGAz07NnT4ysyMvKkfYZmBUmvvPIKgwYNYvz48eTm5vLJJ5+QmZnJs88+S7du3Tj99NOZN28emzdvbun+ig7IW9I2QF6ZNpIUXqIFRUZnPtKplbTtEGgKZGKCtrR+YcbCNu6Ni8mg4/bJPQB4Z2U69Q2u0aTg884DnY7abdswZ2VBb3uQlL4CLLVe7iaEOJW88cYb1NfXM3XqVFatWkV2djaLFi3i7LPPJj4+3iOhevXq1fzjH/9g7969vPnmm8yfP5977rkHgLPOOotx48Yxffp0Fi9ezIEDB1izZg1///vf2bhx40nrf7OCpLfffptrr72WzMxMvv/+ey644AJ0Os9bRUdH88EHH7RIJ0XH5i1IqrNYKbbX34nIzQS8FZI8NZK23Z3b7VwAFmUswqY2f4i4pV0xMoHYYF/yK+r4etNB53FDVBQB9jyC8gULIGYgBCeApQYyfm+r7rYam02lsLKOvQWVFFfVt5tpUiHai169erFx40a6d+/OlVdeSY8ePbjtttuYMmUKa9euda5KA7j//vvZuHEjw4YN49lnn+WVV15h6lQt70pRFBYuXMjEiRO56aab6N27N1dffTWZmZnExMQ09fgT1qzE7SVLlpCUlNQoMFJVlezsbJKSkjCZTNx4440t0knRsTlqJMUEuL6RHduR+Jv0+KRmYMZbIclTayQJYGLCRAKMAeRV57Ht0DaGRQ876c+02qzkVmt/5tH+0fjofRq18THoufX0bjz7cwofrT7AtaOTnEPkwRdeQPXq1VT8tIDI229H6T0VNn6gTbn1Puek97+1qarKqn1FfL3pIMtSCqgxu0bWQv2NnNk3hstGxDO+x8mbAhCiI+natSvz5s07arvg4GC++uqrJs8HBQXx73//m3//+99ezz/11FM89dRTHsfuvfde7r333uPoradmjST16NGDoqKiRsdLSkro1q1bszsjOidvW5I4l/+H+NKQpeW5nIpbkhzO1+DLmUlnAvD13q+P0rr5LDYLiw4sYtbiWYz+dDTnfXse5317HiP/N5Ibf7mRH9J+wGKzeFxz5ahEAkx60gqr+CPN9fc/6KyzUXx9MWdkULdrt2vKbe+v2krFTmT/oSpmfLieGz9cz0/bcqkxW9EpEOKnJYuW1Vj4ZvNBrn1/Hde+/ye7cyvauMdCiBPRrCCpqSHlqqoqKSwpGsmvaVxtO8ceJMUGGLBVV4OiYDw8cfsUnG4DuLrP1QD8kvELxbUtv7fR+rz1XP7j5Ty48kH+zPsTs82MSWdyjiBtLtzM46sf59qfryW1xJVAHuxr5IqRWiD70eoDzuP6wACCzpgC2BO4u50OBj+oOAgFjat1d1RLdxdw0et/8Pu+Ikx6HTPHJ/P9nRPY++y5bHvyHFKfncaXt43l2jFJmPQ61uwvZvqbq/nP7+nYbJ0rWBTiVHFc022zZ88GtLnBOXPm4O/vWnlktVpZt24dQ4cObdEOio5NVVWvW5I4pttidNpohSEmBp2PfZrnFNuS5HCDogYxOHIw24u28/Xer/nLkL+0yH2tNiuvbn6VebvmARDmE8YVfa7g/O7n0zWoKzpFR351PgvSF/Dx7o/ZU7KHq3++mhdOf4Fpydro0I3jk5m35gC/7SnkQFE1yZEBAARfcCEVC3+hfOHPRD/0IEqPKZC6UJtyix3UIv1vS//7M5MnftiJqsKYbuH84/LBdI0I8GjjY9AzpnsEY7pHcMfkHjz9026W7C7g2Z9T2HawnJeuGIyPoW0rqQvRXh04cKCtu+DVcY0kbdmyhS1btqCqKjt27HD+95YtW9izZw9Dhgw5pnlHceoory+ntsFeNNIjJ8l+zF4PyTnVBqd0TpLDtf2uBeDL1C+xWC1HaX101ZZq7l1+rzNAuqrPVSy4dAF3D7ub7iHd0ev0KIpCXGAcswbP4vuLv2dSwiQabA08vOphvtv3HQDdIgOY1FurafLlRtcebYGnTUAfEoL1UBE1GzZCb3uRu9SOXwrg+y05PP69FiBdOyaJ/906plGAdLiEMH/eu2EEz04fiEGn8NO2XG6et4Eac0Mr9VoI0RKOK0havnw5y5cv58Ybb+SXX35x/vfy5cv59ddfeffdd+nVq9fJ6qvogBwr2yJ8IzwSgnPty/8jq7XpJKMjSKqrALN9M8JTdCQJ4Jyu5xDlF8Wh2kN8ve/EcpNyq3KZ8csMVhxcgUln4h8T/8HjYx8n2BTc5DWRfpG8NuU1Lut1GTbVxpNrnmRF9goArhmt/b/6etNBLPatShSTicCzzwKg4tdF0MseJOVsgqrmV7tta2vSirh/vraB5szxyTw3fSBG/bH9s6koCteP7cqHM0cRYNKzOq2Yv/x3k0cJBSFE+9asnKSPPvqI4OCm/4EVwsHbnm3g2pIkokSbijMdvrLNNwR8Alunk+2QUW/kL4O1aba3tr5Fhbl5CcDbDm3jmp+vYW/pXiJ8I/ho2kfOMgNHo9fpeXLck1ze+3JUVB5e9TB7S/dyRt8YIgNNHKqs57c9rgAoeJp238rFS1ADoiFuKKC2y+rbtQ21LDqwiBfXv8jdy+7m3uX38tyfz7EkcwlVZq0CfEFFHX/7YgtWm8rFQ7sw54L+x1Vt2GFi7yg+uWUMfkY9v+8rYvaX2yRHSYgO4phzki699FLmzZtHcHAwl1566RHbfvvttyfcMdE5NFlI0p6TFJZ/AHAbSXKsbAs6dUeRHC7rfRmf7fmM9PJ0/rP9P8weOfu4rl+YvpAnVj+B2Wamd1hv3jjjDeICj2+bF0VReGzMY2RXZLMufx33Lr+X+RfO57IRCby7Mp0vN2QzdYAWAAeMGa1NuRUXU7NhIwG9p0HeVi1IGnb9cT33ZKlrqOPjXR/zacqnlNaXNjr/ReoXBBmDuLL31azaMIyiKjN9Y4N48bLB6HTHHyA5jOgaxnszRnDLvI38vCOPXjGB3HtW7xP5KEKIVnDMI0khISHO36JCQkKO+CWEgyNp230kqaLOQlW9lpsRdmAf4GX5f8ipm4/kYNAZuH/k/QD8d/d/2VxwbBXsVVXlza1v8vDvD2O2mZmcMJlPzv3kuAMkB6POyMuTXyYuII7symxeXP8iV9lXua1ILaSwUgt4FaPRNeW26BfodbZ2g/0rwNr2uTipJalc8/M1vLH1DUrrS4kPjOeavtfwxNgneHzM41zb91oSgxKptFTy9qoUNmdW4GuEt64bjm8zNvU83Om9onj2koEAvLp0H7/uyj/hewohTq5jHkn66KOPvL4X4ki8rWxzTLWF+hkw5mtBkWsk6dRe/n+40+NP59xu5/JLxi/cv/J+vrrgK6L8m94Msry+nDmr5/Bb9m8AzBwwk3uH34ted2I/5EN8Qnj+tOe5+deb+S7tOyYlTGJoYihbs8tYsC2Pm0/T6qMFTzuX8q+/oXLJUmIfexTFLxxqS+Dgeug6/oT6cCJWHVzF7BWzqbfWE+EbwQOjHmBa8jQMOs9/Ah9WH+bLnct47HNtQYEa8S0/H0znrsi7mjXVdrgrRyayO7eCeWsO8OD8bQyMDyE+1O+E7yuEODmalZNUW1tLTU2N878zMzN59dVXWby4/eUeiLblnG4LdC8kqY08xPppP7h1QUHoQ0O1k6dwIUlvFEXhqXFP0TO0J0W1RcxaPIvMikyvbf/I+YOrFlzFb9m/YdQZeXr809w/8v4TDpAcRsaO5OaBNwPw7LpnmTZI207gh225zjYeU26bt0CPM7QTaUtbpA/NsShjEff8dg/11nomdJnANxd9wwXdL2gUIAEoKCxcH4xqM9Ilsgpj6Abe2/4ej/z+SIusMgT4+/n9GJoYSkVdA/d9sRWr5CeJU9y8efMIdfwMOEYzZ85k+vTpJ6U/7poVJF188cV88sknAJSVlTF69GhefvllLr74Yt5+++0W7aDo2LzlJOXal//H6rUpGFNiouu3dFn+34i/0Z/XprxGlF8U+8v3c/WCq3l729uklqSSXpbOT/t/4rbFt3H70tvJqcohPjCe/577Xy7tdeTcwea4Y+gdJAcnU1RbRJbtB/Q6hW3ZZWQUaSMvitFI0DnaNFvFokWuKbd9S1q8L8didc5qHv39URrUBs7rdh6vn/k6EX4RTbZfllLoLBb56cwLmDvhaQyKgYUZC3lw1YONqpAfVUM9pC2DxU/Ap1fCW+MxvjWa16wvEKCzsP5ACe99txgazCf4SYVon5oKZlasWIGiKJSVlXHVVVexd+/e1u/cMWhWkLR582ZOP/10AL7++mtiY2PJzMzkk08+aXJPFXHqsdgsHKo5BHjmJDlGkqIatFVERvcaSad4IcmmJAUn8eUFXzIsehhVlire2voWl/90ORf/cDGP/fEYa/PWYlAMzOg/g/kXzmdA5ICT0g+T3sSccXMAWJD5GYOTTAD8uNU1mhQ0VSs8Wbl4CWryJO1g/naoLDgpfWrKnpI9zF4xmwa1gfO7n88Lp7+AUWdssr25wcbzC1MAuPm0bnSLDOCSXpfw+pmvY9QZWZa1jEdWPYLVdgxL+CtyYckceKk3/O9SWPNv2PcrFO6C4n10Lfmdp3TaBuCvbqgh458T4de/Q6XkKYlTj5+fH9HR0W3dDa+aFSTV1NQQFBQEwOLFi7n00kvR6XSMHTuWzEzvUwHi1FNYU4iKilFnJNzXtdOzIycpqkpbXeRc/g8yknQEUf5RfDD1A54/7XnGxY0jyBREsCmYnqE9uW3wbfww/QceHPUgQaagk9qPUbGjuLjHxaioVJmWA/DD1hzndkUBY0ajDw3FWlJCze4D9lIAwP5lJ7Vf7srry7nnt3uoaahhdOxo5o6fi0458j93n67LJL2omshAE3dO6eE8flr8abw65VWMOiOLMxfzjw3/aHJrJix1sOqf8O/hsPo1qCuDwBhtdd/5r8D138LMhXDd11x+0cWcHlJIPSYeqbgM25o34bWhsOwZsNS23B+G6JxUFczVrf91EvZj9Dbd9uyzzxIdHU1QUBC33norjzzyiNcdPV566SXi4uKIiIjgzjvvxGJpmWlxh+PalsShZ8+efP/991xyySX8+uuv3HfffQAUFhZK/STh5NjYNjYg1uMHlGO6LbxUO+8cSaqvhPpy7b2MJHll1Bm5sMeFXNjjwjbtx70j7mVJ5hLyrIsw6seTXlTNjpxyBieEalNuZ59F2fyvqVj0KwGTzraXAlgCQ6896X2zqTYe++MxcqtzSQxK5F9T/oVR3/QIEkCNuYE3l6cBcN/ZvQny9Ww/MWEiz5/+PA+ufJDP9nxGl8Au3DjgRs+bFO2D+TNd+9UljIbT7tOqj3vJC1OA53vWcM6/VrHO0p9vQ2ZwecXH8PvLsPtHuPRdiB/R3D8G0dlZauD5Nvh38rFcMB254vyJ+vTTT3nuued46623mDBhAl988QUvv/wy3bp182i3fPly4uLiWL58OWlpaVx11VUMHTqUWbNmtVhfmjWSNGfOHB544AGSk5MZM2YM48aNA7RRpWHDhrVY50TH5m1jW3DVSIrI10YdTUlJ2okKLWjCJxh8JdhuzyL9Irl10K0oejM+wdomuD+4T7lNc0y5LUbtZk/e3v9bq5QC+O/u/7Lq4CpMOhOvTH7liJXFHT5Zm0lRlZmkcH+uHJnotc205Gk8MPIBAF7Z9Ap/5v3pOrn3V3hvshYg+UfCpf+BWxZD3/O8BkgOieH+3HOWtkvBi7UXUXnJ/yAwFor3wYfTYOvnx/7BhWinFixYQGBgoMfXuec2XdT29ddf55ZbbuGmm26id+/ezJkzh0GDGu8BGRYWxhtvvEHfvn254IILOP/881m2rGVHrJs1knT55Zdz2mmnkZeXx5AhQ5zHzzzzTC655JIW65zo2LzVSLLZVGdOUliWViPJmHBYjSQZReoQbuh/A/P3zic74E8o7cNP23J57Lx+6HUKAWPGaFNupaXU5NoI8A3Rpp5yN0Pi6JPWp72le3lt82sAPDz6YfqG9z3qNVX1Dby7cj8Afzuz1xG3HZnRfwb7Svfxw/4feGjlQ3x5wZfE7fsNfrwbVCsknw6X/QeCYpu8x+FumpDMF+uzOFBcwxs5PXn0zj/hh7tgzwL4/q9QlgmTHoYWKEEgOhGjvzaq0xbPPU5TpkxptKhr3bp1XH+99yKzqamp3HHHHR7HRo8ezW+//eZxbMCAAej1rl9C4uLi2LFjx3H370iaNZIEEBsby7Bhw9DpXLcYPXo0ffse/R8lcWpwTLe5jyQVV5sxW20oQERlMRiNGOPsP1AkSOpQfA2+3DfiPvSBe1H0NRRW1rN2v7YXn2IwEHS2fZXbr0tcpQBO4io3i9XCY78/hsVmYXLCZK7ofcUxXffftZmU1ljoHhnA9KFH/t5TFIXHxz5Ov/B+lNaX8v3XV8APd2gB0pBr4YbvjitAAvAx6JlzYX8APlydQXatD1z5X5j4oNZgxQuw/LmTkgsiOjBF0aa9WvurGcF6QEAAPXv29PiKjz/xvFOj0XNaXFEUbDbbCd/XXbOCpOrqap544gnGjx9Pz5496d69u8eXEOB93zZn0ravDoNqw9SlC4rjNwEpJNnhTEuextDogRiCtgOwYLv7lJu2yW3lkiWo3R31kk5ekPT+jvdJLU0lzCeMJ8c/eUzFH+ssVj74IwOAO6b0xHAMm9f6Gnz515R/cXE93HbA/lvrmNth+ltwlNynppzRN4bTekZisaq8smQv6HRwxuNwzrNag1X/hLVvNOveQnQ0ffr0YcOGDR7HDv/v1tKs6bZbb72VlStXcsMNNxAXF9cilWhF5+OtRlLeYTWSjI58JHAbSUponQ6KE6YoCg+NeoirDjyBpWwsP+/IYe70gRj1Oo8pt9qKcPwBcrdA1SEIbLpqeHOklabx/o73AXhs7GNE+kUe03XfbD5IUVU9XUJ8udg+iqTabNRu3Ubl4sXU703Fkl+AYjRiiIjAd8hgAidMoEuswjMFeeiArwMDMPYcy8Un+O/gw9P68scbf/D91hxum9idfnHBMP5uUG1aOYHFT0BoEvS/+ISeI0R7d/fddzNr1ixGjhzJ+PHj+fLLL9m+fXubDMI0K0j65Zdf+Pnnn5kwYUJL90d0It62JMlx1kjSig+aEt0CIhlJ6pAGRw3m/AF9+Canksq6IFanFTG5T7Rzyq1s/nzKl2/Av8sgyN+hJXAPuarFnm+1WXlyzZM02BqYnDiZqV2nHuN1Ku+uTAdg1sTuGHQKlStWcOjll6nfl9aofT1QvWYN5R+9SbfzSjGYLByI7cdc3yp81j/PsJjhJAUnNbruWA1KCOH8wXH8vD2Plxen8p8bR2knxv8NyrJgw3/g279AZG+I7tfs5wjR3l133XWkp6fzwAMPUFdXx5VXXsnMmTNZv359q/elWdNtYWFhhIeHH72hOGVVmiupsmjFIj0LSWojSZHVJQAYE91+qDgLSUqNpI7mvhH3YAreDcCHf25xHg8+177KbckS1O5nagdbeMrt8z2fs71oO4HGQB4f8/gxj2wvSykgq6SGUH8jl/cNI+eeezn419up35eGLiCA4IsuJO7550ma9xGJ779P7NNPE3z+ucSfXoHBZKa+3EDDgiCG+vWmtqGWh1Y9dMJbl9x/dm90CixNKWRnjr0chqLAtBeh+2RoqIX5N4G55oj3EaK9mDdvHt9//32j45MnT0ZVVUJDQ5k5cyZlZWUe55944gkOHTpEZWUlH3zwAbt376Znz55HvO+rr77KihUrWrT/zQqS5s6dy5w5czz2bxPCnWMUKcQnBH+31RCOGkmRpdp5z0KS9iApRIKkjiY+MJ7zh2gjhqv3VlFtrgfAf/Ro9GFhWEtLqTPbRwjTlsGxVK0+BjlVOfx7i1bl/74R9xETEHPM136yVitBcUWvIAquvpLKxYvBaCTi1lvo+dsy4v/xD0IvvYSAsWMJPP00wq66kviLYvGPqEVVfMjd0Q1L2kFmvZRCoNXIruJdvLH1xPKGukcFcuEQ7c/p9d/2uU7oDXDp+xAQDYdS4NfHTug5QrRnNTU1vPLKK+zatYs9e/bw5JNPsnTpUm688cajX9zCmhUkvfzyy/z666/ExMQwaNAghg8f7vElhLd8JIBc+3RbeH4W4FZI0lytLREHmW7roOZMvg6doQqr1ZeXVv0EeK5yK1uXpdXAqi2B3K0n/DxVVXlm7TPUNtQyImYEl/e+/JivTSus4o+0InQKnP7eXCzZ2Ri7dCH5s0+JfuAB9CEhjS/K3qCtNAOU6f8m6bvlhN84g4gqhb9+r31ff7TzI8/6Sc1w15SeKAr8uquAlLwK14nAaLhMy7ti00eQvvKEniNEe6UoCgsXLmTixImMGDGCn376iW+++Yazzjqr1fvSrJyk1th5V3RszhpJ/p7LoR2J25HF2qiRKcGek+QoJGkK1H6Qig4n1DeY0T2N/LkHvtqcxv0Tqwg0BRI8bSplX31F5ZLfiL13Esqen7Qpt4QTqyb9fdr3rMldg0ln4slxTx512xF3//tTG0UaU5hKVF4Gvv37k/jBfzCEhXm/wFJnX+pvg0FXwJCr0QMxjz5K0FlnYXjoYc7aXMDS4ToeWXo/312xgDDfJu51FL1igjhvkJab9M7K/bx2tVuB3u6TYeQtsPED+OlvcPtaMB1/3Roh2jM/Pz+WLl3a1t0AmhkkPfnkky3dD9HJeFv+b26wUVipTcNE1Zahj4pE52//B77ioPYa3EWK5nVgfzv9NP7cs4Hqsp68vfU9Hhw922PKrZ5u+AKkLYXJjzT7OYU1hfxzwz8BuHPYnXQL6XaUK1yq6huYv1Ebybxg7wr8hgwh8f330B9pS6WVL0LRXm0ftnP/4XHKf9Qoun3zNX+9/15SijaTE1nBI/+7nrdv+hGdvulq20dy+6Qe/Lw9jwXb83hoWl/iQ/1cJ896CvYugtIDWmmAs+TfYyFOlmYXkywrK+M///kPjz76KCUlWhLu5s2bycnJabHOiY7LOd0W6JpuK6ioQ1XBpEBIfTUm96Rt2di2UxjbLYpQfwVsfny8YR3p5enalJu9ZlLpFq3YJAc3Qk1Js56hqipz/5xLpaWSAREDmNF/xnFd/826A1SbbSRUFjI6sIHEd985coBUsFvbrBa0TWr9Gy9aMYSH0/v9D5ljOx9Dg8oaYxbvPX8Vtrq64+qbw8D4EMb3iMBqU/nIXsfJyTfYFaitfRNKZVNxIU6WZgVJ27dvp3fv3rz44ou89NJLzqz0b7/9lkcffbQl+yc6KG/L/x2FJKP1DehQMSV6SdqWIKlD0+kUpg/tCkBdxQBeWPcCqqoSctFFAJQv/hM1sh+gaqUAmmHRgUWsyF6BQWfgmQnPYNAd+4C4qqp8tFBbfXdRwVa6vvcu+sN2Hz/sAlj0sFZRu+8F0O+CJpsqBgPjH3yRO4K0Nu/H7WHNXdfRUFp6zP1zN2uiVhPm8/VZlNcetmqu7/nQbSJY62GpjCQJcbI0K0iaPXs2M2fOZN++ffj6+jqPn3feeaxatarFOic6Lm9BkmNj22irViPJ6BEkSY2kzuL8wdr/c2vlANbmbGBhxkL8hg7F2DUJtbYWs84+NZZ2/DkHJXUlvLBOS56+bfBt9A7rfVzXL/3oOw7gj19DPTfec5VnoO7NngWQsQr0PjD1uWN6xi1XPM/ogAGYjQov9Uhl/zVXY87KOq5+AkzuHUXvmECqzVa+3nTQ86SiwNQXQNHBru+0pHIhRItrVpC0YcMG/vKXvzQ6Hh8fT35+/gl3SnRsVpuVguoCwDMnKcexJUm19pu1x/L/ctm3rbMYkRRGTLAPqs0Xa3Vv/m/9/1FSV0LIhdpoUtkOrX4WaUvhOPZZUlWVOavnUFpfSq+wXtw68Nbj6ld9egbzVmlFIs8PsxBz+vgjX2Cpg1//rr0ffzeEJR/Tc3SKjufPfY1gQyD7uyh8kXCQA1ddTe3WrcfVX0VRuHG89sz/rj2AzXbY3m2xA7X94gBWPH9c9xZCHJtmBUk+Pj5UVFQ0Or53716iolp2uwHR8RTVFtGgNqBX9ET5ub4fnCvbSrV8Ja8jSSGyJUlHp9MpnDdIG03yq59AWX0Zz617juALtWmokt/3oxr9ofoQ5G875vt+tuczVh5ciUln4oXTXsB4HPukqRYLWx59irUxWqXqWTdPO/pFa1+HskwI6gKnzz7mZwHEBMTwxHhtGuzbCTr2+JWSeeNMKhYvPq77TB8aT5CvgQPFNazad6hxg0kPgs6gTV1mnVjpASFEY80Kki666CKeeeYZLBZtnlxRFLKysnj44Ye57LLLWrSDouNxJG1H+0ej17lW9zhrJB2yL//3mpMkI0mdwQX2Kbfa8l7oVV+WZC5hQf1G/EeOhAYVs6LlLR3rlNv2Q9t5eePLANw/8n76hPc5rv4Uvf0235nDsSk6xiYG0SfOSx0kd+U58Psr2vuzn9F2Pz9O07pN49xu52LTwZtXBlJrqyfnnnsp+fjjY75HgI+BK0Zof08+XnOgcYOwZBh6nfZ+uYwmifZn5syZHbpsULOLSVZVVREVFUVtbS2TJk2iZ8+eBAUF8dxzxzZvLzovb/lI4Ja4XVOK4u+PPiJCO2Gp1QoMggRJncSwxDDiQnypNaucE30PAM+ve57CS7X9Hst2aXlp7Dt6kJRfnc89y+/BYrNwRuIZXNP3muPqS93u3eS+/wGLkscCMHNSr6NftPw5sNRA4lgYdOxFKg/39zF/J9o/mly/Or66rTeoKgUv/B/5zz+Paj22quM3jNMCyhV7D5Fd4mWXg4kPaKNJGSshZ1Oz+yqEaKxZQVJISAhLlizh559/5t///jd33XUXCxcuZOXKlQQEHP9vXKJzcRaSDPAsJOkIkqJqyzAlJbn22HJMtRn9wTe0tbopTiL3Kbe68v5MSpiE2WbmMevXFHcJpCLFvjT+4HqobXr1V3l9OXctu4ui2iJ6hvbk+dOfP+a92QBUs5ncx/7OHzEDKfcJJC7El7P6HWXrkqI02Pa59n7qcydUtyvEJ4RnJzwLwILgdDIe0kbaSz/5Lzn33outtvao9+gWGcD4HhGoKsw/PIEbIDQJBtoDuTUnti2KEK1p5cqVjB49Gh8fH+Li4njkkUdoaGgAYMGCBYSGhmK1/zKxdetWFEXhkUdc9dVuvfVWrr/++pPax+MOkmw2Gx9++CEXXHABf/nLX3j77bf5448/yM3NRVXVo99AdHretiSpqm+gok775o+0B0lO7sv/pZBkp+FY5bYspZAnxjxDcnAy+TUFPHetkSIMWBpCtArW6Su8Xl9eX86sxbNILU0l3Dec1894nQDj8f0SVvTe+9Tv2cNPvScBcN2YJAz6o/yzt/L/tH71ngYJI4/red6M6zKO6/tp/5D/M3A1AS89g2I0UrlkKZkzZ9JQXHzUe1w9Wvv7Mn9jNtbDE7gBxt+lve7+XuomnSJUVaXGUtPqXy31cz4nJ4fzzjuPUaNGsW3bNt5++20++OADnn1W+6Xi9NNPp7Kyki1btJIdK1euJDIy0mMD25UrVzJ58uQW6U9TjqvitqqqXHTRRSxcuJAhQ4YwaNAgVFUlJSWFmTNn8u2333rd7VecWrwFSXn2UaRArAQ01GPq6q2QpEy1dSbDEkOJD/Ujp6yWLQfMvH/O+8z4ZQYHq/N4bKae/2y30iMBbcptwCUe1+4u3s0DKx8guzKbcN9wPjjnAxKCji+pvy41laJ33iE1NJE9wfGY9DpnsNGkgt2w42vt/ZSW20T2nuH3sCZ3Denl6fwrag3PfvQhOXfeRd227Ry4+hoS33sXn25NVw0/p38Mof5G8srrWLX3EFP6Rns2iB0E3adA+nL482049/9arO+ifaptqGXMZ2Na/bnrrl3nsWl5c7311lskJibyxhtvoCgKffv2JTc3l4cffpg5c+YQEhLC0KFDWbFiBSNHjmTFihXcd999PP3001RVVVFeXk5aWhqTJk1qgU/VtOMaSZo3bx6rVq1i2bJlbNmyhc8//5wvvviCbdu2sXTpUn777Tc++eSTk9VX0UE4c5Lcqm3n2mskRTV4q5EkhSQ7I0VROG+QNuX68448YgNi+c85/yEpKImiEIV/DtO22rDu+xXVZkNVVbIrsnlx/Ytcv/B6siuziQuI48OpH9IzrOdxPVu1WMh79DFoaGDRBG2K67xBsUQG+hz5whUvACr0uwjihhz3Z26Kr8GX509/HoNiYGnWUpaG5tD1888xJiRgyc4m8+prqNm8uenrjXouHaYFiV9saKLm0jj7aNLWT7UNo4Vox1JSUhg3bpzH9PmECROoqqri4EFtWnnSpEmsWLECVVX5/fffufTSS+nXrx9//PEHK1eupEuXLvTqdQw5hifguEaSPv/8cx577DGmTJnS6NwZZ5zBI488wqeffsqMGce3TYDoXBwjSTH+rtwPZ9J2tZagbUrq6rpAaiR1WucP7sL7v2ewLKWAWrOVpOAkvrjgCx75/q+sVbdRU6TgX32Imz4ZRaqPD5XmSue1UxKnMHfCXEJ8jrISzYui99+nbvduKiNj+c03EawqM+w1h5qUtx1SfgSUFh1FchgQMYDbh97O61te58X1LzLm4m9J/vILsm+/g7rt28maeRNd/vEiwdO8lye4YmQCH67OYPmeQ5TXWAjxP6wEQo8zIKwblGZoo2EjbmzxzyDaDz+DH+uuXdcmz20tkydP5sMPP2Tbtm0YjUb69u3L5MmTWbFiBaWlpSd9FAmOcyRp+/btTGviLzDAueeey7Ztx173RHQ+NZYayurLAM+RJMd0W0SZVmTS63RbiIwkdTZDEkKID/WjxmxleWohAEGmIF6/7BMeWxnNLlUb2RlVVkiluRIFhbFxY3n3rHd5bcprzQqQ6nbvpuittwFYfd39mK0qg+JDGJYYeuQLHUvoB14G0f2O+7nH4uaBNzMochCVlkqeWfsM+vBwun48j8Azz0Q1m8m59z6KP/jQa95Hv7hg+sQEYbba+GVnXuOb63Qw8ibt/cYPtC1VRKelKAr+Rv9W/zqehRNH0q9fP9auXevxvb569WqCgoJISNBGTR15Sf/617+cAZEjSFqxYsVJz0eC4wySSkpKiIlpemVITEwMpc3cp0h0Dvk12lRbgDGAIGOQ87hzuq26GMVkwuD+fSTTbZ2WoihcMEQLlr/f4tr8WqfXc+bU24j/U9vW6GZdFN9c9A0brt/A++e8z/j48c36x9hmNpP78CPQ0ID/Oecwv1z7rXfGuK5Hvt/BTbD3F22bj8mPNN3uBBl0BuZOmItRZ+T3nN/5cf+P6Pz8SPj3a4TZV+kU/vOfFMx91muJgIuHaaOt329tYiPxoddrW6jkbYOcpqfvhGhN5eXlbN261ePrtttuIzs7m7vvvps9e/bwww8/8OSTTzJ79mx0Oi00CQsLY/DgwXz66afOgGjixIls3ryZvXv3tr+RJKvVisHQ9AydXq93Lt8Tpyb3GknuP5Tcl/8bExNRdG7fepK43ak5cmmWpxZSWm12Hg+59FJqrV1RbeBbsp/eqhEf/VFyho6i6PU3qN+3D314OLuvu5vcsjpC/Y1cOOQo31vLtRU1DL4aIk9ujkOP0B7cMfQOAF7c8CKFNYUoej0xf3+M6EceBkWh9LPPOHjX3dhqPOsiXWT/HOsySpwV7D0ERMCA6dr7LZIfKtqHFStWMGzYMI+vuXPnsnDhQtavX8+QIUP461//yi233MLjjz/uce2kSZOwWq3OICk8PJz+/fsTGxtLnz7HV1S2OY57ddvMmTPx8fH+D1l9fX2LdEp0XE3VSHJsbhtVU4app1s+kqUOaoq09zKS1Cn1iQ2if1wwu/MqWLA9lxvGJQOgM5kIn3UXNcvvJCDGjG37d+gmH9/2H+5qt26l+IMPAIh75mme2aEtrb9qVCK+Rn3TF2au1bb10Blg0kPNfv7xmDlgJkszl7KreBfPrH2G1894HUVRiJg5E2NsHLkPPUTV8uVkzriRxHfexhAZCUBCmD+jksPYcKCUn7blctvEHo1vPvQ62P4l7PwOpv0fGFsvh0SIw82bN4958+Y1eX79+vVHvP7VV1/l1Vdf9Ti29Tj3QTwRxzWSdOONNxIdHU1ISIjXr+joaEnaPsV5W/6vqmqjQpJOlfZRJIMf+IW1Wj9F67p0uBYAf7vFc5oo9JJLqKnSvlesf3zU7PtbKyrIefAhsNkIvuhCcgaO5o+0InQKXD+m65EvXm7fJWDY9RDe9DL8luQ+7bby4EoWZix0ngueNpWkefPQh4ZSt3MnB666mvr9+53nLx6q/Vl+vyXX+82TT4eQRKgvhz0/n9TPIURnd1wjSR991Px/xMSpIa9KC5LcR5JKqs3UN9hQVJWIunKMTdVIkkKSndZFQ7vw/MIUtmSVkX6oiu5RgQAoRiO+lz8MW+7AYMmifuef+Awce1z3VlWV3EcexZKdjTE+nti//51/Lc4A4NyBcSSGH6GmS/pKOPA76E0w8cFmf77m6BXWi78M/gtvbH2Df2z4B6fFn+ZMVPcfPozkLz4n6y9/wZKZRea115H0ySf49unN+YPieOrHXezOq2BfQSW9YoI8b6zTwZCrYdU/tcrhJ7CtihCnumZtSyJEU7zt2+bY2DasoQaTzYopUQpJnmqig3yZ1DsKgC83ZnucC7zoWurMkSgKVL8zG9VmO657F739NlW//YZiNBL/2msUKz78YE9svuX0I4wMqSr8Zs9FGjETQo6vWGVLuGngTXQL6UZJXQmvb3nd45wpOZnkzz/Hd9AgrOXlZN18M/Xp6YQFmJjcR/uzbDKBe4h9f7v9v0GFl5VwQohjIkGSaFGO6Tb3kaRce4JplKNGkvtIUrl9LyrJR+r0rhql/X//ZtNBLFZXIKQoCobJfwXAT9lL0dtvH/M9y779jqJ/a8FFzBOP4zdwAP/9MxOLVWV4UijDk44whZu2VNs7zuALp9/fjE904kx6E0+MfQKAr1K/YsehHR7nDeHhJP3nfXz69cNaXEzWLbdiKSh0Trn9sLWJ7aAiekDCaG17ld3fn+yPIUSnJUGSaDGqqjYxkqQFSZHVpWAwYOziNmokNZJOGWf2iyYy0IeiKjPLUgo8zhnGz0RFh1+EhfKPXqVi8eKj3q/8pwXkPaEFGBGzZhF25ZXUmq38709t77JZp3dv+mJVdeUijboVgmKbbnuSjYodxUU9LkJFZe6fc2mwea4Q1oeEkPThB5iSk2nIyyP79r8yJSmQAJOeg6W1bM4q837jgVqlcXZ+e3I/gBCdmARJosWU1JVgtplRUDyqbTtXttWWYYzvguJeRkKm204ZRr2OK0ZqU1qfr/ecciMwCqXHZACCk2rJvf8Byn/80et9VKuVonfeJffBB8FqJeSSS4iafR8A32w+SGmNhcRwP84ZcITAJ3Uh5G4BYwBMuPdEP9oJmz1iNsGmYFJKUvhizxeNzhvCwkh8/z304eHU706h9PHHONO+f9vi3fneb9r/YkDRRsvKmtjKRAhxRBIkiRbjGEWK8ovCqHdtmZDj2JKkttQzHwmkkOQp5upR2p59q/Yd4kDRYfuLDdQSjMMGKKgWM7kPPUzuY3+nPj0d0ApFVv3+OweuuppD9iXB4TfdRNxzz6IoCjabyod/aAnbN43vhl7XxEIAm81VXXvMbRAY1bIfshki/CK4d8S9ALyx9Q2Ka4sbtTElJpL49lsoRiNVy5YxLn8XAIt3FXifcguOg+TTtPe7vjtZXReiU5MgSbQYb/lI4NqSJLK23HP5P8hI0imma0QAZ/SNRlXho9UZnif7XwTGAIz6CmJv1rY/Kv/2W9LPO589Q4exd+QosmfdRt3OnegCA4l9+mliHn7IWZh0aUoB6UXVBPkauHJU4uGPdtn1LRTsBJ9gGP+3k/VRj9tlvS6jX3g/qi3VvLPtHa9t/IYMIebvfweg17x/YdRBRlE1aYVV3m864BLtdec3J6PLQnR6EiSJFtNUkORY3RZdU+qZtN1QD9Xafl4Et/7KItE2bp6grTibv+kg5bUW1wmfIBh4KQBhverp+un/CDzjDFAU1Lo6VLMZQ1QUoddcTY9fFhJ21ZXOS1VV5dWl+wC4fmxXAn2aqG7SUA/Lntbej78b/MNb/gM2k07R8cDIBwD4eu/XZJRneG0XetWVhEyfToC5lmEl2ijb4t0FXtvS/2Jtq5W8bVCaeVL6LURn1uZB0ptvvklycjK+vr6MGTPmqNU358+fT9++ffH19WXQoEEsXLjQ4/y3337LOeecQ0REBIqieK3MWVdXx5133klERASBgYFcdtllFBQ08Y+MOGbeCklarDYKK91ykjwKSdqXJut92tUPK3FyTegZQd/YIGrMVr5Yf1iuzHB7Mdpd3+E/oCeJb71Jn40b6LF0CT0W/ULPVSuJe/JJDFGeU2S/7ipgd14FASY9tx0pYXvDf7T8nMBYGHdnC3+yEzc6bjSTEibRoDbw6qZXvbZRFIWYxx/HmJDA2IxNAPy6q4m8pIBISBqvvZfCkqKdUhSF77//vq274VWbBklffvkls2fP5sknn2Tz5s0MGTKEqVOnUlhY6LX9mjVruOaaa7jlllvYsmUL06dPZ/r06ezcudPZprq6mtNOO40XX3yxyefed999/PTTT8yfP5+VK1eSm5vLpZde2uKf71TjXNkW6AqSCirqsKlgsDUQWl/lOd0mhSRPSYqicPNp2mjSf/7IoNbstpFrwiiI6gsNtbD9KwB0AQGYEhIwJSd73aTWZlN5deleAGZOSCYswOT9wbWlsPIf2vsz/g6mgJb7UC3ovhH3oVN0/Jb9G5sKNnltow8MoMv/vcDYgt0oqo3tB8udq0gb6XeB9rpnwUnqsRBHlp+fz91330337t3x8fEhMTGRCy+8kGXLlrV1146qTYOkV155hVmzZnHTTTfRv39/3nnnHfz9/fnwww+9tn/ttdeYNm0aDz74IP369WPu3LkMHz6cN954w9nmhhtuYM6cOZx11lle71FeXs4HH3zAK6+8whlnnMGIESP46KOPWLNmDX/++edJ+ZynipwqLQnbfSTJsbItsrYcnQLGBLdpNWeQJEnbp5rpQ+NJCPPjUGW9c8k+oAXLI2/W3q97R0uyPopvNh9kT34lgT6GIy/7//0VqCuDqH4w5NoT+wAnUY/QHlzWS1u+//LGl70nZQP+I0fS46pL6Vei/fkt3t5EYcm+52uvWWuhuqjF+yvEkRw4cIARI0bw22+/8c9//pMdO3awaNEipkyZwp13tr/R3MO1WZBkNpvZtGmTRzCj0+k466yzWLt2rddr1q5d2yj4mTp1apPtvdm0aRMWi8XjPn379iUpKemI96mvr6eiosLjS3hybEkSH+gKetz3bDPExaJz3xzZUUhSaiSdckwGHX87oxcAb6/cT3W9W22godeBTwgUp8G+X494n6r6Bv7xayoAd53Rk1D/JkaRyrJg3bva+7OfBv1x7cjU6u4Yegd+Bj92FO3g18ym/wwi776b0yoPAPDzsq3eG4UmQdwQrbBk6kLvbYQ4Se644w4URWH9+vVcdtll9O7dmwEDBjB79uwmByZ27NjBGWecgZ+fHxEREdx2221UVbkWJ6xYsYLRo0cTEBBAaGgoEyZMIDPT9cvWDz/8wPDhw/H19aV79+48/fTTNDQ0eHvUUbVZkFRUVITVaiUmJsbjeExMDPn53ufX8/Pzj6t9U/cwmUyEhoYe131eeOEFj818ExOPsHrmFFRjqaG0vhTwnG5zJG1rG9settGoc/m/rGw7FV06PJ7kCH9Kqs28uyrddcInEEbO1N6vecPrtQ5vLU/jUGU9XSP8uWlCctMNlz4N1npt89de55xw30+2SL9IbhpwEwBvb30bq83qtZ0+MICLrjwTgE21JorSm6iH1Ncx5SZBUmehqiq2mppW/2pqZNObkpISFi1axJ133klAQOPp7cN/DoOWMjN16lTCwsLYsGED8+fPZ+nSpdx1110ANDQ0MH36dCZNmsT27dtZu3Ytt912m3Mq/vfff2fGjBncc8897N69m3fffZd58+bx3HPPNevPuX3/OtWOPProo8yePdv53xUVFRIouXEkbQcaAwk2BbuOO7YkqSnFNPCwPy/nSJL8OZ6KDHodD07ty52fbeadFfu5eGgXetg3vmX0X2Dtm5D5B2Stg6Qxja7fmVPOe/bg6rHz+uFj0Ht/0P7fYOfX2iqvc+Z2mPy36/tfz/9S/kd6eTqLMxdzbrdzvbYbcMk0uq3+jAxjKD+89zW3/N/sxo16T9MqjGes1Fb4GXwatxEdilpbS+rwEa3+3D6bN6H4H2HTaDdpaWmoqkrfvn2P+f6fffYZdXV1fPLJJ87A6o033uDCCy/kxRdfxGg0Ul5ezgUXXECPHj0A6Nevn/P6p59+mkceeYQbb7wRgO7duzN37lweeughnnzyyWPuh0ObjSRFRkai1+sbrSorKCggNtZ7pdzY2Njjat/UPcxmM2VlZcd1Hx8fH4KDgz2+hEtulZZf1CXQc1TINd1W7rn8H6DcXnVZgqRT1nmDYpnUOwqz1cbj3+10/ZYaEg9D7XlDS5/UthFxU2exct+XW2mwqZw3KJZz+sfglaUWFtiDhtG3QZdhJ+mTtLwgUxAz+mur/d7Z9k6To0mKojBtmPZ367dcM3Wpexs3ih2kreiz1EDm6pPWZyHcHc+ok0NKSgpDhgzxGHmaMGECNpuN1NRUwsPDmTlzJlOnTuXCCy/ktddeIy/PtYnztm3beOaZZwgMDHR+zZo1i7y8PGpqao67P202kmQymRgxYgTLli1j+vTpANhsNpYtW+YcVjvcuHHjWLZsGffee6/z2JIlSxg3btwxP3fEiBEYjUaWLVvGZZdpyZGpqalkZWUd132EJ8dIUpeAw4Mkx3Rbqefyf5CcJIGiKMy9eCBn/2sla9OLeXdVOn+dpP12yORHtRVuWWth7yLoo42kqKrK0z/tYl9hFZGBPjw7fZDXVW8A/PYslGZAUBeY8vdW+lQt59p+1/LJ7k9IL09nSdYSpiVP89pu2qRBvL1zNZujepH3r1fp9s5bng0UBXqeBVv/B/uWQo8zWqH34mRS/Pzos9n76seT/dxj1atXLxRFYc+ePS3ah48++oi//e1vLFq0iC+//JLHH3+cJUuWMHbsWKqqqnj66ae9rlj39fU97me16eq22bNn8/777/Pxxx+TkpLC7bffTnV1NTfdpM3Fz5gxg0cffdTZ/p577mHRokW8/PLL7Nmzh6eeeoqNGzd6BFUlJSVs3bqV3bt3A1oAtHXrVme+UUhICLfccguzZ89m+fLlbNq0iZtuuolx48YxduzYVvz0nYtjJMk9Hwkgt9yVuG3q6paTZK7WlmQDhEghyVNZUoQ/T1zQH4AXF+1h+R5HgdEuMPZ27f2SOWDRAu63Vuzn8/XZKAr88/LBhDe15D9tGay15zRd8Ar4drzR3yBTENf1uw6AD3d82ORv5oPiQwj31VNr9GX9jkzq7P/+eehlX6yStuRkdVe0IkVR0Pn7t/pXk7+QeBEeHs7UqVN58803qa6ubnT+8Bkd0KbOtm3b5tF+9erV6HQ6+vTp4zw2bNgwHn30UdasWcPAgQP57LPPABg+fDipqan07Nmz0ZdOd/whT5sGSVdddRUvvfQSc+bMYejQoWzdupVFixY5k7OzsrI8htHGjx/PZ599xnvvvceQIUP4+uuv+f777xk4cKCzzY8//siwYcM4/3xt2evVV1/NsGHDeOcdV5n/f/3rX1xwwQVcdtllTJw4kdjYWL79VnbKPhHO6Ta3kaQacwNlNVpF5ajaMkzuy//L7UnbPsHgG9Jq/RTt03VjkrhmdCKqCrd/uomFO+x/7yfcCwFRULQX62/P8sriVP5pX8325AX9mWLf5LWRijz43h5gjbrVOQrVEV3T9xp89b6klKTwZ5731UA6ncKkflq6wKaYPhS9+17jRt2ngKKHor1SfVu0mjfffBOr1cro0aP55ptv2LdvHykpKfz73//2Ontz3XXX4evry4033sjOnTtZvnw5d999NzfccAMxMTFkZGTw6KOPsnbtWjIzM1m8eDH79u1z5iXNmTOHTz75hKeffppdu3aRkpLCF198weOPP968D6CKZikvL1cBtby8vK270i5c9/N16sB5A9VfM351HttXUKl2fXiB2nf21+qe007zvGDfUlV9MlhV3xzbyj0V7VW9xarO/HCd2vXhBWrXhxeoD87fqm7NKlVzN/2sLn58snrFI/90nvvHopSmb1RXoapvT9C+v94Yo6rmmtb7ECfJ838+rw6cN1C95ddbmmzz3eaDateHF6hT/vquurtvP7UuLa1xow+man8u698/ib0VwlNubq565513ql27dlVNJpMaHx+vXnTRRery5ctVVVVVQP3uu++c7bdv365OmTJF9fX1VcPDw9VZs2aplZWVqqqqan5+vjp9+nQ1Li5ONZlMateuXdU5c+aoVqvVef2iRYvU8ePHq35+fmpwcLA6evRo9b333mtW3xV7B8VxqqioICQkhPLyckniBs786kwKawv5/PzPGRipjeytSC1k5kcbSC7PY17FCpI//Z/rgk0fw09/05ZjXze/jXot2psGq40XftnDB39437fMFzMvTI3jkilNTI3XlcOX10PGKm0E6talEJZ88jrcSnKrcjnv2/OwqlbmXzifvuGNVwsVV9Uz8rmlqCr8b9EzdD/vLLo8f9iy599fhmXPQO9z4dovWqn3QnRcbb53m+j4zFYzhbVaHol7te0c+8q26NpSz+1IwC1pW/KRhItBr+OJC/oz/6/jOH9wHAEmPSa9jqhAE7eFbmSx6SEu2XAdHPCyQqsoDT6YqgVIRn+49stOESCBtmr07K5nA/Bpyqde20QE+jA4Xpu63hTdh4oFC2goKfFs1FO7BxmrtFIAQogjkjpJ4oQ59mzz1fsS7uvaqDanVAuSYmpKMSU1VSNJgiTR2KjkcEYlH7bpce0o+GSZtqP9vPNg4GXaD31FgYzfYdvnoFq1pe7XftGhlvsfi+v6XceiA4tYmL6Q+0bc5/F3zWFSn2i2HSxnc+8xnJO1gbIvvyTy9ttdDRylAKryIXMN9JjSip9AiI5HRpLECcutdq1sc1/54BxJqvG2/F9qJInj5BcKN/4EI2Zq/73zG/j+r/DdX7Sl7apVm76d9VunC5AAhkQNYUDEAMw2M/NTvU9RT+4TBcDmkK5YFR2ln32Oaja7GjhKAQCkLT3ZXRaiw5MgSZwwx55th9dIcowkRdeUNt6SREaSRHP4hsCFr8Gty2D83ZA0TttqZOQtcPNiLb+tk9bdUhTFWQ7gq9SvaLA13otqSEIoof5GKq0K+7oNpuHQISqWHLbk31EKYJ+UAhDiaCRIEicsp0pbzn94te2DJVp1Uy0nyW3EyGZz7dsmQZJojoSRcM6zcPMimLlAq4PkZeuSzmZq8lTCfcMprC1k5cGVjc7rdQqn99JGk3aM18qglH39tWcjZymAVG3jXyFEkyRIEifMWW3bLUgyN9goqNQSQ+MMDehD3GohVR8Cq1nbSyvIs/ikEKJpJr2J6T2nAzB/bxNTbr21IGm9fxdQFGrW/ok5yy0Y8guFePueXxmrTmJvhej4JEgSJ8xZbdttZVt+eR0qYLJaiI6L9Lygwj7VFhQHemMr9VKIzuHyXpcDsCZnDdmV2Y3OT7QHSTsLa6k/Tdt+pOybw4rldpuovWb8fvI6KkQnIEGSOGGOICk+0JULcrBMm2qLqinFp6nl/8GdM3dEiJMpMTiR8V3Go6Lyzd5vGp2PCvJhQBetdlvKadqUW/m336I2uOUwdTtde81Y1WjzYCGEiwRJ4oQ02BooqCkADquRVCo1koQ4WS7rpW3O/VP6T1ht1kbnJ/TURm+3+MWiDwuj4dAhqtescTVIHAN6E1TmQkl6q/RZiI5IgiRxQg7VHMKqWjHoDET5RzmPO5b/x9SUYuqW7HmRBElCnJDJiZMJMgVRWFPIhoINjc6P7xEBwJqMUoLPOw+A8h9/cjUw+kHCaO19RuMEcCGERoIkcUIcNZJi/WPRKa5vp4Puy/+Tkz0vkhpJQpwQk97EtORpAPy0/6dG50d3C8eoVzhYWkvFGdrmvpXLlmFz34ld8pKEOCoJksQJ8ZaPBHCwqAqwjyR1lRpJQrS0i3pcBMCSzCXUWGo8zvmbDAxLDANgoyESY9ck1NpaKpctczWSvCQhjkqCJHFCnCvbAj2X8ucUa0FSrNGK/vANgCVIEuKEDYkaQmJQIrUNtSzLWtbo/DjHlNv+YkIuuBA4bMotfiQY/KCmCApTWqXPQnQ0EiSJE+KskeRWbdtmU8mr1lbSJEQdFiBZarU6SSBBkhAnQFEULuyuBT/eptwcydtr9xcTdIG2yq167VoaSku1BgYTJI3V3h+QKTchvJEgSZwQb9W2CyvraVBBZ7MSFx/leUGFNvKEMQD8wlqrm0J0Shf0uACAdfnrKKgu8Dg3NDEUP6Oe4mozB/wi8OnXD6xWqjym3Bx5SVJUUghvJEgSJ8Rbte2DpVp+RGRdOf6NVrY5krYTtM02hRDNlhiUyPDo4dhUGwszFnqcMxl0jO4WDsDqtGKCp54DQMWvi12NHEHSgT/ASykBIU51EiSJZrOpNufmth41ktyX/ycfnrQte7YJ0ZIco0k/7v8R9bAEbGcpgLQigqZOBbQpN2tZmdYgbiiYgqCuDPJ3tFKPheg4JEgSzVZSV4LZZkan6IgJiHEed4wkeV/+L0nbQrSkc7qeg0lnIq0sjb2lez3OOfKS1mWUoE/qik/v3tDQQOVvy7UGegN0Ha+9lyk3IRqRIEk0myMfKdo/GqPOtQfbwfxy7bjXattSI0mIlhTiE8Jp8acBWjkAd/3jggn1N1JV38C2g+UETdNGkyoXu0+52UsBHPijVforREciQZJotoOV2qhQQqDnqFB2QRkAXYwqOl9fz4ucI0myb5sQLeWsrmcBjYMknU5hXHdtym3t/iKCztTaVa9di61WmxZ3jiRlrwObrXU6LEQHIUGSaDbHSNLhhSRz7TlJ8WF+jS+S6TYhWtzkxMkYdAbSy9NJL/Pci82Rl7Q6rRif3r0wdIlDra+net06rUHsEG21aV0ZHJJ6SUK4kyBJNJtjJCk+yBUkqapKTp32PjHusCX+quoKkoJlJEmIlhJkCmJc3Dig8WiSo6jk5qxSLFaVoMmTAahasUJroDdAwkjtfdba1uiuEB2GBEmi2RwjSe7TbSXVZurt31aJyZ5VuKkugoZaQJGcJCFa2NldzwZgadZSj+M9ogKJCDBR32Bj+8EyAp1B0krXajjHlFumBElCuJMgSTSbMycpyBUkOZb/h9eWE9g92fOCsiztNbiLVu1XCNFipiROQa/o2VOyh+yKbOdxRVGc9ZLWZZTgP2YMip8fDfn51O/ZozVK0kahyFor+7gJ4UaCJNEsFpuF/Jp8wHMk6WCJffl/rZfl/2UHtNfQw1a8CSFOWKhvKKNiRwGwJMtzym2MPUj6M70YnY8PAeO0oMg55ZYwEnQGqMhx/TIjhJAgSTRPflU+NtWGj96HSL9I5/HsrEIAomvLMXbp4nmR4x/f0MMKTAohWoRzyi3Tc8ptjH2F26bMUhqsNgInTwKg0hEkmQIgboj2PuvPVumrEB2BBEmiWQ5W2ZO2A+NR3LYXycopAqCLsQHFYPC8qDRTe5WRJCFOijOSzkBBYUfRDmc1fIA+MUGE+BmpMVvZmVtB4CQtSKrbvoOGIu3vrGvKbU1rd1uIdkuCJNEs7kGSu+ziKgC6BHnJOXKOJEmQJMTJEOkXyfCY4YBnArdOpzAq2Z6XlF6MMSYG3/79QVWpWvW71kiSt4VoRIIk0Sw5lfaVbUGe9Y4OVmmbZHaNCmp8kSNICpPpNiFOlqam3MZ214Kk9RklAG6r3FZoDRLHaq9FqVBdfNL7KURHIEGSaBZvI0mqqpJr1abYuiZGe15gs8lIkhCtYEriFAC2HtpKeX258/iYblpe0voDJVhtKoFTtHbVf/yBajZDQARE9tEaZ0tekhAgQZJoJudIktvKtqIqM3WKAUW10bXXYXWQqgvBWg+KTgpJCnESdQnsQs/QnthUG3/kuPZj6xcXRKCPgcq6BlLyKvAd0B99VCS2mhpqNm7UGnW15yVlSl6SECBBkmgmZyFJt+m2rMIKACJrywns0c3zAmeNpATQGxFCnDyTErTE7FUHVzmPGfQ6RiZrVfDXZZSg6HQETtA2xq1ea89Dcq+XJISQIEkcv2pLNaX1pYDndFvGfm0KLrauDEP0YdNtsrJNiFYzKVELkv7I+YMGW4PzuHPKLUPLOQoYrwVF1WsOC5LytoG5upV6K0T7JUGSOG6OStuhPqEEmgKdxw8cKAAgXm/xKAsAQJkESUL8f3v3HR5ndSV+/DtdvXdZ1b3J3bJsYzsYMC3AxssSh6yJlyVhN5TgZLPrDSXJsjGQHwklhZTNpmFMnAUTAhgcg01xk2W5y02y1Xuvoynv74+rGWmskQtYM5LmfJ5Hzygz7xXHLy/K8b3nnusrOXE5RFoiaett43D9Yff7uQOKt51Ozd1UsufECezNzeq/z/AUcNqh8qBfYhdiJJEkSVyxobb/l9a1AzAu3MtymuxsE8JnDHoD16ReA8Cuil3u92emRhJsMtDcZeNMXQfG+HgsEyeCptG1bx/odJC2UF1csd8foQsxokiSJK7YUNv/K9t7AchIiBg8SGaShPApd11SeX9dksmgZ25GFKB2uYGXJTdXklQuSZIQkiSJKzbUTFKlXW3/z0xPGDRGtv8L4VuLUxdj0Bkobi2mvL3/wFtXU8n8c64kSTWR7Nzdt6MtLVe9lu+Xw25FwJMkSVwxbzvbrDYHdcZQALImX7Ck5nRCS98vaTm3TQifiDBHMCdhDuC5y22hK0k634SmaYTMnw9GI7aKCnrLyyEpBwwW6G6CxmK/xC7ESCFJkrhirsLtgTNJ5SWVaDodFkcvyVOyPQe0V4PTpk4ZD0/2ZahCBDRvrQDmpEdj1Ouobu2hsqUbfWgowbPV4badu/eA0QwpKrmSuiQR6CRJEldE07T+maQBjSRLTp0HILm3HYPF4jnI3SMpFQwXHHorhBg2y9KWAZBfk0+XrQuAYLOBGamR6v3zQy25LVCvUpckApwkSeKKNHQ3YHVY0ev0JIf2zwqdL60DIMVoHzxIiraF8IusiCzSwtOwOW3sqe5vELkwy9UKQPU7c7UC6Nq7F83h8KxLEiKASZIkrohrFikxJBHTgM7Zrm7babL9X4gRQ6fTsSRlCQC7K/uPGpmfoTpvu2aSgmfORB8WhqO1lZ6ikzCub4db3QnoafNt0EKMIJIkiSvi2iVz4fb/inYbAOnxF9v+L0mSEL62JFUlSZ9UfYLWt1vNtcPtbF0HTZ296IxGQnLV7FHn7t0Qntj336sGlQf8ErcQI4EkSeKKuGaShtr+n5HhZfu/HEkihN8sTFqIUW+ksqOSsnY1qxsdamZiguqW765LynP1S3LVJbn6JeX7NmAhRhBJksQVce1sG1i07ejopNqsZpCyp3iZLXL3SJKZJCF8LcQU4m4F8EnlJ+73F2Rd0C8pbxEA3YWFOK3WAXVJ+3wYrRAjiyRJ4oq4G0mG988kNZwqpsMcAnhpJOmwQ5uafZKZJCH8Y3GK2r22u6q/LsndL6lUFW+bs7MxxMWhWa30HDkC4/p2uFUcUL3OhAhAkiSJK1LepmqS0sP7E55zfdv/ox09hJgv2OLfXqUOy9SbIDzJV2EKIQZwFW/vr9mPzaHqB10zSccrW+nqtaPT6QhdqBKjzvx8SJwBphCwtkLDKf8ELoSfSZIkLluXrYu6brXVPyOif+nMtf0/1ev2f9dSWxroDcMeoxBisMkxk4kJiqHb3k1hXSEAqVHBpEQGYXdqFJa1ABCyUNUhde3PVz3NUuepHyCtAESAkiRJXDbXzrZISySRlkj3+2X1l7H9X5bahPAbvU7vnk36pGpwXdL+vrokV5LUXViIs7dXDrsVAU+SJHHZStvULrWMcM8CbNf2/7QEL9v/m8+rV0mShPCrxamD65IWDDjHDcCclXVBXVJfkiTHk4gAJUmSuGyu7cNpEWnu9zS7nUqHGYDMjMTBg5pK1GvM+GGPTwgxtLxktcX/ZNNJGrobgP7O24VlLdgcTnQ6HSEL5gPQuX9/f/F2w2noavJ90EL4mSRJ4rKVtakkaeBMUm95OTUhqntvVpaXwmx3kpQ9+DMhhM/EBscyNWYqAHuq1BElE+LDiAox0W1zcLxKLZuHuuqS8vMhNBZiJ6gfUCFNJUXgkSRJXDbXclt6RP/SWXdJCbXBKklKiw0dPEiSJCFGDFf3bdeSm16vY36GZ7+k/rqkQ311SdIvSQQuSZLEZXMttw3c2VZxqhS7wYhBc5IcGew5oKsJulUPFmKyfBWmEGIIA/slOTXV+2hhlvpLzn5XXVJ2NobYWLSeHnqOHh3QL0nqkkTgkSRJXJZOW6e7jmHgTFJJWT0AqSYHBr3Oc1DzOfUalgRmL7NMQgifmh0/m2BjME09TZxpPgPA/L7i7QPnm3A6NVWX1NcvqWv//v6ZpIoC1RxWiAAiSZK4LK56pGhLNBHm/l1s5+rbAciMNA8e1NSXJMlSmxAjgslgYn6iKsx21SXNSIkkyKSnuctGcX0HACEL+ppK7t8P8VPAEgG2Tqg74Z/AhfATSZLEZSltV/VIHjvbnE7Od6op++yU6MGDpB5JiBEnL0XtcttTrZIks1HPnDTPJbfQAXVJmt0O41RiJXVJItBIkiQui+s4koE722yVlVQGqV+uEy66s03qkYQYKVx1SQW1BVgdVmDwYbfm8eMxxMSg9fTQfezYgCW3fN8HLIQfSZIkLou3nW3WM2eoCo0DICshfPAgmUkSYsTJjswmITgBq8PqPqLEfdjtebXRQtUluY4oGdAvSWaSRICRJElcFm872zpPn6U6NBaArDjZ/i/EaKDT6ViUsgjobwUwJz0Kg15HZUs3VS3dAO6mkl35B/qW23Sqg357rT/CFsIvJEkSl8XbTFLp2XIcegMWnUZSRJDngJ426FQ732S5TYiRxVWXtLdqLwChFiPTU9SGDNcRJSHz+3a4FRaiGUIgYZoaLK0ARACRJElcUkdvB0096hdneviA7f9Vamo+LVSPfqjt/yFxEBSJEGLkWJSsZpKKmorc/227znFzHXZrmTgBQ2QkWlcXPUVFctitCEiSJIlLcu1siwmKIdysao80m43SdtUzJdvbwbay1CbEiBUXHMek6EkA7KtWdUYXHnar0+sJnj9gyU2SJBGAJEkSl+Ta2TZwFqm3rIzKYPVLNXtc7OBBkiQJMaK5Drx19UtakKl2qp6u7aC5sxeAEHeSlN+/w62qEOxWH0crhH9IkiQuaaidbZV9O9uy48MGD5IkSYgRbWC/JE3TiA2zMD5ebcAoKFVL6e4kqaAALTIDQmLBYYXqI/4JWggfkyRJXJK3nW3W02eoDIsHINPrzjbpti3ESDY3cS4mvYmazhrOt50HYGGW55Jb0NQp6ENDcba3Yz17Fsb1LblJ8bYIEJIkiUvyNpPUfraYupAoQLb/CzEaBRuDmZswF+hfcpuf0Ve87apLMhoJnquu6dqfP6AuSfolicAgSZK4JNe5bQNrks6V1aPp9IQaIS7sgnPbejuhvVp9L9v/hRixXP2SXEeUuGaSjla00t3rAAYsuR040F+XVL4fNM3H0Qrhe5IkiYtq622j2arqE1zLbU6rldI2VdiZFRuCTnfh9v/z6jUoCkJifBSpEOJKuY4oya/Jx+a0MS46mKSIIOxOjcLyvrqkvsNuuw4cQEuZDXqj+ktQa7m/whbCZyRJEhfl2tkWGxRLqEktq/WWlFAZ0nccSaKXHkiy1CbEqDAlZgrRlmg6bZ0crT+KTqcbcI6bSpKCZ0xHZ7HgaGqit7wGkmaqwdIKQAQASZLERbkKOj2Kts+coTKsL0mSnW1CjFp6nZ7cZLWE5l5y62sFcKC0ry7JbCZ49mzA1S9pwJKbEGOcJEnios61ql1qWZH9tUUeB9vGhQweJEmSEKOGuxWAq19S30zSwdJm7A4n4LnkJsXbIpBIkiQuymuSNGD7f1aczCQJMZq5mkoeazhGe287kxLCiQgy0tnr4ER1G+DZVFIbpxImao6qTRpCjGGSJImLKmlVCc/AJKmlpJTGYFWLlBXrZft/oyRJQowWyWHJZEZk4tAc7K/Zj16vY/4F57gFz8oBkwl7bS22NiA8BTSH6r4txBgmSZIYksPpcPdIciVJjo5O98626GAjkSEmz0G9XdBWob6PHe+zWIUQn57rwNv+I0o8m0rqg4MJnqkKtj3PcZMlNzG2SZIkhlTVUYXNacNisJASmgJA79kB9UjeirYbz6rX4Gjou04IMbK56pL2Vu8FYGFWX/H2+Wa0vn5Inv2S5LBbERgkSRJDOtem6pEyIjIw6A0A9Jw8ScXFjiNpPKNe4yb5JEYhxGe3IGkBBp2B0rZSqjqqmJkahcWop7Gzl+J6VXcUskCaSorAI0mSGJK3ou2eE0X9Rdve6pEaXEnSxGGPTwhxdYSbw5kZp5bT9lTtwWzUMzstCoADfUtuwXPmgl6Prbwcmy4BDBboboLGYn+FLcSwkyRJDMlVtJ0d2V+A3XPyJOXhCQBMSPCy3NZwWr3KTJIQo4pryW131W6g/4gS1zluhrBQgqZNA6Dr4BFIVWe6SV2SGMtGRJL005/+lMzMTIKCgsjNzWX//ouvc2/ZsoUpU6YQFBTEzJkzefvttz0+1zSNxx9/nOTkZIKDg7nuuus4c+aMxzWZmZnodDqPr6eeeuqq/9lGswtnkjS7nZ5TpygPu1iSJMttQoxGriNK9tXsw+F0uHe4uYq34YJ+Sa5WAJIkiTHM70nSq6++yvr163niiSc4ePAgs2bNYtWqVdTV1Xm9fvfu3axZs4Z7772XwsJC7rjjDu644w6OHTvmvuaZZ57hhRde4KWXXmLfvn2EhoayatUqenp6PH7W97//faqrq91fDz744LD+WUebC5Ok3nPnqNcF0W0KwqDXkXHhcpvT2V+4HSvLbUKMJjPiZhBmCqPV2srJppPMTY9Cr4Pypm5qWtXvTnddUn5+f11SRb6/QhZi2Pk9SfrRj37Efffdx7p165g2bRovvfQSISEh/OY3v/F6/fPPP8+NN97Iv/3bvzF16lT+67/+i7lz5/KTn/wEULNIzz33HI8++ii33347OTk5/P73v6eqqoqtW7d6/Kzw8HCSkpLcX6GhXmpsAlRTTxMt1hag/0iSnpMnKQtPBCAzNgSz8YLHp60SbF2gN0F0BkKI0cOoN7IgSc0O7aneQ3iQiWkpEUD/klvI3Lmg09FbUoI9bIIaWFcE3S3+CFmIYefXJKm3t5eCggKuu+4693t6vZ7rrruOPXv2eB2zZ88ej+sBVq1a5b7+3Llz1NTUeFwTGRlJbm7uoJ/51FNPERsby5w5c/jhD3+I3W4fMlar1UpbW5vH11hW3KKKMVPDUgk2BgOqaPuy6pFissFgGvy5EGJEG3REiWvJra+ppCEqCssktZTedaIMojMBDSoP+DxWIXzBr0lSQ0MDDoeDxMREj/cTExOpqanxOqampuai17teL/UzH3roITZv3swHH3zA1772NX7wgx/w7W9/e8hYN27cSGRkpPsrLS3t8v+go9DZFrVsNjGqf9msp6iI8r6ZJK9JkmupTXa2CTEquY4oKawrpNvezUJvdUnzvSy5Sb8kMUb5fbnNX9avX8+KFSvIycnh/vvv59lnn+XFF1/EarV6vX7Dhg20tra6v8rLy30csW+5ZpLGR6mu2ZqmYS0qouyydrZJkiTEaJQRkUFyaDI2p42C2gJ38fap2nZau23Ahf2SpKmkGNv8miTFxcVhMBiora31eL+2tpakpCSvY5KSki56vev1Sn4mQG5uLna7nfPnz3v93GKxEBER4fE1lrlmklxJkr26Gkdrq7smaWJC+OBB9afUq+xsE2JU0ul0Hktu8eEWsuJC0TQoKO2rS+qbSbKeOoUjeroaWHEAnA6/xCzEcPJrkmQ2m5k3bx47duxwv+d0OtmxYwd5eXlex+Tl5XlcD7B9+3b39VlZWSQlJXlc09bWxr59+4b8mQCHDh1Cr9eTkJDwWf5IY4Kmae4kaUKUKs7sKSqi1RxCm0XNIGXHeylyrz+pXuOn+CROIcTV51py21PtqktSR5TsP9cMgDEuDnNWFmgaXaWdYA6D3naoO+GfgIUYRn5fblu/fj2/+tWv+N3vfkdRURH/8i//QmdnJ+vWrQNg7dq1bNiwwX39ww8/zLZt23j22Wc5efIk3/3udzlw4AAPPPAAoP4m9I1vfIMnn3ySv/zlLxw9epS1a9eSkpLCHXfcAaji7+eee47Dhw9TUlLCyy+/zCOPPMKXv/xloqOjfX4PRprGnkZara3odXr39n9VtK1mkVKjggkxGz0HdTZCZ736Pn6yL8MVQlxFucm56NBxpvkM9V31gw67hQF1SQWFME59T9len8cqxHAzXvqS4XXXXXdRX1/P448/Tk1NDbNnz2bbtm3uwuuysjL0+v5cbvHixWzatIlHH32U//zP/2TixIls3bqVGTNmuK/59re/TWdnJ1/96ldpaWlh6dKlbNu2jaCgIEAtnW3evJnvfve7WK1WsrKyeOSRR1i/fr1v//AjlGsWKS08jSCjumcDt/97rUeqL1KvURlgllYKQoxW0UHRTImZQlFTEXur97Iw61oAjlS00GNzEGQyELJwAS1btqi6pK8vgZKdULobFt7n3+CFuMr8niQBPPDAA+6ZoAvt3Llz0Ht33nknd95555A/T6fT8f3vf5/vf//7Xj+fO3cue/fK33qG4i7ajhzvfq+n6ATn49ROlilJXuqR6vqSpISpwx6fEGJ45aXkUdRUxJ6qPdyafSsJ4Rbq2q0cLm8hNzvWPZPUc/w4joT7MIBKkjQNdDq/xi7E1eT35TYx8gwq2m5uxl5VzfmIZAAmJXor2pZ6JCHGCtcRJXur1V8mF2R5LrmZkpMxpaaCw0F3nV41kO2ogaYS/wQsxDCRJEkM4ppJchVtW0+dQgNKo1IAmOx1JkmSJCHGijkJcwgyBFHfXc/ZlrMsyOgr3j7f7L7GfY7bwSOQOk+9Wea9CbAQo5UkScKDpmmcbfacSeo5UUSzJZw2YzB63VA1SX1JUoIkSUKMdmaDmXmJKvHZU7XHPZN0sLQZh1MDLuiXlKFmnijd7ftghRhGkiQJD1WdVbTb2jHqjWRHZgNq+/+5vqW2zLhQgkwGz0GdDdDVAOggTna2CTEWuPslVe9hSlIE4RYjHVY7J6rUkUzuuqQjR3CmqFklSj/xS6xCDBdJkoSH002qa/b4yPGY+s5f6zlxgtII1Yhzsrd6JFfRdnQGmEN8EqcQYngtSl4EQEFtAQ7NxsK+2aTdxQ0AmNLTMSYkoNlsdDcFg04PzeehtdJfIQtx1UmSJDycalZdsydFq67Zjo5OektK3EXbXuuR3EXbsrNNiLFiUvQkYoNi6bZ3c7j+MIsnxAHwSXEjoHYRu/slFR6HpBw1UOqSxBgiSZLwcLpZzSRNjlHLZj3Hj4OmURqrDvT1OpNUc1S9Jk7zSYxCiOGn0+lYlKJmk3ZX7WbJhFgA8s810Wt3AhCySLUF6dy7V+qSxJgkSZLw4EqSJkarQ2p7jh3FgY7S0HhgiJmk2uPqNXHG4M+EEKOW+4iSqj1MTgwnLsxMt81BYZna5Rbad9RT9+HDOPsKvSVJEmOJJEnCrcvWRVlbGQCTo9VMUveRo9SExmLVGbAY9WTEXtBN2+noP7MpaaYvwxVCDDNXXdKJxhO0WlvJG++55GZOS1P9kux2uhotalB9kTqmSIgxQJIk4Xa25SwaGrFBscQGq6n1nqNHKYlU/ZGmJIVj0F/QTbf5PNi6wBgMMdk+jlgIMZwSQxMZHzkeDY19NftYMl79Xth9tsF9jXvJ7eDJ/j5pUpckxghJkoTbhfVI9sZGbFVVFEelAjAtJWLwIFc9UsJU0BsGfy6EGNXcrQCq9rCkr3j7UHkLnVY7AKGL1Oede/dIXZIYcyRJEm6nmjx3tnUfVQnQ+STVeXtaspckqfaYek2cPvwBCiF8zpUk7a3ey7joYNJigrE7NfafU0eUhPbNJFmLTuKIm6UGlX7sl1iFuNokSRJuJ5vUVn5XktRzVCVAxeGqR9K0lMjBg1xF21KPJMSYND9xPka9kcqOSsrby1niqkvqW3IzxsdjmTgBNI2uuiA1qPoIdDX5K2QhrhpJkgQADqfD3SNpeqyaFeo+fJhmSxgNOgs6napJGqRGZpKEGMtCTCHMjp8NqCW3C/slAYT0Lbl1uOuSNDj/ka9DFeKqkyRJAHCu9Rzd9m6CjcFkRGSgOZ10Hz5MSYQq2s6KDSXUYvQc1NMKrWo3nCRJQoxdA48oWdxXvF1U3UZjhxXoX3Lr2rMXsleoQSW7fB6nEFebJEkCgBNNahv/1JipGPQGeouLcba3cy4+U71/saLtiHEQHO2jSIUQvubql7Sveh9RIQb3rPKeEjWbFLJgAej19JaWYo/q67xdstMfoQpxVUmSJAAoalTnr02NVUeLdB06BMD5cWqn23RvSVKVuoaU2cMcnRDCn6bFTiPCHEGHrYMj9UdY7K5LUkmSISKCoBmqmWxHpU6d49ZUDK0VfotZiKtBkiQBqGZxoH4ZAnT3JUnuom1vO9uq1TUkzx7m6IQQ/mTQG1iauhSAnRU7WTpRLbl9dKYeTdMACF2kGk925R+GlLlqoCy5iVFOkiSBw+mgqEnNJE2L6UuSCg/RabRQ6lBddGeketnZJjNJQgSMz6V9DoCd5TtZlB2L2aCnormb4vpOoL8uqXPPXjRXXdI5SZLE6CZJkqC0vZRuezdBhiAyIzNxtLTQW1JCcdQ4NCA1Kpi4MIvnIGs7NJ5V38tMkhBj3pLUJRh1Rs61nqO+p5Lc7BgAdp6qAyB47lx0Fgv2ujps5vFqUMku6JtpEmI0kiRJuJfaJsdMxqg30n3kCADFWar3Uc44L7NINUcBDSJSISzeV6EKIfwk3BzOvCR1iO3O8p2smJygvj9VD4A+KIiQhQsBaD/VBsYg6KiBhtN+iVeIq0GSJMGxBtXryNUfqavgIABnU1XRds64qMGDXEttMoskRMBYMW4FALsqdrFisvrL0f5zTe4jSsKuuQaAjo/2QrqqUZJdbmI0kyRJcKRezRzlxKutu10HDgBwyqKKM73OJLmLtmcNe3xCiJFhedpyAA7WHiQ23E5aTDC9Die7+xpLhi1TSVLXwYM4U1XbACneFqOZJEkBrtfR6z6OJCcuB2dPDz1HjtBqDqHSqh4PKdoWQgCkhacxIWoCDs3BJ1Wf8Dn3kpuqSzJnZmJKTwebje62KDXo/MfgsPspYiE+G0mSAlxRUxE2p41oSzTjwsfRfeQIms1GSYbqeZIdF0pksMlzUE9rf51ByhwfRyyE8Kfl49Rs0q7y/iW3naf6WwG4ltzaCishKBKsrf0zz0KMMpIkBbij9aprdk58Djqdzr3UVjJZFWh6XWqrPAhoEJUBYQm+ClUIMQKsSFsBwMeVHzM/MxKzUU9lSzfF9R1A/5Jbx0efoGWrtgGcec8foQrxmUmSFOBc9Ugz49ROtm5XPVJ0OjBE0XaFuoZxC4Y9PiHEyDIzbiYxQTG029opaj7MomxVu/jBSbXLLWThQnRmM/bqauyRs9Wg09v8FK0Qn40kSQHuSEN/0bZms9FVeAgNONKr+iLNy/ByJltFvnqVJEmIgGPQG7gmVc0W7SzfyYpJfUtup1Vdkj44uL8VQKke0EH1YWiv8Ue4QnwmkiQFsMbuRio7KtGhY0bcDLqPHUPr7qYqOZtWq5Mgk55pF57ZpmmSJAkR4D6XrpbR/lb2N5ZNUue45Z9r7m8F0Lfk1r77EKT2HVEiS25iFJIkKYAdqj8EQHZkNuHmcLr27gXg7OxlgFpqMxkueESaSqC7CQwWSJrpy3CFECPEkpQlhBhDqOmsoYNiMmJD6HU4+fC0WnIL7Sve7ioowJnRV5d0+l1/hSvEpyZJUgA7WKuaRs5NVH/T69yjkqSTSZOAoZba+uqRkmeB0Tz8QQohRpwgY5C7Z9L20u3cMC0RgHePqyU1c2YmprQ01QqgS31G8Qdgt/olXiE+LUmSAtjAJMnZ3U13YSEAR7UwAOale0uS9qtXWWoTIqCtylwFwHul73HDdJUI7Siqw2p3oNPp+lsBFJRDWBLYOqH0E7/FK8SnIUlSgOqydVHUVATAvIR5dB08qAq3UzMpbukFYK63maQyNdtEmiRJQgSygUtuxpByEsIttFvt/d23P6eW2dp37kSbeL0adFrqksToIklSgDpUfwiH5iAlNIXksGR3PVLJAvWLLTsulJjQC5bTupqg9rj6PmOJL8MVQowwQcYgd8+k7aXvsWp6EgDvHlNLbqG5C9GHheGob6DXpJbwOb1Nbf4QYpSQJClADapH2r0HgOMpU4Eh6pHK9wEaxE6UJpJCCG7IvAHoq0uarn4nvHeiFrvDic5sJmyZ2gTSdqID9CZoPgeNZ/0WrxBXSpKkAHWwrj9Jsjc303PihHrfEQ5A3vjYwYNc9QQZi30SoxBiZFuautS95BYcVklUiImmzl7yzzcDEH7dSgDadnwMmX2zz7LLTYwikiQFoF5Hr/s4knkJ8+j8+GPQNOzTZnKsrhPA3UXXQ+lu9SpLbUIIwGKwuHsmvV+xneumeu5yC122DEwmes+dwx4zXw2S7ttiFJEkKQAdqjtEj6OHuOA4siKz6Nj1IQBn56/EqUFGbAgpUcGeg6wdUHVIfZ8pSZIQQrkhQy25vXf+PVb1LbltO1aD06lhCAsjdNEiANrOG9WA0k+gs8EvsQpxpSRJCkB7q1WR9qLkReB0qpkk4Ehff6RFWV5mkSr2g+aAqHSIHOezWIUQI9uS1CWEmkKp7aolIqqaULOBmrYeDle0ABCxSiVRLTvyVX81zQkn/+rHiIW4fJIkBaCBSVLP0aM4WlrQh4dT0Kn+pue1Hqlkl3rNWOqrMIUQo4DFYGFluqo9eq/0LT43pX82CSBs5UowGLAWFWFPVoXcnHjDL7EKcaUkSQowrdZWjjeqbfy5ybl0fPgRANqSZRyragOGqEcqfl+9jv+cT+IUQowenx//eQDeOf8O109TB97+9Ug1TqeGMTqa0Lw8ANrKgtSAcx9Cd7NfYhXiSkiSFGDya/Jxak6yIrNICk2iY+dOAI5PX4pTg+z4UJIigzwHdTZAzRH1ffYKn8YrhBj5FiQuIDEkkfbedgxhRYSaDVS2dFNQphKhiBtVd+6WHQchYRo47XDybX+GLMRlkSQpwAxcarNVVqqt/3o9+0NTAVg+KX7woJKd6jVxpvRHEkIMYtAbuDX7VgDeLXuTG2ckA7C1sBK4cMlNnfnGsf/zS6xCXAlJkgKIpml8Uql6HS1KXkT7jh0ABM+dy0elaqnNa5JU/IF6Hb/CF2EKIUYh15LbxxUfc+00df7jW0er6bU71ZLbUrUrtu28RQ0o2Qkddf4IVYjLJklSADnXeo6KjgpMepNKkrb/DYCGZauoau3BYtQPrkfStP56pGypRxJCeDc+ajw5cTnYNTs1zg+JD7fQ0mVj1+l6ACI/fxsATW/tRkuZq3bLHt/qx4iFuDRJkgLIzoqdACxMWoi5vYeuggIADiRPByA3O5Ygk8FzUF0RtFeBwSKdtoUQF7V60moAtha/xm05asntzwXlAISvvBZ9SAi2igps0aqQm6Nb/BKnEJdLkqQAsqtcbeNfnracjg8+AKcTy7SpfFLXq973ttR2+h31mr0cTMGDPxdCiD43Zt5IiDGE823nmZallvB3FNXR0GFFHxxM+PXXA9B83AboVP+1pnN+jFiIi5MkKUC0Wls5VH8IgOXjltP2jjoaQL/yBvaWNAKwYrKXJOlUX5I0+SZfhCmEGMVCTCHclKV+V+Q3v8GstCjsTo3XD6oC7ojbVN1S6zsfoWX1FXAffsUvsQpxOSRJChAfV36MU3MyMXoiCT1mOvfsAeDg5DxsDo3x8aGMjw/zHNReCxUH1PeTJEkSQlza30/6e0AdU3JLTjQArx4oR9M0QhctwpiUhKO1lW7dDDXg0CZwOv0VrhAXJUlSgNhRpnayLR+3nLZt74LDQdDMmeyodQCwanrS4EFn3gU0SJkLEck+jFYIMVrNiJvBjNgZ2Jw2rMG7CTLpOVvXQUFpMzqDgagv/B0A9R+UQ1AktJbDuV1+jloI7yRJCgBdti4+qlCdtW/IuIG2t94CIOjmW9h5Sm3B9ZokuZfabvZJnEKIseFLU78EwBvnNnNrXwH3H/aWAhD5hdWg09G1Jx9H39IchX/0S5xCXIokSQHgw4oP6XH0kB6eTnZXON2FhaDTcWRqHp29DpIjg8gZF+k5qKcNzqrZJ6lHEkJciVWZq4gJiqG2q5bJmbUAvH20mvp2K+ZxqYQuVjtlW8v7fu8U/QU6G/0VrhBDkiQpAGw7r4q0V2Wuou0NdbBkyKJctpd3q/enJ6HT6TwHnXwLHFaImwyJ030arxBidDMbzKyeqNoBfFj/R+akR2FzaLyaXwZA1J2qbqnh9U/QkmaBoxcKf++3eIUYiiRJY1ynrbN/qS39OlpeU0cBWG5f7T6l++aZXuqNXEcGzFBT40IIcSXWTFmDSW/iUP0hlk1T7/1xbxk2h5PwlSsxJiTgaGyi27xAfZj/G3A6/BewEF5IkjTGvV/2Pr3OXjIjMkk90YC9qhp9ZCT702bSYbUzLjqY+RnRnoM6G6Gk7yiSGV/wfdBCiFEvPiSe28arLttnbK8SF2ahpq2Hvx6pQmcyEf2lNQDUvV2CFhwNrWVw5j1/hizEIJIkjXGvn30dgFuyb6H1z2p2KPK223jjqCrYvmN2Knr9BTNFRW+oU7qTciBuok/jFUKMHfdMvwcdOj6q+oDb50UA8ItdJWiaRtQ//AM6s5nuYyexp96gBuz9uR+jFWIwSZLGsPK2cvJr8tGh49boa2h/X53B5rz1Dvd5SnfMSRk88NAm9Tpjta9CFUKMQVmRWVybfi0AjeathJgNnKxp58MzDRhjYoi45RYAGvLtoDOoVgBVh/wYsRCeJEkaw7YWbwVgccpiLFt3gN1O8OzZvNMejN2pMSM1ggkJ4Z6Dak9ART7ojTBrje+DFkKMKV/L+RoA71f8lRtz1GzSzz44C0DsP60DoOW9vTgy+2aTdr/g+yCFGIIkSWOUw+ngjbNqJ9vtGbfQ/MpmAKLX/iMv71P9Su5akD544MG+HSaTboTwRJ/EKoQYu6bGTuW69OvQ0OgJewuzQc++c03sLm7AMnEiYStXgqbReEolUBx/Xc5zEyOGJElj1M6KndR21RJpiWTu4U4czc0YU5I5Nn4eJfWdhJoN/N2cVM9Bth44opIp5n3F5zELIcam+2fdD8BHtX9hVY46/ujH20+jaRpxX70PgMa/7ME5biloTvj4R36LVYiBJEkao/54QnWw/fsJq+n8/csAxNz9ZV4+UAHAHXNSCbMYPQcdfx26myEyDcZf69N4hRBj1+SYydyUqZrSNoe8itmoJ/98Mx+fbSB41ixCFi0Cu53G832d/wtfhqYSP0YshCJJ0hhU1FjEgdoDGHVGbq1LxXrmLPqwMHpu/DzvHVfdb7+8KMNzkKbBnp+o7+evA73Bx1ELIcayh+Y+hElv4nDTLlZMV79fNr59EodTI/7hhwBoeG0vzpTFoDlg1zP+DFcIQJKkMemPRWoW6fqM69G/1DeLtHYtvymsx+7UWJgVw9TkCM9BJR9A7TEwhcL8f/J1yEKIMW5c+Di+PPXLAFQaf014kJET1W3838EKQubMIezaa8HppP5kjBpw5FWoK/JjxEJIkjTmlLeX83bJ2wB8oW0S1tOn0YeFobtzDa/sV0cC/OuK8YMH7n5Rvc5dC8HRgz8XQojP6J9z/pmYoBjKuorIndoEwP979xSdVjvx33gYdDqa3j6IPfkaVZu0bYOa5RbCTyRJGmN+eeSX2DU7i5MWEf9z1UgyZu1a/nCsia5eB9NTIlg+Kd5zUEUBFL8POj0sut8PUQshAkGEOYJvzv8mAIesz5MSZaau3cqPt58maNIkIlerDv9VH9jRDGY1w336XX+GLAKcJEljSFlbGW8WvwnAl2sn0FtSgiEmBt1dd/O/n6gttf+yYvzgw2zf/756nbUGojN9GLEQItB8PvvzLEhagFXrJDlDnSv5m0/OcayylYT169FHRNB5qBRr5Ao14N0NYOv2X8AioEmSNIb89NBPcWgOliYsIuGnahYp/uGHeSm/hrYeO1OSwrlpxgWH2ZbsgpKdoDfB8n/3fdBCiICi0+l4bNFjWAwWTve+Tk6mHacGG147ihYZRfxDqoi7fPM5tNBEtctt19N+jloEKkmSxoiC2gLePvc2OnSsKQjG2dqKZdIk2q+9id/vUc0jN9w8FcPAc9qcDvjbd9X389dBdMbgHyyEEFdZVmQWD899GIDKoOcJC9JztLKVF98/S/QX7yJo5kzsTZ00lE9SAz55AaoP+zFiEagkSRoDHE4HG/dtBOC2yKXE/3E7AEmPPcoPtp2m1+Hkmolxg2uRCn4LVQfBEgHL/s3HUQshAtndU+9mYdJCenX1JKR/AMBP3j/Dwco2Ujb+AJ3JRMN7xfRGLlAtAV77GvR2+TlqEWgkSRoDXjn5CqeaTxFuCufvfnUKgOgvreHD4DS2Ha/BqNfxnVumeg7qqIcd31PfX/sohCX4OGohRCDT6/T899L/JtoSTb3hHbJTG3Bq8OCmQjqS0oh78EEASjfXoQXFQn0RvPuffo5aBBpJkka5kpYSnjv4HAD3lKYRUlyFKSUFy78+xONvHAPg/uXjmZI0oC+SpsFb66GnFZJyYME/+yFyIUSgSwpN4pnlz6DX6akLfZG4CAdVrT18fdNBIr/yFUIX52FvtVF9OBkNHRT8Lxz5k7/DFgFEkqRRzOa0seHjDVgdVhbos7nmd0fAYCD5maf5zrvF1LVbyY4L5YFrJ3gOLPwDFP0F9Ea47QXpri2E8JtFyYv4xtxvoDNY6Y57AYsJ9pY08fhfi0h+5hmMCQm0FjbR3jldDXjjAag44N+gRcCQJGkUe3r/05xoPEGEIZR7f1aCDkhY/wh/7o3j7aM1mAw6nv2HWQSZBiRBdUXwTt8utmsfg5Q5foldCCFcvjL9K9w1+S70llrMSZvQ6eCV/eW8cLCR1OefQ2c2U/nXRnoYDw4rvLIGGov9HbYIAJIkjVJ/OvUnXj31Kjp0/OtfbMQ02wm/4QZOLr+NJ986AcCGm6YyJ31A9+yOetj0D2DrgqzlsPghP0UvhBD9dDod/7HwP1iZvhLCjhCS9BcAXnz/LL9uCiP56acBPaV/7sSmT4TOOvj97dBS7t/AxZgnSdIotL10Oz/Y9wMAvpQfxNwjXQTPmUPr+kf52h8OYnNo3JKTzLolmf2DrO2weQ20lEFMNtz5W9DLv34hxMhg1Bv54bIf8rm0z6GP2k1QwnsA/Gj7aX5GBvH/+R2cdj3n/g/suhhoLYff3iwzSmJYyf9LjjI7ynbw7V3fxqE5uPakidv+1o5l4gRan3iatX84RIfVTm5WDM/eOau/s3ZPG/zhC1CRD0FR8KU/QUiMX/8cQghxIZPBxLPLn+XGzBsxxb6PJeGvAPxiVwmP6aYQseE7OKwGzm01YXdGqL/0/c8NUJ7v58jFWCVJ0ijyctHLrN+5HrtmZ+lJHV/d2k3wtGlUPfkiX958gsbOXmakRvDLtfP765BaK+C3t0DFfpUgrd0KcRP9+ccQQoghmQwmnl72NOtmrMMc+zFByVvQ6Zy8c6yGdU1pdHznv7HbLJS8EYy1OwK6GuB/b4L9v5LDcMVVp9M0eao+jba2NiIjI2ltbSUiIuLSAz6DLlsXG/dvZOvZrQBce8jJfduchOct5p0vfpMf7jyPw6mxMDOGX39lPhFBJjXw3Ifw53vV+n1IHHz5/yBl9rDGKoQQV8vbJW/zvT3fo60tHmvlP+K0h2E26vnXSUFc97PvYOpoJGVxB+EpHWrAhOvh889DZKp/AxdjhiRJn5KvkqTdlbt5ct+TlLeXo9NgzU4Ht+/V6Lznfp6NWcDec80A/N2cVH7wdzMJNhtU/6P3/xv2/0L9kMQZsOYViEoftjiFEGI4nG89z6OfPEph9Vl6qu7E0TkFgOxoC1858zfmf7yV2MmdJMzqQKd3gjkMlj4CeV8HU7CfoxejnSRJn9JwJ0nHGo7x88M/58OKDwGIa9V44E0HKYznzdUPs6Xchs2hEWwy8J1bpnJ3bjo6W7fqgbTraehqVD9o/j/B9f8FlrCrHqMQQviCw+lg86nN/LTwZzQ1ZGOtvQXNEQ7AREsvtx94k5WNu8mY30BInA0ALTQB3aJ/gblrITTOn+GLUUySpE9pOJKkjt4O3i9/n/87tpmDLUcBMDg0ri+AaacnsnfJP/B+byQ2p/pXtnxSPE/ePp00eykcfgUK/wjdTeqHxU2CG5+CCSuvSmxCCOFvrdZWfn3017xyYittdQvobVwKmgUACzYWVR7hlp593JyZT1ioFQBNZ4CJ16HLuQsm3QTmEH/+EcQoMyKSpJ/+9Kf88Ic/pKamhlmzZvHiiy+ycOHCIa/fsmULjz32GOfPn2fixIk8/fTT3Hzzze7PNU3jiSee4Fe/+hUtLS0sWbKEn//850yc2F+w3NTUxIMPPsibb76JXq9n9erVPP/884SFXd6My9VIkhxOB8cbj/PRyW3sLvuQY7YynDr1r0PXayHr9BRCmqdyMn4GzZrRPW5+WjjrJzeyuHMHlOyC1rL+HxqdCUsehjn/CAbTp4pLCCFGslZrK1tOb+G1k+9ytiwRW3Memj3K/blRszKlt4RrDCdZHFzENH0pMbSD3ogzdjr6ydehy16qShFC48G1E1iIC/g9SXr11VdZu3YtL730Erm5uTz33HNs2bKFU6dOkZAw+NDV3bt3s2zZMjZu3Mitt97Kpk2bePrppzl48CAzZswA4Omnn2bjxo387ne/Iysri8cee4yjR49y4sQJgoKCALjpppuorq7mF7/4BTabjXXr1rFgwQI2bdp0WXFfjSTplYLf8INjP/Z4L7lRY/lxjYnG5Xw7+hb3+1EhJm6akczds6OZ8YccdSq2i8EME2+A2V+CSTfKMSNCiICgaRrHG4/zxtm/8M7Jk9TWpmJvn4lmjxx0bQQdTNBVkaGrJV7XSlzfVxTdRJiMRIREEBkRS0RcMuaMqRhix6lWKdGZUtsUwPyeJOXm5rJgwQJ+8pOfAOB0OklLS+PBBx/kP/7jPwZdf9ddd9HZ2clf//pX93uLFi1i9uzZvPTSS2iaRkpKCt/85jf51re+BUBrayuJiYn89re/5Ytf/CJFRUVMmzaN/Px85s+fD8C2bdu4+eabqaioICUlZdA/12q1YrVa3f+7tbWV9PR0ysvLP3WSVNpayrrX7mJqOcyxJrEwfi6Z0/MImT8ffWQU9/3+ADNSI1k8Ppa5GdGYDH0dG35zEzhskLUMsq6BtIVgDv1UMQghxFigaRpVHVUcqD3IrpKTnKjsprYpBHt3KpotisvtePML07MsMZzof+Mft8K4+Z85vvDw8P7edWL00PzIarVqBoNBe/311z3eX7t2rXbbbbd5HZOWlqb9+Mc/9njv8ccf13JycjRN07Ti4mIN0AoLCz2uWbZsmfbQQw9pmqZp//M//6NFRUV5fG6z2TSDwaC99tprXv+5TzzxhAbIl3zJl3zJl3xd8Vdra+tl/j+jGEn6C138oKGhAYfDQWJiosf7iYmJnDx50uuYmpoar9fX1NS4P3e9d7FrLlzKMxqNxMTEuK+50IYNG1i/fr37fzudTpqamoiNjaW9vZ20tLTPNKs0FrW1tcl9GYLcG+/kvgxN7o13o+W+hIeH+zsE8Sn4NUkaTSwWCxaLxeO9qKgoAPcUakRExIj+j9Rf5L4MTe6Nd3Jfhib3xju5L2I4+PVYkri4OAwGA7W1tR7v19bWkpSU5HVMUlLSRa93vV7qmrq6Oo/P7XY7TU1NQ/5zhRBCCBFY/Jokmc1m5s2bx44dO9zvOZ1OduzYQV5entcxeXl5HtcDbN++3X19VlYWSUlJHte0tbWxb98+9zV5eXm0tLRQUFDgvub999/H6XSSm5t71f58QgghhBi9/L7ctn79eu655x7mz5/PwoULee655+js7GTdunUArF27ltTUVDZu3AjAww8/zPLly3n22We55ZZb2Lx5MwcOHOCXv/wloJa+vvGNb/Dkk08yceJEdwuAlJQU7rjjDgCmTp3KjTfeyH333cdLL72EzWbjgQce4Itf/KLXnW2XYrFYeOKJJwYtxwU6uS9Dk3vjndyXocm98U7uixhW/q4c1zRNe/HFF7X09HTNbDZrCxcu1Pbu3ev+bPny5do999zjcf2f/vQnbdKkSZrZbNamT5+uvfXWWx6fO51O7bHHHtMSExM1i8WirVy5Ujt16pTHNY2NjdqaNWu0sLAwLSIiQlu3bp3W3t4+bH9GIYQQQowufu+TJIQQQggxEvm1JkkIIYQQYqSSJEkIIYQQwgtJkoQQQgghvJAkSQghhBDCC0mSLtPPf/5zcnJy3F1d8/LyeOedd9yf9/T08PWvf53Y2FjCwsJYvXr1oIaWY9Wl7s2KFSvQ6XQeX/fff78fI/aPp556yt2iwiWQnxsXb/clUJ+Z7373u4P+3FOmTHF/HsjPy6XuTaA+M2J4SZJ0mcaNG8dTTz1FQUEBBw4c4Nprr+X222/n+PHjADzyyCO8+eabbNmyhV27dlFVVcUXvvAFP0ftG5e6NwD33Xcf1dXV7q9nnnnGjxH7Xn5+Pr/4xS/IycnxeD+QnxsY+r5A4D4z06dP9/hzf/zxx+7PAv15udi9gcB9ZsQw8ncPgtEsOjpa+/Wvf621tLRoJpNJ27Jli/uzoqIiDdD27Nnjxwj9x3VvNE31unr44Yf9G5Aftbe3axMnTtS2b9/ucS8C/bkZ6r5oWuA+M0888YQ2a9Ysr58F+vNysXujaYH7zIjhJTNJn4LD4WDz5s10dnaSl5dHQUEBNpuN6667zn3NlClTSE9PZ8+ePX6M1PcuvDcuL7/8MnFxccyYMYMNGzbQ1dXlxyh96+tf/zq33HKLx/MBBPxzM9R9cQnUZ+bMmTOkpKSQnZ3N3XffTVlZGSDPCwx9b1wC9ZkRw8fvx5KMJkePHiUvL4+enh7CwsJ4/fXXmTZtGocOHcJsNhMVFeVxfWJiIjU1Nf4J1seGujcAX/rSl8jIyCAlJYUjR47w7//+75w6dYrXXnvNz1EPv82bN3Pw4EHy8/MHfVZTUxOwz83F7gsE7jOTm5vLb3/7WyZPnkx1dTXf+973uOaaazh27FhAPy9w8XsTHh4esM+MGF6SJF2ByZMnc+jQIVpbW/nzn//MPffcw65du/wd1ogw1L2ZNm0aX/3qV93XzZw5k+TkZFauXElxcTHjx4/3Y9TDq7y8nIcffpjt27cTFBTk73BGjMu5L4H6zNx0003u73NycsjNzSUjI4M//elPBAcH+zEy/7vYvbn33nsD9pkRw0uW266A2WxmwoQJzJs3j40bNzJr1iyef/55kpKS6O3tpaWlxeP62tpakpKS/BOsjw11b7zJzc0F4OzZs74M0ecKCgqoq6tj7ty5GI1GjEYju3bt4oUXXsBoNJKYmBiQz82l7ovD4Rg0JlCemQtFRUUxadIkzp49K79nLjDw3ngTqM+MuLokSfoMnE4nVquVefPmYTKZ2LFjh/uzU6dOUVZW5lGXE0hc98abQ4cOAZCcnOzDiHxv5cqVHD16lEOHDrm/5s+fz9133+3+PhCfm0vdF4PBMGhMoDwzF+ro6KC4uJjk5GT5PXOBgffGm0B9ZsTVJcttl2nDhg3cdNNNpKen097ezqZNm9i5cyfvvvsukZGR3Hvvvaxfv56YmBgiIiJ48MEHycvLY9GiRf4Ofdhd7N4UFxezadMmbr75ZmJjYzly5AiPPPIIy5Yt87rteywJDw9nxowZHu+FhoYSGxvrfj8Qn5tL3ZdAfma+9a1v8fnPf56MjAyqqqp44oknMBgMrFmzJuB/z1zs3gTyMyOGlyRJl6muro61a9dSXV1NZGQkOTk5vPvuu1x//fUA/PjHP0av17N69WqsViurVq3iZz/7mZ+j9o2L3Zvy8nL+9re/8dxzz9HZ2UlaWhqrV6/m0Ucf9XfYI0IgPzdDMZvNAfvMVFRUsGbNGhobG4mPj2fp0qXs3buX+Ph4ILCfl4vdm56enoB9ZsTw0mmapvk7CCGEEEKIkUZqkoQQQgghvJAkSQghhBDCC0mShBBCCCG8kCRJCCGEEMILSZKEEEIIIbyQJEkIIYQQwgtJkoQQQgghvJAkSQghhBDCC0mShBBCCCG8kCRJCCGEEMILSZKEEEIIIbz4/y/oV9H/WC+vAAAAAElFTkSuQmCC\n" + }, + "metadata": {} + } + ], + "source": [ + "# Plot KDE for the 'Open', 'High', 'Low', 'Close' columns of the 'stock' DataFrame.\n", + "sns.displot(data=stock[['Open','High','Low','Close']], kind='kde', palette=\"tab10\"); # Create a KDE plot with a color palette." + ] + }, + { + "cell_type": "markdown", + "id": "l5jX1Kp-lbD5", + "metadata": { + "id": "l5jX1Kp-lbD5" + }, + "source": [ + "**Observations:**\n", + "* The distributions of the prices are quite similar, with the high price showing a slight variation than the others." + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Histogram on Volume**" + ], + "metadata": { + "id": "1wBKKuVIaWnl" + }, + "id": "1wBKKuVIaWnl" + }, + { + "cell_type": "code", + "source": [ + "sns.histplot(stock, x='Volume');" + ], + "metadata": { + "id": "FMDJ_m6maaoK", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "fa63eb8f-f159-41cc-c969-fce63404ff4b" + }, + "id": "FMDJ_m6maaoK", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Observations:**\n", + "* In a large portion of the time considered, 80 to 175 million shares of the stock were traded, with occasional days where the volume rose to more than 200 million." + ], + "metadata": { + "id": "gNzyLLIgfnz6" + }, + "id": "gNzyLLIgfnz6" + }, + { + "cell_type": "markdown", + "id": "9GVt_AAbe29X", + "metadata": { + "id": "9GVt_AAbe29X" + }, + "source": [ + "#### **Histogram and statistical summary on News Length**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0kwZSJvwOUpa", + "metadata": { + "id": "0kwZSJvwOUpa", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "9c9e9158-a8e5-4f51-9321-d54fb2ed6675" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "count 418.000000\n", + "mean 525.662679\n", + "std 303.584080\n", + "min 44.000000\n", + "25% 304.250000\n", + "50% 480.000000\n", + "75% 700.500000\n", + "max 2142.000000\n", + "Name: news_len, dtype: float64" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
news_len
count418.000000
mean525.662679
std303.584080
min44.000000
25%304.250000
50%480.000000
75%700.500000
max2142.000000
\n", + "

" + ] + }, + "metadata": {}, + "execution_count": 14 + } + ], + "source": [ + "#Calculating the total number of words present in the news content.\n", + "stock['news_len'] = stock['News'].apply(lambda x: len(x.split(' ')))\n", + "stock['news_len'].describe()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "NWn03B4Xey5d", + "metadata": { + "id": "NWn03B4Xey5d", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "abfb1de2-8239-4a5d-d9e3-71b30ad328ef" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "sns.histplot(data=stock,x='news_len');" + ] + }, + { + "cell_type": "markdown", + "id": "VWLWG2X8mrCw", + "metadata": { + "id": "VWLWG2X8mrCw" + }, + "source": [ + "**Observations:**\n", + "* Most of the news have between 50 - 1000 words, with an average of 525 words\n", + " * The shortest news has 44 words\n", + "\n", + "* This indicates that these are likely to be news summaries rather than the actual news content itself." + ] + }, + { + "cell_type": "markdown", + "id": "hLE0s7OFKilB", + "metadata": { + "id": "hLE0s7OFKilB" + }, + "source": [ + "### **Bivariate Analysis**" + ] + }, + { + "cell_type": "markdown", + "id": "Yn_9wfzxL-r1", + "metadata": { + "id": "Yn_9wfzxL-r1" + }, + "source": [ + "#### **Correlation**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gOBaxNZeKllB", + "metadata": { + "id": "gOBaxNZeKllB", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c46aaa76-2682-4469-c36e-44097287cc6e" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "cols = ['Open','High','Low','Close','Volume','news_len']\n", + "sns.heatmap(\n", + " stock[cols].corr(), annot=True, vmin=-1, vmax=1, fmt=\".2f\", cmap=\"Spectral\"\n", + ")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "15UHbBu8Cucj", + "metadata": { + "id": "15UHbBu8Cucj" + }, + "source": [ + "**Observations:**\n", + "* The prices are all perfectly correlated.\n", + " * This might be due to the minimum variation between the different prices.\n", + "\n", + "* There is a negative correlation, albeit very low, between volume and prices.\n", + " * This might be due to selling pressure during periods of negative sentiment." + ] + }, + { + "cell_type": "markdown", + "id": "h-Hz7CpdMAi3", + "metadata": { + "id": "h-Hz7CpdMAi3" + }, + "source": [ + "#### **Label vs Price (Open, High, Low, Close)**" + ] + }, + { + "cell_type": "code", + "source": [ + "plt.figure(figsize=(10, 8))\n", + "\n", + "for i, variable in enumerate(['Open', 'High', 'Low', 'Close']):\n", + " plt.subplot(2, 2, i + 1)\n", + " sns.boxplot(data=stock, x=\"Label\", y=variable)\n", + " plt.tight_layout(pad=2)\n", + "\n", + "plt.show()" + ], + "metadata": { + "id": "lCVHNWhgMElU", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b838000a-b88f-43ac-90ee-499cf11fa5ee" + }, + "id": "lCVHNWhgMElU", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Observations:**\n", + "* The median for all prices is significantly lower for negative sentiment news as compared to both positive and neutral sentiment news, indicating that negative news likely triggers investor sell-offs which drive the stock prices down.\n", + "\n", + "* The boxplot for the open price under neutral sentiment displays a notably higher upper whisker relative to positive sentiment. This suggests that the market's opening often covers a wider range of prices when news is neutral.\n", + " * This variability might be attributed to different interpretations of seemingly neutral news, which leads some investors to react more aggressively and drive the opening price to higher levels." + ], + "metadata": { + "id": "axyzmidFWaNS" + }, + "id": "axyzmidFWaNS" + }, + { + "cell_type": "markdown", + "id": "cY9P2rdBMH-h", + "metadata": { + "id": "cY9P2rdBMH-h" + }, + "source": [ + "#### **Label vs Volume**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "mzCxLFg1LCPk", + "metadata": { + "id": "mzCxLFg1LCPk", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "13536009-bc63-4dff-b422-df723211166c" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "sns.boxplot(\n", + " data=stock, x=\"Label\", y=\"Volume\"\n", + ");" + ] + }, + { + "cell_type": "markdown", + "id": "LFipGtxhOa8g", + "metadata": { + "id": "LFipGtxhOa8g" + }, + "source": [ + "**Observations:**\n", + "* The median trading volume for the stock is approximately the same across all sentiment polarities.\n", + "* The volume distribution for positive sentiment news shows a wider spread compared to other sentiment categories.\n", + " - This wider range might indicate that even positive news leads to diverse interpretations among investors, contributing to varied trading activities and reactions." + ] + }, + { + "cell_type": "markdown", + "id": "9ySUmJUyQ0vi", + "metadata": { + "id": "9ySUmJUyQ0vi" + }, + "source": [ + "#### **Date vs Price (Open, High, Low, Close)**" + ] + }, + { + "cell_type": "markdown", + "id": "tq0NL64DQ0v1", + "metadata": { + "id": "tq0NL64DQ0v1" + }, + "source": [ + "- The data is at the level of news, and we might have more than one news in a day. However, the prices are at daily level\n", + "- So, we can aggregate the data at a daily level by taking the mean of the attributes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bECqvVQtwheA", + "metadata": { + "id": "bECqvVQtwheA", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "89dd8d7b-2323-418b-9e11-544405d0396b" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Open High Low Close Volume\n", + "Date \n", + "2019-01-02 38.72 39.71 38.56 39.48 130672400.0\n", + "2019-01-03 35.99 36.43 35.50 35.55 103544800.0\n", + "2019-01-04 36.13 37.14 35.95 37.06 111448000.0\n", + "2019-01-07 37.17 37.21 36.47 36.98 109012000.0\n", + "2019-01-08 37.39 37.96 37.13 37.69 216071600.0" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
OpenHighLowCloseVolume
Date
2019-01-0238.7239.7138.5639.48130672400.0
2019-01-0335.9936.4335.5035.55103544800.0
2019-01-0436.1337.1435.9537.06111448000.0
2019-01-0737.1737.2136.4736.98109012000.0
2019-01-0837.3937.9637.1337.69216071600.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "stock_daily", + "summary": "{\n \"name\": \"stock_daily\",\n \"rows\": 73,\n \"fields\": [\n {\n \"column\": \"Date\",\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": \"2019-01-02 00:00:00\",\n \"max\": \"2019-04-29 00:00:00\",\n \"num_unique_values\": 73,\n \"samples\": [\n \"2019-01-08 00:00:00\",\n \"2019-04-15 00:00:00\",\n \"2019-01-30 00:00:00\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Open\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.522032992144083,\n \"min\": 35.99,\n \"max\": 51.84,\n \"num_unique_values\": 69,\n \"samples\": [\n 43.22,\n 38.71999999999999,\n 48.830000000000005\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"High\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.515838152119043,\n \"min\": 36.43,\n \"max\": 52.12,\n \"num_unique_values\": 68,\n \"samples\": [\n 49.080000000000005,\n 39.08,\n 37.96\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Low\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.522815819552585,\n \"min\": 35.5,\n \"max\": 51.76,\n \"num_unique_values\": 66,\n \"samples\": [\n 49.54,\n 50.97,\n 38.56\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.5189565039202,\n \"min\": 35.55,\n \"max\": 51.86999999999999,\n \"num_unique_values\": 68,\n \"samples\": [\n 48.77,\n 39.08,\n 37.68999999999999\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 49858245.96607382,\n \"min\": 45448000.0,\n \"max\": 365248800.0,\n \"num_unique_values\": 73,\n \"samples\": [\n 216071600.0,\n 70146400.0,\n 244439200.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 19 + } + ], + "source": [ + "stock_daily = stock.groupby('Date').agg(\n", + " {\n", + " 'Open': 'mean',\n", + " 'High': 'mean',\n", + " 'Low': 'mean',\n", + " 'Close': 'mean',\n", + " 'Volume': 'mean',\n", + " }\n", + ").reset_index() # Group the 'stocks' DataFrame by the 'Date' column\n", + "\n", + "stock_daily.set_index('Date', inplace=True)\n", + "stock_daily.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7ORSmC3lxrwy", + "metadata": { + "id": "7ORSmC3lxrwy", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "119a6be1-50f4-47be-d5d4-f3cdd077f128" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.figure(figsize=(15,5))\n", + "sns.lineplot(stock_daily.drop('Volume', axis=1));" + ] + }, + { + "cell_type": "markdown", + "id": "5EZ5L0-UQ0v2", + "metadata": { + "id": "5EZ5L0-UQ0v2" + }, + "source": [ + "**Observations:**\n", + "* The stock price has gradually increased over time from ~\\$40 to ~\\$50 in the period for which the data is available." + ] + }, + { + "cell_type": "markdown", + "id": "KG4y9NK1Ng1-", + "metadata": { + "id": "KG4y9NK1Ng1-" + }, + "source": [ + "#### **Volume vs Close Price**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0WMHYw6w0TM6", + "metadata": { + "id": "0WMHYw6w0TM6", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "8517bfee-4d34-40e8-a1cd-567350afe1bb" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "# Create a figure and axis\n", + "fig, ax1 = plt.subplots(figsize=(15,5))\n", + "\n", + "# Lineplot on primary y-axis\n", + "sns.lineplot(data=stock_daily.reset_index(), x='Date', y='Close', ax=ax1, color='blue', marker='o', label='Close Price')\n", + "\n", + "# Create a secondary y-axis\n", + "ax2 = ax1.twinx()\n", + "\n", + "# Lineplot on secondary y-axis\n", + "sns.lineplot(data=stock_daily.reset_index(), x='Date', y='Volume', ax=ax2, color='gray', marker='o', label='Volume')\n", + "\n", + "ax1.legend(bbox_to_anchor=(1,1));" + ] + }, + { + "cell_type": "markdown", + "id": "fHU5KgCGNOX5", + "metadata": { + "id": "fHU5KgCGNOX5" + }, + "source": [ + "**Observations:**\n", + "- There is no specific pattern here\n", + " - There have been periods where the price decreased with increasing volumes\n", + " - There have been periods where the price increased with increasing volumes" + ] + }, + { + "cell_type": "markdown", + "id": "N8z4-vOBmwqv", + "metadata": { + "id": "N8z4-vOBmwqv" + }, + "source": [ + "## **Data Preprocessing**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2jIN9NycxtUC", + "metadata": { + "id": "2jIN9NycxtUC", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "fee181c0-f896-4c4c-cf2c-378747dfb425" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "count 418\n", + "mean 2019-02-14 12:24:06.889952256\n", + "min 2019-01-02 00:00:00\n", + "25% 2019-01-11 00:00:00\n", + "50% 2019-01-31 00:00:00\n", + "75% 2019-03-21 00:00:00\n", + "max 2019-04-29 00:00:00\n", + "Name: Date, dtype: object" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Date
count418
mean2019-02-14 12:24:06.889952256
min2019-01-02 00:00:00
25%2019-01-11 00:00:00
50%2019-01-31 00:00:00
75%2019-03-21 00:00:00
max2019-04-29 00:00:00
\n", + "

" + ] + }, + "metadata": {}, + "execution_count": 22 + } + ], + "source": [ + "stock['Date'].describe()" + ] + }, + { + "cell_type": "markdown", + "id": "0FxlsnepSb5m", + "metadata": { + "id": "0FxlsnepSb5m" + }, + "source": [ + "**Observations:**\n", + "* We see that 75% of the data is till the third week of March 2019.\n", + "* We'll take the data till the end of March 2019 for training, and keep the April 2019 data for test set." + ] + }, + { + "cell_type": "markdown", + "id": "j7KR_HgZRDtk", + "metadata": { + "id": "j7KR_HgZRDtk" + }, + "source": [ + "### Train-test Split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "yXsgpkpeI8UK", + "metadata": { + "id": "yXsgpkpeI8UK" + }, + "outputs": [], + "source": [ + "X_train = stock[stock['Date'] < '2019-04-01'].reset_index()\n", + "X_test = stock[(stock['Date'] >= '2019-04-01')].reset_index()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "__2ON8RuI8Q2", + "metadata": { + "id": "__2ON8RuI8Q2" + }, + "outputs": [], + "source": [ + "y_train = X_train['Label'].copy()\n", + "y_test = X_test['Label'].copy()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "imMx6hH0__IB", + "metadata": { + "id": "imMx6hH0__IB", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "3dbdaa6f-f304-47f0-94a8-32e8a5a7309d" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Train data shape (347, 10)\n", + "Test data shape (71, 10)\n", + "Train label shape (347,)\n", + "Test label shape (71,)\n" + ] + } + ], + "source": [ + "print(\"Train data shape\",X_train.shape)\n", + "print(\"Test data shape \",X_test.shape)\n", + "\n", + "print(\"Train label shape\",y_train.shape)\n", + "print(\"Test label shape \",y_test.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "uJZqic2Q6YZD", + "metadata": { + "id": "uJZqic2Q6YZD" + }, + "outputs": [], + "source": [ + "# y_train.value_counts(normalize=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "Xf9R3BaR6bZw", + "metadata": { + "id": "Xf9R3BaR6bZw" + }, + "outputs": [], + "source": [ + "# y_test.value_counts(normalize=True)" + ] + }, + { + "cell_type": "markdown", + "id": "0rYgR14ORf7b", + "metadata": { + "id": "0rYgR14ORf7b" + }, + "source": [ + "## **Word Embeddings**" + ] + }, + { + "cell_type": "markdown", + "id": "4IUBFAOTbjju", + "metadata": { + "id": "4IUBFAOTbjju" + }, + "source": [ + "### **Generating Text Embeddings using Word2Vec**" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Defining the model**" + ], + "metadata": { + "id": "bzwPsqJvVbNC" + }, + "id": "bzwPsqJvVbNC" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ZD188ZNsboS4", + "metadata": { + "id": "ZD188ZNsboS4" + }, + "outputs": [], + "source": [ + "# Creating a list of all words in our data\n", + "words_list = [item.split(\" \") for item in stock['News'].values]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eGVgM5iTbwHy", + "metadata": { + "id": "eGVgM5iTbwHy" + }, + "outputs": [], + "source": [ + "# Creating an instance of Word2Vec\n", + "vec_size = 300\n", + "model_W2V = Word2Vec(words_list, vector_size = vec_size, min_count = 1, window=5, workers = 6)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "lhy6DjNxbzOd", + "metadata": { + "id": "lhy6DjNxbzOd", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "6bf055ce-4c91-4cb3-bd4c-673e2c5de694" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Length of the vocabulary is 14577\n" + ] + } + ], + "source": [ + "# Checking the size of the vocabulary\n", + "print(\"Length of the vocabulary is\", len(list(model_W2V.wv.key_to_index)))" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Encoding the datasets**" + ], + "metadata": { + "id": "ZYCiT-7GVNaH" + }, + "id": "ZYCiT-7GVNaH" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "F_4ldXPzcF7y", + "metadata": { + "id": "F_4ldXPzcF7y" + }, + "outputs": [], + "source": [ + "# Retrieving the words present in the Word2Vec model's vocabulary\n", + "words = list(model_W2V.wv.key_to_index.keys())\n", + "\n", + "# Retrieving word vectors for all the words present in the model's vocabulary\n", + "wvs = model_W2V.wv[words].tolist()\n", + "\n", + "# Creating a dictionary of words and their corresponding vectors\n", + "word_vector_dict = dict(zip(words, wvs))" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Averaging the word vectors to get sentence encodings**" + ], + "metadata": { + "id": "GgismcJz0dZE" + }, + "id": "GgismcJz0dZE" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "vsQ0vF42cH_r", + "metadata": { + "id": "vsQ0vF42cH_r" + }, + "outputs": [], + "source": [ + "def average_vectorizer_Word2Vec(doc):\n", + " # Initializing a feature vector for the sentence\n", + " feature_vector = np.zeros((vec_size,), dtype=\"float64\")\n", + "\n", + " # Creating a list of words in the sentence that are present in the model vocabulary\n", + " words_in_vocab = [word for word in doc.split() if word in words]\n", + "\n", + " # adding the vector representations of the words\n", + " for word in words_in_vocab:\n", + " feature_vector += np.array(word_vector_dict[word])\n", + "\n", + " # Dividing by the number of words to get the average vector\n", + " if len(words_in_vocab) != 0:\n", + " feature_vector /= len(words_in_vocab)\n", + "\n", + " return feature_vector" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "Jtxc1yVHcJjV", + "metadata": { + "id": "Jtxc1yVHcJjV", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "a0ab0ac1-7d5a-4cb0-d761-af1e7f93a323" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Time taken 8.816098928451538\n" + ] + } + ], + "source": [ + "# creating a dataframe of the vectorized documents\n", + "start = time.time()\n", + "\n", + "X_train_wv = pd.DataFrame(X_train['News'].apply(average_vectorizer_Word2Vec).tolist(), columns=['Feature '+str(i) for i in range(vec_size)])\n", + "X_test_wv = pd.DataFrame(X_test['News'].apply(average_vectorizer_Word2Vec).tolist(), columns=['Feature '+str(i) for i in range(vec_size)])\n", + "\n", + "end = time.time()\n", + "print('Time taken ', (end-start))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8IrY8tZjA4VZ", + "metadata": { + "id": "8IrY8tZjA4VZ", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "ded041ed-e998-442e-edc6-4bf5abe65675" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(347, 300) (71, 300)\n" + ] + } + ], + "source": [ + "print(X_train_wv.shape, X_test_wv.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "a3GUvne0hyPx", + "metadata": { + "id": "a3GUvne0hyPx" + }, + "source": [ + "### **Generating Text Embeddings using Sentence Transformer**" + ] + }, + { + "cell_type": "markdown", + "id": "51ITQezWi9VE", + "metadata": { + "id": "51ITQezWi9VE" + }, + "source": [ + "#### **Defining the model**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3EQ7eQIpYSyz", + "metadata": { + "id": "3EQ7eQIpYSyz" + }, + "outputs": [], + "source": [ + "#Defining the model\n", + "model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')" + ] + }, + { + "cell_type": "markdown", + "id": "Lll4MLfzKfBa", + "metadata": { + "id": "Lll4MLfzKfBa" + }, + "source": [ + "#### **Encoding the dataset**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "q1BaGKThKcX3", + "metadata": { + "id": "q1BaGKThKcX3", + "colab": { + "base_uri": "https://localhost:8080/", + "referenced_widgets": [ + "1230a037e0b9479caa9db62c5f9ecb6a", + "ed6c19298c4747a59992a79d99cdaaa7", + "e010222da3cf4751995a51ffc82560ef", + "9f3e3b616bcf482d9fd91a2b54d8d82a", + "6838e428d6d54a3f80d34638812441e6", + "991c2589b56f444486443a31bef569d5", + "4ed01d32996f47f38fbaba687cee45ae", + "a0ce999dbcfe427ba08202bc989b1c33", + "f598184dc72f443ab0ada8de6cf076ad", + "96e9e320eec74a2e9094935af065b254", + "fb854fb10f3e415c9c4c0ac176fb74b4", + "2fb4071397a049f888159e2cbec3ec99", + "280899c6e305423a8d6f20dd395b4e10", + "f68b5d3640c54560b38a29f32deb33a8", + "115335a31d874aba99efb63fa2830e09", + "7b371d0574e04f98bf87a88f722b8477", + "9095b2b09d4a45928fbc3cf45eb35cbb", + "971a53d397494d76b8b5c4a2abb954f7", + "54dd267783314434a5389477c97974e5", + "5bfd23c3586e4615909878610be8e24b", + "4eda58c3e66e40db98ea40fc40ebb109", + "99ee6edbe0574c778200ae65b87d7e0f" + ] + }, + "outputId": "03ca294e-285e-4fe1-f930-d22aaa9f87dc" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Batches: 0%| | 0/11 [00:00#sk-container-id-1 {\n", + " /* Definition of color scheme common for light and dark mode */\n", + " --sklearn-color-text: #000;\n", + " --sklearn-color-text-muted: #666;\n", + " --sklearn-color-line: gray;\n", + " /* Definition of color scheme for unfitted estimators */\n", + " --sklearn-color-unfitted-level-0: #fff5e6;\n", + " --sklearn-color-unfitted-level-1: #f6e4d2;\n", + " --sklearn-color-unfitted-level-2: #ffe0b3;\n", + " --sklearn-color-unfitted-level-3: chocolate;\n", + " /* Definition of color scheme for fitted estimators */\n", + " --sklearn-color-fitted-level-0: #f0f8ff;\n", + " --sklearn-color-fitted-level-1: #d4ebff;\n", + " --sklearn-color-fitted-level-2: #b3dbfd;\n", + " --sklearn-color-fitted-level-3: cornflowerblue;\n", + "\n", + " /* Specific color for light theme */\n", + " --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n", + " --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n", + " --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n", + " --sklearn-color-icon: #696969;\n", + "\n", + " @media (prefers-color-scheme: dark) {\n", + " /* Redefinition of color scheme for dark theme */\n", + " --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n", + " --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n", + " --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n", + " --sklearn-color-icon: #878787;\n", + " }\n", + "}\n", + "\n", + "#sk-container-id-1 {\n", + " color: var(--sklearn-color-text);\n", + "}\n", + "\n", + "#sk-container-id-1 pre {\n", + " padding: 0;\n", + "}\n", + "\n", + "#sk-container-id-1 input.sk-hidden--visually {\n", + " border: 0;\n", + " clip: rect(1px 1px 1px 1px);\n", + " clip: rect(1px, 1px, 1px, 1px);\n", + " height: 1px;\n", + " margin: -1px;\n", + " overflow: hidden;\n", + " padding: 0;\n", + " position: absolute;\n", + " width: 1px;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-dashed-wrapped {\n", + " border: 1px dashed var(--sklearn-color-line);\n", + " margin: 0 0.4em 0.5em 0.4em;\n", + " box-sizing: border-box;\n", + " padding-bottom: 0.4em;\n", + " background-color: var(--sklearn-color-background);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-container {\n", + " /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n", + " but bootstrap.min.css set `[hidden] { display: none !important; }`\n", + " so we also need the `!important` here to be able to override the\n", + " default hidden behavior on the sphinx rendered scikit-learn.org.\n", + " See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n", + " display: inline-block !important;\n", + " position: relative;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-text-repr-fallback {\n", + " display: none;\n", + "}\n", + "\n", + "div.sk-parallel-item,\n", + "div.sk-serial,\n", + "div.sk-item {\n", + " /* draw centered vertical line to link estimators */\n", + " background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n", + " background-size: 2px 100%;\n", + " background-repeat: no-repeat;\n", + " background-position: center center;\n", + "}\n", + "\n", + "/* Parallel-specific style estimator block */\n", + "\n", + "#sk-container-id-1 div.sk-parallel-item::after {\n", + " content: \"\";\n", + " width: 100%;\n", + " border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n", + " flex-grow: 1;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-parallel {\n", + " display: flex;\n", + " align-items: stretch;\n", + " justify-content: center;\n", + " background-color: var(--sklearn-color-background);\n", + " position: relative;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-parallel-item {\n", + " display: flex;\n", + " flex-direction: column;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-parallel-item:first-child::after {\n", + " align-self: flex-end;\n", + " width: 50%;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-parallel-item:last-child::after {\n", + " align-self: flex-start;\n", + " width: 50%;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-parallel-item:only-child::after {\n", + " width: 0;\n", + "}\n", + "\n", + "/* Serial-specific style estimator block */\n", + "\n", + "#sk-container-id-1 div.sk-serial {\n", + " display: flex;\n", + " flex-direction: column;\n", + " align-items: center;\n", + " background-color: var(--sklearn-color-background);\n", + " padding-right: 1em;\n", + " padding-left: 1em;\n", + "}\n", + "\n", + "\n", + "/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n", + "clickable and can be expanded/collapsed.\n", + "- Pipeline and ColumnTransformer use this feature and define the default style\n", + "- Estimators will overwrite some part of the style using the `sk-estimator` class\n", + "*/\n", + "\n", + "/* Pipeline and ColumnTransformer style (default) */\n", + "\n", + "#sk-container-id-1 div.sk-toggleable {\n", + " /* Default theme specific background. It is overwritten whether we have a\n", + " specific estimator or a Pipeline/ColumnTransformer */\n", + " background-color: var(--sklearn-color-background);\n", + "}\n", + "\n", + "/* Toggleable label */\n", + "#sk-container-id-1 label.sk-toggleable__label {\n", + " cursor: pointer;\n", + " display: flex;\n", + " width: 100%;\n", + " margin-bottom: 0;\n", + " padding: 0.5em;\n", + " box-sizing: border-box;\n", + " text-align: center;\n", + " align-items: start;\n", + " justify-content: space-between;\n", + " gap: 0.5em;\n", + "}\n", + "\n", + "#sk-container-id-1 label.sk-toggleable__label .caption {\n", + " font-size: 0.6rem;\n", + " font-weight: lighter;\n", + " color: var(--sklearn-color-text-muted);\n", + "}\n", + "\n", + "#sk-container-id-1 label.sk-toggleable__label-arrow:before {\n", + " /* Arrow on the left of the label */\n", + " content: \"▸\";\n", + " float: left;\n", + " margin-right: 0.25em;\n", + " color: var(--sklearn-color-icon);\n", + "}\n", + "\n", + "#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {\n", + " color: var(--sklearn-color-text);\n", + "}\n", + "\n", + "/* Toggleable content - dropdown */\n", + "\n", + "#sk-container-id-1 div.sk-toggleable__content {\n", + " max-height: 0;\n", + " max-width: 0;\n", + " overflow: hidden;\n", + " text-align: left;\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-0);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-toggleable__content.fitted {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-0);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-toggleable__content pre {\n", + " margin: 0.2em;\n", + " border-radius: 0.25em;\n", + " color: var(--sklearn-color-text);\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-0);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-toggleable__content.fitted pre {\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-fitted-level-0);\n", + "}\n", + "\n", + "#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n", + " /* Expand drop-down */\n", + " max-height: 200px;\n", + " max-width: 100%;\n", + " overflow: auto;\n", + "}\n", + "\n", + "#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n", + " content: \"▾\";\n", + "}\n", + "\n", + "/* Pipeline/ColumnTransformer-specific style */\n", + "\n", + "#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n", + " color: var(--sklearn-color-text);\n", + " background-color: var(--sklearn-color-unfitted-level-2);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n", + " background-color: var(--sklearn-color-fitted-level-2);\n", + "}\n", + "\n", + "/* Estimator-specific style */\n", + "\n", + "/* Colorize estimator box */\n", + "#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-2);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-2);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-label label.sk-toggleable__label,\n", + "#sk-container-id-1 div.sk-label label {\n", + " /* The background is the default theme color */\n", + " color: var(--sklearn-color-text-on-default-background);\n", + "}\n", + "\n", + "/* On hover, darken the color of the background */\n", + "#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {\n", + " color: var(--sklearn-color-text);\n", + " background-color: var(--sklearn-color-unfitted-level-2);\n", + "}\n", + "\n", + "/* Label box, darken color on hover, fitted */\n", + "#sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n", + " color: var(--sklearn-color-text);\n", + " background-color: var(--sklearn-color-fitted-level-2);\n", + "}\n", + "\n", + "/* Estimator label */\n", + "\n", + "#sk-container-id-1 div.sk-label label {\n", + " font-family: monospace;\n", + " font-weight: bold;\n", + " display: inline-block;\n", + " line-height: 1.2em;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-label-container {\n", + " text-align: center;\n", + "}\n", + "\n", + "/* Estimator-specific */\n", + "#sk-container-id-1 div.sk-estimator {\n", + " font-family: monospace;\n", + " border: 1px dotted var(--sklearn-color-border-box);\n", + " border-radius: 0.25em;\n", + " box-sizing: border-box;\n", + " margin-bottom: 0.5em;\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-0);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-estimator.fitted {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-0);\n", + "}\n", + "\n", + "/* on hover */\n", + "#sk-container-id-1 div.sk-estimator:hover {\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-2);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-estimator.fitted:hover {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-2);\n", + "}\n", + "\n", + "/* Specification for estimator info (e.g. \"i\" and \"?\") */\n", + "\n", + "/* Common style for \"i\" and \"?\" */\n", + "\n", + ".sk-estimator-doc-link,\n", + "a:link.sk-estimator-doc-link,\n", + "a:visited.sk-estimator-doc-link {\n", + " float: right;\n", + " font-size: smaller;\n", + " line-height: 1em;\n", + " font-family: monospace;\n", + " background-color: var(--sklearn-color-background);\n", + " border-radius: 1em;\n", + " height: 1em;\n", + " width: 1em;\n", + " text-decoration: none !important;\n", + " margin-left: 0.5em;\n", + " text-align: center;\n", + " /* unfitted */\n", + " border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n", + " color: var(--sklearn-color-unfitted-level-1);\n", + "}\n", + "\n", + ".sk-estimator-doc-link.fitted,\n", + "a:link.sk-estimator-doc-link.fitted,\n", + "a:visited.sk-estimator-doc-link.fitted {\n", + " /* fitted */\n", + " border: var(--sklearn-color-fitted-level-1) 1pt solid;\n", + " color: var(--sklearn-color-fitted-level-1);\n", + "}\n", + "\n", + "/* On hover */\n", + "div.sk-estimator:hover .sk-estimator-doc-link:hover,\n", + ".sk-estimator-doc-link:hover,\n", + "div.sk-label-container:hover .sk-estimator-doc-link:hover,\n", + ".sk-estimator-doc-link:hover {\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-3);\n", + " color: var(--sklearn-color-background);\n", + " text-decoration: none;\n", + "}\n", + "\n", + "div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n", + ".sk-estimator-doc-link.fitted:hover,\n", + "div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n", + ".sk-estimator-doc-link.fitted:hover {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-3);\n", + " color: var(--sklearn-color-background);\n", + " text-decoration: none;\n", + "}\n", + "\n", + "/* Span, style for the box shown on hovering the info icon */\n", + ".sk-estimator-doc-link span {\n", + " display: none;\n", + " z-index: 9999;\n", + " position: relative;\n", + " font-weight: normal;\n", + " right: .2ex;\n", + " padding: .5ex;\n", + " margin: .5ex;\n", + " width: min-content;\n", + " min-width: 20ex;\n", + " max-width: 50ex;\n", + " color: var(--sklearn-color-text);\n", + " box-shadow: 2pt 2pt 4pt #999;\n", + " /* unfitted */\n", + " background: var(--sklearn-color-unfitted-level-0);\n", + " border: .5pt solid var(--sklearn-color-unfitted-level-3);\n", + "}\n", + "\n", + ".sk-estimator-doc-link.fitted span {\n", + " /* fitted */\n", + " background: var(--sklearn-color-fitted-level-0);\n", + " border: var(--sklearn-color-fitted-level-3);\n", + "}\n", + "\n", + ".sk-estimator-doc-link:hover span {\n", + " display: block;\n", + "}\n", + "\n", + "/* \"?\"-specific style due to the `` HTML tag */\n", + "\n", + "#sk-container-id-1 a.estimator_doc_link {\n", + " float: right;\n", + " font-size: 1rem;\n", + " line-height: 1em;\n", + " font-family: monospace;\n", + " background-color: var(--sklearn-color-background);\n", + " border-radius: 1rem;\n", + " height: 1rem;\n", + " width: 1rem;\n", + " text-decoration: none;\n", + " /* unfitted */\n", + " color: var(--sklearn-color-unfitted-level-1);\n", + " border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n", + "}\n", + "\n", + "#sk-container-id-1 a.estimator_doc_link.fitted {\n", + " /* fitted */\n", + " border: var(--sklearn-color-fitted-level-1) 1pt solid;\n", + " color: var(--sklearn-color-fitted-level-1);\n", + "}\n", + "\n", + "/* On hover */\n", + "#sk-container-id-1 a.estimator_doc_link:hover {\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-3);\n", + " color: var(--sklearn-color-background);\n", + " text-decoration: none;\n", + "}\n", + "\n", + "#sk-container-id-1 a.estimator_doc_link.fitted:hover {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-3);\n", + "}\n", + "" + ] + }, + "metadata": {}, + "execution_count": 40 + } + ], + "source": [ + "# Building the model\n", + "rf_word2vec = RandomForestClassifier(n_estimators = 100, max_depth = 3, random_state = 42)\n", + "\n", + "\n", + "# Fitting on train data\n", + "rf_word2vec.fit(X_train_wv, y_train)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Checking Training and Test Performance**\n" + ], + "metadata": { + "id": "95O3167WbBnd" + }, + "id": "95O3167WbBnd" + }, + { + "cell_type": "code", + "source": [ + "# Predicting on train data\n", + "y_pred_train = rf_word2vec.predict(X_train_wv)\n", + "\n", + "# Predicting on test data\n", + "y_pred_test = rf_word2vec.predict(X_test_wv)" + ], + "metadata": { + "id": "TtQlY8DlzadF" + }, + "id": "TtQlY8DlzadF", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "**Confusion Matrix**" + ], + "metadata": { + "id": "ycl7jAX7cZuj" + }, + "id": "ycl7jAX7cZuj" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a_AW25srClm-", + "metadata": { + "id": "a_AW25srClm-", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "7c3d30ac-13eb-4053-ff9a-6718a9fbf3c7" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plot_confusion_matrix(y_train,y_pred_train)" + ] + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_test,y_pred_test)" + ], + "metadata": { + "id": "sp4-2sLEDcM3", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "6046c9f3-6602-421d-8738-7c000827c3a1" + }, + "id": "sp4-2sLEDcM3", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Classification Report**" + ], + "metadata": { + "id": "E1jLbrZAidAB" + }, + "id": "E1jLbrZAidAB" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8rV_bYhqClm_", + "metadata": { + "id": "8rV_bYhqClm_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "cb772720-3f55-4fe6-97c6-75f6fe7384e6" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.755043 0.755043 0.778891 0.720565\n" + ] + } + ], + "source": [ + "#Calculating different metrics on training data\n", + "rf_train_wv = model_performance_classification_sklearn(y_train,y_pred_train)\n", + "print(\"Training performance:\\n\", rf_train_wv)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "_AA2cSvzClm_", + "metadata": { + "id": "_AA2cSvzClm_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d063296f-2448-4515-f5c1-575671bcbd48" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Testing performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.746479 0.746479 0.687934 0.680114\n" + ] + } + ], + "source": [ + "#Calculating different metrics on test data\n", + "rf_test_wv = model_performance_classification_sklearn(y_test, y_pred_test)\n", + "print(\"Testing performance:\\n\",rf_test_wv)" + ] + }, + { + "cell_type": "markdown", + "id": "P2OnPdLRF2M9", + "metadata": { + "id": "P2OnPdLRF2M9" + }, + "source": [ + "* The model is slightly overfitting, as there is a little difference between its performance on the training set and the test set." + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Building a Random Forest Model using text embeddings obtained from the Sentence Transformer**" + ], + "metadata": { + "id": "uijWj2Nl2jyK" + }, + "id": "uijWj2Nl2jyK" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "04W4gkoZ2jyK", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 80 + }, + "outputId": "364a1ac8-b4b6-403a-85d4-133186c89aa9" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "RandomForestClassifier(max_depth=3, random_state=42)" + ], + "text/html": [ + "
RandomForestClassifier(max_depth=3, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ] + }, + "metadata": {}, + "execution_count": 46 + } + ], + "source": [ + "# Building the model\n", + "rf_st = RandomForestClassifier(n_estimators = 100, max_depth = 3, random_state = 42)\n", + "\n", + "\n", + "# Fitting on train data\n", + "rf_st.fit(X_train_st, y_train)" + ], + "id": "04W4gkoZ2jyK" + }, + { + "cell_type": "markdown", + "source": [ + "#### **Checking Training and Test Performance**" + ], + "metadata": { + "id": "BTWSvJfC2jyL" + }, + "id": "BTWSvJfC2jyL" + }, + { + "cell_type": "code", + "source": [ + "# Predicting on train data\n", + "y_pred_train = rf_st.predict(X_train_st)\n", + "\n", + "# Predicting on test data\n", + "y_pred_test = rf_st.predict(X_test_st)" + ], + "metadata": { + "id": "QPI_ePlJ2jyL" + }, + "execution_count": null, + "outputs": [], + "id": "QPI_ePlJ2jyL" + }, + { + "cell_type": "markdown", + "source": [ + "**Confusion Matrix**" + ], + "metadata": { + "id": "vskhvTGm2jyL" + }, + "id": "vskhvTGm2jyL" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9P_tYSn92jyM", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "f404f8af-0142-4662-a7eb-8281eb953a13" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plot_confusion_matrix(y_train,y_pred_train)" + ], + "id": "9P_tYSn92jyM" + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_test,y_pred_test)" + ], + "metadata": { + "id": "LBzzMFHJDolN", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "d4e2c9a6-e164-45ff-ef5a-5c70c78c1736" + }, + "id": "LBzzMFHJDolN", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Classification Report**" + ], + "metadata": { + "id": "sSvRSDit2jyM" + }, + "id": "sSvRSDit2jyM" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "_kEV9XZD2jyM", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "eb0311c5-dc0f-4aec-a189-a8358f68a7de" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.801153 0.801153 0.831835 0.775232\n" + ] + } + ], + "source": [ + "#Calculating different metrics on training data\n", + "rf_train_st = model_performance_classification_sklearn(y_train,y_pred_train)\n", + "print(\"Training performance:\\n\", rf_train_st)" + ], + "id": "_kEV9XZD2jyM" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "QoFxAES32jyM", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "74f810ea-08e2-4c99-a9c2-9187bfbd2357" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Testing performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.71831 0.71831 0.551745 0.624105\n" + ] + } + ], + "source": [ + "#Calculating different metrics on test data\n", + "rf_test_st = model_performance_classification_sklearn(y_test, y_pred_test)\n", + "print(\"Testing performance:\\n\",rf_test_st)" + ], + "id": "QoFxAES32jyM" + }, + { + "cell_type": "markdown", + "id": "ZmPPcdrHE9K2", + "metadata": { + "id": "ZmPPcdrHE9K2" + }, + "source": [ + "* The model is highly overfitting, as there is a significant difference between its performance on the training set and the test set." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DHgj_cCm2pIn" + }, + "source": [ + "### **Building Neural Network Models using different text embeddings**" + ], + "id": "DHgj_cCm2pIn" + }, + { + "cell_type": "markdown", + "source": [ + "#### **Building a Neural Network Model using text embeddings obtained from the Word2Vec**" + ], + "metadata": { + "id": "LpasFYQriueC" + }, + "id": "LpasFYQriueC" + }, + { + "cell_type": "code", + "source": [ + "# Convert the labels\n", + "label_mapping = {1: 2, -1: 0, 0: 1}\n", + "y_train_mapped_wv = [label_mapping[label] for label in y_train]\n", + "y_test_mapped_wv = [label_mapping[label] for label in y_test]\n", + "\n", + "# Convert your features DataFrame to a NumPy array\n", + "X_train_wv_np = np.array(X_train_wv)\n", + "X_test_wv_np = np.array(X_test_wv)\n", + "y_train_mapped_wv = np.array(y_train_mapped_wv)\n", + "y_test_mapped_wv = np.array(y_test_mapped_wv)" + ], + "metadata": { + "id": "xIeKB-P4nYFi" + }, + "id": "xIeKB-P4nYFi", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "import gc\n", + "\n", + "# Clear previous sessions\n", + "tf.keras.backend.clear_session()\n", + "gc.collect()\n", + "\n", + "# Model definition\n", + "model = Sequential()\n", + "model.add(Dense(128, activation='relu', input_shape=(X_train_wv_np.shape[1],))) # Use the shape of the Word2Vec embeddings\n", + "model.add(Dropout(0.3))\n", + "model.add(Dense(64, activation='relu'))\n", + "model.add(Dense(3, activation='softmax')) # 3 output classes\n", + "\n", + "# Compile\n", + "model.compile(optimizer='adam',loss='sparse_categorical_crossentropy', metrics =['accuracy'])\n", + "\n", + "# Summary\n", + "model.summary()" + ], + "metadata": { + "id": "pPoM2BhyXvBv", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 257 + }, + "outputId": "46ff0a8e-280d-40a4-df68-ce369e6c8029" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1mModel: \"sequential\"\u001b[0m\n" + ], + "text/html": [ + "
Model: \"sequential\"\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n", + "│ dense (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m38,528\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dropout (\u001b[38;5;33mDropout\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dense_1 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m64\u001b[0m) │ \u001b[38;5;34m8,256\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dense_2 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m3\u001b[0m) │ \u001b[38;5;34m195\u001b[0m │\n", + "└─────────────────────────────────┴────────────────────────┴───────────────┘\n" + ], + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n",
+              "┃ Layer (type)                     Output Shape                  Param # ┃\n",
+              "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n",
+              "│ dense (Dense)                   │ (None, 128)            │        38,528 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dropout (Dropout)               │ (None, 128)            │             0 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dense_1 (Dense)                 │ (None, 64)             │         8,256 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dense_2 (Dense)                 │ (None, 3)              │           195 │\n",
+              "└─────────────────────────────────┴────────────────────────┴───────────────┘\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Total params: \u001b[0m\u001b[38;5;34m46,979\u001b[0m (183.51 KB)\n" + ], + "text/html": [ + "
 Total params: 46,979 (183.51 KB)\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m46,979\u001b[0m (183.51 KB)\n" + ], + "text/html": [ + "
 Trainable params: 46,979 (183.51 KB)\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n" + ], + "text/html": [ + "
 Non-trainable params: 0 (0.00 B)\n",
+              "
\n" + ] + }, + "metadata": {} + } + ], + "id": "pPoM2BhyXvBv" + }, + { + "cell_type": "markdown", + "source": [ + "**Note:**\n", + "- During training, we use accuracy as a metric to monitor how well the model is learning to distinguish between classes in each batch.\n", + "- Accuracy is fast and reliable during training and gives us a quick view of model progress.\n", + "- It reflects how often the model is predicting the correct label out of all predictions made.\n", + "\n" + ], + "metadata": { + "id": "kIxFfSYLQNlT" + }, + "id": "kIxFfSYLQNlT" + }, + { + "cell_type": "code", + "source": [ + "# Fitting the model\n", + "history = model.fit(\n", + " X_train_wv_np, y_train_mapped_wv,\n", + " epochs=10,\n", + " batch_size=32\n", + ")" + ], + "metadata": { + "id": "bgHeOMfpnobV", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "6b648ba6-870f-4dfc-a2a7-b38a97c50cef" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Epoch 1/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 5ms/step - accuracy: 0.5349 - loss: 0.9062\n", + "Epoch 2/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 4ms/step - accuracy: 0.6190 - loss: 0.7523 \n", + "Epoch 3/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.5989 - loss: 0.7214 \n", + "Epoch 4/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.5874 - loss: 0.7687 \n", + "Epoch 5/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.6741 - loss: 0.7038 \n", + "Epoch 6/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.6298 - loss: 0.7276 \n", + "Epoch 7/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.6655 - loss: 0.7134 \n", + "Epoch 8/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.6025 - loss: 0.7213 \n", + "Epoch 9/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 6ms/step - accuracy: 0.6321 - loss: 0.7183 \n", + "Epoch 10/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.6456 - loss: 0.7322 \n" + ] + } + ], + "id": "bgHeOMfpnobV" + }, + { + "cell_type": "markdown", + "source": [ + "#### **Checking Training and Test Performance**" + ], + "metadata": { + "id": "IX11-Hmx8_E1" + }, + "id": "IX11-Hmx8_E1" + }, + { + "cell_type": "code", + "source": [ + "# Predict class probabilities on training data\n", + "y_train_pred_probs = model.predict(X_train_wv_np)\n", + "\n", + "# Convert probabilities to class labels\n", + "y_train_preds_wv = tf.argmax(y_train_pred_probs, axis=1).numpy()" + ], + "metadata": { + "id": "ZpEpHWni87cO", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "156eca77-fc90-4df0-87ac-daf08655f0b6" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 6ms/step \n" + ] + } + ], + "id": "ZpEpHWni87cO" + }, + { + "cell_type": "code", + "source": [ + "# Predict class probabilities on test data\n", + "y_test_pred_probs = model.predict(X_test_wv_np)\n", + "\n", + "# Convert probabilities to class labels\n", + "y_test_preds_wv = tf.argmax(y_test_pred_probs, axis=1).numpy()" + ], + "metadata": { + "id": "hBMMkZBk9Jkz", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d66d1a0d-3bb4-4fa3-d68e-ded8a15c2696" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[1m3/3\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step\n" + ] + } + ], + "id": "hBMMkZBk9Jkz" + }, + { + "cell_type": "code", + "source": [ + "# Convert back to [-1, 0, 1] to match utility function expectations\n", + "label_mapping = {2: 1, 0: -1, 1: 0}\n", + "y_train_preds_wv = np.array([label_mapping[index] for index in y_train_preds_wv])\n", + "y_test_preds_wv = np.array([label_mapping[index] for index in y_test_preds_wv])" + ], + "metadata": { + "id": "wCPqMh0nwryB" + }, + "id": "wCPqMh0nwryB", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "**Confusion Matrix**" + ], + "metadata": { + "id": "Jbeyf8dzk3MP" + }, + "id": "Jbeyf8dzk3MP" + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_train, y_train_preds_wv)" + ], + "metadata": { + "id": "lIh2fXcwxJ0G", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "711755dc-7004-4b9d-cb30-b06839893f72" + }, + "id": "lIh2fXcwxJ0G", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_test, y_test_preds_wv)" + ], + "metadata": { + "id": "djUVsYwYYBJd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "b02647ff-0825-4c59-8c4b-bdc26cc7a9e1" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "id": "djUVsYwYYBJd" + }, + { + "cell_type": "markdown", + "source": [ + "**Classification Report**" + ], + "metadata": { + "id": "1NqSOfNd1UmS" + }, + "id": "1NqSOfNd1UmS" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5qzE4NHS1UmS", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "68bd5ff5-8663-49d2-cf32-0907384a849a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.636888 0.636888 0.664626 0.516128\n" + ] + } + ], + "source": [ + "#Calculating different metrics on training data\n", + "NN_train_wv = model_performance_classification_sklearn(y_train,y_train_preds_wv)\n", + "print(\"Training performance:\\n\", NN_train_wv)" + ], + "id": "5qzE4NHS1UmS" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4Nr34HI31UmT", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b07a23fe-a40e-4497-8df2-05e18239480e" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Testing performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.760563 0.760563 0.804628 0.669911\n" + ] + } + ], + "source": [ + "#Calculating different metrics on test data\n", + "NN_test_wv = model_performance_classification_sklearn(y_test, y_test_preds_wv)\n", + "print(\"Testing performance:\\n\",NN_test_wv)" + ], + "id": "4Nr34HI31UmT" + }, + { + "cell_type": "markdown", + "source": [ + "#### **Building a Neural Network Model using text embeddings obtained from the Sentence Transformer**" + ], + "metadata": { + "id": "bcXtMsPu3JfI" + }, + "id": "bcXtMsPu3JfI" + }, + { + "cell_type": "code", + "source": [ + "# Convert the labels\n", + "label_mapping = {1: 2, -1: 0, 0: 1}\n", + "y_train_mapped_st = [label_mapping[label] for label in y_train]\n", + "y_test_mapped_st = [label_mapping[label] for label in y_test]\n", + "\n", + "# Convert your features DataFrame to a NumPy array\n", + "X_train_st_np = np.array(X_train_st)\n", + "X_test_st_np = np.array(X_test_st)\n", + "y_train_mapped_st = np.array(y_train_mapped_st)\n", + "y_test_mapped_st = np.array(y_test_mapped_st)" + ], + "metadata": { + "id": "FUfjCAua4A2-" + }, + "execution_count": null, + "outputs": [], + "id": "FUfjCAua4A2-" + }, + { + "cell_type": "code", + "source": [ + "import gc\n", + "\n", + "# Clear previous sessions\n", + "tf.keras.backend.clear_session()\n", + "gc.collect()\n", + "\n", + "# Define the model\n", + "model = Sequential()\n", + "model.add(Dense(128, activation='relu', input_shape=(X_train_st.shape[1],)))\n", + "model.add(Dropout(0.3))\n", + "model.add(Dense(64, activation='relu'))\n", + "model.add(Dense(3, activation='softmax')) # 3 classes (positive, negative, neutral)\n", + "\n", + "# Compile the model\n", + "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['Accuracy'])\n", + "\n", + "# Summary\n", + "model.summary()" + ], + "metadata": { + "id": "ziE6DVHA4A2-", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 257 + }, + "outputId": "55b5e48b-d3dd-4986-ce74-027c5f902891" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1mModel: \"sequential\"\u001b[0m\n" + ], + "text/html": [ + "
Model: \"sequential\"\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n", + "│ dense (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m49,280\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dropout (\u001b[38;5;33mDropout\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dense_1 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m64\u001b[0m) │ \u001b[38;5;34m8,256\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dense_2 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m3\u001b[0m) │ \u001b[38;5;34m195\u001b[0m │\n", + "└─────────────────────────────────┴────────────────────────┴───────────────┘\n" + ], + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n",
+              "┃ Layer (type)                     Output Shape                  Param # ┃\n",
+              "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n",
+              "│ dense (Dense)                   │ (None, 128)            │        49,280 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dropout (Dropout)               │ (None, 128)            │             0 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dense_1 (Dense)                 │ (None, 64)             │         8,256 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dense_2 (Dense)                 │ (None, 3)              │           195 │\n",
+              "└─────────────────────────────────┴────────────────────────┴───────────────┘\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Total params: \u001b[0m\u001b[38;5;34m57,731\u001b[0m (225.51 KB)\n" + ], + "text/html": [ + "
 Total params: 57,731 (225.51 KB)\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m57,731\u001b[0m (225.51 KB)\n" + ], + "text/html": [ + "
 Trainable params: 57,731 (225.51 KB)\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n" + ], + "text/html": [ + "
 Non-trainable params: 0 (0.00 B)\n",
+              "
\n" + ] + }, + "metadata": {} + } + ], + "id": "ziE6DVHA4A2-" + }, + { + "cell_type": "code", + "source": [ + "# Fitting the model\n", + "history = model.fit(\n", + " X_train_st_np, y_train_mapped_st,\n", + " epochs=15,\n", + " batch_size=32\n", + ")" + ], + "metadata": { + "id": "8J-JncGj4A2_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "3faf37bb-8f28-4662-8b16-ca054230d4cd" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Epoch 1/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 5ms/step - Accuracy: 0.6300 - loss: 1.0422\n", + "Epoch 2/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.6496 - loss: 0.8380 \n", + "Epoch 3/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.6464 - loss: 0.7074 \n", + "Epoch 4/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.6099 - loss: 0.7004 \n", + "Epoch 5/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.6664 - loss: 0.6625 \n", + "Epoch 6/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.6915 - loss: 0.6035 \n", + "Epoch 7/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.7356 - loss: 0.5585 \n", + "Epoch 8/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.7714 - loss: 0.5521 \n", + "Epoch 9/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 7ms/step - Accuracy: 0.7655 - loss: 0.5158 \n", + "Epoch 10/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 9ms/step - Accuracy: 0.7861 - loss: 0.4997 \n", + "Epoch 11/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - Accuracy: 0.8121 - loss: 0.4551\n", + "Epoch 12/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 8ms/step - Accuracy: 0.7897 - loss: 0.4756 \n", + "Epoch 13/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 8ms/step - Accuracy: 0.8454 - loss: 0.4200 \n", + "Epoch 14/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 7ms/step - Accuracy: 0.8362 - loss: 0.4135 \n", + "Epoch 15/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 7ms/step - Accuracy: 0.8560 - loss: 0.3473 \n" + ] + } + ], + "id": "8J-JncGj4A2_" + }, + { + "cell_type": "markdown", + "source": [ + "#### **Checking Training and Test Performance**" + ], + "metadata": { + "id": "rbsZ24gM4A2_" + }, + "id": "rbsZ24gM4A2_" + }, + { + "cell_type": "code", + "source": [ + "# Predict class probabilities on training data\n", + "y_train_pred_probs = model.predict(X_train_st_np)\n", + "\n", + "# Convert probabilities to class labels\n", + "y_train_preds_st = tf.argmax(y_train_pred_probs, axis=1).numpy()" + ], + "metadata": { + "id": "xaWGws3r4A2_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c3f459d7-9745-44f3-bfd6-0df051411294" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step\n" + ] + } + ], + "id": "xaWGws3r4A2_" + }, + { + "cell_type": "code", + "source": [ + "# Predict class probabilities on test data\n", + "y_test_pred_probs = model.predict(X_test_st_np)\n", + "\n", + "# Convert probabilities to class labels\n", + "y_test_preds_st = tf.argmax(y_test_pred_probs, axis=1).numpy()" + ], + "metadata": { + "id": "P8yF-MWH4A2_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0bfe417b-bb4a-446f-8ee0-7fde7e16f62a" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[1m3/3\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 14ms/step\n" + ] + } + ], + "id": "P8yF-MWH4A2_" + }, + { + "cell_type": "code", + "source": [ + "# Convert back to [-1, 0, 1] to match utility function expectations\n", + "label_mapping = {2: 1, 0: -1, 1: 0}\n", + "y_train_preds_st = np.array([label_mapping[index] for index in y_train_preds_st])\n", + "y_test_preds_st = np.array([label_mapping[index] for index in y_test_preds_st])" + ], + "metadata": { + "id": "YbwmP-dE4A3A" + }, + "execution_count": null, + "outputs": [], + "id": "YbwmP-dE4A3A" + }, + { + "cell_type": "markdown", + "source": [ + "**Confusion Matrix**" + ], + "metadata": { + "id": "YoVyydZW4A3A" + }, + "id": "YoVyydZW4A3A" + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_train, y_train_preds_st)" + ], + "metadata": { + "id": "I2yC2oAB4A3A", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "c678cd56-9ae7-4a9c-becf-e1dfa09973d1" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "id": "I2yC2oAB4A3A" + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_test, y_test_preds_st)" + ], + "metadata": { + "collapsed": true, + "id": "mZLu8fRk4A3A", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "e96330e8-96aa-476e-b47d-e4d5bcc6a561" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "id": "mZLu8fRk4A3A" + }, + { + "cell_type": "markdown", + "source": [ + "**Classification Report**" + ], + "metadata": { + "id": "eDqgpX_a4A3B" + }, + "id": "eDqgpX_a4A3B" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Jq_ES16g4A3B", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "95db0a8b-3659-4319-ecab-82c051217d1f" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.878963 0.878963 0.864731 0.871679\n" + ] + } + ], + "source": [ + "#Calculating different metrics on training data\n", + "NN_train_st = model_performance_classification_sklearn(y_train,y_train_preds_st)\n", + "print(\"Training performance:\\n\", NN_train_st)" + ], + "id": "Jq_ES16g4A3B" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7MUEidM44A3B", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b3ebc9fb-7278-4621-cb18-5f53190b284a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Testing performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.788732 0.788732 0.780908 0.784708\n" + ] + } + ], + "source": [ + "#Calculating different metrics on test data\n", + "NN_test_st = model_performance_classification_sklearn(y_test, y_test_preds_st)\n", + "print(\"Testing performance:\\n\",NN_test_st)" + ], + "id": "7MUEidM44A3B" + }, + { + "cell_type": "markdown", + "id": "gsmrYpkrFY2A", + "metadata": { + "id": "gsmrYpkrFY2A" + }, + "source": [ + "### **Model Performance Summary and Final Model Selection**" + ] + }, + { + "cell_type": "code", + "source": [ + "# Concatenate the training performance metrics from different models into a single DataFrame\n", + "models_train_comp_df = pd.concat(\n", + " [\n", + " rf_train_wv.T, # Random Forest using Word2Vec embeddings\n", + " NN_train_wv.T, # Neural Network using Word2Vec embeddings\n", + " rf_train_st.T, # Random Forest using Sentence Transformer embeddings\n", + " NN_train_st.T # Neural Network using Sentence Transformer embeddings\n", + " ],\n", + " axis=1 # Concatenate along columns (i.e., each model's metrics form one column)\n", + ")\n", + "\n", + "# Assigning meaningful column names for each model for clarity in the output DataFrame\n", + "models_train_comp_df.columns = [\n", + " \"Word2Vec (Random Forest)\",\n", + " \"Word2Vec (Neural Network)\",\n", + " \"Sentence Transformer (Random Forest)\",\n", + " \"Sentence Transformer (Neural Network)\"\n", + "]\n", + "\n", + "# Print the training performance comparison table\n", + "print(\"Training performance comparison:\")\n", + "models_train_comp_df" + ], + "metadata": { + "id": "FmgvAlKBWjR-", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 192 + }, + "outputId": "68dd172d-e821-4c30-82e3-8df928e36a4c" + }, + "id": "FmgvAlKBWjR-", + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training performance comparison:\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Word2Vec (Random Forest) Word2Vec (Neural Network) \\\n", + "Accuracy 0.755043 0.636888 \n", + "Recall 0.755043 0.636888 \n", + "Precision 0.778891 0.664626 \n", + "F1 0.720565 0.516128 \n", + "\n", + " Sentence Transformer (Random Forest) \\\n", + "Accuracy 0.801153 \n", + "Recall 0.801153 \n", + "Precision 0.831835 \n", + "F1 0.775232 \n", + "\n", + " Sentence Transformer (Neural Network) \n", + "Accuracy 0.878963 \n", + "Recall 0.878963 \n", + "Precision 0.864731 \n", + "F1 0.871679 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Word2Vec (Random Forest)Word2Vec (Neural Network)Sentence Transformer (Random Forest)Sentence Transformer (Neural Network)
Accuracy0.7550430.6368880.8011530.878963
Recall0.7550430.6368880.8011530.878963
Precision0.7788910.6646260.8318350.864731
F10.7205650.5161280.7752320.871679
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "models_train_comp_df", + "summary": "{\n \"name\": \"models_train_comp_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"Word2Vec (Random Forest)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.02400850605954694,\n \"min\": 0.720564904885155,\n \"max\": 0.7788911179618219,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.7550432276657061,\n 0.7788911179618219,\n 0.720564904885155\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Word2Vec (Neural Network)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.06630504860294313,\n \"min\": 0.516128150662598,\n \"max\": 0.6646264228575315,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.6368876080691642,\n 0.6646264228575315,\n 0.516128150662598\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentence Transformer (Random Forest)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.02314897229764395,\n \"min\": 0.7752322113738629,\n \"max\": 0.8318353116624009,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.8011527377521613,\n 0.8318353116624009,\n 0.7752322113738629\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentence Transformer (Neural Network)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.006828124944117901,\n \"min\": 0.8647306583906008,\n \"max\": 0.8789625360230547,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.8789625360230547,\n 0.8647306583906008,\n 0.8716785041639248\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 72 + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Concatenate the testing performance metrics from different models into a single DataFrame\n", + "models_test_comp_df = pd.concat(\n", + " [\n", + " rf_test_wv.T, # Random Forest using Word2Vec embeddings\n", + " NN_test_wv.T, # Neural Network using Word2Vec embeddings\n", + " rf_test_st.T, # Random Forest using Sentence Transformer embeddings\n", + " NN_test_st.T # Neural Network using Sentence Transformer embeddings\n", + " ],\n", + " axis=1 # Concatenate along columns so each model's test metrics appear as one column\n", + ")\n", + "\n", + "# Set descriptive column names for clarity in the resulting comparison table\n", + "models_test_comp_df.columns = [\n", + " \"Word2Vec (Random Forest)\",\n", + " \"Word2Vec (Neural Network)\",\n", + " \"Sentence Transformer (Random Forest)\",\n", + " \"Sentence Transformer (Neural Network)\"\n", + "]\n", + "\n", + "# Print the testing performance comparison table\n", + "print(\"Testing performance comparison:\")\n", + "models_test_comp_df" + ], + "metadata": { + "id": "APzbgeHrWjOj", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 192 + }, + "outputId": "2095fa14-43c0-42dc-e342-c8229f7750b9" + }, + "id": "APzbgeHrWjOj", + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Testing performance comparison:\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Word2Vec (Random Forest) Word2Vec (Neural Network) \\\n", + "Accuracy 0.746479 0.760563 \n", + "Recall 0.746479 0.760563 \n", + "Precision 0.687934 0.804628 \n", + "F1 0.680114 0.669911 \n", + "\n", + " Sentence Transformer (Random Forest) \\\n", + "Accuracy 0.718310 \n", + "Recall 0.718310 \n", + "Precision 0.551745 \n", + "F1 0.624105 \n", + "\n", + " Sentence Transformer (Neural Network) \n", + "Accuracy 0.788732 \n", + "Recall 0.788732 \n", + "Precision 0.780908 \n", + "F1 0.784708 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Word2Vec (Random Forest)Word2Vec (Neural Network)Sentence Transformer (Random Forest)Sentence Transformer (Neural Network)
Accuracy0.7464790.7605630.7183100.788732
Recall0.7464790.7605630.7183100.788732
Precision0.6879340.8046280.5517450.780908
F10.6801140.6699110.6241050.784708
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "models_test_comp_df", + "summary": "{\n \"name\": \"models_test_comp_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"Word2Vec (Random Forest)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.03619949158384784,\n \"min\": 0.6801140174379611,\n \"max\": 0.7464788732394366,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.7464788732394366,\n 0.687933571578726,\n 0.6801140174379611\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Word2Vec (Neural Network)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.05661832323426673,\n \"min\": 0.6699110653078362,\n \"max\": 0.8046277665995976,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.7605633802816901,\n 0.8046277665995976,\n 0.6699110653078362\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentence Transformer (Random Forest)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.08086640854457602,\n \"min\": 0.5517452541334966,\n \"max\": 0.7183098591549296,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.7183098591549296,\n 0.5517452541334966,\n 0.6241052874624798\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentence Transformer (Neural Network)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.0037559350807984376,\n \"min\": 0.7809076682316118,\n \"max\": 0.7887323943661971,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.7887323943661971,\n 0.7809076682316118,\n 0.7847082494969819\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 73 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Model Performance Summary:**" + ], + "metadata": { + "id": "X0yz_T4j6uJc" + }, + "id": "X0yz_T4j6uJc" + }, + { + "cell_type": "markdown", + "source": [ + " **Model Selection: Sentence Transformer + Neural Network**\n", + "\n", + "**Rationale:**\n", + "\n", + "1. **Best Generalization**:\n", + " The Sentence Transformer + Neural Network model achieves the highest F1 score on the test set (0.788), indicating strong generalization and better handling of both precision and recall on unseen data.\n", + "\n", + "2. **Balanced Performance**:\n", + " Training F1 = 0.87 and testing F1 = 0.78 show a minimal gap, meaning the model learned meaningful representations without significant overfitting.\n", + "\n", + "3. **Superior Feature Encoding**:\n", + " Sentence Transformers capture semantic meaning more effectively than Word2Vec, which explains the performance boost across both Random Forest and Neural Network setups.\n", + "\n", + "4. **Neural Network Suitability**:\n", + " While Word2Vec + NN struggles due to sparse and less informative vectors, combining powerful embeddings (Sentence Transformer) with flexible learning (NN) achieves the best synergy.\n", + "\n", + "##### **Why Other Models Were Not Chosen?**\n", + "\n", + "* **Word2Vec + RF**:\n", + " Training F1 = 0.730, Test F1 = 0.685. Although the gap is small (shows some stability), the absolute test performance is lower than Sentence Transformer + NN.\n", + "\n", + "* **Word2Vec + NN**:\n", + " Low performance in both training (F1 = 0.516) and testing (F1 = 0.67) indicates underfitting and ineffective learning due to weak input representations.\n", + "\n", + "* **Sentence Transformer + RF**:\n", + " Strong training F1 = 0.775, but test F1 = 0.624 is slightly lower than the NN version, suggesting mild overfitting and less flexibility in modeling complex patterns." + ], + "metadata": { + "id": "wI7woP0xHrwW" + }, + "id": "wI7woP0xHrwW" + }, + { + "cell_type": "markdown", + "id": "HiOLoD7BO3L-", + "metadata": { + "id": "HiOLoD7BO3L-" + }, + "source": [ + "## **Conclusions and Recommendations**" + ] + }, + { + "cell_type": "markdown", + "source": [ + "* The daily opening, high, low, and closing prices of the stock exhibit similar distributions individually, when compared across different sentiment polarities, and negative sentiment news resulted in a lower value for each price.\n", + "\n", + "* The minimum variation also resulted in the prices exhibiting perfect correlation amongst them, while exhibiting a very low negative correlation with volume, which might be due to selling pressure during periods of negative sentiment.\n", + "\n", + "* The stock price gradually increased over time from ~40 to ~50 in the period for which the data is available while exhibiting a monthly trend.\n", + "\n", + "* We predicted the sentiment of market news by encoding them via different ML models.\n", + "\n", + "* The models largely overfit the data, with only **the Sentence Transformer + Neural Network model** yielding comparatively better performance than the others (train F1 = 0.876, test F1 = 0.788).\n", + "\n", + " * The predominance of neutral news also suggests a cautious market sentiment in this period. As such, a wider period should be considered for data collection to ensure volume and diversity in news sentiment polarities.\n", + "\n", + "* Integrating real-time sentiment analysis systems can allow financial analysts to make informed decisions and quickly respond to market sentiment changes to optimize investment strategies.\n", + "\n", + "* One can explore combining news sentiments with technical and fundamental indicators of the stock and introduce data of other similar stocks for a more comprehensive market analysis." + ], + "metadata": { + "id": "NMR7mKugPFme" + }, + "id": "NMR7mKugPFme" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mQvaNDqQ3BJa" + }, + "source": [ + " Power Ahead \n", + "___" + ], + "id": "mQvaNDqQ3BJa" + } + ], + "metadata": { + "colab": { + "provenance": [], + "include_colab_link": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "1230a037e0b9479caa9db62c5f9ecb6a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ed6c19298c4747a59992a79d99cdaaa7", + "IPY_MODEL_e010222da3cf4751995a51ffc82560ef", + "IPY_MODEL_9f3e3b616bcf482d9fd91a2b54d8d82a" + ], + "layout": "IPY_MODEL_6838e428d6d54a3f80d34638812441e6" + } + }, + "ed6c19298c4747a59992a79d99cdaaa7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_991c2589b56f444486443a31bef569d5", + "placeholder": "​", + "style": "IPY_MODEL_4ed01d32996f47f38fbaba687cee45ae", + "value": "Batches: 100%" + } + }, + "e010222da3cf4751995a51ffc82560ef": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a0ce999dbcfe427ba08202bc989b1c33", + "max": 11, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_f598184dc72f443ab0ada8de6cf076ad", + "value": 11 + } + }, + "9f3e3b616bcf482d9fd91a2b54d8d82a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_96e9e320eec74a2e9094935af065b254", + "placeholder": "​", + "style": "IPY_MODEL_fb854fb10f3e415c9c4c0ac176fb74b4", + "value": " 11/11 [00:44<00:00,  3.41s/it]" + } + }, + "6838e428d6d54a3f80d34638812441e6": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "991c2589b56f444486443a31bef569d5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4ed01d32996f47f38fbaba687cee45ae": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a0ce999dbcfe427ba08202bc989b1c33": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f598184dc72f443ab0ada8de6cf076ad": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "96e9e320eec74a2e9094935af065b254": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fb854fb10f3e415c9c4c0ac176fb74b4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2fb4071397a049f888159e2cbec3ec99": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_280899c6e305423a8d6f20dd395b4e10", + "IPY_MODEL_f68b5d3640c54560b38a29f32deb33a8", + "IPY_MODEL_115335a31d874aba99efb63fa2830e09" + ], + "layout": "IPY_MODEL_7b371d0574e04f98bf87a88f722b8477" + } + }, + "280899c6e305423a8d6f20dd395b4e10": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9095b2b09d4a45928fbc3cf45eb35cbb", + "placeholder": "​", + "style": "IPY_MODEL_971a53d397494d76b8b5c4a2abb954f7", + "value": "Batches: 100%" + } + }, + "f68b5d3640c54560b38a29f32deb33a8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_54dd267783314434a5389477c97974e5", + "max": 3, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_5bfd23c3586e4615909878610be8e24b", + "value": 3 + } + }, + "115335a31d874aba99efb63fa2830e09": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4eda58c3e66e40db98ea40fc40ebb109", + "placeholder": "​", + "style": "IPY_MODEL_99ee6edbe0574c778200ae65b87d7e0f", + "value": " 3/3 [00:09<00:00,  2.81s/it]" + } + }, + "7b371d0574e04f98bf87a88f722b8477": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9095b2b09d4a45928fbc3cf45eb35cbb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "971a53d397494d76b8b5c4a2abb954f7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "54dd267783314434a5389477c97974e5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5bfd23c3586e4615909878610be8e24b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4eda58c3e66e40db98ea40fc40ebb109": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "99ee6edbe0574c778200ae65b87d7e0f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + } + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file From 2130bd15977f139a23c4203b60ec52326ff92e5c Mon Sep 17 00:00:00 2001 From: biplob <110578485+bks1984@users.noreply.github.com> Date: Mon, 10 Nov 2025 08:55:34 +0530 Subject: [PATCH 3/4] Created using Colab --- stock_market__news_sentiment_analysisf.ipynb | 6747 ++++++++++++++++++ 1 file changed, 6747 insertions(+) create mode 100644 stock_market__news_sentiment_analysisf.ipynb diff --git a/stock_market__news_sentiment_analysisf.ipynb b/stock_market__news_sentiment_analysisf.ipynb new file mode 100644 index 0000000..14376df --- /dev/null +++ b/stock_market__news_sentiment_analysisf.ipynb @@ -0,0 +1,6747 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "id": "inNE1fy-ISPj", + "metadata": { + "id": "inNE1fy-ISPj" + }, + "source": [ + "

\n", + " \n", + "

\n", + "\n", + "
Stock Market News Sentiment Analysis
" + ] + }, + { + "cell_type": "markdown", + "id": "EvCcfwuSU-fz", + "metadata": { + "id": "EvCcfwuSU-fz" + }, + "source": [ + "## **Problem Statement**" + ] + }, + { + "cell_type": "markdown", + "id": "6QR_RHvIVHT2", + "metadata": { + "id": "6QR_RHvIVHT2" + }, + "source": [ + "### Business Context" + ] + }, + { + "cell_type": "markdown", + "id": "pl3dmH-EnJGl", + "metadata": { + "id": "pl3dmH-EnJGl" + }, + "source": [ + "The prices of the stocks of companies listed under a global exchange are influenced by a variety of factors, with the company's financial performance, innovations and collaborations, and market sentiment being factors that play a significant role. News and media reports can rapidly affect investor perceptions and, consequently, stock prices in the highly competitive financial industry. With the sheer volume of news and opinions from a wide variety of sources, investors and financial analysts often struggle to stay updated and accurately interpret its impact on the market. As a result, investment firms need sophisticated tools to analyze market sentiment and integrate this information into their investment strategies." + ] + }, + { + "cell_type": "markdown", + "id": "Vn6bbxSwVKl3", + "metadata": { + "id": "Vn6bbxSwVKl3" + }, + "source": [ + "### Problem Definition" + ] + }, + { + "cell_type": "markdown", + "id": "jCIswL3zobj6", + "metadata": { + "id": "jCIswL3zobj6" + }, + "source": [ + "With an ever-rising number of news articles and opinions, an investment startup aims to leverage artificial intelligence to address the challenge of interpreting stock-related news and its impact on stock prices. They have collected historical daily news for a specific company listed under NASDAQ, along with data on its daily stock price and trade volumes.\n", + "\n", + "As a member of the Data Science and AI team in the startup, you have been tasked with developing an AI-driven sentiment analysis system that will automatically process and analyze news articles to gauge market sentiment, and summarizing the news at a weekly level to enhance the accuracy of their stock price predictions and optimize investment strategies. This will empower their financial analysts with actionable insights, leading to more informed investment decisions and improved client outcomes." + ] + }, + { + "cell_type": "markdown", + "id": "ZJOtDHVSF5hu", + "metadata": { + "id": "ZJOtDHVSF5hu" + }, + "source": [ + "### Data Dictionary" + ] + }, + { + "cell_type": "markdown", + "id": "ZlkjI8V5F9RK", + "metadata": { + "id": "ZlkjI8V5F9RK" + }, + "source": [ + "* `Date` : The date the news was released\n", + "* `News` : The content of news articles that could potentially affect the company's stock price\n", + "* `Open` : The stock price (in \\$) at the beginning of the day\n", + "* `High` : The highest stock price (in \\$) reached during the day\n", + "* `Low` : The lowest stock price (in \\$) reached during the day\n", + "* `Close` : The adjusted stock price (in \\$) at the end of the day\n", + "* `Volume` : The number of shares traded during the day\n", + "* `Label` : The sentiment polarity of the news content\n", + " * 1: positive\n", + " * 0: neutral\n", + " * -1: negative" + ] + }, + { + "cell_type": "markdown", + "id": "VrFQHcW5mYgv", + "metadata": { + "id": "VrFQHcW5mYgv" + }, + "source": [ + "## **Installing and Importing the necessary libraries**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "A-E2-iaumpo8", + "metadata": { + "id": "A-E2-iaumpo8", + "collapsed": true, + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "5f12e599-de14-4e2b-adb4-de180cdd4fba" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Requirement already satisfied: numpy==1.26.4 in /usr/local/lib/python3.12/dist-packages (1.26.4)\n", + "Requirement already satisfied: scikit-learn==1.6.1 in /usr/local/lib/python3.12/dist-packages (1.6.1)\n", + "Requirement already satisfied: scipy==1.13.1 in /usr/local/lib/python3.12/dist-packages (1.13.1)\n", + "Requirement already satisfied: gensim==4.3.3 in /usr/local/lib/python3.12/dist-packages (4.3.3)\n", + "Requirement already satisfied: sentence-transformers==3.4.1 in /usr/local/lib/python3.12/dist-packages (3.4.1)\n", + "Requirement already satisfied: pandas==2.2.2 in /usr/local/lib/python3.12/dist-packages (2.2.2)\n", + "Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn==1.6.1) (1.5.2)\n", + "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn==1.6.1) (3.6.0)\n", + "Requirement already satisfied: smart-open>=1.8.1 in /usr/local/lib/python3.12/dist-packages (from gensim==4.3.3) (7.3.1)\n", + "Requirement already satisfied: transformers<5.0.0,>=4.41.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers==3.4.1) (4.56.1)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from sentence-transformers==3.4.1) (4.67.1)\n", + "Requirement already satisfied: torch>=1.11.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers==3.4.1) (2.8.0+cu126)\n", + "Requirement already satisfied: huggingface-hub>=0.20.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers==3.4.1) (0.35.0)\n", + "Requirement already satisfied: Pillow in /usr/local/lib/python3.12/dist-packages (from sentence-transformers==3.4.1) (11.3.0)\n", + "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas==2.2.2) (2.9.0.post0)\n", + "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas==2.2.2) (2025.2)\n", + "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas==2.2.2) (2025.2)\n", + "Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (3.19.1)\n", + "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (2025.3.0)\n", + "Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (25.0)\n", + "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (6.0.2)\n", + "Requirement already satisfied: requests in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (2.32.4)\n", + "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (4.15.0)\n", + "Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (1.1.10)\n", + "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas==2.2.2) (1.17.0)\n", + "Requirement already satisfied: wrapt in /usr/local/lib/python3.12/dist-packages (from smart-open>=1.8.1->gensim==4.3.3) (1.17.3)\n", + "Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (75.2.0)\n", + "Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (1.13.3)\n", + "Requirement already satisfied: networkx in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (3.5)\n", + "Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (3.1.6)\n", + "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.77)\n", + "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.77)\n", + "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.80)\n", + "Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (9.10.2.21)\n", + "Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.4.1)\n", + "Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (11.3.0.4)\n", + "Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (10.3.7.77)\n", + "Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (11.7.1.2)\n", + "Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.5.4.2)\n", + "Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (0.7.1)\n", + "Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (2.27.3)\n", + "Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.77)\n", + "Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (12.6.85)\n", + "Requirement already satisfied: nvidia-cufile-cu12==1.11.1.6 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (1.11.1.6)\n", + "Requirement already satisfied: triton==3.4.0 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers==3.4.1) (3.4.0)\n", + "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers==3.4.1) (2024.11.6)\n", + "Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers==3.4.1) (0.22.0)\n", + "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers==3.4.1) (0.6.2)\n", + "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch>=1.11.0->sentence-transformers==3.4.1) (1.3.0)\n", + "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch>=1.11.0->sentence-transformers==3.4.1) (3.0.2)\n", + "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (3.4.3)\n", + "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (2.5.0)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers==3.4.1) (2025.8.3)\n" + ] + } + ], + "source": [ + "# installing the sentence-transformers and gensim libraries for word embeddings\n", + "!pip install numpy==1.26.4 \\\n", + " scikit-learn==1.6.1 \\\n", + " scipy==1.13.1 \\\n", + " gensim==4.3.3 \\\n", + " sentence-transformers==3.4.1 \\\n", + " pandas==2.2.2" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Note:\n", + "- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.\n", + "- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in this notebook." + ], + "metadata": { + "id": "Su4_EiqL5aIZ" + }, + "id": "Su4_EiqL5aIZ" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "179a2a45", + "metadata": { + "id": "179a2a45" + }, + "outputs": [], + "source": [ + "# To manipulate and analyze data\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "# To visualize data\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "\n", + "# To used time-related functions\n", + "import time\n", + "\n", + "# To build, tune, and evaluate ML models\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.metrics import confusion_matrix, accuracy_score, f1_score, precision_score, recall_score\n", + "\n", + "# To load/create word embeddings\n", + "from gensim.models import Word2Vec\n", + "\n", + "# To work with transformer models\n", + "import torch\n", + "from sentence_transformers import SentenceTransformer\n", + "\n", + "# Import TensorFlow and Keras for deep learning model building.\n", + "import tensorflow as tf\n", + "from tensorflow import keras\n", + "from tensorflow.keras.models import Sequential\n", + "from tensorflow.keras.layers import Dense, Dropout\n", + "\n", + "# To implement progress bar related functionalities\n", + "from tqdm import tqdm\n", + "tqdm.pandas()\n", + "\n", + "# To ignore unnecessary warnings\n", + "import warnings\n", + "warnings.filterwarnings('ignore')" + ] + }, + { + "cell_type": "markdown", + "id": "wQ46zPgumfjF", + "metadata": { + "id": "wQ46zPgumfjF" + }, + "source": [ + "## **Loading the Dataset**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "yu_7XbWQWma8", + "metadata": { + "id": "yu_7XbWQWma8" + }, + "outputs": [], + "source": [ + "# # uncomment and run the following code if Google Colab is being used and the dataset is in Google Drive\n", + "# from google.colab import drive\n", + "# drive.mount('/content/drive')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "62a33eef", + "metadata": { + "id": "62a33eef" + }, + "outputs": [], + "source": [ + "# Read the CSV file named 'stock_news' into a pandas DataFrame named 'stock'\n", + "stock_news = pd.read_csv(\"/content/02. Dataset - stock_news.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1xFSwCCer1uA", + "metadata": { + "id": "1xFSwCCer1uA" + }, + "outputs": [], + "source": [ + "#Creating a copy of the dataset\n", + "stock = stock_news.copy()" + ] + }, + { + "cell_type": "markdown", + "id": "EvFNfrvGWthn", + "metadata": { + "id": "EvFNfrvGWthn" + }, + "source": [ + "## **Data Overview**" + ] + }, + { + "cell_type": "markdown", + "id": "GW4rkWI1WzBb", + "metadata": { + "id": "GW4rkWI1WzBb" + }, + "source": [ + "#### **Displaying the first few rows of the dataset**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd2f105b", + "metadata": { + "id": "dd2f105b", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "4fce6c84-28cc-4aee-e18c-154c97c9e849" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Date News Open \\\n", + "0 01-02-2019 The dollar minutes ago tumbled to 106 67 from... 38.72 \n", + "1 01-02-2019 By Wayne Cole and Swati Pandey SYDNEY Reuters... 38.72 \n", + "2 01-02-2019 By Stephen Culp NEW YORK Reuters Wall Stre... 38.72 \n", + "3 01-02-2019 By Wayne Cole SYDNEY Reuters The Australia... 38.72 \n", + "4 01-02-2019 Investing com Asian equities fell in morning... 38.72 \n", + "\n", + " High Low Close Volume Label \n", + "0 39.71 38.56 39.48 130672400 1 \n", + "1 39.71 38.56 39.48 130672400 -1 \n", + "2 39.71 38.56 39.48 130672400 0 \n", + "3 39.71 38.56 39.48 130672400 -1 \n", + "4 39.71 38.56 39.48 130672400 1 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DateNewsOpenHighLowCloseVolumeLabel
001-02-2019The dollar minutes ago tumbled to 106 67 from...38.7239.7138.5639.481306724001
101-02-2019By Wayne Cole and Swati Pandey SYDNEY Reuters...38.7239.7138.5639.48130672400-1
201-02-2019By Stephen Culp NEW YORK Reuters Wall Stre...38.7239.7138.5639.481306724000
301-02-2019By Wayne Cole SYDNEY Reuters The Australia...38.7239.7138.5639.48130672400-1
401-02-2019Investing com Asian equities fell in morning...38.7239.7138.5639.481306724001
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "stock", + "summary": "{\n \"name\": \"stock\",\n \"rows\": 418,\n \"fields\": [\n {\n \"column\": \"Date\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 73,\n \"samples\": [\n \"01-08-2019\",\n \"04-15-2019\",\n \"01-30-2019\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"News\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 418,\n \"samples\": [\n \" Reuters Apple Inc NASDAQ AAPL is expected to unveil a new video streaming service and a news subscription platform at an event on Monday at its California headquarters The iPhone maker is banking on growing its services business to offset a dip in smartphone sales While the Wall Street Journal plans to join Apple s new subscription news service other major publishers including the New York Times and the Washington Post have declined according to a New York Times report Apple has also partnered with Hollywood celebrities to make a streaming debut with a slate of original content taking a page out of Netflix NASDAQ NFLX Inc s playbook Below are some of the shows curated from media reports and Apple s own announcements which are part of the iPhone maker s content library SHOWS CONFIRMED BY APPLE UNTITLED DRAMA SERIES WITH REESE WITHERSPOON AND JENNIFER ANISTON Two seasons of a drama series starring Reese Witherspoon and Jennifer Aniston that looks at the lives of people working on a morning television show REVIVAL OF STEVEN SPIELBERG S 1985 AMAZING STORIES The tech giant has also struck a deal with director Steven Spielberg to make new episodes of Amazing Stories a science fiction and horror anthology series that ran on NBC in the 1980s A NEW THRILLER BY M NIGHT SHYAMALAN Plot of the story has not been disclosed ARE YOU SLEEPING A MYSTERY SERIES A drama featuring Octavia Spencer based on a crime novel by Kathleen Barber AN ANTHOLOGY SERIES CALLED LITTLE AMERICA Focuses on stories of immigrants coming to the United States AN ANIMATED CARTOON MUSICAL CALLED CENTRAL PARK The animated musical comedy is about a family of caretakers who end up saving the park and the world DICKINSON AN EMILY DICKINSON COMEDY A half hour comedy series that is set during American poet Emily Dickinson s era with a modern sensibility and tone OPRAH WINFREY PARTNERSHIP Apple in June last year announced a multi year deal with Oprah Winfrey to create original programming SHOWS REPORTED BY MEDIA TIME BANDITS A FANTASY SERIES The potential series is an adaptation of Terry Gilliam s 1981 fantasy film of the same name about a young boy who joins a group of renegade time traveling dwarves Deadline reported https UNTITLED CAPTAIN MARVEL STAR BRIE LARSON S CIA PROJECT The new series looks at a young woman s journey in the CIA reported Variety https DEFENDING JACOB STARRING CAPTAIN AMERICA CHRIS EVANS This limited series is based on the novel of the same name and is about an assistant district attorney who is investigating the murder of a 14 year old boy according to Deadline https FOR ALL MANKIND A SCI FI SERIES A space drama from producer Ronald Moore according to Deadline https MY GLORY WAS I HAD SUCH FRIENDS A series featuring Jennifer Garner is based on the 2017 memoir of the same name by Amy Silverstein reported Variety https SEE A FANTASY EPIC STARRING JASON MOMOA The show poses the question about the fate of humanity if everyone lost their sight Variety reported https FOUNDATION A SCI FI ADAPTATION An adaptation of the iconic novel series from famed sci fi author Isaac Asimov Deadline reported The book series follows a mathematician who predicts the collapse of humanity A COMEDY SHOW BY ROB MCELHENNEY AND CHARLIE DAY The sitcom comedy based on the lives of a diverse group of people who work together in a video game development studio Variety reported https AN UNSCRIPTED SERIES HOME FROM THE DOCUMENTARY FILMMAKER MATT TYRNAUER The series will offer viewers a never before seen look inside the world s most extraordinary homes and feature interviews with people who built them according to Variety https UNTITLED RICHARD GERE SERIES Based on an Israeli series Nevelot the show is about two elderly Vietnam vets whose lives are changed when a woman they both love is killed in a car accident Deadline reported J J ABRAMS PRODUCED LITTLE VOICE Singer and actress Sara Bareilles is writing the music and could possibly star in the J J Abrams produced half hour show which explores the journey of finding one s authentic voice in early 20s according to Variety THE PEANUTS GANG Apple has acquired the rights to the famous characters and the first series will be a science and math oriented short featuring Snoopy as an astronaut according to Hollywood Reporter ON THE ROCKS A feature film directed by Sofia Coppola starring Bill Murray is about a young mother who reconnects with her larger than life playboy father on an adventure through New York Variety reported https LOSING EARTH Apple has acquired the rights to a TV series based on Nathaniel Rich s 70 page New York Times Magazine story Losing Earth New York Times reported THE ELEPHANT QUEEN Apple has acquired the rights to Victoria Stone and Mark Deeble s documentary The Elephant Queen Deadline reported WOLFWALKERS An Irish animation about a young hunter who comes to Ireland with her father to destroy a pack of evil wolves but instead befriends a wild native girl who runs with them first reported by Bloomberg PACHINKO Apple has secured the rights to develop Min Jin Lee s best selling novel about four generations of a Korean immigrant family into a series reported Variety CALLS Apple has bought the rights to make an English language version of the French original short form series according to Variety SHANTARAM Apple has won the rights to develop the hit novel Shantaram as a drama series reported Variety https SWAGGER A DRAMA SERIES BASED ON KEVIN DURANT A drama series based on the early life and career of NBA superstar Kevin Durant according to Variety https YOU THINK IT I LL SAY IT Apple has ordered a 10 episode half hour run of the comedy show which is an adaptation of Curtis Sittenfeld s short story collection by the same name Variety reported https WHIPLASH DIRECTOR DAMIEN CHAZELLE DRAMA SERIES According to Variety Apple has ordered a whole season of a series without first shooting a pilot but no other details are known about the show \\n Apple may offer cut priced bundles with video offering The Information reported on Thursday \",\n \"Investing com Stocks in focus in premarket trade Monday \\n Viacom NASDAQ VIAB jumped 4 2 by 8 04 AM ET 12 04 GMT as the company announced that it had renewed its contract with AT T NYSE T avoiding a blackout of MTV Nickelodeon and Comedy Central for DirecTV users \\n Nike NYSE NKE fell 0 3 after European Union antitrust regulators fined the company 12 5 million euros 14 14 million for restricting cross border sales of merchandising products \\n Apple NASDAQ AAPL dropped 0 3 while markets geared up for a company presentation that is expected to lift the curtain on Monday on a secretive years long effort to build a video streaming prodduct \\n Boeing NYSE BA gained 0 4 as the company preps to brief more than 200 global airline pilots technical leaders and regulators on Wednesday over software and training updates for its 737 MAX aircraft \\n CalAmp NASDAQ CAMP fell 2 4 after JP Morgan downgraded it to neutral from overweight according to Briefing com \\n Winnebago Industries NYSE WGO stock declined 0 9 after the company s fiscal second quarter revenue was lower than expected although earnings per share beat expectations\\n Thermo Fisher Scientific NYSE TMO stock could see movement in the regular session after the company announced that it would acquire Brammer Bio for approximately 1 7 billion in cash \\n Biogen NASDAQ BIIB bounced 1 5 after announcing a new 5 billion buyback The stock had fallen by nearly one third last week after saying it had halted the development of a drug it had been developing to treat Alzheimer s \",\n \"By Yimou Lee TAIPEI Reuters Terry Gou chairman of Apple NASDAQ AAPL supplier Foxconn said on Wednesday he will contest Taiwan s 2020 presidential election shaking up the political landscape at a time of heightened tension between the self ruled island and Beijing Gou Taiwan s richest person with a net worth of 7 6 billion according to Forbes said he would join the already competitive race and take part in the opposition China friendly Kuomintang KMT primaries His decision capped a flurry of news this week that began when Gou told Reuters on Monday he planned to step down from the world s largest contract manufacturer to pave the way for younger talent to move up the company s ranks He announced on Tuesday he was considering a presidential bid and hinted he was close to a decision when he told more than 100 people packed into a temple he would follow the instruction of a sea goddess who had told him to run in the presidential race Peace stability and Taiwan s economy future are my core values Gou said later at the KMT s headquarters in Taipei He urged the party to rediscover the spirit of the KMT the honor of KMT members and the KMT s lost support of the youth Gou s bid which requires KMT approval comes at a delicate time for cross strait relations and delivers a blow to the ruling pro independence Democratic Progressive NYSE PGR Party which is struggling in opinion polls China Taiwan relations have deteriorated since the island s president Tsai Ing wen of the independence leaning DPP swept to power in 2016 China suspects Tsai is pushing for the island s formal independence That is a red line for China which has never renounced the use of force to bring Taiwan under its control Tsai says she wants to maintain the status quo with China but will defend Taiwan s security and democracy VERY PRO CHINA A senior adviser to Tsai told Reuters he thought Gou s bid could create problems given his extensive business ties with China This is problematic to Taiwan s national security the adviser Yao Chia wen said He s very pro China and he represents the class of the wealthy people Will that gain support from Taiwanese Yao said adding he believed Gou would face a tough battle in the KMT primary Tension between Taipei and Beijing escalated again on Monday as Chinese bombers and warships conducted drills around the island prompting Taiwan to scramble jets and ships to monitor the Chinese forces The KMT which once ruled China before fleeing to Taiwan at the end of a civil war with the Communists in 1949 said in February it could sign a peace treaty with Beijing if it won the hotly contested presidential election Zhang Baohui a regional security analyst at Hong Kong s Lingnan University said Gou s run could mark the start of the most unusual election race in Taiwan history This is something entirely fresh for Taiwan politics here is a candidate who sees everything through the pragmatic angle of a businessman rather than raw politics or ideology Zhang told Reuters He has no baggage and that will be a fascinating scenario Gou s news comes as Tsai is grappling with a series of unpopular domestic reform initiatives from a pension scheme to labour law which have come under intense voter scrutiny The KMT said this week Gou had been a party member for more than 50 years and had given it an interest free loan of T 45 million 1 5 million in 2016 under the name of his mother which had signalled his loyalty to the party Foxconn said on Tuesday Gou would remain chairman of Foxconn though he planned to withdraw from daily operations \"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Open\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.947134201503234,\n \"min\": 35.99,\n \"max\": 51.84,\n \"num_unique_values\": 69,\n \"samples\": [\n 43.22,\n 38.72,\n 48.83\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"High\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.947413441172774,\n \"min\": 36.43,\n \"max\": 52.12,\n \"num_unique_values\": 67,\n \"samples\": [\n 43.87,\n 39.08,\n 37.96\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Low\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.967879507972434,\n \"min\": 35.5,\n \"max\": 51.76,\n \"num_unique_values\": 66,\n \"samples\": [\n 49.54,\n 50.97,\n 38.56\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.999867403609388,\n \"min\": 35.55,\n \"max\": 51.87,\n \"num_unique_values\": 68,\n \"samples\": [\n 48.77,\n 39.08,\n 37.69\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 45745495,\n \"min\": 45448000,\n \"max\": 365248800,\n \"num_unique_values\": 73,\n \"samples\": [\n 216071600,\n 70146400,\n 244439200\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Label\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": -1,\n \"max\": 1,\n \"num_unique_values\": 3,\n \"samples\": [\n 1,\n -1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 4 + } + ], + "source": [ + "stock.head(5)" + ] + }, + { + "cell_type": "markdown", + "id": "y2ewB36LL9Cz", + "metadata": { + "id": "y2ewB36LL9Cz" + }, + "source": [ + "#### **Understanding the shape of the dataset**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "wWx6wqN0MTPw", + "metadata": { + "id": "wWx6wqN0MTPw", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "edd77532-57d5-4ab5-ebc3-b8a4fa737b0a" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(418, 8)" + ] + }, + "metadata": {}, + "execution_count": 5 + } + ], + "source": [ + "stock.shape" + ] + }, + { + "cell_type": "markdown", + "id": "yQjb8QOTivg3", + "metadata": { + "id": "yQjb8QOTivg3" + }, + "source": [ + "**Observations:**\n", + "* There are a total of 418 records with 8 attributes each." + ] + }, + { + "cell_type": "markdown", + "id": "fPLJXhFcMA7N", + "metadata": { + "id": "fPLJXhFcMA7N" + }, + "source": [ + "#### **Checking the data types of the columns**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "Gc_eAiMdMVe2", + "metadata": { + "id": "Gc_eAiMdMVe2", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "8c61445b-ef88-4a92-b810-0e15bb3000f5" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "RangeIndex: 418 entries, 0 to 417\n", + "Data columns (total 8 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Date 418 non-null object \n", + " 1 News 418 non-null object \n", + " 2 Open 418 non-null float64\n", + " 3 High 418 non-null float64\n", + " 4 Low 418 non-null float64\n", + " 5 Close 418 non-null float64\n", + " 6 Volume 418 non-null int64 \n", + " 7 Label 418 non-null int64 \n", + "dtypes: float64(4), int64(2), object(2)\n", + "memory usage: 26.3+ KB\n" + ] + } + ], + "source": [ + "stock.info()" + ] + }, + { + "cell_type": "markdown", + "id": "i1CgPxT5mxEf", + "metadata": { + "id": "i1CgPxT5mxEf" + }, + "source": [ + "Let's convert the Date column to pandas `datetime` type." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ZD5fstuv6ery", + "metadata": { + "id": "ZD5fstuv6ery" + }, + "outputs": [], + "source": [ + "# Convert the 'Date' column in the 'stocks' DataFrame to datetime format\n", + "stock['Date'] = pd.to_datetime(stock['Date'])" + ] + }, + { + "cell_type": "markdown", + "id": "8dORemydMDfR", + "metadata": { + "id": "8dORemydMDfR" + }, + "source": [ + "#### **Checking the statistical summary**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gUazWjegMeQl", + "metadata": { + "id": "gUazWjegMeQl", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "cd88b728-318a-4f67-d74e-71373400a55b" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Date Open High Low \\\n", + "count 418 418.000000 418.000000 418.000000 \n", + "mean 2019-02-14 12:24:06.889952256 42.308852 42.787321 41.923732 \n", + "min 2019-01-02 00:00:00 35.990000 36.430000 35.500000 \n", + "25% 2019-01-11 00:00:00 38.130000 38.420000 37.720000 \n", + "50% 2019-01-31 00:00:00 41.530000 42.250000 41.140000 \n", + "75% 2019-03-21 00:00:00 47.190000 47.427500 46.480000 \n", + "max 2019-04-29 00:00:00 51.840000 52.120000 51.760000 \n", + "std NaN 4.947134 4.947413 4.967880 \n", + "\n", + " Close Volume Label \n", + "count 418.000000 4.180000e+02 418.000000 \n", + "mean 42.418517 1.294225e+08 0.308612 \n", + "min 35.550000 4.544800e+07 -1.000000 \n", + "25% 38.270000 1.029072e+08 -1.000000 \n", + "50% 41.610000 1.156272e+08 1.000000 \n", + "75% 47.032500 1.511252e+08 1.000000 \n", + "max 51.870000 3.652488e+08 1.000000 \n", + "std 4.999867 4.574550e+07 0.943473 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
DateOpenHighLowCloseVolumeLabel
count418418.000000418.000000418.000000418.0000004.180000e+02418.000000
mean2019-02-14 12:24:06.88995225642.30885242.78732141.92373242.4185171.294225e+080.308612
min2019-01-02 00:00:0035.99000036.43000035.50000035.5500004.544800e+07-1.000000
25%2019-01-11 00:00:0038.13000038.42000037.72000038.2700001.029072e+08-1.000000
50%2019-01-31 00:00:0041.53000042.25000041.14000041.6100001.156272e+081.000000
75%2019-03-21 00:00:0047.19000047.42750046.48000047.0325001.511252e+081.000000
max2019-04-29 00:00:0051.84000052.12000051.76000051.8700003.652488e+081.000000
stdNaN4.9471344.9474134.9678804.9998674.574550e+070.943473
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"stock\",\n \"rows\": 8,\n \"fields\": [\n {\n \"column\": \"Date\",\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": \"1970-01-01 00:00:00.000000418\",\n \"max\": \"2019-04-29 00:00:00\",\n \"num_unique_values\": 7,\n \"samples\": [\n \"418\",\n \"2019-02-14 12:24:06.889952256\",\n \"2019-03-21 00:00:00\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Open\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 135.29734162506347,\n \"min\": 4.947134201503234,\n \"max\": 418.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 42.308851674641154,\n 47.19,\n 418.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"High\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 135.18648944299875,\n \"min\": 4.947413441172774,\n \"max\": 418.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 42.78732057416268,\n 47.427499999999995,\n 418.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Low\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 135.4078256172517,\n \"min\": 4.967879507972434,\n \"max\": 418.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 41.923732057416274,\n 46.48,\n 418.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 135.30548063571206,\n \"min\": 4.999867403609388,\n \"max\": 418.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 42.41851674641149,\n 47.0325,\n 418.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 111473859.17448182,\n \"min\": 418.0,\n \"max\": 365248800.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 129422491.86602871,\n 151125200.0,\n 418.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Label\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 147.67411440583984,\n \"min\": -1.0,\n \"max\": 418.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 0.30861244019138756,\n 0.9434730920044713,\n -1.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "stock.describe()" + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Observations:**\n", + "\n", + "- **Date Range and Trading Period**:\n", + " - The data covers a period from January 2, 2019, to April 29, 2019, indicating a span of approximately four months.\n", + "\n", + "- **Price Overview**:\n", + " - **Average Prices**: The average opening price is approximately \\$42.30, while the average closing price is about \\$42.41.\n", + " - **Price Variability**: The prices range from a minimum of around \\$35.99 for opening to a maximum of \\$51.84 for opening, reflecting significant volatility during this period.\n", + "\n", + "- **Trading Volume**:\n", + " - The average trading volume is approximately 129.42 million shares, with fluctuations from around 45.45 million to 365.24 million, highlighting varying market activity levels." + ], + "metadata": { + "id": "0wZ7x_5W77tD" + }, + "id": "0wZ7x_5W77tD" + }, + { + "cell_type": "markdown", + "id": "lXRpNWnQMGIY", + "metadata": { + "id": "lXRpNWnQMGIY" + }, + "source": [ + "#### **Checking the duplicate values**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ti4UpPi6M5kM", + "metadata": { + "id": "ti4UpPi6M5kM", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "66fad013-2ec1-4ec3-9cfe-842656486d7c" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ], + "source": [ + "stock.duplicated().sum()" + ] + }, + { + "cell_type": "markdown", + "id": "XkwHzJH6k_jx", + "metadata": { + "id": "XkwHzJH6k_jx" + }, + "source": [ + "**Observations:**\n", + "* There are no duplicate values." + ] + }, + { + "cell_type": "markdown", + "id": "fxghULa0MOY-", + "metadata": { + "id": "fxghULa0MOY-" + }, + "source": [ + "#### **Checking for missing values**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "yItWheKoNGkf", + "metadata": { + "id": "yItWheKoNGkf", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "ff1d5335-a019-465d-e7f8-546c97e5873f" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Date 0\n", + "News 0\n", + "Open 0\n", + "High 0\n", + "Low 0\n", + "Close 0\n", + "Volume 0\n", + "Label 0\n", + "dtype: int64" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
0
Date0
News0
Open0
High0
Low0
Close0
Volume0
Label0
\n", + "

" + ] + }, + "metadata": {}, + "execution_count": 10 + } + ], + "source": [ + "stock.isnull().sum()" + ] + }, + { + "cell_type": "markdown", + "id": "qg7TsQTclDUS", + "metadata": { + "id": "qg7TsQTclDUS" + }, + "source": [ + "**Observations:**\n", + "* There are no missing values." + ] + }, + { + "cell_type": "markdown", + "id": "hGHBK8-QeKOB", + "metadata": { + "id": "hGHBK8-QeKOB" + }, + "source": [ + "## **Exploratory Data Analysis**" + ] + }, + { + "cell_type": "markdown", + "id": "Q0UlMQnyegl7", + "metadata": { + "id": "Q0UlMQnyegl7" + }, + "source": [ + "### **Univariate Analysis**" + ] + }, + { + "cell_type": "markdown", + "id": "RrznHeBaLu0W", + "metadata": { + "id": "RrznHeBaLu0W" + }, + "source": [ + "#### **Countplot on Label**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "meVjTKoxLpmA", + "metadata": { + "id": "meVjTKoxLpmA", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "dcd02675-45e6-4016-fb12-10ce1aef7018" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "sns.countplot(data=stock, x='Label', stat=\"percent\");" + ] + }, + { + "cell_type": "markdown", + "id": "nXPvfQr-Avd7", + "metadata": { + "id": "nXPvfQr-Avd7" + }, + "source": [ + "**Observations:**\n", + "* The dataset is imbalanced for the sentiment polarities.\n", + "* There is more news content with positive polarity compared to other types." + ] + }, + { + "cell_type": "markdown", + "id": "dpGHhbGeeoF8", + "metadata": { + "id": "dpGHhbGeeoF8" + }, + "source": [ + "#### **Density Plot of Price (Open, High, Low, Close)**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "BKqgbg0_v5EM", + "metadata": { + "id": "BKqgbg0_v5EM", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0a50f216-e030-496b-de8c-cc3a2b8153ca" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "# Plot KDE for the 'Open', 'High', 'Low', 'Close' columns of the 'stock' DataFrame.\n", + "sns.displot(data=stock[['Open','High','Low','Close']], kind='kde', palette=\"tab10\"); # Create a KDE plot with a color palette." + ] + }, + { + "cell_type": "markdown", + "id": "l5jX1Kp-lbD5", + "metadata": { + "id": "l5jX1Kp-lbD5" + }, + "source": [ + "**Observations:**\n", + "* The distributions of the prices are quite similar, with the high price showing a slight variation than the others." + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Histogram on Volume**" + ], + "metadata": { + "id": "1wBKKuVIaWnl" + }, + "id": "1wBKKuVIaWnl" + }, + { + "cell_type": "code", + "source": [ + "sns.histplot(stock, x='Volume');" + ], + "metadata": { + "id": "FMDJ_m6maaoK", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "fa63eb8f-f159-41cc-c969-fce63404ff4b" + }, + "id": "FMDJ_m6maaoK", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAGwCAYAAACzXI8XAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAMEVJREFUeJzt3X1cVHXe//H3oDBgyhjecJMgaApoeZN5g7qbGUWu+dBLtrKyKN1qu9BN6U66UtMsurlSNyPNLsV1Wy83t3Rr27SkpFI0pdw00dVWG0vAqGC8wVHh/P7Yn3NFgsEwcObA6/l4nMfDczMfPvN9HIa3Z75zxmYYhiEAAAALCjC7AQAAAG8RZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGURZAAAgGW1NruBxlZVVaUjR46oXbt2stlsZrcDAADqwDAMHTt2TFFRUQoIqP26S7MPMkeOHFF0dLTZbQAAAC8cPnxYXbp0qXV/sw8y7dq1k/TvgQgNDTW5GwAAUBcul0vR0dGev+O1afZB5tzbSaGhoQQZAAAs5uemhTDZFwAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWFZrsxtA8+Z0OlVaWtrgOh07dlRMTIwPOgIANCcEGTQap9OphIREVVScbHCtkJA22ru3kDADAKiGIINGU1paqoqKkxo8abZCI2O9ruMqOqRty+eotLSUIAMAqIYgg0YXGhmrsJh4s9sAADRDTPYFAACWRZABAACWZWqQqays1MyZMxUXF6eQkBB1795dTzzxhAzD8BxjGIZmzZqlyMhIhYSEKDk5Wfv37zexawAA4C9MDTLPPPOMFi9erBdffFGFhYV65pln9Oyzz2rRokWeY5599lm98MILWrJkibZt26aLLrpIKSkpOnXqlImdAwAAf2DqZN8tW7Zo7NixGj16tCQpNjZW//u//6tPPvlE0r+vxixcuFCPPfaYxo4dK0lauXKlwsPDtW7dOk2YMOG8mm63W26327Pucrma4JkAAAAzmHpFZujQocrNzdU///lPSdI//vEPffzxxxo1apQk6eDBgyouLlZycrLnMQ6HQ4MHD1Z+fn6NNbOysuRwODxLdHR04z8RAABgClOvyMyYMUMul0sJCQlq1aqVKisr9eSTT+q2226TJBUXF0uSwsPDqz0uPDzcs++nMjMzlZGR4Vl3uVyEGQAAmilTg8xrr72mP/3pT1q1apV69+6tnTt3atq0aYqKilJaWppXNe12u+x2u487BQAA/sjUIPPQQw9pxowZnrkul19+ub766itlZWUpLS1NERERkqSSkhJFRkZ6HldSUqJ+/fqZ0TIAAPAjps6ROXnypAICqrfQqlUrVVVVSZLi4uIUERGh3Nxcz36Xy6Vt27YpKSmpSXsFAAD+x9QrMmPGjNGTTz6pmJgY9e7dW5999pnmz5+vSZMmSZJsNpumTZumefPmqUePHoqLi9PMmTMVFRWlcePGmdk6AADwA6YGmUWLFmnmzJn6z//8Tx09elRRUVG69957NWvWLM8xDz/8sE6cOKF77rlHZWVlGj58uNavX6/g4GATOwcAAP7A1CDTrl07LVy4UAsXLqz1GJvNprlz52ru3LlN1xgAALAEvmsJAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYlqlBJjY2Vjab7bwlPT1dknTq1Cmlp6erQ4cOatu2rVJTU1VSUmJmywAAwI+YGmS2b9+uoqIiz/Lee+9Jkm688UZJ0vTp0/XWW29pzZo1ysvL05EjRzR+/HgzWwYAAH6ktZk/vFOnTtXWn376aXXv3l1XXXWVysvLtWzZMq1atUojR46UJOXk5CgxMVFbt27VkCFDaqzpdrvldrs96y6Xq/GeAAAAMJXfzJE5ffq0Xn31VU2aNEk2m00FBQU6c+aMkpOTPcckJCQoJiZG+fn5tdbJysqSw+HwLNHR0U3RPgAAMIHfBJl169aprKxMd955pySpuLhYQUFBat++fbXjwsPDVVxcXGudzMxMlZeXe5bDhw83YtcAAMBMpr619GPLli3TqFGjFBUV1aA6drtddrvdR10BAAB/5hdB5quvvtLGjRv1xhtveLZFRETo9OnTKisrq3ZVpqSkRBERESZ0CQAA/I1fvLWUk5Ojzp07a/To0Z5tAwYMUGBgoHJzcz3b9u3bJ6fTqaSkJDPaBAAAfsb0KzJVVVXKyclRWlqaWrf+v3YcDocmT56sjIwMhYWFKTQ0VFOnTlVSUlKtn1gCAAAti+lBZuPGjXI6nZo0adJ5+xYsWKCAgAClpqbK7XYrJSVFL730kgldAgAAf2R6kLnuuutkGEaN+4KDg5Wdna3s7Owm7goAAFiBX8yRAQAA8AZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWBZBBgAAWJbpN8QD6qqwsLDBNTp27KiYmBgfdAMA8AcEGfi9ivLvJNk0ceLEBtcKCWmjvXsLCTMA0EwQZOD3zpw8JslQv1sfUae4BK/ruIoOadvyOSotLSXIAEAzQZCBZbTtHKOwmHiz2wAA+BEm+wIAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsyPch88803mjhxojp06KCQkBBdfvnl2rFjh2e/YRiaNWuWIiMjFRISouTkZO3fv9/EjgEAgL8wNcj88MMPGjZsmAIDA/XOO+9oz549ev7553XxxRd7jnn22Wf1wgsvaMmSJdq2bZsuuugipaSk6NSpUyZ2DgAA/EFrM3/4M888o+joaOXk5Hi2xcXFef5tGIYWLlyoxx57TGPHjpUkrVy5UuHh4Vq3bp0mTJhwXk232y232+1Zd7lcjfgMAACAmUy9IvPmm2/qyiuv1I033qjOnTurf//+euWVVzz7Dx48qOLiYiUnJ3u2ORwODR48WPn5+TXWzMrKksPh8CzR0dGN/jwAAIA5TA0y//rXv7R48WL16NFDGzZs0H333aff/e53+sMf/iBJKi4uliSFh4dXe1x4eLhn309lZmaqvLzcsxw+fLhxnwQAADCNqW8tVVVV6corr9RTTz0lSerfv792796tJUuWKC0tzauadrtddrvdl20CAAA/ZeoVmcjISPXq1avatsTERDmdTklSRESEJKmkpKTaMSUlJZ59AACg5TI1yAwbNkz79u2rtu2f//ynunbtKunfE38jIiKUm5vr2e9yubRt2zYlJSU1aa8AAMD/mPrW0vTp0zV06FA99dRTuummm/TJJ59o6dKlWrp0qSTJZrNp2rRpmjdvnnr06KG4uDjNnDlTUVFRGjdunJmtAwAAP2BqkBk4cKDWrl2rzMxMzZ07V3FxcVq4cKFuu+02zzEPP/ywTpw4oXvuuUdlZWUaPny41q9fr+DgYBM7BwAA/sDUICNJN9xwg2644YZa99tsNs2dO1dz585twq4AAIAVmP4VBQAAAN4iyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsiyAAAAMsy/YZ48E9Op1OlpaUNqlFYWOijbgAAqBlBBudxOp1KSEhURcVJn9Q74z7tkzoAAPwUQQbnKS0tVUXFSQ2eNFuhkbFe1ynala/dby7V2bNnfdccAAA/QpBBrUIjYxUWE+/1411Fh3zXDAAANWCyLwAAsCyCDAAAsCyCDAAAsCyCDAAAsCyCDAAAsCyCDAAAsCyCDAAAsCyCDAAAsCyCDAAAsCyCDAAAsCyCDAAAsCyCDAAAsCyCDAAAsCyCDAAAsKzWZjcANLXCwsIG1+jYsaNiYmJ80A0AoCEIMmgxKsq/k2TTxIkTG1wrJKSN9u4tJMwAgMkIMmgxzpw8JslQv1sfUae4BK/ruIoOadvyOSotLSXIAIDJCDJocdp2jlFYTLzZbQAAfIDJvgAAwLJMDTKPP/64bDZbtSUh4f8u+Z86dUrp6enq0KGD2rZtq9TUVJWUlJjYMQAA8CemX5Hp3bu3ioqKPMvHH3/s2Td9+nS99dZbWrNmjfLy8nTkyBGNHz/exG4BAIA/MX2OTOvWrRUREXHe9vLyci1btkyrVq3SyJEjJUk5OTlKTEzU1q1bNWTIkKZuFQAA+BnTr8js379fUVFR6tatm2677TY5nU5JUkFBgc6cOaPk5GTPsQkJCYqJiVF+fn6t9dxut1wuV7UFAAA0T6YGmcGDB2vFihVav369Fi9erIMHD+oXv/iFjh07puLiYgUFBal9+/bVHhMeHq7i4uJaa2ZlZcnhcHiW6OjoRn4WAADALKa+tTRq1CjPv/v06aPBgwera9eueu211xQSEuJVzczMTGVkZHjWXS4XYQYAgGbK9LeWfqx9+/bq2bOnDhw4oIiICJ0+fVplZWXVjikpKalxTs05drtdoaGh1RYAANA8+VWQOX78uL788ktFRkZqwIABCgwMVG5urmf/vn375HQ6lZSUZGKXAADAX5j61tKDDz6oMWPGqGvXrjpy5Ihmz56tVq1a6ZZbbpHD4dDkyZOVkZGhsLAwhYaGaurUqUpKSuITSwAAQJLJQebrr7/WLbfcou+++06dOnXS8OHDtXXrVnXq1EmStGDBAgUEBCg1NVVut1spKSl66aWXzGwZAAD4EVODzOrVqy+4Pzg4WNnZ2crOzm6ijgAAgJX41RwZAACA+iDIAAAAyyLIAAAAyyLIAAAAyyLIAAAAy/IqyHTr1k3ffffdedvLysrUrVu3BjcFAABQF14FmUOHDqmysvK87W63W998802DmwIAAKiLet1H5s033/T8e8OGDXI4HJ71yspK5ebmKjY21mfNAQAAXEi9gsy4ceMkSTabTWlpadX2BQYGKjY2Vs8//7zPmgMAALiQegWZqqoqSVJcXJy2b9+ujh07NkpTAAAAdeHVVxQcPHjQ130AAADUm9fftZSbm6vc3FwdPXrUc6XmnOXLlze4MQAAgJ/jVZCZM2eO5s6dqyuvvFKRkZGy2Wy+7gsAAOBneRVklixZohUrVuj222/3dT8AAAB15tV9ZE6fPq2hQ4f6uhcAAIB68SrI/OY3v9GqVat83QsAAEC9ePXW0qlTp7R06VJt3LhRffr0UWBgYLX98+fP90lzAAAAF+JVkPn888/Vr18/SdLu3bur7WPiLwAAaCpeBZkPPvjA130AAADUm1dzZAAAAPyBV1dkrr766gu+hfT+++973RAAAEBdeRVkzs2POefMmTPauXOndu/efd6XSQIAADQWr4LMggULatz++OOP6/jx4w1qCAAAoK58Okdm4sSJfM8SAABoMj4NMvn5+QoODvZlSQAAgFp59dbS+PHjq60bhqGioiLt2LFDM2fO9EljAAAAP8erIONwOKqtBwQEKD4+XnPnztV1113nk8YAAAB+jldBJicnx9d9AAAA1JtXQeacgoICFRYWSpJ69+6t/v37+6QpAACAuvAqyBw9elQTJkzQpk2b1L59e0lSWVmZrr76aq1evVqdOnXyZY8AAAA18upTS1OnTtWxY8f0xRdf6Pvvv9f333+v3bt3y+Vy6Xe/+52vewQAAKiRV1dk1q9fr40bNyoxMdGzrVevXsrOzmayLwAAaDJeXZGpqqpSYGDgedsDAwNVVVXlVSNPP/20bDabpk2b5tl26tQppaenq0OHDmrbtq1SU1NVUlLiVX0AAND8eBVkRo4cqfvvv19HjhzxbPvmm280ffp0XXPNNfWut337dr388svq06dPte3Tp0/XW2+9pTVr1igvL09Hjhw57x42AACg5fIqyLz44otyuVyKjY1V9+7d1b17d8XFxcnlcmnRokX1qnX8+HHddttteuWVV3TxxRd7tpeXl2vZsmWaP3++Ro4cqQEDBignJ0dbtmzR1q1bvWkbAAA0M17NkYmOjtann36qjRs3au/evZKkxMREJScn17tWenq6Ro8ereTkZM2bN8+zvaCgQGfOnKlWMyEhQTExMcrPz9eQIUNqrOd2u+V2uz3rLper3j0BAABrqNcVmffff1+9evWSy+WSzWbTtddeq6lTp2rq1KkaOHCgevfurY8++qjO9VavXq1PP/1UWVlZ5+0rLi5WUFCQ5+Pd54SHh6u4uLjWmllZWXI4HJ4lOjq6zv0AAABrqVeQWbhwoe6++26Fhoaet8/hcOjee+/V/Pnz61Tr8OHDuv/++/WnP/3Jp180mZmZqfLycs9y+PBhn9UGAAD+pV5B5h//+Ieuv/76Wvdfd911KigoqFOtgoICHT16VFdccYVat26t1q1bKy8vTy+88IJat26t8PBwnT59WmVlZdUeV1JSooiIiFrr2u12hYaGVlsAAEDzVK85MiUlJTV+7NpTrHVrffvtt3Wqdc0112jXrl3Vtt11111KSEjQI488oujoaAUGBio3N1epqamSpH379snpdCopKak+bQMAgGaqXkHmkksu0e7du3XppZfWuP/zzz9XZGRknWq1a9dOl112WbVtF110kTp06ODZPnnyZGVkZCgsLEyhoaGaOnWqkpKSap3oCwAAWpZ6vbX0q1/9SjNnztSpU6fO21dRUaHZs2frhhtu8FlzCxYs0A033KDU1FT98pe/VEREhN544w2f1QcAANZWrysyjz32mN544w317NlTU6ZMUXx8vCRp7969ys7OVmVlpf7rv/7L62Y2bdpUbT04OFjZ2dnKzs72uiYAAGi+6hVkwsPDtWXLFt13333KzMyUYRiSJJvNppSUFGVnZys8PLxRGgUAAPipet8Qr2vXrvr73/+uH374QQcOHJBhGOrRo0e1u/ICAAA0Ba/u7CtJF198sQYOHOjLXgAAAOrFq+9aAgAA8AcEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFkEGQAAYFmtzW4AQMM5nU6VlpY2uE7Hjh0VExPjg44AoGkQZACLczqdSkhIVEXFyQbXCglpo717CwkzACyDIANYXGlpqSoqTmrwpNkKjYz1uo6r6JC2LZ+j0tJSggwAyyDIAM1EaGSswmLizW4DAJoUk30BAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBlmRpkFi9erD59+ig0NFShoaFKSkrSO++849l/6tQppaenq0OHDmrbtq1SU1NVUlJiYscAAMCfmBpkunTpoqeffloFBQXasWOHRo4cqbFjx+qLL76QJE2fPl1vvfWW1qxZo7y8PB05ckTjx483s2UAAOBHTL0h3pgxY6qtP/nkk1q8eLG2bt2qLl26aNmyZVq1apVGjhwpScrJyVFiYqK2bt2qIUOGmNEyAADwI34zR6ayslKrV6/WiRMnlJSUpIKCAp05c0bJycmeYxISEhQTE6P8/Pxa67jdbrlcrmoLAABonkwPMrt27VLbtm1lt9v129/+VmvXrlWvXr1UXFysoKAgtW/fvtrx4eHhKi4urrVeVlaWHA6HZ4mOjm7kZwAAAMxiepCJj4/Xzp07tW3bNt13331KS0vTnj17vK6XmZmp8vJyz3L48GEfdgsAAPyJ6V8aGRQUpEsvvVSSNGDAAG3fvl2///3vdfPNN+v06dMqKyurdlWmpKREERERtdaz2+2y2+2N3TYAAPADpl+R+amqqiq53W4NGDBAgYGBys3N9ezbt2+fnE6nkpKSTOwQAAD4C1OvyGRmZmrUqFGKiYnRsWPHtGrVKm3atEkbNmyQw+HQ5MmTlZGRobCwMIWGhmrq1KlKSkriE0sAAECSyUHm6NGjuuOOO1RUVCSHw6E+ffpow4YNuvbaayVJCxYsUEBAgFJTU+V2u5WSkqKXXnrJzJYBAIAfMTXILFu27IL7g4ODlZ2drezs7CbqCAAAWInfzZEBAACoK4IMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwLIIMAACwrNZmNwAALY3T6VRpaWmD63Ts2FExMTE+6AiwLlODTFZWlt544w3t3btXISEhGjp0qJ555hnFx8d7jjl16pQeeOABrV69Wm63WykpKXrppZcUHh5uYucA4B2n06mEhERVVJxscK2QkDbau7eQMIMWzdQgk5eXp/T0dA0cOFBnz57Vo48+quuuu0579uzRRRddJEmaPn263n77ba1Zs0YOh0NTpkzR+PHjtXnzZjNbBwCvlJaWqqLipAZPmq3QyFiv67iKDmnb8jkqLS0lyKBFMzXIrF+/vtr6ihUr1LlzZxUUFOiXv/ylysvLtWzZMq1atUojR46UJOXk5CgxMVFbt27VkCFDzGgbABosNDJWYTHxP38ggAvyq8m+5eXlkqSwsDBJUkFBgc6cOaPk5GTPMQkJCYqJiVF+fn6NNdxut1wuV7UFAAA0T34TZKqqqjRt2jQNGzZMl112mSSpuLhYQUFBat++fbVjw8PDVVxcXGOdrKwsORwOzxIdHd3YrQMAAJP4TZBJT0/X7t27tXr16gbVyczMVHl5uWc5fPiwjzoEAAD+xi8+fj1lyhT97W9/04cffqguXbp4tkdEROj06dMqKyurdlWmpKREERERNday2+2y2+2N3TIAAPADpgYZwzA0depUrV27Vps2bVJcXFy1/QMGDFBgYKByc3OVmpoqSdq3b5+cTqeSkpLMaNnv+eL+FIWFhT7qBgCAxmVqkElPT9eqVav017/+Ve3atfPMe3E4HAoJCZHD4dDkyZOVkZGhsLAwhYaGaurUqUpKSuITSzXw5f0pJOmM+7RP6gAA0FhMDTKLFy+WJI0YMaLa9pycHN15552SpAULFiggIECpqanVboiH8/nq/hRFu/K1+82lOnv2rO+aAwCgEZj+1tLPCQ4OVnZ2trKzs5ugo+ahofencBUd8l0zAAA0Ir/51BIAAEB9EWQAAIBlEWQAAIBlEWQAAIBlEWQAAIBl+cWdfYGWihsYAkDDEGQAk3ADQwBoOIIMYBJuYAgADUeQAUzGDQwBwHsEGcBLDZ2bwtwWAGg4ggxQTxXl30myaeLEiT6px9wWAPAeQQaopzMnj0ky1O/WR9QpLsHrOsxtAYCGI8gAXmrbOYa5LQBgMoIMgGbPF/frkaSOHTsqJibGBx0B8BWCDIBmzZf36wkJaaO9ewsJM4AfIcgAaNZ8db8eV9EhbVs+R6WlpQQZwI8QZAC0CA29Xw8A/8SXRgIAAMviiowf8NVERG6wBgBoaQgyJvP1FwdK3GANANByEGRM5quJiBI3WAMAtDwEGT/hi4mI3GANANDSMNkXAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFkEGAABYFnf2BVCNL758tGPHjoqJifFBNwBwYaYGmQ8//FDPPfecCgoKVFRUpLVr12rcuHGe/YZhaPbs2XrllVdUVlamYcOGafHixerRo4d5TQPNVEX5d5JsmjhxYoNrhYS00d69hYQZAI3O1CBz4sQJ9e3bV5MmTdL48ePP2//ss8/qhRde0B/+8AfFxcVp5syZSklJ0Z49exQcHGxCx0DzdebkMUmG+t36iDrFJXhdx1V0SNuWz1FpaSlBBkCjMzXIjBo1SqNGjapxn2EYWrhwoR577DGNHTtWkrRy5UqFh4dr3bp1mjBhQlO2CrQYbTvHNPgLTAGgqfjtHJmDBw+quLhYycnJnm0Oh0ODBw9Wfn5+rUHG7XbL7XZ71l0uV6P3CqBxOJ1OlZaWNqiGL+b8APBffhtkiouLJUnh4eHVtoeHh3v21SQrK0tz5sxp1N4AND6n06mEhERVVJz0Sb0z7tM+qQPAv/htkPFWZmamMjIyPOsul0vR0dEmdgTAG6WlpaqoOKnBk2YrNDLW6zpFu/K1+82lOnv2rO+aA+A3/DbIRERESJJKSkoUGRnp2V5SUqJ+/frV+ji73S673d7Y7QFoIqGRsQ2as+MqOuS7ZgD4Hb+9IV5cXJwiIiKUm5vr2eZyubRt2zYlJSWZ2BkAAPAXpl6ROX78uA4cOOBZP3jwoHbu3KmwsDDFxMRo2rRpmjdvnnr06OH5+HVUVFS1e80AAICWy9Qgs2PHDl199dWe9XNzW9LS0rRixQo9/PDDOnHihO655x6VlZVp+PDhWr9+PfeQAQAAkkwOMiNGjJBhGLXut9lsmjt3rubOnduEXQEAAKvw2zkyAAAAP4cgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALIsgAwAALKu12Q0AgJUUFhaa+ngA1RFkAKAOKsq/k2TTxIkTfVLvjPu0T+oALR1BBgDq4MzJY5IM9bv1EXWKS/C6TtGufO1+c6nOnj3ru+aAFowgAwD10LZzjMJi4r1+vKvokO+aAUCQaQin06nS0tIG1eD9cgAAvEeQ8ZLT6VRCQqIqKk76pB7vlwMAUH8EGS+VlpaqouKkBk+ardDIWK/r8H45AADeI8g0UGhkLO+XAwBgEoIMAAAtkC/meUpSx44dFRMT44OOvEOQAQCghfHlPM+QkDbau7fQtDBjiSCTnZ2t5557TsXFxerbt68WLVqkQYMGmd0WAACW5Kt5nq6iQ9q2fI5KS0sJMrX585//rIyMDC1ZskSDBw/WwoULlZKSon379qlz585mtwcAgGU1dJ6nP/D7L42cP3++7r77bt11113q1auXlixZojZt2mj58uVmtwYAAEzm11dkTp8+rYKCAmVmZnq2BQQEKDk5Wfn5+TU+xu12y+12e9bLy8slSS6Xy6e9HT9+XJL0/Vf7dNZd4XUdV9FXkqTyb/YrsLWtQT35qhZ1qNOgOsVOSVJBQYHn98Qb+/btk+Q/v2N+V8dH4yz9+3W1qqqqQTWoY606Pvv9+v/n4fHjx33+d/ZcPcMwLnyg4ce++eYbQ5KxZcuWatsfeughY9CgQTU+Zvbs2YYkFhYWFhYWlmawHD58+IJZwa+vyHgjMzNTGRkZnvWqqip9//336tChg2y2hl3xqCuXy6Xo6GgdPnxYoaGhTfIzrYBxqR1jUzPGpXaMTc0Yl9pZbWwMw9CxY8cUFRV1weP8Osh07NhRrVq1UklJSbXtJSUlioiIqPExdrtddru92rb27ds3VosXFBoaaomTpakxLrVjbGrGuNSOsakZ41I7K42Nw+H42WP8erJvUFCQBgwYoNzcXM+2qqoq5ebmKikpycTOAACAP/DrKzKSlJGRobS0NF155ZUaNGiQFi5cqBMnTuiuu+4yuzUAAGAyvw8yN998s7799lvNmjVLxcXF6tevn9avX6/w8HCzW6uV3W7X7Nmzz3uLq6VjXGrH2NSMcakdY1MzxqV2zXVsbIbxc59rAgAA8E9+PUcGAADgQggyAADAsggyAADAsggyAADAsggyXsrOzlZsbKyCg4M1ePBgffLJJ7Ueu2LFCtlstmpLcHBwE3bbND788EONGTNGUVFRstlsWrdu3c8+ZtOmTbriiitkt9t16aWXasWKFY3eZ1Or77hs2rTpvPPFZrOpuLi4aRpuIllZWRo4cKDatWunzp07a9y4cZ7vf7mQNWvWKCEhQcHBwbr88sv197//vQm6bVrejE1LeJ1ZvHix+vTp47mhW1JSkt55550LPqYlnC9S/cemOZ0vBBkv/PnPf1ZGRoZmz56tTz/9VH379lVKSoqOHj1a62NCQ0NVVFTkWb766qsm7LhpnDhxQn379lV2dnadjj948KBGjx6tq6++Wjt37tS0adP0m9/8Rhs2bGjkTptWfcflnH379lU7Zzp37txIHZojLy9P6enp2rp1q9577z2dOXNG1113nU6cOFHrY7Zs2aJbbrlFkydP1meffaZx48Zp3Lhx2r17dxN23vi8GRup+b/OdOnSRU8//bQKCgq0Y8cOjRw5UmPHjtUXX3xR4/Et5XyR6j82UjM6X3zz9Y4ty6BBg4z09HTPemVlpREVFWVkZWXVeHxOTo7hcDiaqDv/IMlYu3btBY95+OGHjd69e1fbdvPNNxspKSmN2Jm56jIuH3zwgSHJ+OGHH5qkJ39x9OhRQ5KRl5dX6zE33XSTMXr06GrbBg8ebNx7772N3Z6p6jI2LfF1xjAM4+KLLzb+53/+p8Z9LfV8OedCY9OczheuyNTT6dOnVVBQoOTkZM+2gIAAJScnKz8/v9bHHT9+XF27dlV0dPTPpuSWIj8/v9o4SlJKSsoFx7El6devnyIjI3Xttddq8+bNZrfT6MrLyyVJYWFhtR7TUs+ZuoyN1LJeZyorK7V69WqdOHGi1q+saannS13GRmo+5wtBpp5KS0tVWVl53p2Fw8PDa53DEB8fr+XLl+uvf/2rXn31VVVVVWno0KH6+uuvm6Jlv1VcXFzjOLpcLlVUVJjUlfkiIyO1ZMkSvf7663r99dcVHR2tESNG6NNPPzW7tUZTVVWladOmadiwYbrssstqPa62c6a5zR/6sbqOTUt5ndm1a5fatm0ru92u3/72t1q7dq169epV47Et7Xypz9g0p/PF77+ioDlISkqqloqHDh2qxMREvfzyy3riiSdM7Az+KD4+XvHx8Z71oUOH6ssvv9SCBQv0xz/+0cTOGk96erp2796tjz/+2OxW/E5dx6alvM7Ex8dr586dKi8v11/+8helpaUpLy+v1j/YLUl9xqY5nS8EmXrq2LGjWrVqpZKSkmrbS0pKFBERUacagYGB6t+/vw4cONAYLVpGREREjeMYGhqqkJAQk7ryT4MGDWq2f+SnTJmiv/3tb/rwww/VpUuXCx5b2zlT1989q6nP2PxUc32dCQoK0qWXXipJGjBggLZv367f//73evnll887tqWdL/UZm5+y8vnCW0v1FBQUpAEDBig3N9ezraqqSrm5uRd8L/LHKisrtWvXLkVGRjZWm5aQlJRUbRwl6b333qvzOLYkO3fubHbni2EYmjJlitauXav3339fcXFxP/uYlnLOeDM2P9VSXmeqqqrkdrtr3NdSzpfaXGhsfsrS54vZs42taPXq1YbdbjdWrFhh7Nmzx7jnnnuM9u3bG8XFxYZhGMbtt99uzJgxw3P8nDlzjA0bNhhffvmlUVBQYEyYMMEIDg42vvjiC7OeQqM4duyY8dlnnxmfffaZIcmYP3++8dlnnxlfffWVYRiGMWPGDOP222/3HP+vf/3LaNOmjfHQQw8ZhYWFRnZ2ttGqVStj/fr1Zj2FRlHfcVmwYIGxbt06Y//+/cauXbuM+++/3wgICDA2btxo1lNoFPfdd5/hcDiMTZs2GUVFRZ7l5MmTnmN++ru0efNmo3Xr1sZ///d/G4WFhcbs2bONwMBAY9euXWY8hUbjzdi0hNeZGTNmGHl5ecbBgweNzz//3JgxY4Zhs9mMd9991zCMlnu+GEb9x6Y5nS8EGS8tWrTIiImJMYKCgoxBgwYZW7du9ey76qqrjLS0NM/6tGnTPMeGh4cbv/rVr4xPP/3UhK4b17mPDf90OTcWaWlpxlVXXXXeY/r162cEBQUZ3bp1M3Jycpq878ZW33F55plnjO7duxvBwcFGWFiYMWLECOP99983p/lGVNOYSKp2Dvz0d8kwDOO1114zevbsaQQFBRm9e/c23n777aZtvAl4MzYt4XVm0qRJRteuXY2goCCjU6dOxjXXXOP5Q20YLfd8MYz6j01zOl9shmEYTXf9BwAAwHeYIwMAACyLIAMAACyLIAMAACyLIAMAACyLIAMAACyLIAMAACyLIAMAACyLIAMAAOrtww8/1JgxYxQVFSWbzaZ169bVu8aGDRs0ZMgQtWvXTp06dVJqaqoOHTpUrxoEGQB+KzY2VgsXLjS7DQA1OHHihPr27avs7GyvHn/w4EGNHTtWI0eO1M6dO7VhwwaVlpZq/Pjx9apDkAHQKMaMGaPrr7++xn0fffSRbDabPv/88ybuCoCvjBo1SvPmzdN//Md/1Ljf7XbrwQcf1CWXXKKLLrpIgwcP1qZNmzz7CwoKVFlZqXnz5ql79+664oor9OCDD2rnzp06c+ZMnfsgyABoFJMnT9Z7772nr7/++rx9OTk5uvLKK9WnTx8TOgPQFKZMmaL8/HytXr1an3/+uW688UZdf/312r9/vyRpwIABCggIUE5OjiorK1VeXq4//vGPSk5OVmBgYJ1/DkEGQKO44YYb1KlTJ61YsaLa9uPHj2vNmjWaPHmyXn/9dfXu3Vt2u12xsbF6/vnna6136NAh2Ww27dy507OtrKxMNpvN87+8TZs2yWazacOGDerfv79CQkI0cuRIHT16VO+8844SExMVGhqqW2+9VSdPnvTUqaqqUlZWluLi4hQSEqK+ffvqL3/5iy+HA2hRnE6ncnJytGbNGv3iF79Q9+7d9eCDD2r48OHKycmRJMXFxendd9/Vo48+Krvdrvbt2+vrr7/Wa6+9Vq+fRZAB0Chat26tO+64QytWrNCPv5t2zZo1qqysVGJiom666SZNmDBBu3bt0uOPP66ZM2eeF3y88fjjj+vFF1/Uli1bdPjwYd10001auHChVq1apbffflvvvvuuFi1a5Dk+KytLK1eu1JIlS/TFF19o+vTpmjhxovLy8hrcC9AS7dq1S5WVlerZs6fatm3rWfLy8vTll19KkoqLi3X33XcrLS1N27dvV15enoKCgvTrX/9a9fk+69aN9SQAYNKkSXruueeUl5enESNGSPr320qpqalaunSprrnmGs2cOVOS1LNnT+3Zs0fPPfec7rzzzgb93Hnz5mnYsGGS/v0WV2Zmpr788kt169ZNkvTrX/9aH3zwgR555BG53W499dRT2rhxo5KSkiRJ3bp108cff6yXX35ZV111VYN6AVqi48ePq1WrViooKFCrVq2q7Wvbtq0kKTs7Ww6HQ88++6xn36uvvqro6Ght27ZNQ4YMqdPP4ooMgEaTkJCgoUOHavny5ZKkAwcO6KOPPtLkyZNVWFjoCRvnDBs2TPv371dlZWWDfu6P596Eh4erTZs2nhBzbtvRo0c9PZ08eVLXXntttf85rly50vM/RwD1079/f1VWVuro0aO69NJLqy0RERGSpJMnTyogoHoMORd6qqqq6vyzuCIDoFFNnjxZU6dOVXZ2tnJyctS9e3evrnKce8H78SXn2j7Z8OOJgjab7byJgzabzfNCefz4cUnS22+/rUsuuaTacXa7vd59Ai3F8ePHdeDAAc/6wYMHtXPnToWFhalnz5667bbbdMcdd+j5559X//799e233yo3N1d9+vTR6NGjNXr0aC1YsEBz587VLbfcomPHjunRRx9V165d1b9//zr3wRUZAI3qpptuUkBAgFatWqWVK1dq0qRJstlsSkxM1ObNm6sdu3nzZvXs2fO8S9GS1KlTJ0lSUVGRZ9uPJ/56q1evXrLb7XI6nef9zzE6OrrB9YHmaseOHerfv78ndGRkZKh///6aNWuWpH+/jXzHHXfogQceUHx8vMaNG6ft27crJiZGkjRy5EitWrVK69atU//+/XX99dfLbrdr/fr1CgkJqXMfXJEB0Kjatm2rm2++WZmZmXK5XJ75Lw888IAGDhyoJ554QjfffLPy8/P14osv6qWXXqqxTkhIiIYMGaKnn35acXFxOnr0qB577LEG99euXTs9+OCDmj59uqqqqjR8+HCVl5dr8+bNCg0NVVpaWoN/BtAcjRgx4oKTcgMDAzVnzhzNmTOn1mMmTJigCRMmNKgPrsgAaHSTJ0/WDz/8oJSUFEVFRUmSrrjiCr322mtavXq1LrvsMs2aNUtz58694ETf5cuX6+zZsxowYICmTZumefPm+aS/J554QjNnzlRWVpYSExN1/fXX6+2331ZcXJxP6gNoPDajPp9xAgAA8CNckQEAAJZFkAEAAJZFkAEAAJZFkAEAAJZFkAEAAJZFkAEAAJZFkAEAAJZFkAEAAJZFkAEAAJZFkAEAAJZFkAEAAJb1/wAv87QoMqNXVgAAAABJRU5ErkJggg==\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Observations:**\n", + "* In a large portion of the time considered, 80 to 175 million shares of the stock were traded, with occasional days where the volume rose to more than 200 million." + ], + "metadata": { + "id": "gNzyLLIgfnz6" + }, + "id": "gNzyLLIgfnz6" + }, + { + "cell_type": "markdown", + "id": "9GVt_AAbe29X", + "metadata": { + "id": "9GVt_AAbe29X" + }, + "source": [ + "#### **Histogram and statistical summary on News Length**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0kwZSJvwOUpa", + "metadata": { + "id": "0kwZSJvwOUpa", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "9c9e9158-a8e5-4f51-9321-d54fb2ed6675" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "count 418.000000\n", + "mean 525.662679\n", + "std 303.584080\n", + "min 44.000000\n", + "25% 304.250000\n", + "50% 480.000000\n", + "75% 700.500000\n", + "max 2142.000000\n", + "Name: news_len, dtype: float64" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
news_len
count418.000000
mean525.662679
std303.584080
min44.000000
25%304.250000
50%480.000000
75%700.500000
max2142.000000
\n", + "

" + ] + }, + "metadata": {}, + "execution_count": 14 + } + ], + "source": [ + "#Calculating the total number of words present in the news content.\n", + "stock['news_len'] = stock['News'].apply(lambda x: len(x.split(' ')))\n", + "stock['news_len'].describe()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "NWn03B4Xey5d", + "metadata": { + "id": "NWn03B4Xey5d", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "abfb1de2-8239-4a5d-d9e3-71b30ad328ef" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjIAAAGxCAYAAAB4AFyyAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAALFdJREFUeJzt3X1U1HXe//HXILeGAyJyYzGKLQFaat4Rm90ZiV7V0UuvzUp2rbW6tgtt1d3qcLYyPXtlWWteFem2V2qdK7O8zun2KltDwSykoqwsYLXFxZTBRYMB5U75/v5ond9O3qQ0w3c+8HycM+c43++XD+/xm/hs5uuMw7IsSwAAAAYKsXsAAACAriJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABgr1O4BAq2zs1P79+9Xv3795HA47B4HAACcAcuy1NTUpEGDBikk5NTPu/T4kNm/f79SUlLsHgMAAHTB3r17dd55551yf48PmX79+kn67jfC6XTaPA0AADgTHo9HKSkp3r/HT6XHh8zxl5OcTichAwCAYX7oshAu9gUAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLFC7R4A3a+mpkb19fV+Xzc+Pl4ul8vv6wIAcCqETC9TU1OjjIxMtbQc8fvaUVF9VVlZQcwAALoNIdPL1NfXq6XliLJ+uUjO5CF+W9dTu0dlqxervr6ekAEAdBtCppdyJg9RnCvd7jEAAPhRuNgXAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLFsD5l9+/YpLy9PAwYMUFRUlC666CJ9/PHH3v2WZemBBx5QcnKyoqKilJOTo127dtk4MQAACBa2hsy3336rSy+9VGFhYXr77bf11Vdf6Q9/+IP69+/vPWbZsmV64okntGrVKpWVlemcc85Rbm6uWltbbZwcAAAEg1A7v/kjjzyilJQUrVmzxrstNTXV+2vLsrRixQrdd999mjp1qiTp+eefV2Jiol599VXdeOON3T4zAAAIHrY+I/P6669r7Nix+tnPfqaEhARdfPHF+tOf/uTdX11dLbfbrZycHO+2mJgYZWVlqbS01I6RAQBAELE1ZP76179q5cqVSktL0zvvvKM777xTd911l5577jlJktvtliQlJib6fF1iYqJ33/e1tbXJ4/H43AAAQM9k60tLnZ2dGjt2rB566CFJ0sUXX6ydO3dq1apVmj17dpfWXLp0qRYvXuzPMQEAQJCy9RmZ5ORkDRs2zGdbZmamampqJElJSUmSpLq6Op9j6urqvPu+r6CgQI2Njd7b3r17AzA5AAAIBraGzKWXXqqqqiqfbX/5y180ePBgSd9d+JuUlKSioiLvfo/Ho7KyMmVnZ590zYiICDmdTp8bAADomWx9aWnBggX66U9/qoceekg33HCDPvzwQz3zzDN65plnJEkOh0Pz58/X73//e6WlpSk1NVX333+/Bg0apGnTptk5OgAACAK2hsy4ceP0yiuvqKCgQEuWLFFqaqpWrFihWbNmeY+55557dPjwYd1xxx1qaGjQhAkTtHHjRkVGRto4OQAACAa2howkXXfddbruuutOud/hcGjJkiVasmRJN04FAABMYPtHFAAAAHQVIQMAAIxFyAAAAGPZfo0MepaKioqArBsfHy+XyxWQtQEA5iJk4BctjQclOZSXlxeQ9aOi+qqysoKYAQD4IGTgFx1HmiRZGnXzvRqYmuHXtT21e1S2erHq6+sJGQCAD0IGfhWd4FKcK93uMQAAvQQX+wIAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGPx6dcwRkVFhd/XjI+Pl8vl8vu6AIDuQcgg6LU0HpTkUF5ent/Xjorqq8rKCmIGAAxFyCDodRxpkmRp1M33amBqht/W9dTuUdnqxaqvrydkAMBQhAyMEZ3gUpwr3e4xAABBhIt9AQCAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABjL1pB58MEH5XA4fG4ZGRne/a2trcrPz9eAAQMUHR2tGTNmqK6uzsaJAQBAMLH9GZnhw4ertrbWe9u2bZt334IFC/TGG29ow4YNKikp0f79+zV9+nQbpwUAAMEk1PYBQkOVlJR0wvbGxkY9++yzWrdunSZOnChJWrNmjTIzM7V9+3Zdcskl3T0qAAAIMrY/I7Nr1y4NGjRIQ4cO1axZs1RTUyNJKi8vV0dHh3JycrzHZmRkyOVyqbS01K5xAQBAELH1GZmsrCytXbtW6enpqq2t1eLFi3XZZZdp586dcrvdCg8PV2xsrM/XJCYmyu12n3LNtrY2tbW1ee97PJ5AjQ8AAGxma8hMmTLF++sRI0YoKytLgwcP1ssvv6yoqKgurbl06VItXrzYXyMCAIAgZvtLS/8sNjZWF1xwgXbv3q2kpCS1t7eroaHB55i6urqTXlNzXEFBgRobG723vXv3BnhqAABgl6AKmebmZn399ddKTk7WmDFjFBYWpqKiIu/+qqoq1dTUKDs7+5RrREREyOl0+twAAEDPZOtLS7/97W91/fXXa/Dgwdq/f78WLVqkPn366KabblJMTIzmzJmjhQsXKi4uTk6nU/PmzVN2djb/YgkAAEiyOWS++eYb3XTTTTp48KAGDhyoCRMmaPv27Ro4cKAk6fHHH1dISIhmzJihtrY25ebm6umnn7ZzZAAAEERsDZn169efdn9kZKQKCwtVWFjYTRMBAACTBNU1MgAAAGeDkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgrFC7B8DJ1dTUqL6+3u/rVlRU+H1NAADsQsgEoZqaGmVkZKql5UjAvkdHW3vA1gYAoLsQMkGovr5eLS1HlPXLRXImD/Hr2rVflGrn68/o6NGjfl0XAAA7EDJBzJk8RHGudL+u6and49f1AACwExf7AgAAY/GMDHq9QF0AHR8fL5fLFZC1AQDfIWTQa7U0HpTkUF5eXkDWj4rqq8rKCmIGAAKIkEGv1XGkSZKlUTffq4GpGX5d21O7R2WrF6u+vp6QAYAAImTQ60UnuPx+UTUAoHtwsS8AADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIwVNCHz8MMPy+FwaP78+d5tra2tys/P14ABAxQdHa0ZM2aorq7OviEBAEBQCYqQ+eijj/THP/5RI0aM8Nm+YMECvfHGG9qwYYNKSkq0f/9+TZ8+3aYpAQBAsLE9ZJqbmzVr1iz96U9/Uv/+/b3bGxsb9eyzz2r58uWaOHGixowZozVr1uiDDz7Q9u3bbZwYAAAEC9tDJj8/X9dee61ycnJ8tpeXl6ujo8Nne0ZGhlwul0pLS7t7TAAAEIRC7fzm69ev1yeffKKPPvrohH1ut1vh4eGKjY312Z6YmCi3233KNdva2tTW1ua97/F4/DYvAAAILrY9I7N37179+te/1gsvvKDIyEi/rbt06VLFxMR4bykpKX5bGwAABBfbQqa8vFwHDhzQ6NGjFRoaqtDQUJWUlOiJJ55QaGioEhMT1d7eroaGBp+vq6urU1JS0inXLSgoUGNjo/e2d+/eAD8SAABgF9teWrr66qv1xRdf+Gy79dZblZGRoXvvvVcpKSkKCwtTUVGRZsyYIUmqqqpSTU2NsrOzT7luRESEIiIiAjo7AAAIDraFTL9+/XThhRf6bDvnnHM0YMAA7/Y5c+Zo4cKFiouLk9Pp1Lx585Sdna1LLrnEjpEBAECQsfVi3x/y+OOPKyQkRDNmzFBbW5tyc3P19NNP2z0WAAAIEkEVMsXFxT73IyMjVVhYqMLCQnsGAgAAQc3295EBAADoKkIGAAAYi5ABAADGImQAAICxCBkAAGAsQgYAABiLkAEAAMbqUsgMHTpUBw8ePGF7Q0ODhg4d+qOHAgAAOBNdCpk9e/bo2LFjJ2xva2vTvn37fvRQAAAAZ+Ks3tn39ddf9/76nXfeUUxMjPf+sWPHVFRUpCFDhvhtOAAAgNM5q5CZNm2aJMnhcGj27Nk++8LCwjRkyBD94Q9/8NtwAAAAp3NWIdPZ2SlJSk1N1UcffaT4+PiADAUAAHAmuvShkdXV1f6eAwAA4Kx1+dOvi4qKVFRUpAMHDnifqTlu9erVP3owAACAH9KlkFm8eLGWLFmisWPHKjk5WQ6Hw99zAQAA/KAuhcyqVau0du1a/fznP/f3PECPUlFR4fc14+Pj5XK5/L4uAJioSyHT3t6un/70p/6eBegxWhoPSnIoLy/P72tHRfVVZWUFMQMA6mLI3HbbbVq3bp3uv/9+f88D9AgdR5okWRp1870amJrht3U9tXtUtnqx6uvrCRkAUBdDprW1Vc8884zeffddjRgxQmFhYT77ly9f7pfhANNFJ7gU50q3ewwA6LG6FDKff/65Ro0aJUnauXOnzz4u/AUAAN2lSyGzZcsWf88BAABw1rr0oZEAAADBoEvPyFx11VWnfQlp8+bNXR4IAADgTHUpZI5fH3NcR0eHduzYoZ07d57wYZIAAACB0qWQefzxx0+6/cEHH1Rzc/OPGggAAOBMdfmzlk4mLy9P48eP12OPPebPZYNWTU2N6uvr/b5uIN4NFgCAnsivIVNaWqrIyEh/Lhm0ampqlJGRqZaWIwH7Hh1t7QFbGwCAnqBLITN9+nSf+5Zlqba2Vh9//HGvebff+vp6tbQcUdYvF8mZPMSva9d+Uaqdrz+jo0eP+nVdAAB6mi6FTExMjM/9kJAQpaena8mSJZo0aZJfBjOFM3mI39+51VO7x6/rAQDQU3UpZNasWePvOQAAAM7aj7pGpry83Hth6vDhw3XxxRf7ZSgAAIAz0aWQOXDggG688UYVFxcrNjZWktTQ0KCrrrpK69ev18CBA/05IwAAwEl16SMK5s2bp6amJn355Zc6dOiQDh06pJ07d8rj8eiuu+7y94wAAAAn1aVnZDZu3Kh3331XmZmZ3m3Dhg1TYWFhr7vYFwAA2KdLz8h0dnYqLCzshO1hYWHq7Oz80UMBAACciS6FzMSJE/XrX/9a+/fv927bt2+fFixYoKuvvtpvwwEAAJxOl0Lmqaeeksfj0ZAhQ3T++efr/PPPV2pqqjwej5588kl/zwgAAHBSXbpGJiUlRZ988oneffddVVZWSpIyMzOVk5Pj1+EAAABO56yekdm8ebOGDRsmj8cjh8Oha665RvPmzdO8efM0btw4DR8+XO+9916gZgUAAPBxViGzYsUK3X777XI6nSfsi4mJ0b//+79r+fLlfhsOAADgdM4qZD777DNNnjz5lPsnTZqk8vLyHz0UAADAmTirkKmrqzvpP7s+LjQ0VH//+99/9FAAAABn4qxC5txzz9XOnTtPuf/zzz9XcnLyjx4KAADgTJxVyPzLv/yL7r//frW2tp6wr6WlRYsWLdJ1113nt+EAAABO56xC5r777tOhQ4d0wQUXaNmyZXrttdf02muv6ZFHHlF6eroOHTqk3/3ud2e83sqVKzVixAg5nU45nU5lZ2fr7bff9u5vbW1Vfn6+BgwYoOjoaM2YMUN1dXVnMzIAAOjBzup9ZBITE/XBBx/ozjvvVEFBgSzLkiQ5HA7l5uaqsLBQiYmJZ7zeeeedp4cfflhpaWmyLEvPPfecpk6dqk8//VTDhw/XggUL9H//93/asGGDYmJiNHfuXE2fPl3vv//+2T1KAADQI531G+INHjxYb731lr799lvt3r1blmUpLS1N/fv3P+tvfv311/vc/8///E+tXLlS27dv13nnnadnn31W69at08SJEyVJa9asUWZmprZv365LLrnkrL8fAADoWbr0zr6S1L9/f40bN85vgxw7dkwbNmzQ4cOHlZ2drfLycnV0dPi8W3BGRoZcLpdKS0tPGTJtbW1qa2vz3vd4PH6bEQAABJcufdaSP33xxReKjo5WRESEfvWrX+mVV17RsGHD5Ha7FR4ertjYWJ/jExMT5Xa7T7ne0qVLFRMT472lpKQE+BEAAAC72B4y6enp2rFjh8rKynTnnXdq9uzZ+uqrr7q8XkFBgRobG723vXv3+nFaAAAQTLr80pK/hIeH6yc/+YkkacyYMfroo4/0X//1X5o5c6ba29vV0NDg86xMXV2dkpKSTrleRESEIiIiAj02AAAIArY/I/N9nZ2damtr05gxYxQWFqaioiLvvqqqKtXU1Cg7O9vGCQEAQLCw9RmZgoICTZkyRS6XS01NTVq3bp2Ki4v1zjvvKCYmRnPmzNHChQsVFxcnp9OpefPmKTs7m3+xBAAAJNkcMgcOHNAvfvEL1dbWKiYmRiNGjNA777yja665RpL0+OOPKyQkRDNmzFBbW5tyc3P19NNP2zkyAAAIIraGzLPPPnva/ZGRkSosLFRhYWE3TQQAAEwSdNfIAAAAnClCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICxbP/QSABnr6KiIiDrxsfHy+VyBWRtAAgEQgYwSEvjQUkO5eXlBWT9qKi+qqysIGYAGIOQAQzScaRJkqVRN9+rgakZfl3bU7tHZasXq76+npABYAxCBjBQdIJLca50u8cAANtxsS8AADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAY9kaMkuXLtW4cePUr18/JSQkaNq0aaqqqvI5prW1Vfn5+RowYICio6M1Y8YM1dXV2TQxAAAIJraGTElJifLz87V9+3Zt2rRJHR0dmjRpkg4fPuw9ZsGCBXrjjTe0YcMGlZSUaP/+/Zo+fbqNUwMAgGARauc337hxo8/9tWvXKiEhQeXl5br88svV2NioZ599VuvWrdPEiRMlSWvWrFFmZqa2b9+uSy65xI6xAQBAkAiqa2QaGxslSXFxcZKk8vJydXR0KCcnx3tMRkaGXC6XSktLT7pGW1ubPB6Pzw0AAPRMQRMynZ2dmj9/vi699FJdeOGFkiS3263w8HDFxsb6HJuYmCi3233SdZYuXaqYmBjvLSUlJdCjAwAAmwRNyOTn52vnzp1av379j1qnoKBAjY2N3tvevXv9NCEAAAg2tl4jc9zcuXP15ptvauvWrTrvvPO825OSktTe3q6GhgafZ2Xq6uqUlJR00rUiIiIUERER6JEBAEAQsPUZGcuyNHfuXL3yyivavHmzUlNTffaPGTNGYWFhKioq8m6rqqpSTU2NsrOzu3tcAAAQZGx9RiY/P1/r1q3Ta6+9pn79+nmve4mJiVFUVJRiYmI0Z84cLVy4UHFxcXI6nZo3b56ys7P5F0sAAMDekFm5cqUk6corr/TZvmbNGt1yyy2SpMcff1whISGaMWOG2tralJubq6effrqbJwUAAMHI1pCxLOsHj4mMjFRhYaEKCwu7YSIAAGCSoPlXSwAAAGeLkAEAAMYiZAAAgLGC4n1kAASPiooKv68ZHx8vl8vl93UBgJABIElqaTwoyaG8vDy/rx0V1VeVlRXEDAC/I2QASJI6jjRJsjTq5ns1MDXDb+t6aveobPVi1dfXEzIA/I6QAeAjOsGlOFe63WMAwBnhYl8AAGAsQgYAABiLkAEAAMYiZAAAgLEIGQAAYCxCBgAAGIuQAQAAxiJkAACAsQgZAABgLEIGAAAYi5ABAADGImQAAICx+NBIAN2ioqIiIOvGx8fzqdpAL0bIAAiolsaDkhzKy8sLyPpRUX1VWVlBzAC9FCEDIKA6jjRJsjTq5ns1MDXDr2t7aveobPVi1dfXEzJAL0XIAOgW0QkuxbnS7R4DQA/Dxb4AAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAY9kaMlu3btX111+vQYMGyeFw6NVXX/XZb1mWHnjgASUnJysqKko5OTnatWuXPcMCAICgY2vIHD58WCNHjlRhYeFJ9y9btkxPPPGEVq1apbKyMp1zzjnKzc1Va2trN08KAACCUaid33zKlCmaMmXKSfdZlqUVK1bovvvu09SpUyVJzz//vBITE/Xqq6/qxhtv7M5RAQBAELI1ZE6nurpabrdbOTk53m0xMTHKyspSaWnpKUOmra1NbW1t3vsejyfgswKwV0VFhd/XjI+Pl8vl8vu6APwraEPG7XZLkhITE322JyYmevedzNKlS7V48eKAzgYgOLQ0HpTkUF5ent/Xjorqq8rKCmIGCHJBGzJdVVBQoIULF3rvezwepaSk2DgRgEDpONIkydKom+/VwNQMv63rqd2jstWLVV9fT8gAQS5oQyYpKUmSVFdXp+TkZO/2uro6jRo16pRfFxERoYiIiECPByCIRCe4FOdKt3sMADYI2veRSU1NVVJSkoqKirzbPB6PysrKlJ2dbeNkAAAgWNj6jExzc7N2797tvV9dXa0dO3YoLi5OLpdL8+fP1+9//3ulpaUpNTVV999/vwYNGqRp06bZNzQAAAgatobMxx9/rKuuusp7//i1LbNnz9batWt1zz336PDhw7rjjjvU0NCgCRMmaOPGjYqMjLRrZAAAEERsDZkrr7xSlmWdcr/D4dCSJUu0ZMmSbpwKAACYImivkQEAAPghhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAY4XaPQAABKuKioqArNvW1qaIiIiArB0fHy+XyxWQtYFgRMgAwPe0NB6U5FBeXl5gvoHDIVlWQJaOiuqrysoKYga9BiEDAN/TcaRJkqVRN9+rgakZfl279otS7Xz9mYCs7ando7LVi1VfX0/IoNcgZADgFKITXIpzpft1TU/tnoCtDfRGXOwLAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhAwAAjEXIAAAAYxEyAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIwVavcAAAAg8GpqalRfX+/3dePj4+Vyufy+7pkiZAAA6OFqamqUkZGplpYjfl87KqqvKisrbIsZI0KmsLBQjz76qNxut0aOHKknn3xS48ePt3ssAACMUF9fr5aWI8r65SI5k4f4bV1P7R6VrV6s+vp6QuZUXnrpJS1cuFCrVq1SVlaWVqxYodzcXFVVVSkhIcHu8QAAMIYzeYjiXOl2j+FXQX+x7/Lly3X77bfr1ltv1bBhw7Rq1Sr17dtXq1evtns0AABgs6AOmfb2dpWXlysnJ8e7LSQkRDk5OSotLbVxMgAAEAyC+qWl+vp6HTt2TImJiT7bExMTVVlZedKvaWtrU1tbm/d+Y2OjJMnj8fh1tubmZknSob9V6Whbi1/X9tT+TZLUuG+XwkIdRqzNzN2zNjN3z9omzixJHneNJKm8vNz7M8pfQkJC1NnZ6dc1A702M/9/VVVVkvz/d9bx/+aam5v9/vfs8fUsyzr9gVYQ27dvnyXJ+uCDD3y233333db48eNP+jWLFi2yJHHjxo0bN27cesBt7969p22FoH5GJj4+Xn369FFdXZ3P9rq6OiUlJZ30awoKCrRw4ULv/c7OTh06dEgDBgyQw3Fm//fj8XiUkpKivXv3yul0dv0BIOA4V2bhfJmDc2WWnni+LMtSU1OTBg0adNrjgjpkwsPDNWbMGBUVFWnatGmSvguToqIizZ0796RfExERoYiICJ9tsbGxXfr+Tqezx/wH0dNxrszC+TIH58osPe18xcTE/OAxQR0ykrRw4ULNnj1bY8eO1fjx47VixQodPnxYt956q92jAQAAmwV9yMycOVN///vf9cADD8jtdmvUqFHauHHjCRcAAwCA3ifoQ0aS5s6de8qXkgIhIiJCixYtOuElKgQfzpVZOF/m4FyZpTefL4dl/dC/awIAAAhOQf2GeAAAAKdDyAAAAGMRMgAAwFiEzPcUFhZqyJAhioyMVFZWlj788EO7R+p1HnzwQTkcDp9bRkaGd39ra6vy8/M1YMAARUdHa8aMGSe8aWJNTY2uvfZa9e3bVwkJCbr77rt19OjR7n4oPdLWrVt1/fXXa9CgQXI4HHr11Vd99luWpQceeEDJycmKiopSTk6Odu3a5XPMoUOHNGvWLDmdTsXGxmrOnDknvKX+559/rssuu0yRkZFKSUnRsmXLAv3QepwfOle33HLLCX/WJk+e7HMM56p7LF26VOPGjVO/fv2UkJCgadOmeT9W4Dh//ewrLi7W6NGjFRERoZ/85Cdau3ZtoB9eQBEy/+Sll17SwoULtWjRIn3yyScaOXKkcnNzdeDAAbtH63WGDx+u2tpa723btm3efQsWLNAbb7yhDRs2qKSkRPv379f06dO9+48dO6Zrr71W7e3t+uCDD/Tcc89p7dq1euCBB+x4KD3O4cOHNXLkSBUWFp50/7Jly/TEE09o1apVKisr0znnnKPc3Fy1trZ6j5k1a5a+/PJLbdq0SW+++aa2bt2qO+64w7vf4/Fo0qRJGjx4sMrLy/Xoo4/qwQcf1DPPPBPwx9eT/NC5kqTJkyf7/Fl78cUXffZzrrpHSUmJ8vPztX37dm3atEkdHR2aNGmSDh8+7D3GHz/7qqurde211+qqq67Sjh07NH/+fN1222165513uvXx+pVfPhSphxg/fryVn5/vvX/s2DFr0KBB1tKlS22cqvdZtGiRNXLkyJPua2hosMLCwqwNGzZ4t1VUVFiSrNLSUsuyLOutt96yQkJCLLfb7T1m5cqVltPptNra2gI6e28jyXrllVe89zs7O62kpCTr0Ucf9W5raGiwIiIirBdffNGyLMv66quvLEnWRx995D3m7bffthwOh7Vv3z7Lsizr6aeftvr37+9zvu69914rPT09wI+o5/r+ubIsy5o9e7Y1derUU34N58o+Bw4csCRZJSUllmX572ffPffcYw0fPtzne82cOdPKzc0N9EMKGJ6R+Yf29naVl5crJyfHuy0kJEQ5OTkqLS21cbLeadeuXRo0aJCGDh2qWbNmqabm/3+qb0dHh895ysjIkMvl8p6n0tJSXXTRRT5vmpibmyuPx6Mvv/yyex9IL1NdXS232+1zfmJiYpSVleVzfmJjYzV27FjvMTk5OQoJCVFZWZn3mMsvv1zh4eHeY3Jzc1VVVaVvv/22mx5N71BcXKyEhASlp6frzjvv1MGDB737OFf2aWxslCTFxcVJ8t/PvtLSUp81jh9j8t9zhMw/1NfX69ixYye8Y3BiYqLcbrdNU/VOWVlZWrt2rTZu3KiVK1equrpal112mZqamuR2uxUeHn7C52f983lyu90nPY/H9yFwjv/+nu7PkdvtVkJCgs/+0NBQxcXFcQ672eTJk/X888+rqKhIjzzyiEpKSjRlyhQdO3ZMEufKLp2dnZo/f74uvfRSXXjhhZLkt599pzrG4/GopaUlEA8n4Ix4Z1/0LlOmTPH+esSIEcrKytLgwYP18ssvKyoqysbJgJ7lxhtv9P76oosu0ogRI3T++eeruLhYV199tY2T9W75+fnauXOnz7WBODWekfmH+Ph49enT54QrwOvq6pSUlGTTVJC++/TyCy64QLt371ZSUpLa29vV0NDgc8w/n6ekpKSTnsfj+xA4x39/T/fnKCkp6YQL6I8ePapDhw5xDm02dOhQxcfHa/fu3ZI4V3aYO3eu3nzzTW3ZskXnnXeed7u/fvad6hin02ns/ygSMv8QHh6uMWPGqKioyLuts7NTRUVFys7OtnEyNDc36+uvv1ZycrLGjBmjsLAwn/NUVVWlmpoa73nKzs7WF1984fMDeNOmTXI6nRo2bFi3z9+bpKamKikpyef8eDwelZWV+ZyfhoYGlZeXe4/ZvHmzOjs7lZWV5T1m69at6ujo8B6zadMmpaenq3///t30aHqfb775RgcPHlRycrIkzlV3sixLc+fO1SuvvKLNmzcrNTXVZ7+/fvZlZ2f7rHH8GKP/nrP7auNgsn79eisiIsJau3at9dVXX1l33HGHFRsb63MFOALvN7/5jVVcXGxVV1db77//vpWTk2PFx8dbBw4csCzLsn71q19ZLpfL2rx5s/Xxxx9b2dnZVnZ2tvfrjx49al144YXWpEmTrB07dlgbN260Bg4caBUUFNj1kHqUpqYm69NPP7U+/fRTS5K1fPly69NPP7X+9re/WZZlWQ8//LAVGxtrvfbaa9bnn39uTZ061UpNTbVaWlq8a0yePNm6+OKLrbKyMmvbtm1WWlqaddNNN3n3NzQ0WImJidbPf/5za+fOndb69eutvn37Wn/84x+7/fGa7HTnqqmpyfrtb39rlZaWWtXV1da7775rjR492kpLS7NaW1u9a3Cuusedd95pxcTEWMXFxVZtba33duTIEe8x/vjZ99e//tXq27evdffdd1sVFRVWYWGh1adPH2vjxo3d+nj9iZD5nieffNJyuVxWeHi4NX78eGv79u12j9TrzJw500pOTrbCw8Otc88915o5c6a1e/du7/6WlhbrP/7jP6z+/ftbffv2tf71X//Vqq2t9Vljz5491pQpU6yoqCgrPj7e+s1vfmN1dHR090PpkbZs2WJJOuE2e/Zsy7K++yfY999/v5WYmGhFRERYV199tVVVVeWzxsGDB62bbrrJio6OtpxOp3XrrbdaTU1NPsd89tln1oQJE6yIiAjr3HPPtR5++OHueog9xunO1ZEjR6xJkyZZAwcOtMLCwqzBgwdbt99++wn/48a56h4nO0+SrDVr1niP8dfPvi1btlijRo2ywsPDraFDh/p8DxPx6dcAAMBYXCMDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGMRMgAAwFiEDAAAMBYhA6DXcDgcevXVV+0eA4AfETIAAMBYhAwAADAWIQPgR7vyyit111136Z577lFcXJySkpL04IMPevc3NDTotttu08CBA+V0OjVx4kR99tlnkqTGxkb16dNHH3/8sSSps7NTcXFxuuSSS7xf/z//8z9KSUmRJLW3t2vu3LlKTk5WZGSkBg8erKVLl3Zp7r179+qGG25QbGys4uLiNHXqVO3Zs8e7/5ZbbtG0adP02GOPKTk5WQMGDFB+fr46Ojq69P0A+B8hA8AvnnvuOZ1zzjkqKyvTsmXLtGTJEm3atEmS9LOf/UwHDhzQ22+/rfLyco0ePVpXX321Dh06pJiYGI0aNUrFxcWSpC+++EIOh0OffvqpmpubJUklJSW64oorJElPPPGEXn/9db388suqqqrSCy+8oCFDhpz1vB0dHcrNzVW/fv303nvv6f3331d0dLQmT56s9vZ273FbtmzR119/rS1btui5557T2rVrtXbt2h/1ewXAfwgZAH4xYsQILVq0SGlpafrFL36hsWPHqqioSNu2bdOHH36oDRs2aOzYsUpLS9Njjz2m2NhY/e///q+k757ROR4yxcXFuuaaa5SZmalt27Z5tx0PmZqaGqWlpWnChAkaPHiwJkyYoJtuuums533ppZfU2dmp//7v/9ZFF12kzMxMrVmzRjU1Nd5ZJKl///566qmnlJGRoeuuu07XXnutioqKftxvFgC/IWQA+MWIESN87icnJ+vAgQP67LPP1NzcrAEDBig6Otp7q66u1tdffy1JuuKKK7Rt2zYdO3ZMJSUluvLKK71xs3//fu3evVtXXnmlpO9e7tmxY4fS09N111136c9//nOX5v3ss8+0e/du9evXzztTXFycWltbvXNJ0vDhw9WnT58THheA4BBq9wAAeoawsDCf+w6HQ52dnWpublZycrLPsxzHxcbGSpIuv/xyNTU16ZNPPtHWrVv10EMPKSkpSQ8//LBGjhypQYMGKS0tTZI0evRoVVdX6+2339a7776rG264QTk5Od5nd85Uc3OzxowZoxdeeOGEfQMHDvzBxwUgOBAyAAJq9OjRcrvdCg0NPeW1LLGxsRoxYoSeeuophYWFKSMjQwkJCZo5c6befPNN78tKxzmdTs2cOVMzZ87Uv/3bv2ny5Mk6dOiQ4uLizmqul156SQkJCXI6nT/mIQKwES8tAQionJwcZWdna9q0afrzn/+sPXv26IMPPtDvfvc7779Ukr67TuaFF17wRktcXJwyMzP10ksv+YTM8uXL9eKLL6qyslJ/+ctftGHDBiUlJXmf3TlTs2bNUnx8vKZOnar33ntP1dXVKi4u1l133aVvvvnGL48dQOARMgACyuFw6K233tLll1+uW2+9VRdccIFuvPFG/e1vf1NiYqL3uCuuuELHjh3zXgsjfRc339/Wr18/LVu2TGPHjtW4ceO0Z88evfXWWwoJObsfZ3379tXWrVvlcrk0ffp0ZWZmas6cOWptbeUZGsAgDsuyLLuHAAAA6AqekQEAAMYiZAD0CC+88ILPP+/+59vw4cPtHg9AgPDSEoAeoampSXV1dSfdFxYWpsGDB3fzRAC6AyEDAACMxUtLAADAWIQMAAAwFiEDAACMRcgAAABjETIAAMBYhAwAADAWIQMAAIxFyAAAAGP9P3Zdf69mWePrAAAAAElFTkSuQmCC\n" + }, + "metadata": {} + } + ], + "source": [ + "sns.histplot(data=stock,x='news_len');" + ] + }, + { + "cell_type": "markdown", + "id": "VWLWG2X8mrCw", + "metadata": { + "id": "VWLWG2X8mrCw" + }, + "source": [ + "**Observations:**\n", + "* Most of the news have between 50 - 1000 words, with an average of 525 words\n", + " * The shortest news has 44 words\n", + "\n", + "* This indicates that these are likely to be news summaries rather than the actual news content itself." + ] + }, + { + "cell_type": "markdown", + "id": "hLE0s7OFKilB", + "metadata": { + "id": "hLE0s7OFKilB" + }, + "source": [ + "### **Bivariate Analysis**" + ] + }, + { + "cell_type": "markdown", + "id": "Yn_9wfzxL-r1", + "metadata": { + "id": "Yn_9wfzxL-r1" + }, + "source": [ + "#### **Correlation**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "gOBaxNZeKllB", + "metadata": { + "id": "gOBaxNZeKllB", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c46aaa76-2682-4469-c36e-44097287cc6e" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "cols = ['Open','High','Low','Close','Volume','news_len']\n", + "sns.heatmap(\n", + " stock[cols].corr(), annot=True, vmin=-1, vmax=1, fmt=\".2f\", cmap=\"Spectral\"\n", + ")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "15UHbBu8Cucj", + "metadata": { + "id": "15UHbBu8Cucj" + }, + "source": [ + "**Observations:**\n", + "* The prices are all perfectly correlated.\n", + " * This might be due to the minimum variation between the different prices.\n", + "\n", + "* There is a negative correlation, albeit very low, between volume and prices.\n", + " * This might be due to selling pressure during periods of negative sentiment." + ] + }, + { + "cell_type": "markdown", + "id": "h-Hz7CpdMAi3", + "metadata": { + "id": "h-Hz7CpdMAi3" + }, + "source": [ + "#### **Label vs Price (Open, High, Low, Close)**" + ] + }, + { + "cell_type": "code", + "source": [ + "plt.figure(figsize=(10, 8))\n", + "\n", + "for i, variable in enumerate(['Open', 'High', 'Low', 'Close']):\n", + " plt.subplot(2, 2, i + 1)\n", + " sns.boxplot(data=stock, x=\"Label\", y=variable)\n", + " plt.tight_layout(pad=2)\n", + "\n", + "plt.show()" + ], + "metadata": { + "id": "lCVHNWhgMElU", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b838000a-b88f-43ac-90ee-499cf11fa5ee" + }, + "id": "lCVHNWhgMElU", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Observations:**\n", + "* The median for all prices is significantly lower for negative sentiment news as compared to both positive and neutral sentiment news, indicating that negative news likely triggers investor sell-offs which drive the stock prices down.\n", + "\n", + "* The boxplot for the open price under neutral sentiment displays a notably higher upper whisker relative to positive sentiment. This suggests that the market's opening often covers a wider range of prices when news is neutral.\n", + " * This variability might be attributed to different interpretations of seemingly neutral news, which leads some investors to react more aggressively and drive the opening price to higher levels." + ], + "metadata": { + "id": "axyzmidFWaNS" + }, + "id": "axyzmidFWaNS" + }, + { + "cell_type": "markdown", + "id": "cY9P2rdBMH-h", + "metadata": { + "id": "cY9P2rdBMH-h" + }, + "source": [ + "#### **Label vs Volume**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "mzCxLFg1LCPk", + "metadata": { + "id": "mzCxLFg1LCPk", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "13536009-bc63-4dff-b422-df723211166c" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "sns.boxplot(\n", + " data=stock, x=\"Label\", y=\"Volume\"\n", + ");" + ] + }, + { + "cell_type": "markdown", + "id": "LFipGtxhOa8g", + "metadata": { + "id": "LFipGtxhOa8g" + }, + "source": [ + "**Observations:**\n", + "* The median trading volume for the stock is approximately the same across all sentiment polarities.\n", + "* The volume distribution for positive sentiment news shows a wider spread compared to other sentiment categories.\n", + " - This wider range might indicate that even positive news leads to diverse interpretations among investors, contributing to varied trading activities and reactions." + ] + }, + { + "cell_type": "markdown", + "id": "9ySUmJUyQ0vi", + "metadata": { + "id": "9ySUmJUyQ0vi" + }, + "source": [ + "#### **Date vs Price (Open, High, Low, Close)**" + ] + }, + { + "cell_type": "markdown", + "id": "tq0NL64DQ0v1", + "metadata": { + "id": "tq0NL64DQ0v1" + }, + "source": [ + "- The data is at the level of news, and we might have more than one news in a day. However, the prices are at daily level\n", + "- So, we can aggregate the data at a daily level by taking the mean of the attributes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bECqvVQtwheA", + "metadata": { + "id": "bECqvVQtwheA", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "89dd8d7b-2323-418b-9e11-544405d0396b" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Open High Low Close Volume\n", + "Date \n", + "2019-01-02 38.72 39.71 38.56 39.48 130672400.0\n", + "2019-01-03 35.99 36.43 35.50 35.55 103544800.0\n", + "2019-01-04 36.13 37.14 35.95 37.06 111448000.0\n", + "2019-01-07 37.17 37.21 36.47 36.98 109012000.0\n", + "2019-01-08 37.39 37.96 37.13 37.69 216071600.0" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
OpenHighLowCloseVolume
Date
2019-01-0238.7239.7138.5639.48130672400.0
2019-01-0335.9936.4335.5035.55103544800.0
2019-01-0436.1337.1435.9537.06111448000.0
2019-01-0737.1737.2136.4736.98109012000.0
2019-01-0837.3937.9637.1337.69216071600.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "stock_daily", + "summary": "{\n \"name\": \"stock_daily\",\n \"rows\": 73,\n \"fields\": [\n {\n \"column\": \"Date\",\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": \"2019-01-02 00:00:00\",\n \"max\": \"2019-04-29 00:00:00\",\n \"num_unique_values\": 73,\n \"samples\": [\n \"2019-01-08 00:00:00\",\n \"2019-04-15 00:00:00\",\n \"2019-01-30 00:00:00\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Open\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.522032992144083,\n \"min\": 35.99,\n \"max\": 51.84,\n \"num_unique_values\": 69,\n \"samples\": [\n 43.22,\n 38.71999999999999,\n 48.830000000000005\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"High\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.515838152119043,\n \"min\": 36.43,\n \"max\": 52.12,\n \"num_unique_values\": 68,\n \"samples\": [\n 49.080000000000005,\n 39.08,\n 37.96\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Low\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.522815819552585,\n \"min\": 35.5,\n \"max\": 51.76,\n \"num_unique_values\": 66,\n \"samples\": [\n 49.54,\n 50.97,\n 38.56\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.5189565039202,\n \"min\": 35.55,\n \"max\": 51.86999999999999,\n \"num_unique_values\": 68,\n \"samples\": [\n 48.77,\n 39.08,\n 37.68999999999999\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 49858245.96607382,\n \"min\": 45448000.0,\n \"max\": 365248800.0,\n \"num_unique_values\": 73,\n \"samples\": [\n 216071600.0,\n 70146400.0,\n 244439200.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 19 + } + ], + "source": [ + "stock_daily = stock.groupby('Date').agg(\n", + " {\n", + " 'Open': 'mean',\n", + " 'High': 'mean',\n", + " 'Low': 'mean',\n", + " 'Close': 'mean',\n", + " 'Volume': 'mean',\n", + " }\n", + ").reset_index() # Group the 'stocks' DataFrame by the 'Date' column\n", + "\n", + "stock_daily.set_index('Date', inplace=True)\n", + "stock_daily.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7ORSmC3lxrwy", + "metadata": { + "id": "7ORSmC3lxrwy", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "119a6be1-50f4-47be-d5d4-f3cdd077f128" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.figure(figsize=(15,5))\n", + "sns.lineplot(stock_daily.drop('Volume', axis=1));" + ] + }, + { + "cell_type": "markdown", + "id": "5EZ5L0-UQ0v2", + "metadata": { + "id": "5EZ5L0-UQ0v2" + }, + "source": [ + "**Observations:**\n", + "* The stock price has gradually increased over time from ~\\$40 to ~\\$50 in the period for which the data is available." + ] + }, + { + "cell_type": "markdown", + "id": "KG4y9NK1Ng1-", + "metadata": { + "id": "KG4y9NK1Ng1-" + }, + "source": [ + "#### **Volume vs Close Price**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0WMHYw6w0TM6", + "metadata": { + "id": "0WMHYw6w0TM6", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "8517bfee-4d34-40e8-a1cd-567350afe1bb" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "# Create a figure and axis\n", + "fig, ax1 = plt.subplots(figsize=(15,5))\n", + "\n", + "# Lineplot on primary y-axis\n", + "sns.lineplot(data=stock_daily.reset_index(), x='Date', y='Close', ax=ax1, color='blue', marker='o', label='Close Price')\n", + "\n", + "# Create a secondary y-axis\n", + "ax2 = ax1.twinx()\n", + "\n", + "# Lineplot on secondary y-axis\n", + "sns.lineplot(data=stock_daily.reset_index(), x='Date', y='Volume', ax=ax2, color='gray', marker='o', label='Volume')\n", + "\n", + "ax1.legend(bbox_to_anchor=(1,1));" + ] + }, + { + "cell_type": "markdown", + "id": "fHU5KgCGNOX5", + "metadata": { + "id": "fHU5KgCGNOX5" + }, + "source": [ + "**Observations:**\n", + "- There is no specific pattern here\n", + " - There have been periods where the price decreased with increasing volumes\n", + " - There have been periods where the price increased with increasing volumes" + ] + }, + { + "cell_type": "markdown", + "id": "N8z4-vOBmwqv", + "metadata": { + "id": "N8z4-vOBmwqv" + }, + "source": [ + "## **Data Preprocessing**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2jIN9NycxtUC", + "metadata": { + "id": "2jIN9NycxtUC", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "fee181c0-f896-4c4c-cf2c-378747dfb425" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "count 418\n", + "mean 2019-02-14 12:24:06.889952256\n", + "min 2019-01-02 00:00:00\n", + "25% 2019-01-11 00:00:00\n", + "50% 2019-01-31 00:00:00\n", + "75% 2019-03-21 00:00:00\n", + "max 2019-04-29 00:00:00\n", + "Name: Date, dtype: object" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Date
count418
mean2019-02-14 12:24:06.889952256
min2019-01-02 00:00:00
25%2019-01-11 00:00:00
50%2019-01-31 00:00:00
75%2019-03-21 00:00:00
max2019-04-29 00:00:00
\n", + "

" + ] + }, + "metadata": {}, + "execution_count": 22 + } + ], + "source": [ + "stock['Date'].describe()" + ] + }, + { + "cell_type": "markdown", + "id": "0FxlsnepSb5m", + "metadata": { + "id": "0FxlsnepSb5m" + }, + "source": [ + "**Observations:**\n", + "* We see that 75% of the data is till the third week of March 2019.\n", + "* We'll take the data till the end of March 2019 for training, and keep the April 2019 data for test set." + ] + }, + { + "cell_type": "markdown", + "id": "j7KR_HgZRDtk", + "metadata": { + "id": "j7KR_HgZRDtk" + }, + "source": [ + "### Train-test Split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "yXsgpkpeI8UK", + "metadata": { + "id": "yXsgpkpeI8UK" + }, + "outputs": [], + "source": [ + "X_train = stock[stock['Date'] < '2019-04-01'].reset_index()\n", + "X_test = stock[(stock['Date'] >= '2019-04-01')].reset_index()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "__2ON8RuI8Q2", + "metadata": { + "id": "__2ON8RuI8Q2" + }, + "outputs": [], + "source": [ + "y_train = X_train['Label'].copy()\n", + "y_test = X_test['Label'].copy()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "imMx6hH0__IB", + "metadata": { + "id": "imMx6hH0__IB", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "3dbdaa6f-f304-47f0-94a8-32e8a5a7309d" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Train data shape (347, 10)\n", + "Test data shape (71, 10)\n", + "Train label shape (347,)\n", + "Test label shape (71,)\n" + ] + } + ], + "source": [ + "print(\"Train data shape\",X_train.shape)\n", + "print(\"Test data shape \",X_test.shape)\n", + "\n", + "print(\"Train label shape\",y_train.shape)\n", + "print(\"Test label shape \",y_test.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "uJZqic2Q6YZD", + "metadata": { + "id": "uJZqic2Q6YZD" + }, + "outputs": [], + "source": [ + "# y_train.value_counts(normalize=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "Xf9R3BaR6bZw", + "metadata": { + "id": "Xf9R3BaR6bZw" + }, + "outputs": [], + "source": [ + "# y_test.value_counts(normalize=True)" + ] + }, + { + "cell_type": "markdown", + "id": "0rYgR14ORf7b", + "metadata": { + "id": "0rYgR14ORf7b" + }, + "source": [ + "## **Word Embeddings**" + ] + }, + { + "cell_type": "markdown", + "id": "4IUBFAOTbjju", + "metadata": { + "id": "4IUBFAOTbjju" + }, + "source": [ + "### **Generating Text Embeddings using Word2Vec**" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Defining the model**" + ], + "metadata": { + "id": "bzwPsqJvVbNC" + }, + "id": "bzwPsqJvVbNC" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ZD188ZNsboS4", + "metadata": { + "id": "ZD188ZNsboS4" + }, + "outputs": [], + "source": [ + "# Creating a list of all words in our data\n", + "words_list = [item.split(\" \") for item in stock['News'].values]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eGVgM5iTbwHy", + "metadata": { + "id": "eGVgM5iTbwHy" + }, + "outputs": [], + "source": [ + "# Creating an instance of Word2Vec\n", + "vec_size = 300\n", + "model_W2V = Word2Vec(words_list, vector_size = vec_size, min_count = 1, window=5, workers = 6)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "lhy6DjNxbzOd", + "metadata": { + "id": "lhy6DjNxbzOd", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "6bf055ce-4c91-4cb3-bd4c-673e2c5de694" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Length of the vocabulary is 14577\n" + ] + } + ], + "source": [ + "# Checking the size of the vocabulary\n", + "print(\"Length of the vocabulary is\", len(list(model_W2V.wv.key_to_index)))" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Encoding the datasets**" + ], + "metadata": { + "id": "ZYCiT-7GVNaH" + }, + "id": "ZYCiT-7GVNaH" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "F_4ldXPzcF7y", + "metadata": { + "id": "F_4ldXPzcF7y" + }, + "outputs": [], + "source": [ + "# Retrieving the words present in the Word2Vec model's vocabulary\n", + "words = list(model_W2V.wv.key_to_index.keys())\n", + "\n", + "# Retrieving word vectors for all the words present in the model's vocabulary\n", + "wvs = model_W2V.wv[words].tolist()\n", + "\n", + "# Creating a dictionary of words and their corresponding vectors\n", + "word_vector_dict = dict(zip(words, wvs))" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Averaging the word vectors to get sentence encodings**" + ], + "metadata": { + "id": "GgismcJz0dZE" + }, + "id": "GgismcJz0dZE" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "vsQ0vF42cH_r", + "metadata": { + "id": "vsQ0vF42cH_r" + }, + "outputs": [], + "source": [ + "def average_vectorizer_Word2Vec(doc):\n", + " # Initializing a feature vector for the sentence\n", + " feature_vector = np.zeros((vec_size,), dtype=\"float64\")\n", + "\n", + " # Creating a list of words in the sentence that are present in the model vocabulary\n", + " words_in_vocab = [word for word in doc.split() if word in words]\n", + "\n", + " # adding the vector representations of the words\n", + " for word in words_in_vocab:\n", + " feature_vector += np.array(word_vector_dict[word])\n", + "\n", + " # Dividing by the number of words to get the average vector\n", + " if len(words_in_vocab) != 0:\n", + " feature_vector /= len(words_in_vocab)\n", + "\n", + " return feature_vector" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "Jtxc1yVHcJjV", + "metadata": { + "id": "Jtxc1yVHcJjV", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "a0ab0ac1-7d5a-4cb0-d761-af1e7f93a323" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Time taken 8.816098928451538\n" + ] + } + ], + "source": [ + "# creating a dataframe of the vectorized documents\n", + "start = time.time()\n", + "\n", + "X_train_wv = pd.DataFrame(X_train['News'].apply(average_vectorizer_Word2Vec).tolist(), columns=['Feature '+str(i) for i in range(vec_size)])\n", + "X_test_wv = pd.DataFrame(X_test['News'].apply(average_vectorizer_Word2Vec).tolist(), columns=['Feature '+str(i) for i in range(vec_size)])\n", + "\n", + "end = time.time()\n", + "print('Time taken ', (end-start))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8IrY8tZjA4VZ", + "metadata": { + "id": "8IrY8tZjA4VZ", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "ded041ed-e998-442e-edc6-4bf5abe65675" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(347, 300) (71, 300)\n" + ] + } + ], + "source": [ + "print(X_train_wv.shape, X_test_wv.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "a3GUvne0hyPx", + "metadata": { + "id": "a3GUvne0hyPx" + }, + "source": [ + "### **Generating Text Embeddings using Sentence Transformer**" + ] + }, + { + "cell_type": "markdown", + "id": "51ITQezWi9VE", + "metadata": { + "id": "51ITQezWi9VE" + }, + "source": [ + "#### **Defining the model**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3EQ7eQIpYSyz", + "metadata": { + "id": "3EQ7eQIpYSyz" + }, + "outputs": [], + "source": [ + "#Defining the model\n", + "model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')" + ] + }, + { + "cell_type": "markdown", + "id": "Lll4MLfzKfBa", + "metadata": { + "id": "Lll4MLfzKfBa" + }, + "source": [ + "#### **Encoding the dataset**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "q1BaGKThKcX3", + "metadata": { + "id": "q1BaGKThKcX3", + "colab": { + "base_uri": "https://localhost:8080/", + "referenced_widgets": [ + "1230a037e0b9479caa9db62c5f9ecb6a", + "ed6c19298c4747a59992a79d99cdaaa7", + "e010222da3cf4751995a51ffc82560ef", + "9f3e3b616bcf482d9fd91a2b54d8d82a", + "6838e428d6d54a3f80d34638812441e6", + "991c2589b56f444486443a31bef569d5", + "4ed01d32996f47f38fbaba687cee45ae", + "a0ce999dbcfe427ba08202bc989b1c33", + "f598184dc72f443ab0ada8de6cf076ad", + "96e9e320eec74a2e9094935af065b254", + "fb854fb10f3e415c9c4c0ac176fb74b4", + "2fb4071397a049f888159e2cbec3ec99", + "280899c6e305423a8d6f20dd395b4e10", + "f68b5d3640c54560b38a29f32deb33a8", + "115335a31d874aba99efb63fa2830e09", + "7b371d0574e04f98bf87a88f722b8477", + "9095b2b09d4a45928fbc3cf45eb35cbb", + "971a53d397494d76b8b5c4a2abb954f7", + "54dd267783314434a5389477c97974e5", + "5bfd23c3586e4615909878610be8e24b", + "4eda58c3e66e40db98ea40fc40ebb109", + "99ee6edbe0574c778200ae65b87d7e0f" + ] + }, + "outputId": "03ca294e-285e-4fe1-f930-d22aaa9f87dc" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "Batches: 0%| | 0/11 [00:00#sk-container-id-1 {\n", + " /* Definition of color scheme common for light and dark mode */\n", + " --sklearn-color-text: #000;\n", + " --sklearn-color-text-muted: #666;\n", + " --sklearn-color-line: gray;\n", + " /* Definition of color scheme for unfitted estimators */\n", + " --sklearn-color-unfitted-level-0: #fff5e6;\n", + " --sklearn-color-unfitted-level-1: #f6e4d2;\n", + " --sklearn-color-unfitted-level-2: #ffe0b3;\n", + " --sklearn-color-unfitted-level-3: chocolate;\n", + " /* Definition of color scheme for fitted estimators */\n", + " --sklearn-color-fitted-level-0: #f0f8ff;\n", + " --sklearn-color-fitted-level-1: #d4ebff;\n", + " --sklearn-color-fitted-level-2: #b3dbfd;\n", + " --sklearn-color-fitted-level-3: cornflowerblue;\n", + "\n", + " /* Specific color for light theme */\n", + " --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n", + " --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n", + " --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n", + " --sklearn-color-icon: #696969;\n", + "\n", + " @media (prefers-color-scheme: dark) {\n", + " /* Redefinition of color scheme for dark theme */\n", + " --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n", + " --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n", + " --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n", + " --sklearn-color-icon: #878787;\n", + " }\n", + "}\n", + "\n", + "#sk-container-id-1 {\n", + " color: var(--sklearn-color-text);\n", + "}\n", + "\n", + "#sk-container-id-1 pre {\n", + " padding: 0;\n", + "}\n", + "\n", + "#sk-container-id-1 input.sk-hidden--visually {\n", + " border: 0;\n", + " clip: rect(1px 1px 1px 1px);\n", + " clip: rect(1px, 1px, 1px, 1px);\n", + " height: 1px;\n", + " margin: -1px;\n", + " overflow: hidden;\n", + " padding: 0;\n", + " position: absolute;\n", + " width: 1px;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-dashed-wrapped {\n", + " border: 1px dashed var(--sklearn-color-line);\n", + " margin: 0 0.4em 0.5em 0.4em;\n", + " box-sizing: border-box;\n", + " padding-bottom: 0.4em;\n", + " background-color: var(--sklearn-color-background);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-container {\n", + " /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n", + " but bootstrap.min.css set `[hidden] { display: none !important; }`\n", + " so we also need the `!important` here to be able to override the\n", + " default hidden behavior on the sphinx rendered scikit-learn.org.\n", + " See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n", + " display: inline-block !important;\n", + " position: relative;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-text-repr-fallback {\n", + " display: none;\n", + "}\n", + "\n", + "div.sk-parallel-item,\n", + "div.sk-serial,\n", + "div.sk-item {\n", + " /* draw centered vertical line to link estimators */\n", + " background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n", + " background-size: 2px 100%;\n", + " background-repeat: no-repeat;\n", + " background-position: center center;\n", + "}\n", + "\n", + "/* Parallel-specific style estimator block */\n", + "\n", + "#sk-container-id-1 div.sk-parallel-item::after {\n", + " content: \"\";\n", + " width: 100%;\n", + " border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n", + " flex-grow: 1;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-parallel {\n", + " display: flex;\n", + " align-items: stretch;\n", + " justify-content: center;\n", + " background-color: var(--sklearn-color-background);\n", + " position: relative;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-parallel-item {\n", + " display: flex;\n", + " flex-direction: column;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-parallel-item:first-child::after {\n", + " align-self: flex-end;\n", + " width: 50%;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-parallel-item:last-child::after {\n", + " align-self: flex-start;\n", + " width: 50%;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-parallel-item:only-child::after {\n", + " width: 0;\n", + "}\n", + "\n", + "/* Serial-specific style estimator block */\n", + "\n", + "#sk-container-id-1 div.sk-serial {\n", + " display: flex;\n", + " flex-direction: column;\n", + " align-items: center;\n", + " background-color: var(--sklearn-color-background);\n", + " padding-right: 1em;\n", + " padding-left: 1em;\n", + "}\n", + "\n", + "\n", + "/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n", + "clickable and can be expanded/collapsed.\n", + "- Pipeline and ColumnTransformer use this feature and define the default style\n", + "- Estimators will overwrite some part of the style using the `sk-estimator` class\n", + "*/\n", + "\n", + "/* Pipeline and ColumnTransformer style (default) */\n", + "\n", + "#sk-container-id-1 div.sk-toggleable {\n", + " /* Default theme specific background. It is overwritten whether we have a\n", + " specific estimator or a Pipeline/ColumnTransformer */\n", + " background-color: var(--sklearn-color-background);\n", + "}\n", + "\n", + "/* Toggleable label */\n", + "#sk-container-id-1 label.sk-toggleable__label {\n", + " cursor: pointer;\n", + " display: flex;\n", + " width: 100%;\n", + " margin-bottom: 0;\n", + " padding: 0.5em;\n", + " box-sizing: border-box;\n", + " text-align: center;\n", + " align-items: start;\n", + " justify-content: space-between;\n", + " gap: 0.5em;\n", + "}\n", + "\n", + "#sk-container-id-1 label.sk-toggleable__label .caption {\n", + " font-size: 0.6rem;\n", + " font-weight: lighter;\n", + " color: var(--sklearn-color-text-muted);\n", + "}\n", + "\n", + "#sk-container-id-1 label.sk-toggleable__label-arrow:before {\n", + " /* Arrow on the left of the label */\n", + " content: \"▸\";\n", + " float: left;\n", + " margin-right: 0.25em;\n", + " color: var(--sklearn-color-icon);\n", + "}\n", + "\n", + "#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {\n", + " color: var(--sklearn-color-text);\n", + "}\n", + "\n", + "/* Toggleable content - dropdown */\n", + "\n", + "#sk-container-id-1 div.sk-toggleable__content {\n", + " max-height: 0;\n", + " max-width: 0;\n", + " overflow: hidden;\n", + " text-align: left;\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-0);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-toggleable__content.fitted {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-0);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-toggleable__content pre {\n", + " margin: 0.2em;\n", + " border-radius: 0.25em;\n", + " color: var(--sklearn-color-text);\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-0);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-toggleable__content.fitted pre {\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-fitted-level-0);\n", + "}\n", + "\n", + "#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n", + " /* Expand drop-down */\n", + " max-height: 200px;\n", + " max-width: 100%;\n", + " overflow: auto;\n", + "}\n", + "\n", + "#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n", + " content: \"▾\";\n", + "}\n", + "\n", + "/* Pipeline/ColumnTransformer-specific style */\n", + "\n", + "#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n", + " color: var(--sklearn-color-text);\n", + " background-color: var(--sklearn-color-unfitted-level-2);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n", + " background-color: var(--sklearn-color-fitted-level-2);\n", + "}\n", + "\n", + "/* Estimator-specific style */\n", + "\n", + "/* Colorize estimator box */\n", + "#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-2);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-2);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-label label.sk-toggleable__label,\n", + "#sk-container-id-1 div.sk-label label {\n", + " /* The background is the default theme color */\n", + " color: var(--sklearn-color-text-on-default-background);\n", + "}\n", + "\n", + "/* On hover, darken the color of the background */\n", + "#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {\n", + " color: var(--sklearn-color-text);\n", + " background-color: var(--sklearn-color-unfitted-level-2);\n", + "}\n", + "\n", + "/* Label box, darken color on hover, fitted */\n", + "#sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n", + " color: var(--sklearn-color-text);\n", + " background-color: var(--sklearn-color-fitted-level-2);\n", + "}\n", + "\n", + "/* Estimator label */\n", + "\n", + "#sk-container-id-1 div.sk-label label {\n", + " font-family: monospace;\n", + " font-weight: bold;\n", + " display: inline-block;\n", + " line-height: 1.2em;\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-label-container {\n", + " text-align: center;\n", + "}\n", + "\n", + "/* Estimator-specific */\n", + "#sk-container-id-1 div.sk-estimator {\n", + " font-family: monospace;\n", + " border: 1px dotted var(--sklearn-color-border-box);\n", + " border-radius: 0.25em;\n", + " box-sizing: border-box;\n", + " margin-bottom: 0.5em;\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-0);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-estimator.fitted {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-0);\n", + "}\n", + "\n", + "/* on hover */\n", + "#sk-container-id-1 div.sk-estimator:hover {\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-2);\n", + "}\n", + "\n", + "#sk-container-id-1 div.sk-estimator.fitted:hover {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-2);\n", + "}\n", + "\n", + "/* Specification for estimator info (e.g. \"i\" and \"?\") */\n", + "\n", + "/* Common style for \"i\" and \"?\" */\n", + "\n", + ".sk-estimator-doc-link,\n", + "a:link.sk-estimator-doc-link,\n", + "a:visited.sk-estimator-doc-link {\n", + " float: right;\n", + " font-size: smaller;\n", + " line-height: 1em;\n", + " font-family: monospace;\n", + " background-color: var(--sklearn-color-background);\n", + " border-radius: 1em;\n", + " height: 1em;\n", + " width: 1em;\n", + " text-decoration: none !important;\n", + " margin-left: 0.5em;\n", + " text-align: center;\n", + " /* unfitted */\n", + " border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n", + " color: var(--sklearn-color-unfitted-level-1);\n", + "}\n", + "\n", + ".sk-estimator-doc-link.fitted,\n", + "a:link.sk-estimator-doc-link.fitted,\n", + "a:visited.sk-estimator-doc-link.fitted {\n", + " /* fitted */\n", + " border: var(--sklearn-color-fitted-level-1) 1pt solid;\n", + " color: var(--sklearn-color-fitted-level-1);\n", + "}\n", + "\n", + "/* On hover */\n", + "div.sk-estimator:hover .sk-estimator-doc-link:hover,\n", + ".sk-estimator-doc-link:hover,\n", + "div.sk-label-container:hover .sk-estimator-doc-link:hover,\n", + ".sk-estimator-doc-link:hover {\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-3);\n", + " color: var(--sklearn-color-background);\n", + " text-decoration: none;\n", + "}\n", + "\n", + "div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n", + ".sk-estimator-doc-link.fitted:hover,\n", + "div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n", + ".sk-estimator-doc-link.fitted:hover {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-3);\n", + " color: var(--sklearn-color-background);\n", + " text-decoration: none;\n", + "}\n", + "\n", + "/* Span, style for the box shown on hovering the info icon */\n", + ".sk-estimator-doc-link span {\n", + " display: none;\n", + " z-index: 9999;\n", + " position: relative;\n", + " font-weight: normal;\n", + " right: .2ex;\n", + " padding: .5ex;\n", + " margin: .5ex;\n", + " width: min-content;\n", + " min-width: 20ex;\n", + " max-width: 50ex;\n", + " color: var(--sklearn-color-text);\n", + " box-shadow: 2pt 2pt 4pt #999;\n", + " /* unfitted */\n", + " background: var(--sklearn-color-unfitted-level-0);\n", + " border: .5pt solid var(--sklearn-color-unfitted-level-3);\n", + "}\n", + "\n", + ".sk-estimator-doc-link.fitted span {\n", + " /* fitted */\n", + " background: var(--sklearn-color-fitted-level-0);\n", + " border: var(--sklearn-color-fitted-level-3);\n", + "}\n", + "\n", + ".sk-estimator-doc-link:hover span {\n", + " display: block;\n", + "}\n", + "\n", + "/* \"?\"-specific style due to the `` HTML tag */\n", + "\n", + "#sk-container-id-1 a.estimator_doc_link {\n", + " float: right;\n", + " font-size: 1rem;\n", + " line-height: 1em;\n", + " font-family: monospace;\n", + " background-color: var(--sklearn-color-background);\n", + " border-radius: 1rem;\n", + " height: 1rem;\n", + " width: 1rem;\n", + " text-decoration: none;\n", + " /* unfitted */\n", + " color: var(--sklearn-color-unfitted-level-1);\n", + " border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n", + "}\n", + "\n", + "#sk-container-id-1 a.estimator_doc_link.fitted {\n", + " /* fitted */\n", + " border: var(--sklearn-color-fitted-level-1) 1pt solid;\n", + " color: var(--sklearn-color-fitted-level-1);\n", + "}\n", + "\n", + "/* On hover */\n", + "#sk-container-id-1 a.estimator_doc_link:hover {\n", + " /* unfitted */\n", + " background-color: var(--sklearn-color-unfitted-level-3);\n", + " color: var(--sklearn-color-background);\n", + " text-decoration: none;\n", + "}\n", + "\n", + "#sk-container-id-1 a.estimator_doc_link.fitted:hover {\n", + " /* fitted */\n", + " background-color: var(--sklearn-color-fitted-level-3);\n", + "}\n", + "" + ] + }, + "metadata": {}, + "execution_count": 40 + } + ], + "source": [ + "# Building the model\n", + "rf_word2vec = RandomForestClassifier(n_estimators = 100, max_depth = 3, random_state = 42)\n", + "\n", + "\n", + "# Fitting on train data\n", + "rf_word2vec.fit(X_train_wv, y_train)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Checking Training and Test Performance**\n" + ], + "metadata": { + "id": "95O3167WbBnd" + }, + "id": "95O3167WbBnd" + }, + { + "cell_type": "code", + "source": [ + "# Predicting on train data\n", + "y_pred_train = rf_word2vec.predict(X_train_wv)\n", + "\n", + "# Predicting on test data\n", + "y_pred_test = rf_word2vec.predict(X_test_wv)" + ], + "metadata": { + "id": "TtQlY8DlzadF" + }, + "id": "TtQlY8DlzadF", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "**Confusion Matrix**" + ], + "metadata": { + "id": "ycl7jAX7cZuj" + }, + "id": "ycl7jAX7cZuj" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a_AW25srClm-", + "metadata": { + "id": "a_AW25srClm-", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "7c3d30ac-13eb-4053-ff9a-6718a9fbf3c7" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plot_confusion_matrix(y_train,y_pred_train)" + ] + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_test,y_pred_test)" + ], + "metadata": { + "id": "sp4-2sLEDcM3", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "6046c9f3-6602-421d-8738-7c000827c3a1" + }, + "id": "sp4-2sLEDcM3", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Classification Report**" + ], + "metadata": { + "id": "E1jLbrZAidAB" + }, + "id": "E1jLbrZAidAB" + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8rV_bYhqClm_", + "metadata": { + "id": "8rV_bYhqClm_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "cb772720-3f55-4fe6-97c6-75f6fe7384e6" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.755043 0.755043 0.778891 0.720565\n" + ] + } + ], + "source": [ + "#Calculating different metrics on training data\n", + "rf_train_wv = model_performance_classification_sklearn(y_train,y_pred_train)\n", + "print(\"Training performance:\\n\", rf_train_wv)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "_AA2cSvzClm_", + "metadata": { + "id": "_AA2cSvzClm_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d063296f-2448-4515-f5c1-575671bcbd48" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Testing performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.746479 0.746479 0.687934 0.680114\n" + ] + } + ], + "source": [ + "#Calculating different metrics on test data\n", + "rf_test_wv = model_performance_classification_sklearn(y_test, y_pred_test)\n", + "print(\"Testing performance:\\n\",rf_test_wv)" + ] + }, + { + "cell_type": "markdown", + "id": "P2OnPdLRF2M9", + "metadata": { + "id": "P2OnPdLRF2M9" + }, + "source": [ + "* The model is slightly overfitting, as there is a little difference between its performance on the training set and the test set." + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Building a Random Forest Model using text embeddings obtained from the Sentence Transformer**" + ], + "metadata": { + "id": "uijWj2Nl2jyK" + }, + "id": "uijWj2Nl2jyK" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "04W4gkoZ2jyK", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 80 + }, + "outputId": "364a1ac8-b4b6-403a-85d4-133186c89aa9" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "RandomForestClassifier(max_depth=3, random_state=42)" + ], + "text/html": [ + "
RandomForestClassifier(max_depth=3, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ] + }, + "metadata": {}, + "execution_count": 46 + } + ], + "source": [ + "# Building the model\n", + "rf_st = RandomForestClassifier(n_estimators = 100, max_depth = 3, random_state = 42)\n", + "\n", + "\n", + "# Fitting on train data\n", + "rf_st.fit(X_train_st, y_train)" + ], + "id": "04W4gkoZ2jyK" + }, + { + "cell_type": "markdown", + "source": [ + "#### **Checking Training and Test Performance**" + ], + "metadata": { + "id": "BTWSvJfC2jyL" + }, + "id": "BTWSvJfC2jyL" + }, + { + "cell_type": "code", + "source": [ + "# Predicting on train data\n", + "y_pred_train = rf_st.predict(X_train_st)\n", + "\n", + "# Predicting on test data\n", + "y_pred_test = rf_st.predict(X_test_st)" + ], + "metadata": { + "id": "QPI_ePlJ2jyL" + }, + "execution_count": null, + "outputs": [], + "id": "QPI_ePlJ2jyL" + }, + { + "cell_type": "markdown", + "source": [ + "**Confusion Matrix**" + ], + "metadata": { + "id": "vskhvTGm2jyL" + }, + "id": "vskhvTGm2jyL" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9P_tYSn92jyM", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "f404f8af-0142-4662-a7eb-8281eb953a13" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plot_confusion_matrix(y_train,y_pred_train)" + ], + "id": "9P_tYSn92jyM" + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_test,y_pred_test)" + ], + "metadata": { + "id": "LBzzMFHJDolN", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "d4e2c9a6-e164-45ff-ef5a-5c70c78c1736" + }, + "id": "LBzzMFHJDolN", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "**Classification Report**" + ], + "metadata": { + "id": "sSvRSDit2jyM" + }, + "id": "sSvRSDit2jyM" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "_kEV9XZD2jyM", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "eb0311c5-dc0f-4aec-a189-a8358f68a7de" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.801153 0.801153 0.831835 0.775232\n" + ] + } + ], + "source": [ + "#Calculating different metrics on training data\n", + "rf_train_st = model_performance_classification_sklearn(y_train,y_pred_train)\n", + "print(\"Training performance:\\n\", rf_train_st)" + ], + "id": "_kEV9XZD2jyM" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "QoFxAES32jyM", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "74f810ea-08e2-4c99-a9c2-9187bfbd2357" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Testing performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.71831 0.71831 0.551745 0.624105\n" + ] + } + ], + "source": [ + "#Calculating different metrics on test data\n", + "rf_test_st = model_performance_classification_sklearn(y_test, y_pred_test)\n", + "print(\"Testing performance:\\n\",rf_test_st)" + ], + "id": "QoFxAES32jyM" + }, + { + "cell_type": "markdown", + "id": "ZmPPcdrHE9K2", + "metadata": { + "id": "ZmPPcdrHE9K2" + }, + "source": [ + "* The model is highly overfitting, as there is a significant difference between its performance on the training set and the test set." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DHgj_cCm2pIn" + }, + "source": [ + "### **Building Neural Network Models using different text embeddings**" + ], + "id": "DHgj_cCm2pIn" + }, + { + "cell_type": "markdown", + "source": [ + "#### **Building a Neural Network Model using text embeddings obtained from the Word2Vec**" + ], + "metadata": { + "id": "LpasFYQriueC" + }, + "id": "LpasFYQriueC" + }, + { + "cell_type": "code", + "source": [ + "# Convert the labels\n", + "label_mapping = {1: 2, -1: 0, 0: 1}\n", + "y_train_mapped_wv = [label_mapping[label] for label in y_train]\n", + "y_test_mapped_wv = [label_mapping[label] for label in y_test]\n", + "\n", + "# Convert your features DataFrame to a NumPy array\n", + "X_train_wv_np = np.array(X_train_wv)\n", + "X_test_wv_np = np.array(X_test_wv)\n", + "y_train_mapped_wv = np.array(y_train_mapped_wv)\n", + "y_test_mapped_wv = np.array(y_test_mapped_wv)" + ], + "metadata": { + "id": "xIeKB-P4nYFi" + }, + "id": "xIeKB-P4nYFi", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "import gc\n", + "\n", + "# Clear previous sessions\n", + "tf.keras.backend.clear_session()\n", + "gc.collect()\n", + "\n", + "# Model definition\n", + "model = Sequential()\n", + "model.add(Dense(128, activation='relu', input_shape=(X_train_wv_np.shape[1],))) # Use the shape of the Word2Vec embeddings\n", + "model.add(Dropout(0.3))\n", + "model.add(Dense(64, activation='relu'))\n", + "model.add(Dense(3, activation='softmax')) # 3 output classes\n", + "\n", + "# Compile\n", + "model.compile(optimizer='adam',loss='sparse_categorical_crossentropy', metrics =['accuracy'])\n", + "\n", + "# Summary\n", + "model.summary()" + ], + "metadata": { + "id": "pPoM2BhyXvBv", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 257 + }, + "outputId": "46ff0a8e-280d-40a4-df68-ce369e6c8029" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1mModel: \"sequential\"\u001b[0m\n" + ], + "text/html": [ + "
Model: \"sequential\"\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n", + "│ dense (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m38,528\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dropout (\u001b[38;5;33mDropout\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dense_1 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m64\u001b[0m) │ \u001b[38;5;34m8,256\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dense_2 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m3\u001b[0m) │ \u001b[38;5;34m195\u001b[0m │\n", + "└─────────────────────────────────┴────────────────────────┴───────────────┘\n" + ], + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n",
+              "┃ Layer (type)                     Output Shape                  Param # ┃\n",
+              "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n",
+              "│ dense (Dense)                   │ (None, 128)            │        38,528 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dropout (Dropout)               │ (None, 128)            │             0 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dense_1 (Dense)                 │ (None, 64)             │         8,256 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dense_2 (Dense)                 │ (None, 3)              │           195 │\n",
+              "└─────────────────────────────────┴────────────────────────┴───────────────┘\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Total params: \u001b[0m\u001b[38;5;34m46,979\u001b[0m (183.51 KB)\n" + ], + "text/html": [ + "
 Total params: 46,979 (183.51 KB)\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m46,979\u001b[0m (183.51 KB)\n" + ], + "text/html": [ + "
 Trainable params: 46,979 (183.51 KB)\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n" + ], + "text/html": [ + "
 Non-trainable params: 0 (0.00 B)\n",
+              "
\n" + ] + }, + "metadata": {} + } + ], + "id": "pPoM2BhyXvBv" + }, + { + "cell_type": "markdown", + "source": [ + "**Note:**\n", + "- During training, we use accuracy as a metric to monitor how well the model is learning to distinguish between classes in each batch.\n", + "- Accuracy is fast and reliable during training and gives us a quick view of model progress.\n", + "- It reflects how often the model is predicting the correct label out of all predictions made.\n", + "\n" + ], + "metadata": { + "id": "kIxFfSYLQNlT" + }, + "id": "kIxFfSYLQNlT" + }, + { + "cell_type": "code", + "source": [ + "# Fitting the model\n", + "history = model.fit(\n", + " X_train_wv_np, y_train_mapped_wv,\n", + " epochs=10,\n", + " batch_size=32\n", + ")" + ], + "metadata": { + "id": "bgHeOMfpnobV", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "6b648ba6-870f-4dfc-a2a7-b38a97c50cef" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Epoch 1/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 5ms/step - accuracy: 0.5349 - loss: 0.9062\n", + "Epoch 2/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 4ms/step - accuracy: 0.6190 - loss: 0.7523 \n", + "Epoch 3/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.5989 - loss: 0.7214 \n", + "Epoch 4/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.5874 - loss: 0.7687 \n", + "Epoch 5/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.6741 - loss: 0.7038 \n", + "Epoch 6/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.6298 - loss: 0.7276 \n", + "Epoch 7/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.6655 - loss: 0.7134 \n", + "Epoch 8/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.6025 - loss: 0.7213 \n", + "Epoch 9/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 6ms/step - accuracy: 0.6321 - loss: 0.7183 \n", + "Epoch 10/10\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - accuracy: 0.6456 - loss: 0.7322 \n" + ] + } + ], + "id": "bgHeOMfpnobV" + }, + { + "cell_type": "markdown", + "source": [ + "#### **Checking Training and Test Performance**" + ], + "metadata": { + "id": "IX11-Hmx8_E1" + }, + "id": "IX11-Hmx8_E1" + }, + { + "cell_type": "code", + "source": [ + "# Predict class probabilities on training data\n", + "y_train_pred_probs = model.predict(X_train_wv_np)\n", + "\n", + "# Convert probabilities to class labels\n", + "y_train_preds_wv = tf.argmax(y_train_pred_probs, axis=1).numpy()" + ], + "metadata": { + "id": "ZpEpHWni87cO", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "156eca77-fc90-4df0-87ac-daf08655f0b6" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 6ms/step \n" + ] + } + ], + "id": "ZpEpHWni87cO" + }, + { + "cell_type": "code", + "source": [ + "# Predict class probabilities on test data\n", + "y_test_pred_probs = model.predict(X_test_wv_np)\n", + "\n", + "# Convert probabilities to class labels\n", + "y_test_preds_wv = tf.argmax(y_test_pred_probs, axis=1).numpy()" + ], + "metadata": { + "id": "hBMMkZBk9Jkz", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d66d1a0d-3bb4-4fa3-d68e-ded8a15c2696" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[1m3/3\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step\n" + ] + } + ], + "id": "hBMMkZBk9Jkz" + }, + { + "cell_type": "code", + "source": [ + "# Convert back to [-1, 0, 1] to match utility function expectations\n", + "label_mapping = {2: 1, 0: -1, 1: 0}\n", + "y_train_preds_wv = np.array([label_mapping[index] for index in y_train_preds_wv])\n", + "y_test_preds_wv = np.array([label_mapping[index] for index in y_test_preds_wv])" + ], + "metadata": { + "id": "wCPqMh0nwryB" + }, + "id": "wCPqMh0nwryB", + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "**Confusion Matrix**" + ], + "metadata": { + "id": "Jbeyf8dzk3MP" + }, + "id": "Jbeyf8dzk3MP" + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_train, y_train_preds_wv)" + ], + "metadata": { + "id": "lIh2fXcwxJ0G", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "711755dc-7004-4b9d-cb30-b06839893f72" + }, + "id": "lIh2fXcwxJ0G", + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_test, y_test_preds_wv)" + ], + "metadata": { + "id": "djUVsYwYYBJd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "b02647ff-0825-4c59-8c4b-bdc26cc7a9e1" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "id": "djUVsYwYYBJd" + }, + { + "cell_type": "markdown", + "source": [ + "**Classification Report**" + ], + "metadata": { + "id": "1NqSOfNd1UmS" + }, + "id": "1NqSOfNd1UmS" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5qzE4NHS1UmS", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "68bd5ff5-8663-49d2-cf32-0907384a849a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.636888 0.636888 0.664626 0.516128\n" + ] + } + ], + "source": [ + "#Calculating different metrics on training data\n", + "NN_train_wv = model_performance_classification_sklearn(y_train,y_train_preds_wv)\n", + "print(\"Training performance:\\n\", NN_train_wv)" + ], + "id": "5qzE4NHS1UmS" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4Nr34HI31UmT", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b07a23fe-a40e-4497-8df2-05e18239480e" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Testing performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.760563 0.760563 0.804628 0.669911\n" + ] + } + ], + "source": [ + "#Calculating different metrics on test data\n", + "NN_test_wv = model_performance_classification_sklearn(y_test, y_test_preds_wv)\n", + "print(\"Testing performance:\\n\",NN_test_wv)" + ], + "id": "4Nr34HI31UmT" + }, + { + "cell_type": "markdown", + "source": [ + "#### **Building a Neural Network Model using text embeddings obtained from the Sentence Transformer**" + ], + "metadata": { + "id": "bcXtMsPu3JfI" + }, + "id": "bcXtMsPu3JfI" + }, + { + "cell_type": "code", + "source": [ + "# Convert the labels\n", + "label_mapping = {1: 2, -1: 0, 0: 1}\n", + "y_train_mapped_st = [label_mapping[label] for label in y_train]\n", + "y_test_mapped_st = [label_mapping[label] for label in y_test]\n", + "\n", + "# Convert your features DataFrame to a NumPy array\n", + "X_train_st_np = np.array(X_train_st)\n", + "X_test_st_np = np.array(X_test_st)\n", + "y_train_mapped_st = np.array(y_train_mapped_st)\n", + "y_test_mapped_st = np.array(y_test_mapped_st)" + ], + "metadata": { + "id": "FUfjCAua4A2-" + }, + "execution_count": null, + "outputs": [], + "id": "FUfjCAua4A2-" + }, + { + "cell_type": "code", + "source": [ + "import gc\n", + "\n", + "# Clear previous sessions\n", + "tf.keras.backend.clear_session()\n", + "gc.collect()\n", + "\n", + "# Define the model\n", + "model = Sequential()\n", + "model.add(Dense(128, activation='relu', input_shape=(X_train_st.shape[1],)))\n", + "model.add(Dropout(0.3))\n", + "model.add(Dense(64, activation='relu'))\n", + "model.add(Dense(3, activation='softmax')) # 3 classes (positive, negative, neutral)\n", + "\n", + "# Compile the model\n", + "model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['Accuracy'])\n", + "\n", + "# Summary\n", + "model.summary()" + ], + "metadata": { + "id": "ziE6DVHA4A2-", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 257 + }, + "outputId": "55b5e48b-d3dd-4986-ce74-027c5f902891" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1mModel: \"sequential\"\u001b[0m\n" + ], + "text/html": [ + "
Model: \"sequential\"\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n", + "│ dense (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m49,280\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dropout (\u001b[38;5;33mDropout\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m128\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dense_1 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m64\u001b[0m) │ \u001b[38;5;34m8,256\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dense_2 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m3\u001b[0m) │ \u001b[38;5;34m195\u001b[0m │\n", + "└─────────────────────────────────┴────────────────────────┴───────────────┘\n" + ], + "text/html": [ + "
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n",
+              "┃ Layer (type)                     Output Shape                  Param # ┃\n",
+              "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n",
+              "│ dense (Dense)                   │ (None, 128)            │        49,280 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dropout (Dropout)               │ (None, 128)            │             0 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dense_1 (Dense)                 │ (None, 64)             │         8,256 │\n",
+              "├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
+              "│ dense_2 (Dense)                 │ (None, 3)              │           195 │\n",
+              "└─────────────────────────────────┴────────────────────────┴───────────────┘\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Total params: \u001b[0m\u001b[38;5;34m57,731\u001b[0m (225.51 KB)\n" + ], + "text/html": [ + "
 Total params: 57,731 (225.51 KB)\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m57,731\u001b[0m (225.51 KB)\n" + ], + "text/html": [ + "
 Trainable params: 57,731 (225.51 KB)\n",
+              "
\n" + ] + }, + "metadata": {} + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n" + ], + "text/html": [ + "
 Non-trainable params: 0 (0.00 B)\n",
+              "
\n" + ] + }, + "metadata": {} + } + ], + "id": "ziE6DVHA4A2-" + }, + { + "cell_type": "code", + "source": [ + "# Fitting the model\n", + "history = model.fit(\n", + " X_train_st_np, y_train_mapped_st,\n", + " epochs=15,\n", + " batch_size=32\n", + ")" + ], + "metadata": { + "id": "8J-JncGj4A2_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "3faf37bb-8f28-4662-8b16-ca054230d4cd" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Epoch 1/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 5ms/step - Accuracy: 0.6300 - loss: 1.0422\n", + "Epoch 2/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.6496 - loss: 0.8380 \n", + "Epoch 3/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.6464 - loss: 0.7074 \n", + "Epoch 4/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.6099 - loss: 0.7004 \n", + "Epoch 5/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.6664 - loss: 0.6625 \n", + "Epoch 6/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.6915 - loss: 0.6035 \n", + "Epoch 7/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.7356 - loss: 0.5585 \n", + "Epoch 8/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step - Accuracy: 0.7714 - loss: 0.5521 \n", + "Epoch 9/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 7ms/step - Accuracy: 0.7655 - loss: 0.5158 \n", + "Epoch 10/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 9ms/step - Accuracy: 0.7861 - loss: 0.4997 \n", + "Epoch 11/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - Accuracy: 0.8121 - loss: 0.4551\n", + "Epoch 12/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 8ms/step - Accuracy: 0.7897 - loss: 0.4756 \n", + "Epoch 13/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 8ms/step - Accuracy: 0.8454 - loss: 0.4200 \n", + "Epoch 14/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 7ms/step - Accuracy: 0.8362 - loss: 0.4135 \n", + "Epoch 15/15\n", + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 7ms/step - Accuracy: 0.8560 - loss: 0.3473 \n" + ] + } + ], + "id": "8J-JncGj4A2_" + }, + { + "cell_type": "markdown", + "source": [ + "#### **Checking Training and Test Performance**" + ], + "metadata": { + "id": "rbsZ24gM4A2_" + }, + "id": "rbsZ24gM4A2_" + }, + { + "cell_type": "code", + "source": [ + "# Predict class probabilities on training data\n", + "y_train_pred_probs = model.predict(X_train_st_np)\n", + "\n", + "# Convert probabilities to class labels\n", + "y_train_preds_st = tf.argmax(y_train_pred_probs, axis=1).numpy()" + ], + "metadata": { + "id": "xaWGws3r4A2_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c3f459d7-9745-44f3-bfd6-0df051411294" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[1m11/11\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step\n" + ] + } + ], + "id": "xaWGws3r4A2_" + }, + { + "cell_type": "code", + "source": [ + "# Predict class probabilities on test data\n", + "y_test_pred_probs = model.predict(X_test_st_np)\n", + "\n", + "# Convert probabilities to class labels\n", + "y_test_preds_st = tf.argmax(y_test_pred_probs, axis=1).numpy()" + ], + "metadata": { + "id": "P8yF-MWH4A2_", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0bfe417b-bb4a-446f-8ee0-7fde7e16f62a" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[1m3/3\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 14ms/step\n" + ] + } + ], + "id": "P8yF-MWH4A2_" + }, + { + "cell_type": "code", + "source": [ + "# Convert back to [-1, 0, 1] to match utility function expectations\n", + "label_mapping = {2: 1, 0: -1, 1: 0}\n", + "y_train_preds_st = np.array([label_mapping[index] for index in y_train_preds_st])\n", + "y_test_preds_st = np.array([label_mapping[index] for index in y_test_preds_st])" + ], + "metadata": { + "id": "YbwmP-dE4A3A" + }, + "execution_count": null, + "outputs": [], + "id": "YbwmP-dE4A3A" + }, + { + "cell_type": "markdown", + "source": [ + "**Confusion Matrix**" + ], + "metadata": { + "id": "YoVyydZW4A3A" + }, + "id": "YoVyydZW4A3A" + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_train, y_train_preds_st)" + ], + "metadata": { + "id": "I2yC2oAB4A3A", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "c678cd56-9ae7-4a9c-becf-e1dfa09973d1" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "id": "I2yC2oAB4A3A" + }, + { + "cell_type": "code", + "source": [ + "plot_confusion_matrix(y_test, y_test_preds_st)" + ], + "metadata": { + "collapsed": true, + "id": "mZLu8fRk4A3A", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 410 + }, + "outputId": "e96330e8-96aa-476e-b47d-e4d5bcc6a561" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbMAAAGJCAYAAAAADN1MAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjAsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvlHJYcgAAAAlwSFlzAAAPYQAAD2EBqD+naQAAM6VJREFUeJzt3XlcVPX+P/DXgDAg+yKbyuKGmluiKZrigrkHoql5b4FZqaGpuIU3U7TCS+aeS11DMkmvG2mZRm5ooimKmim5YNpFEFBAEAaC8/vDr/NrBHQGBmbOOa9nj/N4xGfOnPM+Q/nyfc5nzlEIgiCAiIhIxEwMXQAREVFtMcyIiEj0GGZERCR6DDMiIhI9hhkREYkew4yIiESPYUZERKLHMCMiItFjmBERkegxzEhUrl69ipdeegl2dnZQKBRISEjQ6/Zv3rwJhUKBTZs26XW7YtanTx/06dPH0GUQPRXDjHR2/fp1TJw4Ec2aNYOFhQVsbW3Rs2dPrFy5EsXFxXW679DQUFy8eBEfffQRNm/ejC5dutTp/upTWFgYFAoFbG1tq/wcr169CoVCAYVCgaVLl+q8/YyMDCxcuBCpqal6qJbIuDQwdAEkLt9//z1eeeUVKJVKvP7662jXrh1KS0tx/PhxzJ49G5cuXcLnn39eJ/suLi5GcnIy/vWvf2HKlCl1sg8vLy8UFxfDzMysTrb/LA0aNMDDhw+xd+9ejB49WuO1LVu2wMLCAiUlJTXadkZGBqKiouDt7Y1OnTpp/b4ff/yxRvsjqk8MM9Jaeno6xo4dCy8vLxw6dAju7u7q18LDw3Ht2jV8//33dbb/7OxsAIC9vX2d7UOhUMDCwqLOtv8sSqUSPXv2xDfffFMpzOLj4zF06FDs3LmzXmp5+PAhGjZsCHNz83rZH1Ft8DQjaS0mJgaFhYXYuHGjRpA91qJFC0ybNk39819//YXFixejefPmUCqV8Pb2xrx586BSqTTe5+3tjWHDhuH48eN44YUXYGFhgWbNmuGrr75Sr7Nw4UJ4eXkBAGbPng2FQgFvb28Aj07PPf73v1u4cCEUCoXGWGJiIl588UXY29vD2toavr6+mDdvnvr16q6ZHTp0CL169YKVlRXs7e0RFBSEy5cvV7m/a9euISwsDPb29rCzs8P48ePx8OHD6j/YJ4wbNw4//PAD8vLy1GOnT5/G1atXMW7cuErr37t3D7NmzUL79u1hbW0NW1tbDB48GOfPn1evc+TIEXTt2hUAMH78ePXpysfH2adPH7Rr1w4pKSno3bs3GjZsqP5cnrxmFhoaCgsLi0rHP3DgQDg4OCAjI0PrYyXSF4YZaW3v3r1o1qwZevToodX6b775Jj744AN07twZy5cvR0BAAKKjozF27NhK6167dg2jRo3CgAED8Omnn8LBwQFhYWG4dOkSACAkJATLly8HALz66qvYvHkzVqxYoVP9ly5dwrBhw6BSqbBo0SJ8+umnePnll/Hzzz8/9X0//fQTBg4ciLt372LhwoWIiIjAiRMn0LNnT9y8ebPS+qNHj8aDBw8QHR2N0aNHY9OmTYiKitK6zpCQECgUCuzatUs9Fh8fj9atW6Nz586V1r9x4wYSEhIwbNgwLFu2DLNnz8bFixcREBCgDpY2bdpg0aJFAIC3334bmzdvxubNm9G7d2/1dnJzczF48GB06tQJK1asQN++fausb+XKlWjUqBFCQ0NRXl4OANiwYQN+/PFHrF69Gh4eHlofK5HeCERayM/PFwAIQUFBWq2fmpoqABDefPNNjfFZs2YJAIRDhw6px7y8vAQAQlJSknrs7t27glKpFGbOnKkeS09PFwAIn3zyicY2Q0NDBS8vr0o1LFiwQPj7f+LLly8XAAjZ2dnV1v14H7GxseqxTp06CS4uLkJubq567Pz584KJiYnw+uuvV9rfG2+8obHNESNGCE5OTtXu8+/HYWVlJQiCIIwaNUro37+/IAiCUF5eLri5uQlRUVFVfgYlJSVCeXl5peNQKpXCokWL1GOnT5+udGyPBQQECACE9evXV/laQECAxtiBAwcEAMKHH34o3LhxQ7C2thaCg4OfeYxEdYWdGWmloKAAAGBjY6PV+vv27QMAREREaIzPnDkTACpdW2vbti169eql/rlRo0bw9fXFjRs3alzzkx5fa/v2229RUVGh1Xvu3LmD1NRUhIWFwdHRUT3eoUMHDBgwQH2cfzdp0iSNn3v16oXc3Fz1Z6iNcePG4ciRI8jMzMShQ4eQmZlZ5SlG4NF1NhOTR/8rl5eXIzc3V30K9ezZs1rvU6lUYvz48Vqt+9JLL2HixIlYtGgRQkJCYGFhgQ0bNmi9LyJ9Y5iRVmxtbQEADx480Gr9P/74AyYmJmjRooXGuJubG+zt7fHHH39ojHt6elbahoODA+7fv1/DiisbM2YMevbsiTfffBOurq4YO3Ys/vvf/z412B7X6evrW+m1Nm3aICcnB0VFRRrjTx6Lg4MDAOh0LEOGDIGNjQ22bduGLVu2oGvXrpU+y8cqKiqwfPlytGzZEkqlEs7OzmjUqBEuXLiA/Px8rffZuHFjnSZ7LF26FI6OjkhNTcWqVavg4uKi9XuJ9I1hRlqxtbWFh4cHfv31V53e9+QEjOqYmppWOS4IQo338fh6zmOWlpZISkrCTz/9hNdeew0XLlzAmDFjMGDAgErr1kZtjuUxpVKJkJAQxMXFYffu3dV2ZQDw8ccfIyIiAr1798bXX3+NAwcOIDExEc8995zWHSjw6PPRxblz53D37l0AwMWLF3V6L5G+McxIa8OGDcP169eRnJz8zHW9vLxQUVGBq1evaoxnZWUhLy9PPTNRHxwcHDRm/j32ZPcHACYmJujfvz+WLVuG3377DR999BEOHTqEw4cPV7ntx3WmpaVVeu3KlStwdnaGlZVV7Q6gGuPGjcO5c+fw4MGDKifNPLZjxw707dsXGzduxNixY/HSSy8hMDCw0mei7V8stFFUVITx48ejbdu2ePvttxETE4PTp0/rbftEumKYkdbmzJkDKysrvPnmm8jKyqr0+vXr17Fy5UoAj06TAag043DZsmUAgKFDh+qtrubNmyM/Px8XLlxQj925cwe7d+/WWO/evXuV3vv4y8NPfl3gMXd3d3Tq1AlxcXEa4fDrr7/ixx9/VB9nXejbty8WL16MNWvWwM3Nrdr1TE1NK3V927dvx//+9z+NscehW1Xw62ru3Lm4desW4uLisGzZMnh7eyM0NLTaz5GorvFL06S15s2bIz4+HmPGjEGbNm007gBy4sQJbN++HWFhYQCAjh07IjQ0FJ9//jny8vIQEBCAX375BXFxcQgODq522ndNjB07FnPnzsWIESPw7rvv4uHDh1i3bh1atWqlMQFi0aJFSEpKwtChQ+Hl5YW7d+9i7dq1aNKkCV588cVqt//JJ59g8ODB8Pf3x4QJE1BcXIzVq1fDzs4OCxcu1NtxPMnExATvv//+M9cbNmwYFi1ahPHjx6NHjx64ePEitmzZgmbNmmms17x5c9jb22P9+vWwsbGBlZUVunXrBh8fH53qOnToENauXYsFCxaovyoQGxuLPn36YP78+YiJidFpe0R6YeDZlCRCv//+u/DWW28J3t7egrm5uWBjYyP07NlTWL16tVBSUqJer6ysTIiKihJ8fHwEMzMzoWnTpkJkZKTGOoLwaGr+0KFDK+3nySnh1U3NFwRB+PHHH4V27doJ5ubmgq+vr/D1119Xmpp/8OBBISgoSPDw8BDMzc0FDw8P4dVXXxV+//33Svt4cvr6Tz/9JPTs2VOwtLQUbG1theHDhwu//fabxjqP9/fk1P/Y2FgBgJCenl7tZyoImlPzq1Pd1PyZM2cK7u7ugqWlpdCzZ08hOTm5yin13377rdC2bVuhQYMGGscZEBAgPPfcc1Xu8+/bKSgoELy8vITOnTsLZWVlGuvNmDFDMDExEZKTk596DER1QSEIOlyVJiIiMkK8ZkZERKLHMCMiItFjmBERkegxzIiISPQYZkREJHoMMyIiEj2GGRERiZ4k7wBy9qb2j9qg+te2ia2hS6Bq3MrV/onYVL9auTbU6/Ysn59S4/cWn1ujx0r0Q5JhRkREz6CQ1ok5hhkRkRzp8SkKxoBhRkQkRxLrzKR1NEREJEvszIiI5IinGYmISPQkdpqRYUZEJEfszIiISPTYmRERkehJrDOTVjQTEZEssTMjIpIjnmYkIiLRk9hpRoYZEZEcsTMjIiLRY2dGRESiJ7HOTFpHQ0REssTOjIhIjiTWmTHMiIjkyITXzIiISOzYmRERkehxNiMREYmexDozaR0NERHJEjszIiI54mlGIiISPYmdZmSYERHJETszIiISPXZmREQkehLrzKQVzUREJEvszIiI5IinGYmISPQkdpqRYUZEJEfszIiISPQYZkREJHoSO80orWgmIiJZYmdGRCRHEjvNKK2jISIi7SgUNV9qaMmSJVAoFJg+fbp6rKSkBOHh4XBycoK1tTVGjhyJrKwsnbfNMCMikiOFSc2XGjh9+jQ2bNiADh06aIzPmDEDe/fuxfbt23H06FFkZGQgJCRE5+0zzIiI5KgeO7PCwkL84x//wBdffAEHBwf1eH5+PjZu3Ihly5ahX79+8PPzQ2xsLE6cOIGTJ0/qtA+GGRGRDCkUihovKpUKBQUFGotKpap2X+Hh4Rg6dCgCAwM1xlNSUlBWVqYx3rp1a3h6eiI5OVmn42GYERGRTqKjo2FnZ6exREdHV7nu1q1bcfbs2Spfz8zMhLm5Oezt7TXGXV1dkZmZqVNNnM1IRCRDilpM5IiMjERERITGmFKprLTe7du3MW3aNCQmJsLCwqLG+9MGw4yISI5q8Z1ppVJZZXg9KSUlBXfv3kXnzp3VY+Xl5UhKSsKaNWtw4MABlJaWIi8vT6M7y8rKgpubm041McyIiGSoNp2Ztvr374+LFy9qjI0fPx6tW7fG3Llz0bRpU5iZmeHgwYMYOXIkACAtLQ23bt2Cv7+/TvtimBERyVB9hJmNjQ3atWunMWZlZQUnJyf1+IQJExAREQFHR0fY2tpi6tSp8Pf3R/fu3XXaF8OMiEiG6iPMtLF8+XKYmJhg5MiRUKlUGDhwINauXavzdhSCIAh1UJ9Bnb1ZYOgSdHb54ll8t30zbly9grx7OYhY8Am69uijfl0QBOz4agMO7U9AUWEhfNt2wBvvvgf3xp6GK7qG2jaxNXQJerE1fgviYjciJycbrXxb471589H+iS+Eis2t3IeGLqFWJowegruZdyqNDwkejckRkQaoSH9auTbU6/Zsx35V4/cWbH1dj5XoB6fmGwlVSTE8m7XCG1PmVPn63v9+hf3fbsOEqZFYvDIWSgtLLJk3FaWl1X+3g+rO/h/2YWlMNCa+E46t23fD17c1Jk+cgNzcXEOXJmvLPv8aX+1OVC+Ll60DALzYd4CBKzM+tfmemTEy2jDLysrCokWLDF1GvenUtSfGhE1G1559K70mCAJ+SPgGI159A116BMCrWUu8MycK93NzcObEUQNUS5vjYhEyajSCR4xE8xYt8P6CKFhYWCBh105DlyZrdvaOcHByVi+nTxyDe+OmaNfJz9ClGR9FLRYjZLRhlpmZiaioKEOXYRTuZv4Pefdy0a7zC+qxhlbWaN76OVy9fMGAlclTWWkpLv92Cd39e6jHTExM0L17D1w4f86AldHflZWV4XDiPgQOCTLabsKQpNaZGWwCyIULT/9DOC0tTavtqFSqSrdRKVWpYK7FdyDEIv/eo1NXdvZOGuN29k7Iu8fTWvXtft59lJeXw8lJ8/fh5OSE9PQbBqqKnnTy2GEUFT5A/8HDDV2KUTLWUKopg4VZp06doFAoUNX8k8fj2nzY0dHRlTq4t6e9h4nTxX2xl4hqJ/H7BPh16wknZxdDl2KUGGZ64ujoiJiYGPTv37/K1y9duoThw5/9N6qqbqvy2x1pTYqwc3zUAeTn5cLByVk9np+XC+/mrQxVlmw52DvA1NS00mSP3NxcODs7V/Muqk93MzNwPuUUIhcvNXQpVE8MFmZ+fn7IyMiAl5dXla/n5eVV2bU9qarbqpjfE9/U/KdxcWsMe0cn/HruNLyb+wIAHhYV4vqVSxgwbJSBq5MfM3NztGn7HE6dTEa//o/u9l1RUYFTp5Ix9tV/Grg6AoCf9u2Bnb0juvr3MnQpRoudmZ5MmjQJRUVF1b7u6emJ2NjYeqzIsEqKHyIz47b65+zMDNy8ngZrGzs4u7hhcPCrSPjmS7g1bgoXt8bYHrceDk7O6NIjwIBVy9droeMxf95cPPdcO7Rr3wFfb45DcXExgkfo/lBB0q+Kigr89MO36DdoGEwb8L4Q1ZJWlhkuzEaMGFFp7Oeff0aXLl2gVCrh4OCA0NBQA1RmGDd+v4zFcyapf968YTkAoPeAoZg8ayGGj34dqpJi/Gflx3hYWAjf5zrivY9WwdxcOhNdxGTQ4CG4f+8e1q5ZhZycbPi2boO1G/4DJ55mNLjUM6eQnZWJAUODDV2KUZNaZ2ZUdwCxtbVFamoqmjVrVqvtiPEOIHIilTuASJHY7wAiZfq+A0ij8dtq/N7s2DF6rEQ/jKoHN6JcJSKSNKl1Zkb7pWkiIiJtGVVntmHDBri6uhq6DCIi6ZNWY2ZcYTZu3DhDl0BEJAtSO81oVGFGRET1g2FGRESixzAjIiLRk1qYcTYjERGJHjszIiI5klZjxjAjIpIjqZ1mZJgREckQw4yIiERPamHGCSBERCR67MyIiORIWo0Zw4yISI6kdpqRYUZEJEMMMyIiEj2GGRERiZ7UwoyzGYmISPTYmRERyZG0GjOGGRGRHEntNCPDjIhIhhhmREQkehLLMoYZEZEcSa0z42xGIiISPXZmREQyJLHGjGFGRCRHUjvNyDAjIpIhiWUZw4yISI5MTKSVZgwzIiIZklpnxtmMREQkeuzMiIhkiBNAiIhI9CSWZQwzIiI5YmdGRESixzAjIiLRk1iWcTYjERGJHzszIiIZ4mlGIiISPYllGcOMiEiO2JkREZHoSSzLGGZERHIktc6MsxmJiEj02JkREcmQxBozhhkRkRxJ7TSjJMOslbuNoUsgEiVPp4aGLoHqicSyjNfMiIjkSKFQ1HjRxbp169ChQwfY2trC1tYW/v7++OGHH9Svl5SUIDw8HE5OTrC2tsbIkSORlZWl8/EwzIiIZEihqPmiiyZNmmDJkiVISUnBmTNn0K9fPwQFBeHSpUsAgBkzZmDv3r3Yvn07jh49ioyMDISEhOh+PIIgCDq/y8gVqiR3SJLSwFRi5zeI6oGFni8K9YhJqvF7T8zpXat9Ozo64pNPPsGoUaPQqFEjxMfHY9SoUQCAK1euoE2bNkhOTkb37t213qYkr5kREdHT1WYCiEqlgkql0hhTKpVQKpVPfV95eTm2b9+OoqIi+Pv7IyUlBWVlZQgMDFSv07p1a3h6euocZjzNSEQkQ7U5zRgdHQ07OzuNJTo6utp9Xbx4EdbW1lAqlZg0aRJ2796Ntm3bIjMzE+bm5rC3t9dY39XVFZmZmTodDzszIiIZqk1nFhkZiYiICI2xp3Vlvr6+SE1NRX5+Pnbs2IHQ0FAcPXq0xvuvCsOMiEiGahNm2pxS/Dtzc3O0aNECAODn54fTp09j5cqVGDNmDEpLS5GXl6fRnWVlZcHNzU2nmniakYhIhuprNmNVKioqoFKp4OfnBzMzMxw8eFD9WlpaGm7dugV/f3+dtsnOjIiI6kxkZCQGDx4MT09PPHjwAPHx8Thy5AgOHDgAOzs7TJgwAREREXB0dIStrS2mTp0Kf39/nSZ/AAwzIiJZqq/bWd29exevv/467ty5Azs7O3To0AEHDhzAgAEDAADLly+HiYkJRo4cCZVKhYEDB2Lt2rU674ffM6N6x++ZEelO398z67vyRI3fe3haDz1Woh/szIiIZIg3GiYiItGTWJYxzIiI5MhEYmnGqflERCR67MyIiGRIYo0Zw4yISI44AYSIiETPRFpZxjAjIpIjdmZERCR6EssyzmYkIiLxY2dGRCRDCkirNWOYERHJECeAEBGR6HECCBERiZ7EsoxhRkQkR7w3IxERkZFhZ0ZEJEMSa8wYZkREcsQJIEREJHoSyzKGGRGRHEltAgjDjIhIhqQVZVqG2Z49e7Te4Msvv1zjYoiIiGpCqzALDg7WamMKhQLl5eW1qYeIiOqBLCeAVFRU1HUdRERUj3hvRiIiEj1ZdmZPKioqwtGjR3Hr1i2UlpZqvPbuu+/qpTAiIqo7Essy3cPs3LlzGDJkCB4+fIiioiI4OjoiJycHDRs2hIuLC8OMiEgEpNaZ6XxvxhkzZmD48OG4f/8+LC0tcfLkSfzxxx/w8/PD0qVL66JGIiKip9I5zFJTUzFz5kyYmJjA1NQUKpUKTZs2RUxMDObNm1cXNRIRkZ6ZKGq+GCOdw8zMzAwmJo/e5uLiglu3bgEA7OzscPv2bf1WR0REdUKhUNR4MUY6XzN7/vnncfr0abRs2RIBAQH44IMPkJOTg82bN6Ndu3Z1USMREemZcUZSzencmX388cdwd3cHAHz00UdwcHDA5MmTkZ2djc8//1zvBRIRkf6ZKBQ1XoyRzp1Zly5d1P/u4uKC/fv367UgIiIiXfFL00REMmSkDVaN6RxmPj4+T70AeOPGjVoVRI+cPXMaX23aiMuXLyEnOxtLV6xB336Bhi6L/mZr/BbExW5ETk42Wvm2xnvz5qN9hw6GLovA3402jHUiR03pHGbTp0/X+LmsrAznzp3D/v37MXv2bH3VJXvFxcVo5dsaL48Yidkzphq6HHrC/h/2YWlMNN5fEIX27Ttiy+Y4TJ44Ad9+tx9OTk6GLk/W+LvRjsSyTPcwmzZtWpXjn332Gc6cOVPrguiRnr16o2ev3oYug6qxOS4WIaNGI3jESADA+wuikJR0BAm7dmLCW28buDp54+9GO8Y6kaOmdJ7NWJ3Bgwdj586d+tockdEqKy3F5d8uobt/D/WYiYkJunfvgQvnzxmwMuLvRnsKRc0XY6S3CSA7duyAo6OjTu/JycnBl19+ieTkZGRmZgIA3Nzc0KNHD4SFhaFRo0b6Ko9Ib+7n3Ud5eXmlU1ZOTk5IT+c1Y0Pi70a+avSl6b9fOBQEAZmZmcjOzsbatWu13s7p06cxcOBANGzYEIGBgWjVqhUAICsrC6tWrcKSJUtw4MABja8CVEWlUkGlUmmMlcEcSqVSh6MiIpIX2U8ACQoK0vgQTExM0KhRI/Tp0wetW7fWejtTp07FK6+8gvXr11f6UAVBwKRJkzB16lQkJyc/dTvR0dGIiorSGIv81weYN3+h1rUQ6cLB3gGmpqbIzc3VGM/NzYWzs7OBqiKAvxtd6O0ak5HQOcwWLlyolx2fP38emzZtqvJvBwqFAjNmzMDzzz//zO1ERkYiIiJCY6wM5nqpkagqZubmaNP2OZw6mYx+/R99XaKiogKnTiVj7Kv/NHB18sbfjfZk35mZmprizp07cHFx0RjPzc2Fi4sLysvLtdqOm5sbfvnll2q7uV9++QWurq7P3I5Sqax0SrFQJWhVgzF7+LAIt//vJs4AkPG/P5F25TJs7ezg7u5hwMoIAF4LHY/58+biuefaoV37Dvh6cxyKi4sRPCLE0KXJHn832jHWu9/XlM5hJghVB4VKpYK5ufYd0axZs/D2228jJSUF/fv3VwdXVlYWDh48iC+++ELWz0f77dKvmDghVP3zsk+WAACGvRyMqA+XGKos+j+DBg/B/Xv3sHbNKuTkZMO3dRus3fAfOPFUlsHxd6MdqYWZQqgunZ6watUqAI8ezrl48WJYW1urXysvL0dSUhJu3ryJc+e0n/66bds2LF++HCkpKeqOztTUFH5+foiIiMDo0aN1ORY1KXRmUtbAVGL/FxHVAws933wwYs+VGr932cvaz4+oL1qHmY+PDwDgjz/+QJMmTWBqaqp+zdzcHN7e3li0aBG6deumcxFlZWXIyckBADg7O8PMzEznbfwdw8y4McyIdKfvMJu5N63G7/10uK8eK9EPrT+e9PR0AEDfvn2xa9cuODg46K0IMzMz9WNliIio7kntNKPOWX/48OG6qIOIiOqRxCYz6v5Vg5EjR+Lf//53pfGYmBi88soreimKiIjqltQezqlzmCUlJWHIkCGVxgcPHoykpCS9FEVERHXLpBaLMdK5rsLCwiqn4JuZmaGgoEAvRREREelC5zBr3749tm3bVml869ataNu2rV6KIiKiuiX7u+bPnz8fISEhuH79Ovr16wcAOHjwIOLj47Fjxw69F0hERPpnrNe+akrnMBs+fDgSEhLw8ccfY8eOHbC0tETHjh1x6NAhnR8BQ0REhiGxLKvZ88yGDh2KoUOHAgAKCgrwzTffYNasWRp38iAiIuMlte+Z1XhiSlJSEkJDQ+Hh4YFPP/0U/fr1w8mTJ/VZGxER1RFZT83PzMzEkiVL0LJlS7zyyiuwtbWFSqVCQkIClixZgq5du9ZVnUREJELR0dHo2rUrbGxs4OLiguDgYKSlad5Kq6SkBOHh4XBycoK1tTVGjhyJrKwsnfajdZgNHz4cvr6+uHDhAlasWIGMjAysXr1ap50REZFxqK/ZjEePHkV4eDhOnjyJxMRElJWV4aWXXkJRUZF6nRkzZmDv3r3Yvn07jh49ioyMDISE6PbIHq1vNNygQQO8++67mDx5Mlq2bKkeNzMzw/nz541qWj5vNGzceKNhIt3p+0bDHx28VuP3/qt/ixq/Nzs7Gy4uLjh69Ch69+6N/Px8NGrUCPHx8Rg1ahQA4MqVK2jTpg2Sk5PRvXt3rbardWd2/PhxPHjwAH5+fujWrRvWrFmjvtM9ERGJi6IW/6hUKhQUFGgsKpVKq/3m5+cDgHr2e0pKCsrKyhAYGKhep3Xr1vD09ERycrLWx6N1mHXv3h1ffPEF7ty5g4kTJ2Lr1q3w8PBARUUFEhMT8eDBA613SkREhmWiqPkSHR0NOzs7jSU6OvqZ+6yoqMD06dPRs2dPtGvXDsCjuRjm5uawt7fXWNfV1RWZmZnaH49ORw/AysoKb7zxBo4fP46LFy9i5syZWLJkCVxcXPDyyy/rujkiIjKA2oRZZGQk8vPzNZbIyMhn7jM8PBy//vortm7dqv/jqc2bfX19ERMTgz///BPffPONvmoiIiIjplQqYWtrq7EolcqnvmfKlCn47rvvcPjwYTRp0kQ97ubmhtLSUuTl5Wmsn5WVBTc3N61r0ssNkE1NTREcHIw9e/boY3NERFTHFApFjRddCIKAKVOmYPfu3Th06BB8fHw0Xvfz84OZmRkOHjyoHktLS8OtW7fg7++v9X70PD+GiIjEoL7uABIeHo74+Hh8++23sLGxUV8Hs7Ozg6WlJezs7DBhwgRERETA0dERtra2mDp1Kvz9/bWeyQgwzIiIZKm+buSxbt06AECfPn00xmNjYxEWFgYAWL58OUxMTDBy5EioVCoMHDgQa9eu1Wk/Wn/PTEz4PTPjxu+ZEelO398zW3Esvcbvnd7L59kr1TN2ZkREMsQbDRMRERkZdmZERDJkpDe/rzGGGRGRDJlAWmnGMCMikiF2ZkREJHpSmwDCMCMikiFjfWJ0TXE2IxERiR47MyIiGZJYY8YwIyKSI6mdZmSYERHJkMSyjGFGRCRHUpswwTAjIpIhXZ9LZuykFs5ERCRD7MyIiGRIWn0Zw4yISJY4m5GIiERPWlHGMCMikiWJNWYMMyIiOeJsRiIiIiPDzoyISIak1skwzIiIZEhqpxkZZkREMiStKGOYERHJEjszEcjIKzZ0CfQUnk4NDV0CVcOh6xRDl0DVKD63Rq/bk9o1M6kdDxERyZAkOzMiIno6nmYkIiLRk1aUMcyIiGRJYo0Zw4yISI5MJNabMcyIiGRIap0ZZzMSEZHosTMjIpIhBU8zEhGR2EntNCPDjIhIhjgBhIiIRI+dGRERiZ7UwoyzGYmISPTYmRERyRBnMxIRkeiZSCvLGGZERHLEzoyIiESPE0CIiIiMDDszIiIZ4mlGIiISPU4AISIi0WNnRkREoie1CSAMMyIiGZJYlnE2IxERiR87MyIiGTKR2HlGhhkRkQxJK8oYZkRE8iSxNGOYERHJEKfmExGR6EnskhlnMxIRkfixMyMikiGJNWbszIiIZElRi0UHSUlJGD58ODw8PKBQKJCQkKDxuiAI+OCDD+Du7g5LS0sEBgbi6tWrOh8Ow4yISIYUtfhHF0VFRejYsSM+++yzKl+PiYnBqlWrsH79epw6dQpWVlYYOHAgSkpKdNoPTzMSEclQfU0AGTx4MAYPHlzla4IgYMWKFXj//fcRFBQEAPjqq6/g6uqKhIQEjB07Vuv9sDMjIpKh2pxlVKlUKCgo0FhUKpXONaSnpyMzMxOBgYHqMTs7O3Tr1g3Jyck6bYthRkREOomOjoadnZ3GEh0drfN2MjMzAQCurq4a466ururXtMXTjEREclSL04yRkZGIiIjQGFMqlbUsqHYYZkREMlSbO4AolUq9hJebmxsAICsrC+7u7urxrKwsdOrUSadt8TQjEZEMKRQ1X/TFx8cHbm5uOHjwoHqsoKAAp06dgr+/v07bYmdGRCRD9fWl6cLCQly7dk39c3p6OlJTU+Ho6AhPT09Mnz4dH374IVq2bAkfHx/Mnz8fHh4eCA4O1mk/DDMiIjmqpzQ7c+YM+vbtq/758bW20NBQbNq0CXPmzEFRURHefvtt5OXl4cUXX8T+/fthYWGh034UgiAIeq3cCPye9dDQJdBTeDo1NHQJVA2HrlMMXQJVo/jcGr1u7/ztBzV+b8emNnqsRD/YmRERyRAfAUNERKIntUfAMMyIiGRIYlnGMDNWE0YPwd3MO5XGhwSPxuSISANURE/aGr8FcbEbkZOTjVa+rfHevPlo36GDocuSrVnjB2Dxu0FYs+UwZi/dqR7v1sEHC8OHoWt7b5SXV+DC7//D8Hc+Q4mqzIDVGgGJpRnDzEgt+/xrVJRXqH/+I/0a5kdMxot9BxiwKnps/w/7sDQmGu8viEL79h2xZXMcJk+cgG+/2w8nJydDlyc7fm09MWFkT1z4/U+N8W4dfPDtmnewNPZHRPx7O/4qr0CHVo1RUSG5eW86k9o1M35p2kjZ2TvCwclZvZw+cQzujZuiXSc/Q5dGADbHxSJk1GgEjxiJ5i1a4P0FUbCwsEDCrp3PfjPplZWlOWI/DsM7i79BXkGxxmsxM0OwdusRLI1NxOUbmbj6x13sTDyH0rK/DFQt1RWGmQiUlZXhcOI+BA4JgkJqV21FqKy0FJd/u4Tu/j3UYyYmJujevQcunD9nwMrkaUXkGOw/9isOn0rTGG/kYI0XOvgg+14hDm+KwM2fPsaP/5mGHp2aGahS42IMdwDRJ6MOs9u3b+ONN9546jpVPYqgtAaPIjBmJ48dRlHhA/QfPNzQpRCA+3n3UV5eXul0opOTE3JycgxUlTy9MtAPnVo3xfzVeyq95tPEGQDwr4lD8OWuEwgKX4vUy7exb8NUNPdsVN+lGp16etB0vTHqMLt37x7i4uKeuk5VjyLYsGppPVVYPxK/T4Bft55wcnYxdClERqOJqz0+mT0S4/+1CarSyqcNTUwe/bG7cedxbN5zEufT/sScT3fh95t3ERqk233/JEliaWbQCSB79lT+29Tf3bhx45nbqOpRBLfyymtVlzG5m5mB8ymnELlYWgEtZg72DjA1NUVubq7GeG5uLpydnQ1Ulfw838YTrk62SI6fqx5r0MAUL3ZujkljeqPDiMUAgMs3NJ+LlZaeiaZuDvVaqzGS2gQQg4ZZcHAwFAoFnnZHrWddI6rqUQTmxdK5ndVP+/bAzt4RXf17GboU+j9m5uZo0/Y5nDqZjH79Hz0ht6KiAqdOJWPsq/80cHXycfiXNPiN+khj7POofyItPQufbkpE+p85yLibh1bemmc0Wni54Meff6vPUo2SsV77qimDnmZ0d3fHrl27UFFRUeVy9uxZQ5ZncBUVFfjph2/Rb9AwmDbgtyiMyWuh47Frx3+xJ2E3bly/jg8XLURxcTGCR4QYujTZKHyowm/X72gsRcWluJdfhN+uP/qO5vK4n/DO2D4YEdgJzZo644N3hsLX2xWbEpINXD3pm0H/hPTz80NKSgqCgoKqfP1ZXZvUpZ45heysTAwYGmzoUugJgwYPwf1797B2zSrk5GTDt3UbrN3wHzjxNKNRWRN/BBZKM8TMHAkHu4a4+Pv/MGzyGqT/yYk6EmvMDHvX/GPHjqGoqAiDBg2q8vWioiKcOXMGAQEBOm2Xd803brxrvvHiXfONl77vml+bPydbuRrf/8MG7cx69Xr6dSArKyudg4yIiJ6NE0CIiEj0pDYBhGFGRCRDEssy4/7SNBERkTbYmRERyZHEWjOGGRGRDHECCBERiR4ngBARkehJLMsYZkREsiSxNONsRiIiEj12ZkREMsQJIEREJHqcAEJERKInsSxjmBERyRE7MyIikgBppRlnMxIRkeixMyMikiGeZiQiItGTWJYxzIiI5IidGRERiR6/NE1EROInrSzjbEYiIhI/dmZERDIkscaMYUZEJEecAEJERKLHCSBERCR+0soyhhkRkRxJLMs4m5GIiMSPnRkRkQxxAggREYkeJ4AQEZHoSa0z4zUzIiISPXZmREQyxM6MiIjIyLAzIyKSIU4AISIi0ZPaaUaGGRGRDEksyxhmRESyJLE04wQQIiISPXZmREQyxAkgREQkepwAQkREoiexLOM1MyIiWVLUYqmBzz77DN7e3rCwsEC3bt3wyy+/1PYINDDMiIhkSFGLf3S1bds2REREYMGCBTh79iw6duyIgQMH4u7du3o7HoYZERHVqWXLluGtt97C+PHj0bZtW6xfvx4NGzbEl19+qbd9MMyIiGRIoaj5olKpUFBQoLGoVKoq91NaWoqUlBQEBgaqx0xMTBAYGIjk5GS9HY8kJ4C0cm1o6BL0RqVSITo6GpGRkVAqlYYuh/5Gir+b4nNrDF2C3kjx96NPFrX403/hh9GIiorSGFuwYAEWLlxYad2cnByUl5fD1dVVY9zV1RVXrlypeRFPUAiCIOhta6R3BQUFsLOzQ35+PmxtbQ1dDv0NfzfGjb+fuqNSqSp1Ykqlssq/NGRkZKBx48Y4ceIE/P391eNz5szB0aNHcerUKb3UJMnOjIiI6k51wVUVZ2dnmJqaIisrS2M8KysLbm5uequJ18yIiKjOmJubw8/PDwcPHlSPVVRU4ODBgxqdWm2xMyMiojoVERGB0NBQdOnSBS+88AJWrFiBoqIijB8/Xm/7YJgZOaVSiQULFvACthHi78a48fdjPMaMGYPs7Gx88MEHyMzMRKdOnbB///5Kk0JqgxNAiIhI9HjNjIiIRI9hRkREoscwIyIi0WOYERGR6DHMjNyuXbvw0ksvwcnJCQqFAqmpqYYuif5PXT/SgmomKSkJw4cPh4eHBxQKBRISEgxdEtUDhpmRKyoqwosvvoh///vfhi6F/qY+HmlBNVNUVISOHTvis88+M3QpVI84NV8kbt68CR8fH5w7dw6dOnUydDmy161bN3Tt2hVr1jy6MW9FRQWaNm2KqVOn4r333jNwdfSYQqHA7t27ERwcbOhSqI6xMyPSUX090oKItMcwI9LR0x5pkZmZaaCqiOSNYWZEtmzZAmtra/Vy7NgxQ5dERCQKvDejEXn55ZfRrVs39c+NGzc2YDVUnfp6pAURaY+dmRGxsbFBixYt1IulpaWhS6Iq1NcjLYhIe+zMjNy9e/dw69YtZGRkAADS0tIAAG5ubuwCDKg+HmlBNVNYWIhr166pf05PT0dqaiocHR3h6elpwMqoTglk1GJjYwUAlZYFCxYYujTZW716teDp6SmYm5sLL7zwgnDy5ElDl0SCIBw+fLjK/2dCQ0MNXRrVIX7PjIiIRI/XzIiISPQYZkREJHoMMyIiEj2GGRERiR7DjIiIRI9hRkREoscwIyIi0WOYERGR6DHMiLQUFham8ZDHPn36YPr06fVex5EjR6BQKJCXl1fv+yYyVgwzEr2wsDAoFAooFAqYm5ujRYsWWLRoEf7666863e+uXbuwePFirdZlABHVLd5omCRh0KBBiI2NhUqlwr59+xAeHg4zMzNERkZqrFdaWgpzc3O97NPR0VEv2yGi2mNnRpKgVCrh5uYGLy8vTJ48GYGBgdizZ4/61OBHH30EDw8P+Pr6AgBu376N0aNHw97eHo6OjggKCsLNmzfV2ysvL0dERATs7e3h5OSEOXPm4MnbmD55mlGlUmHu3Llo2rQplEolWrRogY0bN+LmzZvo27cvAMDBwQEKhQJhYWEAHj06Jjo6Gj4+PrC0tETHjh2xY8cOjf3s27cPrVq1gqWlJfr27atRJxE9wjAjSbK0tERpaSkA4ODBg0hLS0NiYiK+++47lJWVYeDAgbCxscGxY8fw888/w9raGoMGDVK/59NPP8WmTZvw5Zdf4vjx47h37x5279791H2+/vrr+Oabb7Bq1SpcvnwZGzZsgLW1NZo2bYqdO3cCePQInzt37mDlypUAgOjoaHz11VdYv349Ll26hBkzZuCf//wnjh49CuBR6IaEhGD48OFITU3Fm2++iffee6+uPjYi8TLwXfuJai00NFQICgoSBEEQKioqhMTEREGpVAqzZs0SQkNDBVdXV0GlUqnX37x5s+Dr6ytUVFSox1QqlWBpaSkcOHBAEARBcHd3F2JiYtSvl5WVCU2aNFHvRxAEISAgQJg2bZogCIKQlpYmABASExOrrPHxY0nu37+vHispKREaNmwonDhxQmPdCRMmCK+++qogCIIQGRkptG3bVuP1uXPnVtoWkdzxmhlJwnfffQdra2uUlZWhoqIC48aNw8KFCxEeHo727dtrXCc7f/48rl27BhsbG41tlJSU4Pr168jPz8edO3fQrVs39WsNGjRAly5dKp1qfCw1NRWmpqYICAjQuuZr167h4cOHGDBggMZ4aWkpnn/+eQDA5cuXNeoAwKdZE1WBYUaS0LdvX6xbtw7m5ubw8PBAgwb//z9tKysrjXULCwvh5+eHLVu2VNpOo0aNarR/S0tLnd9TWFgIAPj+++/RuHFjjdeUSmWN6iCSK4YZSYKVlRVatGih1bqdO3fGtm3b4OLiAltb2yrXcXd3x6lTp9C7d28AwF9//YWUlBR07ty5yvXbt2+PiooKHD16FIGBgZVef9wZlpeXq8fatm0LpVKJW7duVdvRtWnTBnv27NEYO3ny5LMPkkhmOAGEZOcf//gHnJ2dERQUhGPHjiE9PR1HjhzBu+++iz///BMAMG3aNCxZsgQJCQm4cuUK3nnnnad+R8zb2xuhoaF44403kJCQoN7mf//7XwCAl5cXFAoFvvvuO2RnZ6OwsBA2NjaYNWsWZsyYgbi4OFy/fh1nz57F6tWrERcXBwCYNGkSrl69itmzZyMtLQ3x8fHYtGlTXX9ERKLDMCPZadiwIZKSkuDp6YmQkBC0adMGEyZMQElJibpTmzlzJl577TWEhobC398fNjY2GDFixFO3u27dOowaNQrvvPMOWrdujbfeegtFRUUAgMaNGyMqKgrvvfceXF1dMWXKFADA4sWLMX/+fERHR6NNmzYYNGgQvv/+e/j4+AAAPD09sXPnTiQkJKBjx45Yv349Pv744zr8dIjESSFUd0WbiIhIJNiZERGR6DHMiIhI9BhmREQkegwzIiISPYYZERGJHsOMiIhEj2FGRESixzAjIiLRY5gREZHoMcyIiEj0GGZERCR6/w8EFJYYCdckWwAAAABJRU5ErkJggg==\n" + }, + "metadata": {} + } + ], + "id": "mZLu8fRk4A3A" + }, + { + "cell_type": "markdown", + "source": [ + "**Classification Report**" + ], + "metadata": { + "id": "eDqgpX_a4A3B" + }, + "id": "eDqgpX_a4A3B" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Jq_ES16g4A3B", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "95db0a8b-3659-4319-ecab-82c051217d1f" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.878963 0.878963 0.864731 0.871679\n" + ] + } + ], + "source": [ + "#Calculating different metrics on training data\n", + "NN_train_st = model_performance_classification_sklearn(y_train,y_train_preds_st)\n", + "print(\"Training performance:\\n\", NN_train_st)" + ], + "id": "Jq_ES16g4A3B" + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7MUEidM44A3B", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b3ebc9fb-7278-4621-cb18-5f53190b284a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Testing performance:\n", + " Accuracy Recall Precision F1\n", + "0 0.788732 0.788732 0.780908 0.784708\n" + ] + } + ], + "source": [ + "#Calculating different metrics on test data\n", + "NN_test_st = model_performance_classification_sklearn(y_test, y_test_preds_st)\n", + "print(\"Testing performance:\\n\",NN_test_st)" + ], + "id": "7MUEidM44A3B" + }, + { + "cell_type": "markdown", + "id": "gsmrYpkrFY2A", + "metadata": { + "id": "gsmrYpkrFY2A" + }, + "source": [ + "### **Model Performance Summary and Final Model Selection**" + ] + }, + { + "cell_type": "code", + "source": [ + "# Concatenate the training performance metrics from different models into a single DataFrame\n", + "models_train_comp_df = pd.concat(\n", + " [\n", + " rf_train_wv.T, # Random Forest using Word2Vec embeddings\n", + " NN_train_wv.T, # Neural Network using Word2Vec embeddings\n", + " rf_train_st.T, # Random Forest using Sentence Transformer embeddings\n", + " NN_train_st.T # Neural Network using Sentence Transformer embeddings\n", + " ],\n", + " axis=1 # Concatenate along columns (i.e., each model's metrics form one column)\n", + ")\n", + "\n", + "# Assigning meaningful column names for each model for clarity in the output DataFrame\n", + "models_train_comp_df.columns = [\n", + " \"Word2Vec (Random Forest)\",\n", + " \"Word2Vec (Neural Network)\",\n", + " \"Sentence Transformer (Random Forest)\",\n", + " \"Sentence Transformer (Neural Network)\"\n", + "]\n", + "\n", + "# Print the training performance comparison table\n", + "print(\"Training performance comparison:\")\n", + "models_train_comp_df" + ], + "metadata": { + "id": "FmgvAlKBWjR-", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 192 + }, + "outputId": "68dd172d-e821-4c30-82e3-8df928e36a4c" + }, + "id": "FmgvAlKBWjR-", + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training performance comparison:\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Word2Vec (Random Forest) Word2Vec (Neural Network) \\\n", + "Accuracy 0.755043 0.636888 \n", + "Recall 0.755043 0.636888 \n", + "Precision 0.778891 0.664626 \n", + "F1 0.720565 0.516128 \n", + "\n", + " Sentence Transformer (Random Forest) \\\n", + "Accuracy 0.801153 \n", + "Recall 0.801153 \n", + "Precision 0.831835 \n", + "F1 0.775232 \n", + "\n", + " Sentence Transformer (Neural Network) \n", + "Accuracy 0.878963 \n", + "Recall 0.878963 \n", + "Precision 0.864731 \n", + "F1 0.871679 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Word2Vec (Random Forest)Word2Vec (Neural Network)Sentence Transformer (Random Forest)Sentence Transformer (Neural Network)
Accuracy0.7550430.6368880.8011530.878963
Recall0.7550430.6368880.8011530.878963
Precision0.7788910.6646260.8318350.864731
F10.7205650.5161280.7752320.871679
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "models_train_comp_df", + "summary": "{\n \"name\": \"models_train_comp_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"Word2Vec (Random Forest)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.02400850605954694,\n \"min\": 0.720564904885155,\n \"max\": 0.7788911179618219,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.7550432276657061,\n 0.7788911179618219,\n 0.720564904885155\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Word2Vec (Neural Network)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.06630504860294313,\n \"min\": 0.516128150662598,\n \"max\": 0.6646264228575315,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.6368876080691642,\n 0.6646264228575315,\n 0.516128150662598\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentence Transformer (Random Forest)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.02314897229764395,\n \"min\": 0.7752322113738629,\n \"max\": 0.8318353116624009,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.8011527377521613,\n 0.8318353116624009,\n 0.7752322113738629\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentence Transformer (Neural Network)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.006828124944117901,\n \"min\": 0.8647306583906008,\n \"max\": 0.8789625360230547,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.8789625360230547,\n 0.8647306583906008,\n 0.8716785041639248\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 72 + } + ] + }, + { + "cell_type": "code", + "source": [ + "# Concatenate the testing performance metrics from different models into a single DataFrame\n", + "models_test_comp_df = pd.concat(\n", + " [\n", + " rf_test_wv.T, # Random Forest using Word2Vec embeddings\n", + " NN_test_wv.T, # Neural Network using Word2Vec embeddings\n", + " rf_test_st.T, # Random Forest using Sentence Transformer embeddings\n", + " NN_test_st.T # Neural Network using Sentence Transformer embeddings\n", + " ],\n", + " axis=1 # Concatenate along columns so each model's test metrics appear as one column\n", + ")\n", + "\n", + "# Set descriptive column names for clarity in the resulting comparison table\n", + "models_test_comp_df.columns = [\n", + " \"Word2Vec (Random Forest)\",\n", + " \"Word2Vec (Neural Network)\",\n", + " \"Sentence Transformer (Random Forest)\",\n", + " \"Sentence Transformer (Neural Network)\"\n", + "]\n", + "\n", + "# Print the testing performance comparison table\n", + "print(\"Testing performance comparison:\")\n", + "models_test_comp_df" + ], + "metadata": { + "id": "APzbgeHrWjOj", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 192 + }, + "outputId": "2095fa14-43c0-42dc-e342-c8229f7750b9" + }, + "id": "APzbgeHrWjOj", + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Testing performance comparison:\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Word2Vec (Random Forest) Word2Vec (Neural Network) \\\n", + "Accuracy 0.746479 0.760563 \n", + "Recall 0.746479 0.760563 \n", + "Precision 0.687934 0.804628 \n", + "F1 0.680114 0.669911 \n", + "\n", + " Sentence Transformer (Random Forest) \\\n", + "Accuracy 0.718310 \n", + "Recall 0.718310 \n", + "Precision 0.551745 \n", + "F1 0.624105 \n", + "\n", + " Sentence Transformer (Neural Network) \n", + "Accuracy 0.788732 \n", + "Recall 0.788732 \n", + "Precision 0.780908 \n", + "F1 0.784708 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Word2Vec (Random Forest)Word2Vec (Neural Network)Sentence Transformer (Random Forest)Sentence Transformer (Neural Network)
Accuracy0.7464790.7605630.7183100.788732
Recall0.7464790.7605630.7183100.788732
Precision0.6879340.8046280.5517450.780908
F10.6801140.6699110.6241050.784708
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + " \n", + " \n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "models_test_comp_df", + "summary": "{\n \"name\": \"models_test_comp_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"Word2Vec (Random Forest)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.03619949158384784,\n \"min\": 0.6801140174379611,\n \"max\": 0.7464788732394366,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.7464788732394366,\n 0.687933571578726,\n 0.6801140174379611\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Word2Vec (Neural Network)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.05661832323426673,\n \"min\": 0.6699110653078362,\n \"max\": 0.8046277665995976,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.7605633802816901,\n 0.8046277665995976,\n 0.6699110653078362\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentence Transformer (Random Forest)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.08086640854457602,\n \"min\": 0.5517452541334966,\n \"max\": 0.7183098591549296,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.7183098591549296,\n 0.5517452541334966,\n 0.6241052874624798\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentence Transformer (Neural Network)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.0037559350807984376,\n \"min\": 0.7809076682316118,\n \"max\": 0.7887323943661971,\n \"num_unique_values\": 3,\n \"samples\": [\n 0.7887323943661971,\n 0.7809076682316118,\n 0.7847082494969819\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 73 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "#### **Model Performance Summary:**" + ], + "metadata": { + "id": "X0yz_T4j6uJc" + }, + "id": "X0yz_T4j6uJc" + }, + { + "cell_type": "markdown", + "source": [ + " **Model Selection: Sentence Transformer + Neural Network**\n", + "\n", + "**Rationale:**\n", + "\n", + "1. **Best Generalization**:\n", + " The Sentence Transformer + Neural Network model achieves the highest F1 score on the test set (0.788), indicating strong generalization and better handling of both precision and recall on unseen data.\n", + "\n", + "2. **Balanced Performance**:\n", + " Training F1 = 0.87 and testing F1 = 0.78 show a minimal gap, meaning the model learned meaningful representations without significant overfitting.\n", + "\n", + "3. **Superior Feature Encoding**:\n", + " Sentence Transformers capture semantic meaning more effectively than Word2Vec, which explains the performance boost across both Random Forest and Neural Network setups.\n", + "\n", + "4. **Neural Network Suitability**:\n", + " While Word2Vec + NN struggles due to sparse and less informative vectors, combining powerful embeddings (Sentence Transformer) with flexible learning (NN) achieves the best synergy.\n", + "\n", + "##### **Why Other Models Were Not Chosen?**\n", + "\n", + "* **Word2Vec + RF**:\n", + " Training F1 = 0.730, Test F1 = 0.685. Although the gap is small (shows some stability), the absolute test performance is lower than Sentence Transformer + NN.\n", + "\n", + "* **Word2Vec + NN**:\n", + " Low performance in both training (F1 = 0.516) and testing (F1 = 0.67) indicates underfitting and ineffective learning due to weak input representations.\n", + "\n", + "* **Sentence Transformer + RF**:\n", + " Strong training F1 = 0.775, but test F1 = 0.624 is slightly lower than the NN version, suggesting mild overfitting and less flexibility in modeling complex patterns." + ], + "metadata": { + "id": "wI7woP0xHrwW" + }, + "id": "wI7woP0xHrwW" + }, + { + "cell_type": "markdown", + "id": "HiOLoD7BO3L-", + "metadata": { + "id": "HiOLoD7BO3L-" + }, + "source": [ + "## **Conclusions and Recommendations**" + ] + }, + { + "cell_type": "markdown", + "source": [ + "* The daily opening, high, low, and closing prices of the stock exhibit similar distributions individually, when compared across different sentiment polarities, and negative sentiment news resulted in a lower value for each price.\n", + "\n", + "* The minimum variation also resulted in the prices exhibiting perfect correlation amongst them, while exhibiting a very low negative correlation with volume, which might be due to selling pressure during periods of negative sentiment.\n", + "\n", + "* The stock price gradually increased over time from ~40 to ~50 in the period for which the data is available while exhibiting a monthly trend.\n", + "\n", + "* We predicted the sentiment of market news by encoding them via different ML models.\n", + "\n", + "* The models largely overfit the data, with only **the Sentence Transformer + Neural Network model** yielding comparatively better performance than the others (train F1 = 0.876, test F1 = 0.788).\n", + "\n", + " * The predominance of neutral news also suggests a cautious market sentiment in this period. As such, a wider period should be considered for data collection to ensure volume and diversity in news sentiment polarities.\n", + "\n", + "* Integrating real-time sentiment analysis systems can allow financial analysts to make informed decisions and quickly respond to market sentiment changes to optimize investment strategies.\n", + "\n", + "* One can explore combining news sentiments with technical and fundamental indicators of the stock and introduce data of other similar stocks for a more comprehensive market analysis." + ], + "metadata": { + "id": "NMR7mKugPFme" + }, + "id": "NMR7mKugPFme" + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mQvaNDqQ3BJa" + }, + "source": [ + " Power Ahead \n", + "___" + ], + "id": "mQvaNDqQ3BJa" + } + ], + "metadata": { + "colab": { + "provenance": [], + "include_colab_link": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "1230a037e0b9479caa9db62c5f9ecb6a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_ed6c19298c4747a59992a79d99cdaaa7", + "IPY_MODEL_e010222da3cf4751995a51ffc82560ef", + "IPY_MODEL_9f3e3b616bcf482d9fd91a2b54d8d82a" + ], + "layout": "IPY_MODEL_6838e428d6d54a3f80d34638812441e6" + } + }, + "ed6c19298c4747a59992a79d99cdaaa7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_991c2589b56f444486443a31bef569d5", + "placeholder": "​", + "style": "IPY_MODEL_4ed01d32996f47f38fbaba687cee45ae", + "value": "Batches: 100%" + } + }, + "e010222da3cf4751995a51ffc82560ef": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_a0ce999dbcfe427ba08202bc989b1c33", + "max": 11, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_f598184dc72f443ab0ada8de6cf076ad", + "value": 11 + } + }, + "9f3e3b616bcf482d9fd91a2b54d8d82a": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_96e9e320eec74a2e9094935af065b254", + "placeholder": "​", + "style": "IPY_MODEL_fb854fb10f3e415c9c4c0ac176fb74b4", + "value": " 11/11 [00:44<00:00,  3.41s/it]" + } + }, + "6838e428d6d54a3f80d34638812441e6": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "991c2589b56f444486443a31bef569d5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "4ed01d32996f47f38fbaba687cee45ae": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "a0ce999dbcfe427ba08202bc989b1c33": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "f598184dc72f443ab0ada8de6cf076ad": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "96e9e320eec74a2e9094935af065b254": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "fb854fb10f3e415c9c4c0ac176fb74b4": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "2fb4071397a049f888159e2cbec3ec99": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HBoxModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HBoxModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HBoxView", + "box_style": "", + "children": [ + "IPY_MODEL_280899c6e305423a8d6f20dd395b4e10", + "IPY_MODEL_f68b5d3640c54560b38a29f32deb33a8", + "IPY_MODEL_115335a31d874aba99efb63fa2830e09" + ], + "layout": "IPY_MODEL_7b371d0574e04f98bf87a88f722b8477" + } + }, + "280899c6e305423a8d6f20dd395b4e10": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_9095b2b09d4a45928fbc3cf45eb35cbb", + "placeholder": "​", + "style": "IPY_MODEL_971a53d397494d76b8b5c4a2abb954f7", + "value": "Batches: 100%" + } + }, + "f68b5d3640c54560b38a29f32deb33a8": { + "model_module": "@jupyter-widgets/controls", + "model_name": "FloatProgressModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "FloatProgressModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "ProgressView", + "bar_style": "success", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_54dd267783314434a5389477c97974e5", + "max": 3, + "min": 0, + "orientation": "horizontal", + "style": "IPY_MODEL_5bfd23c3586e4615909878610be8e24b", + "value": 3 + } + }, + "115335a31d874aba99efb63fa2830e09": { + "model_module": "@jupyter-widgets/controls", + "model_name": "HTMLModel", + "model_module_version": "1.5.0", + "state": { + "_dom_classes": [], + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "HTMLModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/controls", + "_view_module_version": "1.5.0", + "_view_name": "HTMLView", + "description": "", + "description_tooltip": null, + "layout": "IPY_MODEL_4eda58c3e66e40db98ea40fc40ebb109", + "placeholder": "​", + "style": "IPY_MODEL_99ee6edbe0574c778200ae65b87d7e0f", + "value": " 3/3 [00:09<00:00,  2.81s/it]" + } + }, + "7b371d0574e04f98bf87a88f722b8477": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "9095b2b09d4a45928fbc3cf45eb35cbb": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "971a53d397494d76b8b5c4a2abb954f7": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + }, + "54dd267783314434a5389477c97974e5": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "5bfd23c3586e4615909878610be8e24b": { + "model_module": "@jupyter-widgets/controls", + "model_name": "ProgressStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "ProgressStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "bar_color": null, + "description_width": "" + } + }, + "4eda58c3e66e40db98ea40fc40ebb109": { + "model_module": "@jupyter-widgets/base", + "model_name": "LayoutModel", + "model_module_version": "1.2.0", + "state": { + "_model_module": "@jupyter-widgets/base", + "_model_module_version": "1.2.0", + "_model_name": "LayoutModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "LayoutView", + "align_content": null, + "align_items": null, + "align_self": null, + "border": null, + "bottom": null, + "display": null, + "flex": null, + "flex_flow": null, + "grid_area": null, + "grid_auto_columns": null, + "grid_auto_flow": null, + "grid_auto_rows": null, + "grid_column": null, + "grid_gap": null, + "grid_row": null, + "grid_template_areas": null, + "grid_template_columns": null, + "grid_template_rows": null, + "height": null, + "justify_content": null, + "justify_items": null, + "left": null, + "margin": null, + "max_height": null, + "max_width": null, + "min_height": null, + "min_width": null, + "object_fit": null, + "object_position": null, + "order": null, + "overflow": null, + "overflow_x": null, + "overflow_y": null, + "padding": null, + "right": null, + "top": null, + "visibility": null, + "width": null + } + }, + "99ee6edbe0574c778200ae65b87d7e0f": { + "model_module": "@jupyter-widgets/controls", + "model_name": "DescriptionStyleModel", + "model_module_version": "1.5.0", + "state": { + "_model_module": "@jupyter-widgets/controls", + "_model_module_version": "1.5.0", + "_model_name": "DescriptionStyleModel", + "_view_count": null, + "_view_module": "@jupyter-widgets/base", + "_view_module_version": "1.2.0", + "_view_name": "StyleView", + "description_width": "" + } + } + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file From c2bb62bc2815133aef7102b5316b0810bf735176 Mon Sep 17 00:00:00 2001 From: biplob <110578485+bks1984@users.noreply.github.com> Date: Tue, 2 Dec 2025 11:23:34 +0530 Subject: [PATCH 4/4] Created using Colab --- Copy_of_Logistic_Regression.ipynb | 726 ++++++++++++++++++++++++++++++ 1 file changed, 726 insertions(+) create mode 100644 Copy_of_Logistic_Regression.ipynb diff --git a/Copy_of_Logistic_Regression.ipynb b/Copy_of_Logistic_Regression.ipynb new file mode 100644 index 0000000..41ae87a --- /dev/null +++ b/Copy_of_Logistic_Regression.ipynb @@ -0,0 +1,726 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "source": [ + "# **What is Logistic Regression**\n", + "\n", + "As depicted in this ML algorithm chart, Logistic Regression is part of [**Supervised Machine Learning**](https://colab.research.google.com/drive/1WHMXFxU8DK-lgTgaHFZ2oLtkpFM25bDm?usp=sharing). Logistic Regression is used for **Classification** problem." + ], + "metadata": { + "id": "qC487XkXHW2R" + } + }, + { + "cell_type": "markdown", + "source": [ + "![image.png]()" + ], + "metadata": { + "id": "Abdg7AmMHXZJ" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DEkuSfNBY6g3" + }, + "source": [ + "In Linear Regression, we used to determine the value of a continuous dependent variable. **Logistic Regression** is generally used for **classification** purposes. Unlike Linear Regression, the dependent variable can take a limited number of values only i.e, the **dependent variable is categorical**.\n", + "\n", + "**Classification techniques** are an essential part of machine learning and data mining applications.\n", + "\n", + "**Approximately 70% of problems in Data Science are classification problems.**\n", + "\n", + "There are lots of classification problems that are available, but the **logistics regression is common and is a useful regression method for solving the binary classification problem**.\n", + "\n", + "\n", + "**Types of Logistic Regression:**\n", + "\n", + "\n", + "**Binary Logistic Regression**\n", + "When the number of possible outcomes is only two it is called Binary Logistic Regression.The target variable has only two possible outcomes such as Spam or Not Spam, Cancer or No Cancer.\n", + "\n", + "**Multinomial Logistic Regression**: The target variable has three or more nominal categories such as predicting the type of Wine.\n", + "Other example, **IRIS dataset** a very famous example of multi-class classification. Other examples are classifying article/blog/document category.\n", + "\n", + "**Ordinal Logistic Regression:** the target variable has three or more ordinal categories such as restaurant or product rating from 1 to 5.\n", + "\n", + "\n", + "Logistic Regression is one of the most simple and commonly used Machine Learning algorithms for **two-class classification**. It is easy to implement and can be used as the baseline for any binary classification problem.\n", + "\n", + "Famous examples :\n", + "\n", + "**spam detection**.\n", + "\n", + "**Diabetes prediction**,\n", + "\n", + "**Churn prediction** etc\n", + "\n", + "Its basic fundamental concepts are also constructive in deep learning. Logistic regression describes and estimates the relationship between one dependent binary variable and independent variables.\n", + "\n", + "\n", + "Let’s look at how logistic regression can be used for classification tasks.\n", + "In Linear Regression, the output is the weighted sum of inputs.\n", + "\n", + "Logistic Regression is a generalized Linear Regression in the sense that we don’t output the weighted sum of inputs directly, but we pass it through a function that can map any real value between 0 and 1.\n", + "![Capture.PNG]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g5M2KZjGY6jX" + }, + "source": [ + "The activation function that is used is known as the sigmoid function. The plot of the sigmoid function looks like\n", + "\n", + "![image.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FJ7jd-bPvvQe" + }, + "source": [ + "##Linear Regression Vs. Logistic Regression\n", + "Linear regression gives you a continuous output, but logistic regression provides a constant output. An example of the continuous output is house price and stock price. Example's of the discrete output is predicting whether a patient has cancer or not, predicting whether the customer will churn. Linear regression is estimated using Ordinary Least Squares (OLS) while logistic regression is estimated using Maximum Likelihood Estimation (MLE) approach." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pA5A5bOhvnVg" + }, + "source": [ + "![image.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3uP7Itp8y9OK" + }, + "source": [ + "##Example : Lets build logistic regression model using Scikit-learn for diabetes prediction\n", + "\n", + "Here, we are going to predict diabetes using Logistic Regression Classifier." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "aTcJK4nUyrJ5" + }, + "source": [ + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import pandas as pd\n", + "import seaborn as sns\n", + "\n", + "%matplotlib inline" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "resources": { + "http://localhost:8080/nbextensions/google.colab/files.js": { + "data": "Ly8gQ29weXJpZ2h0IDIwMTcgR29vZ2xlIExMQwovLwovLyBMaWNlbnNlZCB1bmRlciB0aGUgQXBhY2hlIExpY2Vuc2UsIFZlcnNpb24gMi4wICh0aGUgIkxpY2Vuc2UiKTsKLy8geW91IG1heSBub3QgdXNlIHRoaXMgZmlsZSBleGNlcHQgaW4gY29tcGxpYW5jZSB3aXRoIHRoZSBMaWNlbnNlLgovLyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2UgYXQKLy8KLy8gICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKLy8KLy8gVW5sZXNzIHJlcXVpcmVkIGJ5IGFwcGxpY2FibGUgbGF3IG9yIGFncmVlZCB0byBpbiB3cml0aW5nLCBzb2Z0d2FyZQovLyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRlZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAovLyBXSVRIT1VUIFdBUlJBTlRJRVMgT1IgQ09ORElUSU9OUyBPRiBBTlkgS0lORCwgZWl0aGVyIGV4cHJlc3Mgb3IgaW1wbGllZC4KLy8gU2VlIHRoZSBMaWNlbnNlIGZvciB0aGUgc3BlY2lmaWMgbGFuZ3VhZ2UgZ292ZXJuaW5nIHBlcm1pc3Npb25zIGFuZAovLyBsaW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCi8qKgogKiBAZmlsZW92ZXJ2aWV3IEhlbHBlcnMgZm9yIGdvb2dsZS5jb2xhYiBQeXRob24gbW9kdWxlLgogKi8KKGZ1bmN0aW9uKHNjb3BlKSB7CmZ1bmN0aW9uIHNwYW4odGV4dCwgc3R5bGVBdHRyaWJ1dGVzID0ge30pIHsKICBjb25zdCBlbGVtZW50ID0gZG9jdW1lbnQuY3JlYXRlRWxlbWVudCgnc3BhbicpOwogIGVsZW1lbnQudGV4dENvbnRlbnQgPSB0ZXh0OwogIGZvciAoY29uc3Qga2V5IG9mIE9iamVjdC5rZXlzKHN0eWxlQXR0cmlidXRlcykpIHsKICAgIGVsZW1lbnQuc3R5bGVba2V5XSA9IHN0eWxlQXR0cmlidXRlc1trZXldOwogIH0KICByZXR1cm4gZWxlbWVudDsKfQoKLy8gTWF4IG51bWJlciBvZiBieXRlcyB3aGljaCB3aWxsIGJlIHVwbG9hZGVkIGF0IGEgdGltZS4KY29uc3QgTUFYX1BBWUxPQURfU0laRSA9IDEwMCAqIDEwMjQ7CgpmdW5jdGlvbiBfdXBsb2FkRmlsZXMoaW5wdXRJZCwgb3V0cHV0SWQpIHsKICBjb25zdCBzdGVwcyA9IHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCk7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICAvLyBDYWNoZSBzdGVwcyBvbiB0aGUgb3V0cHV0RWxlbWVudCB0byBtYWtlIGl0IGF2YWlsYWJsZSBmb3IgdGhlIG5leHQgY2FsbAogIC8vIHRvIHVwbG9hZEZpbGVzQ29udGludWUgZnJvbSBQeXRob24uCiAgb3V0cHV0RWxlbWVudC5zdGVwcyA9IHN0ZXBzOwoKICByZXR1cm4gX3VwbG9hZEZpbGVzQ29udGludWUob3V0cHV0SWQpOwp9CgovLyBUaGlzIGlzIHJvdWdobHkgYW4gYXN5bmMgZ2VuZXJhdG9yIChub3Qgc3VwcG9ydGVkIGluIHRoZSBicm93c2VyIHlldCksCi8vIHdoZXJlIHRoZXJlIGFyZSBtdWx0aXBsZSBhc3luY2hyb25vdXMgc3RlcHMgYW5kIHRoZSBQeXRob24gc2lkZSBpcyBnb2luZwovLyB0byBwb2xsIGZvciBjb21wbGV0aW9uIG9mIGVhY2ggc3RlcC4KLy8gVGhpcyB1c2VzIGEgUHJvbWlzZSB0byBibG9jayB0aGUgcHl0aG9uIHNpZGUgb24gY29tcGxldGlvbiBvZiBlYWNoIHN0ZXAsCi8vIHRoZW4gcGFzc2VzIHRoZSByZXN1bHQgb2YgdGhlIHByZXZpb3VzIHN0ZXAgYXMgdGhlIGlucHV0IHRvIHRoZSBuZXh0IHN0ZXAuCmZ1bmN0aW9uIF91cGxvYWRGaWxlc0NvbnRpbnVlKG91dHB1dElkKSB7CiAgY29uc3Qgb3V0cHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKG91dHB1dElkKTsKICBjb25zdCBzdGVwcyA9IG91dHB1dEVsZW1lbnQuc3RlcHM7CgogIGNvbnN0IG5leHQgPSBzdGVwcy5uZXh0KG91dHB1dEVsZW1lbnQubGFzdFByb21pc2VWYWx1ZSk7CiAgcmV0dXJuIFByb21pc2UucmVzb2x2ZShuZXh0LnZhbHVlLnByb21pc2UpLnRoZW4oKHZhbHVlKSA9PiB7CiAgICAvLyBDYWNoZSB0aGUgbGFzdCBwcm9taXNlIHZhbHVlIHRvIG1ha2UgaXQgYXZhaWxhYmxlIHRvIHRoZSBuZXh0CiAgICAvLyBzdGVwIG9mIHRoZSBnZW5lcmF0b3IuCiAgICBvdXRwdXRFbGVtZW50Lmxhc3RQcm9taXNlVmFsdWUgPSB2YWx1ZTsKICAgIHJldHVybiBuZXh0LnZhbHVlLnJlc3BvbnNlOwogIH0pOwp9CgovKioKICogR2VuZXJhdG9yIGZ1bmN0aW9uIHdoaWNoIGlzIGNhbGxlZCBiZXR3ZWVuIGVhY2ggYXN5bmMgc3RlcCBvZiB0aGUgdXBsb2FkCiAqIHByb2Nlc3MuCiAqIEBwYXJhbSB7c3RyaW5nfSBpbnB1dElkIEVsZW1lbnQgSUQgb2YgdGhlIGlucHV0IGZpbGUgcGlja2VyIGVsZW1lbnQuCiAqIEBwYXJhbSB7c3RyaW5nfSBvdXRwdXRJZCBFbGVtZW50IElEIG9mIHRoZSBvdXRwdXQgZGlzcGxheS4KICogQHJldHVybiB7IUl0ZXJhYmxlPCFPYmplY3Q+fSBJdGVyYWJsZSBvZiBuZXh0IHN0ZXBzLgogKi8KZnVuY3Rpb24qIHVwbG9hZEZpbGVzU3RlcChpbnB1dElkLCBvdXRwdXRJZCkgewogIGNvbnN0IGlucHV0RWxlbWVudCA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKGlucHV0SWQpOwogIGlucHV0RWxlbWVudC5kaXNhYmxlZCA9IGZhbHNlOwoKICBjb25zdCBvdXRwdXRFbGVtZW50ID0gZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQob3V0cHV0SWQpOwogIG91dHB1dEVsZW1lbnQuaW5uZXJIVE1MID0gJyc7CgogIGNvbnN0IHBpY2tlZFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgaW5wdXRFbGVtZW50LmFkZEV2ZW50TGlzdGVuZXIoJ2NoYW5nZScsIChlKSA9PiB7CiAgICAgIHJlc29sdmUoZS50YXJnZXQuZmlsZXMpOwogICAgfSk7CiAgfSk7CgogIGNvbnN0IGNhbmNlbCA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2J1dHRvbicpOwogIGlucHV0RWxlbWVudC5wYXJlbnRFbGVtZW50LmFwcGVuZENoaWxkKGNhbmNlbCk7CiAgY2FuY2VsLnRleHRDb250ZW50ID0gJ0NhbmNlbCB1cGxvYWQnOwogIGNvbnN0IGNhbmNlbFByb21pc2UgPSBuZXcgUHJvbWlzZSgocmVzb2x2ZSkgPT4gewogICAgY2FuY2VsLm9uY2xpY2sgPSAoKSA9PiB7CiAgICAgIHJlc29sdmUobnVsbCk7CiAgICB9OwogIH0pOwoKICAvLyBXYWl0IGZvciB0aGUgdXNlciB0byBwaWNrIHRoZSBmaWxlcy4KICBjb25zdCBmaWxlcyA9IHlpZWxkIHsKICAgIHByb21pc2U6IFByb21pc2UucmFjZShbcGlja2VkUHJvbWlzZSwgY2FuY2VsUHJvbWlzZV0pLAogICAgcmVzcG9uc2U6IHsKICAgICAgYWN0aW9uOiAnc3RhcnRpbmcnLAogICAgfQogIH07CgogIGNhbmNlbC5yZW1vdmUoKTsKCiAgLy8gRGlzYWJsZSB0aGUgaW5wdXQgZWxlbWVudCBzaW5jZSBmdXJ0aGVyIHBpY2tzIGFyZSBub3QgYWxsb3dlZC4KICBpbnB1dEVsZW1lbnQuZGlzYWJsZWQgPSB0cnVlOwoKICBpZiAoIWZpbGVzKSB7CiAgICByZXR1cm4gewogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbXBsZXRlJywKICAgICAgfQogICAgfTsKICB9CgogIGZvciAoY29uc3QgZmlsZSBvZiBmaWxlcykgewogICAgY29uc3QgbGkgPSBkb2N1bWVudC5jcmVhdGVFbGVtZW50KCdsaScpOwogICAgbGkuYXBwZW5kKHNwYW4oZmlsZS5uYW1lLCB7Zm9udFdlaWdodDogJ2JvbGQnfSkpOwogICAgbGkuYXBwZW5kKHNwYW4oCiAgICAgICAgYCgke2ZpbGUudHlwZSB8fCAnbi9hJ30pIC0gJHtmaWxlLnNpemV9IGJ5dGVzLCBgICsKICAgICAgICBgbGFzdCBtb2RpZmllZDogJHsKICAgICAgICAgICAgZmlsZS5sYXN0TW9kaWZpZWREYXRlID8gZmlsZS5sYXN0TW9kaWZpZWREYXRlLnRvTG9jYWxlRGF0ZVN0cmluZygpIDoKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgJ24vYSd9IC0gYCkpOwogICAgY29uc3QgcGVyY2VudCA9IHNwYW4oJzAlIGRvbmUnKTsKICAgIGxpLmFwcGVuZENoaWxkKHBlcmNlbnQpOwoKICAgIG91dHB1dEVsZW1lbnQuYXBwZW5kQ2hpbGQobGkpOwoKICAgIGNvbnN0IGZpbGVEYXRhUHJvbWlzZSA9IG5ldyBQcm9taXNlKChyZXNvbHZlKSA9PiB7CiAgICAgIGNvbnN0IHJlYWRlciA9IG5ldyBGaWxlUmVhZGVyKCk7CiAgICAgIHJlYWRlci5vbmxvYWQgPSAoZSkgPT4gewogICAgICAgIHJlc29sdmUoZS50YXJnZXQucmVzdWx0KTsKICAgICAgfTsKICAgICAgcmVhZGVyLnJlYWRBc0FycmF5QnVmZmVyKGZpbGUpOwogICAgfSk7CiAgICAvLyBXYWl0IGZvciB0aGUgZGF0YSB0byBiZSByZWFkeS4KICAgIGxldCBmaWxlRGF0YSA9IHlpZWxkIHsKICAgICAgcHJvbWlzZTogZmlsZURhdGFQcm9taXNlLAogICAgICByZXNwb25zZTogewogICAgICAgIGFjdGlvbjogJ2NvbnRpbnVlJywKICAgICAgfQogICAgfTsKCiAgICAvLyBVc2UgYSBjaHVua2VkIHNlbmRpbmcgdG8gYXZvaWQgbWVzc2FnZSBzaXplIGxpbWl0cy4gU2VlIGIvNjIxMTU2NjAuCiAgICBsZXQgcG9zaXRpb24gPSAwOwogICAgZG8gewogICAgICBjb25zdCBsZW5ndGggPSBNYXRoLm1pbihmaWxlRGF0YS5ieXRlTGVuZ3RoIC0gcG9zaXRpb24sIE1BWF9QQVlMT0FEX1NJWkUpOwogICAgICBjb25zdCBjaHVuayA9IG5ldyBVaW50OEFycmF5KGZpbGVEYXRhLCBwb3NpdGlvbiwgbGVuZ3RoKTsKICAgICAgcG9zaXRpb24gKz0gbGVuZ3RoOwoKICAgICAgY29uc3QgYmFzZTY0ID0gYnRvYShTdHJpbmcuZnJvbUNoYXJDb2RlLmFwcGx5KG51bGwsIGNodW5rKSk7CiAgICAgIHlpZWxkIHsKICAgICAgICByZXNwb25zZTogewogICAgICAgICAgYWN0aW9uOiAnYXBwZW5kJywKICAgICAgICAgIGZpbGU6IGZpbGUubmFtZSwKICAgICAgICAgIGRhdGE6IGJhc2U2NCwKICAgICAgICB9LAogICAgICB9OwoKICAgICAgbGV0IHBlcmNlbnREb25lID0gZmlsZURhdGEuYnl0ZUxlbmd0aCA9PT0gMCA/CiAgICAgICAgICAxMDAgOgogICAgICAgICAgTWF0aC5yb3VuZCgocG9zaXRpb24gLyBmaWxlRGF0YS5ieXRlTGVuZ3RoKSAqIDEwMCk7CiAgICAgIHBlcmNlbnQudGV4dENvbnRlbnQgPSBgJHtwZXJjZW50RG9uZX0lIGRvbmVgOwoKICAgIH0gd2hpbGUgKHBvc2l0aW9uIDwgZmlsZURhdGEuYnl0ZUxlbmd0aCk7CiAgfQoKICAvLyBBbGwgZG9uZS4KICB5aWVsZCB7CiAgICByZXNwb25zZTogewogICAgICBhY3Rpb246ICdjb21wbGV0ZScsCiAgICB9CiAgfTsKfQoKc2NvcGUuZ29vZ2xlID0gc2NvcGUuZ29vZ2xlIHx8IHt9OwpzY29wZS5nb29nbGUuY29sYWIgPSBzY29wZS5nb29nbGUuY29sYWIgfHwge307CnNjb3BlLmdvb2dsZS5jb2xhYi5fZmlsZXMgPSB7CiAgX3VwbG9hZEZpbGVzLAogIF91cGxvYWRGaWxlc0NvbnRpbnVlLAp9Owp9KShzZWxmKTsK", + "ok": true, + "headers": [ + [ + "content-type", + "application/javascript" + ] + ], + "status": 200, + "status_text": "" + } + }, + "base_uri": "https://localhost:8080/", + "height": 72 + }, + "id": "kTMLmgKeY_mo", + "outputId": "8c9898f1-2913-4546-bd10-4460027b1cea" + }, + "source": [ + "#To upload from your local drive, start with the following code:\n", + "from google.colab import files\n", + "uploaded = files.upload()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " Upload widget is only available when the cell has been executed in the\n", + " current browser session. Please rerun this cell to enable.\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {} + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Saving diabetes.csv to diabetes.csv\n" + ] + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z1GFG4bMyVwx" + }, + "source": [ + "#to import it into a dataframe (make sure the filename matches the name of the uploaded file)\n", + "import io\n", + "import pandas as pd\n", + "df = pd.read_csv(io.BytesIO(uploaded['diabetes.csv']))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 204 + }, + "id": "gdDKwmWza66t", + "outputId": "68197960-2482-42d1-ab6e-3f388df710b4" + }, + "source": [ + "df.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331
\n", + "
" + ], + "text/plain": [ + " Pregnancies Glucose BloodPressure ... DiabetesPedigreeFunction Age Outcome\n", + "0 6 148 72 ... 0.627 50 1\n", + "1 1 85 66 ... 0.351 31 0\n", + "2 8 183 64 ... 0.672 32 1\n", + "3 1 89 66 ... 0.167 21 0\n", + "4 0 137 40 ... 2.288 33 1\n", + "\n", + "[5 rows x 9 columns]" + ] + }, + "metadata": {}, + "execution_count": 12 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lum0Cxna1aIJ" + }, + "source": [ + "Selecting Feature\n", + "Here, you need to divide the given columns into two types of variables dependent(or target variable) and independent variable(or feature variables)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sj-Renl5zI_j" + }, + "source": [ + "#split dataset in features and target variable\n", + "feature_cols = ['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age']\n", + "X = df[feature_cols] # Features\n", + "y = df.Outcome # Target variable" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Vb8siCfQ3Umz" + }, + "source": [ + "### Splitting Data\n", + "To understand model performance, dividing the dataset into a training set and a test set is a good strategy.\n", + "\n", + "Let's split dataset by using **function train_test_split()**. You need to pass 3 parameters features, target, and test_set size. Additionally, you can use random_state to select records randomly." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2wJFeZ2M1jOh" + }, + "source": [ + "# split X and y into training and testing sets\n", + "from sklearn.model_selection import train_test_split\n", + "X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nY978WTH38Uv" + }, + "source": [ + "Here, the Dataset is broken into two parts in a ratio of 75:25. It means 75% data will be used for model training and 25% for model testing." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PylEEeof3-D5" + }, + "source": [ + "### Model Development and Prediction\n", + "First, import the Logistic Regression module and create a Logistic Regression classifier object using **LogisticRegression()** function.\n", + "\n", + "Then, fit your model on the train set using fit() and perform prediction on the test set using predict()." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QCP2RFoa3fML", + "outputId": "6e4c9f1f-2f26-4191-bd7a-ad73c8eec17a" + }, + "source": [ + "# import the class\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# instantiate the model (using the default parameters)\n", + "logreg = LogisticRegression()\n", + "\n", + "# fit the model with data\n", + "logreg.fit(X_train,y_train)\n", + "\n", + "#\n", + "y_pred=logreg.predict(X_test)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.7/dist-packages/sklearn/linear_model/_logistic.py:940: ConvergenceWarning: lbfgs failed to converge (status=1):\n", + "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n", + "\n", + "Increase the number of iterations (max_iter) or scale the data as shown in:\n", + " https://scikit-learn.org/stable/modules/preprocessing.html\n", + "Please also refer to the documentation for alternative solver options:\n", + " https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n", + " extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LbT8Y8VL4cDx" + }, + "source": [ + "# **Model Evaluation using Confusion Matrix**\n", + "\n", + "A **confusion matrix** is a table that is used to evaluate the performance of a classification model. You can also visualize the performance of an algorithm. The fundamental of a confusion matrix is the number of correct and incorrect predictions are summed up class-wise.\n", + "![CM.PNG]()\n", + "![CM2.PNG]()" + ] + }, + { + "cell_type": "markdown", + "source": [ + "# IMPORTANT\n", + "\n", + "Confusion matrix accuracy is not meaningful for **unbalanced classification**.\n", + "In such cases where class variable is imbalanced it is recommended to measure the accuracy using the Area Under the Precision-Recall Curve (AUPRC).\n", + "\n", + "❗ Example of **unbalanced classification**\n", + "\n", + "Suppose we have a dataset of credit card transactions where:\n", + "\n", + "Total transactions are : 284,807\n", + "\n", + "Fraud transaction (Class variable) : 492\n", + "\n", + "This dataset is highly unbalanced, because the positive class (frauds) account only for 0.172% of all transactions.\n" + ], + "metadata": { + "id": "svvPVFGFFanw" + } + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "zYYrt_Ku4Vhx", + "outputId": "a15506a2-d458-42f0-e7ae-aeba519dc930" + }, + "source": [ + "# import the metrics class\n", + "from sklearn import metrics\n", + "cnf_matrix = metrics.confusion_matrix(y_test, y_pred)\n", + "cnf_matrix" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[115, 15],\n", + " [ 25, 37]])" + ] + }, + "metadata": {}, + "execution_count": 20 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vCtLXypF4pWd" + }, + "source": [ + "Here, you can see the confusion matrix in the form of the array object. The dimension of this matrix is 2*2 because this model is binary classification. You have two classes 0 and 1. Diagonal values represent accurate predictions, while non-diagonal elements are inaccurate predictions. In the output, 119 and 36 are actual predictions, and 26 and 11 are incorrect predictions." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XM9qUNYl4tpk" + }, + "source": [ + "### Visualizing Confusion Matrix using Heatmap\n", + "Let's visualize the results of the model in the form of a confusion matrix using matplotlib and seaborn.\n", + "\n", + "Here, you will visualize the confusion matrix using Heatmap." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "lXr1xgSJ4j_5" + }, + "source": [ + "# import required modules\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "%matplotlib inline" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 342 + }, + "id": "fR-1Aq9h4y5I", + "outputId": "694eb98d-fc22-4d63-832e-20b93795164e" + }, + "source": [ + "class_names=[0,1] # name of classes\n", + "fig, ax = plt.subplots()\n", + "tick_marks = np.arange(len(class_names))\n", + "plt.xticks(tick_marks, class_names)\n", + "plt.yticks(tick_marks, class_names)\n", + "# create heatmap\n", + "sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap=\"YlGnBu\" ,fmt='g')\n", + "ax.xaxis.set_label_position(\"top\")\n", + "plt.tight_layout()\n", + "plt.title('Confusion matrix', y=1.1)\n", + "plt.ylabel('Actual label')\n", + "plt.xlabel('Predicted label')\n" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Text(0.5, 257.44, 'Predicted label')" + ] + }, + "metadata": {}, + "execution_count": 24 + }, + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7cc8Gwr65MN1" + }, + "source": [ + "### Confusion Matrix Evaluation Metrics\n", + "Let's evaluate the model using model evaluation metrics such as accuracy, precision, and recall." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "k9p9fJmP46tw", + "outputId": "6e7e9c88-2c7a-4d2b-8a10-abcf28c8acfb" + }, + "source": [ + "print(\"Accuracy:\",metrics.accuracy_score(y_test, y_pred))\n", + "print(\"Precision:\",metrics.precision_score(y_test, y_pred))\n", + "print(\"Recall:\",metrics.recall_score(y_test, y_pred))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Accuracy: 0.7916666666666666\n", + "Precision: 0.7115384615384616\n", + "Recall: 0.5967741935483871\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U-WHC8za5Z5J" + }, + "source": [ + "**We got a classification rate of 80%, considered as good accuracy.**\n", + "\n", + "**Precision**: Precision is about being precise, i.e., how accurate your model is. In other words, you can say, when a model makes a prediction, how often it is correct. In your prediction case, when your Logistic Regression model predicted patients are going to suffer from diabetes, that patients have 76% of the time.\n", + "\n", + "**Recall**: If there are patients who have diabetes in the test set and your Logistic Regression model can identify it 58% of the time." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6-crRCXI58oG" + }, + "source": [ + "## ROC Curve\n", + "Receiver Operating Characteristic(ROC) curve is a plot of the true positive rate against the false positive rate. It shows the tradeoff between sensitivity and specificity." + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 265 + }, + "id": "I4C7a8AE5Rfw", + "outputId": "950434de-211a-452b-eb7e-f89d56529940" + }, + "source": [ + "y_pred_proba = logreg.predict_proba(X_test)[::,1]\n", + "fpr, tpr, _ = metrics.roc_curve(y_test, y_pred_proba)\n", + "auc = metrics.roc_auc_score(y_test, y_pred_proba)\n", + "plt.plot(fpr,tpr,label=\"data 1, auc=\"+str(auc))\n", + "plt.legend(loc=4)\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Sm7cDn8V6M1i" + }, + "source": [ + "AUC score for the case is 0.86. AUC score 1 represents perfect classifier, and 0.5 represents a worthless classifier.\n", + "\n", + "**Advantages**\n", + "Because of its efficient and straightforward nature, doesn't require high computation power, easy to implement, easily interpretable, used widely by data analyst and scientist. Also, it doesn't require scaling of features. Logistic regression provides a probability score for observations.\n", + "\n", + "**Disadvantages**\n", + "Logistic regression is not able to handle a large number of categorical features/variables. It is vulnerable to overfitting. Also, can't solve the non-linear problem with the logistic regression that is why it requires a transformation of non-linear features. Logistic regression will not perform well with independent variables that are not correlated to the target variable and are very similar or correlated to each other." + ] + } + ] +} \ No newline at end of file