diff --git a/.env b/.env
deleted file mode 100644
index 939c71c..0000000
--- a/.env
+++ /dev/null
@@ -1 +0,0 @@
-GOOGLE_API_KEY = "Enter Your API Key here"
diff --git a/.env.example b/.env.example
new file mode 100644
index 0000000..39bb49f
--- /dev/null
+++ b/.env.example
@@ -0,0 +1 @@
+GOOGLE_API_KEY=your_google_api_key_here
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..56c989e
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,33 @@
+# Environment variables
+.env
+
+# Claude Code
+CLAUDE.md
+.claude/
+.playwright-mcp/
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+
+# Virtual environments
+venv/
+env/
+ENV/
+.venv
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Streamlit
+.streamlit/secrets.toml
diff --git a/README.md b/README.md
index f210644..60d34e3 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
# YouTube Video Transcript Summarizer with Generative AI
**Introduction**
-
+test2
The YouTube Video Transcript Summarizer with GenAI is an innovative tool designed to save time by automatically generating concise summaries from YouTube video transcripts. This application leverages the YouTube Transcript API to retrieve video transcripts, and integrates Google's Gemini AI to summarize them, helping users get key takeaways quickly without watching the entire video. With a clean, user-friendly interface built using Streamlit, this project simplifies the process of obtaining summaries from video content, making it accessible to students, professionals, and anyone looking to boost their productivity.
@@ -9,12 +9,14 @@ The YouTube Video Transcript Summarizer with GenAI is an innovative tool designe
**Table of Contents**
1. Key Technologies and Skills
-2. Installation
-3. Usage
-4. Features
-5. Contributing
-6. License
-7. Contact
+2. Requirements
+3. Installation
+4. Usage
+5. Features
+6. Recent Updates & Improvements
+7. Contributing
+8. License
+9. Contact
@@ -27,18 +29,25 @@ The YouTube Video Transcript Summarizer with GenAI is an innovative tool designe
+**Requirements**
+- Python 3.7 or higher
+- Google API Key (for Gemini AI)
+- Internet connection (for YouTube Transcript API)
+
+**Key Dependencies:**
+- `youtube-transcript-api>=0.6.0` - For retrieving video transcripts
+- `google-generativeai` - For AI-powered summarization (Gemini 2.5 Flash)
+- `streamlit` - For web interface
+- `langcodes` - For language name conversion
+
+
+
**Installation**
To run this project, you need to install the following packages:
-```python
-pip install python-dotenv
-pip install streamlit
-pip install streamlit-extras
-pip install youtube-transcript-api
-pip install google-generativeai
-pip install langcodes
-pip install language_data
+```bash
+pip install -r requirements.txt
```
@@ -49,9 +58,18 @@ To use this project, follow these steps:
1. Clone the repository: ```git clone https://github.com/gopiashokan/YouTube-Video-Transcript-Summarizer-with-GenAI.git```
2. Install the required packages: ```pip install -r requirements.txt```
-3. Add your Google API key to the `.env` file.
-4. Run the Streamlit app: ```streamlit run app.py```
-5. Access the app in your browser at ```http://localhost:8501```
+3. Get your Google API key:
+ - Visit [Google AI Studio](https://makersuite.google.com/app/apikey)
+ - Sign in with your Google account
+ - Click "Create API Key" to generate your key
+4. Create a `.env` file in the root directory (use `.env.example` as template)
+5. Add your Google API key to the `.env` file:
+ ```
+ GOOGLE_API_KEY=your_actual_api_key_here
+ ```
+ **⚠️ IMPORTANT:** Never commit your `.env` file to Git. It contains sensitive API keys.
+6. Run the Streamlit app: ```streamlit run app.py```
+7. Access the app in your browser at ```http://localhost:8501```
@@ -59,9 +77,9 @@ To use this project, follow these steps:
#### YouTube Video Transcript Retrieval:
- - **Input Video Link:** Users can easily provide a YouTube video link to the application. The system automatically extracts the video ID from the URL and prepares the request for the transcript.
+ - **Input Video Link:** Users can easily provide a YouTube video link to the application. The system automatically extracts the video ID from various YouTube URL formats (standard, shortened, embed) and prepares the request for the transcript.
- - **Transcript Language Detection:** Using the `YouTube Transcript API`, the application detects all available transcript languages for the given video. This ensures that users can choose their preferred language for summarization.
+ - **Transcript Language Detection:** Using the `YouTube Transcript API` (v1.2.3+), the application detects all available transcript languages for the given video. The updated implementation uses the modern API workflow with instance methods for improved reliability.
- **Language Conversion:** The detected language codes are transformed into human-readable names using the `Langcodes` library, allowing users to effortlessly identify and select their preferred transcript language.
@@ -75,7 +93,7 @@ To use this project, follow these steps:
#### AI-Powered Summarization:
- - **Generative AI Model:** The project incorporates Google's Gemini AI `gemini-pro` model to generate summaries. The model processes the video transcript along with a carefully crafted prompt to deliver concise, accurate, and context-aware summaries, eliminating the need for users to watch the entire video.
+ - **Generative AI Model:** The project incorporates Google's Gemini AI `gemini-2.5-flash` model to generate summaries. The model processes the video transcript along with a carefully crafted prompt to deliver concise, accurate, and context-aware summaries, eliminating the need for users to watch the entire video.
- **Custom Prompting:** The system uses an intelligently designed prompt that guides the AI in producing relevant summaries, ensuring the key points from the video are captured and presented clearly.
@@ -87,6 +105,15 @@ To use this project, follow these steps:
- **Real-Time Interaction:** The application provides real-time feedback and results, allowing users to receive their video summaries almost instantly. This makes the experience not only efficient but also highly responsive to user actions.
+#### Recent Updates & Improvements:
+
+ - **API Compatibility:** Updated to support the latest YouTube Transcript API (v1.2.3+) with modern instance-based methods
+ - **AI Model Upgrade:** Migrated from deprecated `gemini-pro` to current `gemini-2.5-flash` model for improved performance
+ - **Enhanced Security:** Added `.gitignore` and `.env.example` for better API key management and repository security
+ - **Code Quality:** Fixed deprecation warnings and improved error handling throughout the application
+ - **Documentation:** Improved setup instructions and added comprehensive environment configuration guide
+
+
#### References:
- Streamlit: [https://docs.streamlit.io/](https://docs.streamlit.io/)
diff --git a/app.py b/app.py
index 615abd5..ed5ae58 100644
--- a/app.py
+++ b/app.py
@@ -1,18 +1,21 @@
import os
+import re
import langcodes
import google.generativeai as genai
import streamlit as st
from streamlit_extras.add_vertical_space import add_vertical_space
from dotenv import load_dotenv
from youtube_transcript_api import YouTubeTranscriptApi
-from warnings import filterwarnings
+from urllib.parse import urlparse, parse_qs
+
+import config
def streamlit_config():
# page configuration
- st.set_page_config(page_title='YouTube')
+ st.set_page_config(page_title=config.PAGE_TITLE)
# page header transparent color and Removes top padding
page_background_color = """
@@ -32,78 +35,152 @@ def streamlit_config():
st.markdown(page_background_color, unsafe_allow_html=True)
# title and position
- add_vertical_space(2)
- st.markdown(f'
YouTube Transcript Summarizer with GenAI
',
+ add_vertical_space(config.VERTICAL_SPACE_MEDIUM)
+ st.markdown(f'{config.APP_TITLE}
',
unsafe_allow_html=True)
- add_vertical_space(2)
+ add_vertical_space(config.VERTICAL_SPACE_MEDIUM)
+
+
+def extract_video_id(video_link):
+ """
+ Extract video ID from various YouTube URL formats.
+ Supports:
+ - https://www.youtube.com/watch?v=VIDEO_ID
+ - https://youtu.be/VIDEO_ID
+ - https://www.youtube.com/embed/VIDEO_ID
+ - https://www.youtube.com/v/VIDEO_ID
+ """
+ try:
+ # Pattern for youtube.com URLs
+ if 'youtube.com' in video_link:
+ parsed_url = urlparse(video_link)
+ if parsed_url.path == '/watch':
+ video_id = parse_qs(parsed_url.query).get('v')
+ if video_id:
+ return video_id[0]
+ elif '/embed/' in parsed_url.path:
+ return parsed_url.path.split('/embed/')[1].split('?')[0]
+ elif '/v/' in parsed_url.path:
+ return parsed_url.path.split('/v/')[1].split('?')[0]
+
+ # Pattern for youtu.be URLs
+ elif 'youtu.be' in video_link:
+ parsed_url = urlparse(video_link)
+ return parsed_url.path.lstrip('/')
+
+ # If it's already just the video ID (11 characters)
+ elif re.match(r'^[A-Za-z0-9_-]{11}$', video_link.strip()):
+ return video_link.strip()
+
+ return None
+
+ except Exception as e:
+ return None
+
+
+def _get_transcript_list(video_id):
+ """
+ Helper function to get transcript list for a YouTube video.
+ Returns TranscriptList object or None on error.
+ """
+ try:
+ ytt_api = YouTubeTranscriptApi()
+ return ytt_api.list(video_id)
+ except Exception as e:
+ st.error(config.ERROR_TRANSCRIPT_FETCH.format(error=str(e)))
+ return None
def extract_languages(video_id):
+ """
+ Extract available transcript languages for a YouTube video.
+ Returns tuple of (language_list, language_dict) or (None, None) on error.
+ """
+ # Get transcript list using helper function
+ transcript_list = _get_transcript_list(video_id)
+ if not transcript_list:
+ return None, None
- # Fetch the List of Available Transcripts for Given Video
- transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
+ try:
+ # Extract the Language Codes from List ---> ['en','ta']
+ available_transcripts = [i.language_code for i in transcript_list]
- # Extract the Language Codes from List ---> ['en','ta']
- available_transcripts = [i.language_code for i in transcript_list]
+ # Convert Language_codes to Human-Readable Language_names ---> 'en' into 'English'
+ language_list = list({langcodes.Language.get(i).display_name() for i in available_transcripts})
- # Convert Language_codes to Human-Readable Language_names ---> 'en' into 'English'
- language_list = list({langcodes.Language.get(i).display_name() for i in available_transcripts})
+ # Create a Dictionary Mapping Language_names to Language_codes
+ language_dict = {langcodes.Language.get(i).display_name():i for i in available_transcripts}
- # Create a Dictionary Mapping Language_names to Language_codes
- language_dict = {langcodes.Language.get(i).display_name():i for i in available_transcripts}
+ return language_list, language_dict
- return language_list, language_dict
+ except Exception as e:
+ st.error(config.ERROR_TRANSCRIPT_FETCH.format(error=str(e)))
+ return None, None
def extract_transcript(video_id, language):
-
+ """
+ Extract transcript text for a YouTube video in specified language.
+ Returns transcript string or None on error.
+ """
+ # Get transcript list using helper function
+ transcript_list = _get_transcript_list(video_id)
+ if not transcript_list:
+ return None
+
try:
- # Request Transcript for YouTube Video using API
- transcript_content = YouTubeTranscriptApi.get_transcript(video_id=video_id, languages=[language])
-
+ # Find transcript in the specified language
+ transcript = transcript_list.find_transcript([language])
+
+ # Fetch the actual transcript content
+ transcript_content = transcript.fetch()
+
# Extract Transcript Content from JSON Response and Join to Single Response
- transcript = ' '.join([i['text'] for i in transcript_content])
+ transcript_text = ' '.join([i.text for i in transcript_content])
+
+ return transcript_text
- return transcript
-
-
except Exception as e:
- add_vertical_space(5)
- st.markdown(f'{e}
', unsafe_allow_html=True)
+ st.error(config.ERROR_TRANSCRIPT_EXTRACT.format(error=str(e)))
+ return None
def generate_summary(transcript_text):
-
+ """
+ Generate AI-powered summary using Google Gemini.
+ Returns summary string or None on error.
+ """
try:
+ # Check if API key exists
+ api_key = os.getenv(config.GOOGLE_API_KEY_ENV)
+ if not api_key:
+ st.error(config.ERROR_API_KEY_MISSING)
+ return None
+
# Configures the genai Library
- genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
+ genai.configure(api_key=api_key)
- # Initializes a Gemini-Pro Generative Model
- model = genai.GenerativeModel(model_name = 'gemini-pro')
+ # Initializes a Gemini 2.5 Flash Generative Model
+ model = genai.GenerativeModel(model_name=config.GEMINI_MODEL)
# Define a Prompt for AI Model
- prompt = """You are a YouTube video summarizer. You will be taking the transcript text and summarizing the entire video,
- providing the important points are proper sub-heading in a concise manner (within 500 words).
- Please provide the summary of the text given here: """
-
+ prompt = config.SUMMARY_PROMPT_TEMPLATE.format(word_limit=config.SUMMARY_WORD_LIMIT)
+
response = model.generate_content(prompt + transcript_text)
return response.text
except Exception as e:
- add_vertical_space(5)
- st.markdown(f'{e}
', unsafe_allow_html=True)
+ st.error(config.ERROR_SUMMARY_GENERATE.format(error=str(e)))
+ return None
def main():
- # Filter the Warnings
- filterwarnings(action='ignore')
-
# Load the Environment Variables
load_dotenv()
@@ -112,67 +189,73 @@ def main():
# Initialize the Button Variable
button = False
+ video_id = None
+ language = None
with st.sidebar:
- image_url = 'https://raw.githubusercontent.com/gopiashokan/YouTube-Video-Transcript-Summarizer-with-GenAI/main/image/youtube_banner.JPG'
- st.image(image_url, use_column_width=True)
- add_vertical_space(2)
+ image_url = config.BANNER_IMAGE_PATH
+ st.image(image_url, use_container_width=True)
+ add_vertical_space(config.VERTICAL_SPACE_MEDIUM)
- # Get YouTube Video Link From User
- video_link = st.text_input(label='Enter YouTube Video Link')
+ # Get YouTube Video Link From User
+ video_link = st.text_input(label=config.LABEL_VIDEO_LINK)
if video_link:
# Extract the Video ID From URL
- video_id = video_link.split('=')[1].split('&')[0]
-
- # Extract Language from Video_ID
- language_list, language_dict = extract_languages(video_id)
-
- # User Select the Transcript Language
- language_input = st.selectbox(label='Select Transcript Language',
- options=language_list)
-
- # Get Language_code from Dict
- language = language_dict[language_input]
-
- # Click Submit Button
- add_vertical_space(1)
- button = st.button(label='Submit')
-
+ video_id = extract_video_id(video_link)
+
+ if not video_id:
+ st.error(config.ERROR_INVALID_URL)
+ else:
+ # Extract Language from Video_ID
+ language_list, language_dict = extract_languages(video_id)
+
+ if language_list and language_dict:
+ # User Select the Transcript Language
+ language_input = st.selectbox(label=config.LABEL_SELECT_LANGUAGE,
+ options=language_list)
+
+ # Get Language_code from Dict
+ language = language_dict[language_input]
+
+ # Click Submit Button
+ add_vertical_space(config.VERTICAL_SPACE_SMALL)
+ button = st.button(label=config.BUTTON_SUBMIT)
+
# User Enter the Video Link and Click Submit Button
- if button and video_link:
-
+ if button and video_link and video_id and language:
+
# UI Split into Columns
- _, col2, _ = st.columns([0.07,0.83,0.1])
+ _, col2, _ = st.columns([config.LAYOUT_LEFT_MARGIN, config.LAYOUT_CENTER_WIDTH, config.LAYOUT_RIGHT_MARGIN])
# Display the Video Thumbnail Image
with col2:
- st.image(image=f'http://img.youtube.com/vi/{video_id}/0.jpg',
- use_column_width=True)
+ st.image(image=config.YOUTUBE_THUMBNAIL_URL_TEMPLATE.format(video_id=video_id),
+ use_container_width=True)
# Extract Transcript from YouTube Video
- add_vertical_space(2)
- with st.spinner(text='Extracting Transcript...'):
+ add_vertical_space(config.VERTICAL_SPACE_MEDIUM)
+ with st.spinner(text=config.SPINNER_EXTRACTING):
transcript_text = extract_transcript(video_id, language)
+ if not transcript_text:
+ st.error(config.ERROR_TRANSCRIPT_FAILED_GENERIC)
+ return
+
# Generating Summary using Gemini AI
- with st.spinner(text='Generating Summary...'):
+ with st.spinner(text=config.SPINNER_GENERATING):
summary = generate_summary(transcript_text)
# Display the Summary
if summary:
st.write(summary)
+ else:
+ st.error(config.ERROR_SUMMARY_FAILED_GENERIC)
if __name__ == '__main__':
-
- try:
- main()
-
- except Exception as e:
- add_vertical_space(5)
- st.markdown(f'{e}
', unsafe_allow_html=True)
+ main()
diff --git a/config.py b/config.py
new file mode 100644
index 0000000..1f30c1c
--- /dev/null
+++ b/config.py
@@ -0,0 +1,86 @@
+"""
+Application configuration constants.
+
+This module centralizes all configuration values, magic strings, and constants
+for the YouTube Transcript Summarizer application. This makes the application
+more maintainable and easier to customize.
+"""
+
+# ============================================================================
+# UI Configuration
+# ============================================================================
+
+APP_TITLE = "YouTube Transcript Summarizer with GenAI"
+PAGE_TITLE = "YouTube"
+
+# ============================================================================
+# Image URLs and Assets
+# ============================================================================
+
+# Local banner image (faster, more reliable than external URL)
+BANNER_IMAGE_PATH = "image/youtube_banner.JPG"
+
+# YouTube thumbnail URL template (HTTPS for security)
+YOUTUBE_THUMBNAIL_URL_TEMPLATE = "https://img.youtube.com/vi/{video_id}/0.jpg"
+
+# ============================================================================
+# AI Configuration
+# ============================================================================
+
+# Google Gemini model name
+GEMINI_MODEL = "gemini-2.5-flash"
+
+# Summary generation prompt template
+SUMMARY_PROMPT_TEMPLATE = """You are a YouTube video summarizer. You will be taking the transcript text and summarizing the entire video,
+ providing the important points with proper sub-headings in a concise manner (within {word_limit} words).
+ Please provide the summary of the text given here: """
+
+# Summary word limit
+SUMMARY_WORD_LIMIT = 500
+
+# ============================================================================
+# API Configuration
+# ============================================================================
+
+# Environment variable name for Google API key
+GOOGLE_API_KEY_ENV = "GOOGLE_API_KEY"
+
+# ============================================================================
+# Layout Configuration
+# ============================================================================
+
+# Column layout ratios for centered content display
+LAYOUT_LEFT_MARGIN = 0.07
+LAYOUT_CENTER_WIDTH = 0.83
+LAYOUT_RIGHT_MARGIN = 0.10
+
+# Vertical spacing units
+VERTICAL_SPACE_SMALL = 1
+VERTICAL_SPACE_MEDIUM = 2
+
+# ============================================================================
+# Error Messages
+# ============================================================================
+
+ERROR_INVALID_URL = "Invalid YouTube URL. Please enter a valid YouTube video link."
+ERROR_API_KEY_MISSING = f"Google API key not found. Please add {GOOGLE_API_KEY_ENV} to your .env file."
+ERROR_TRANSCRIPT_FETCH = "Error fetching transcripts: {error}"
+ERROR_TRANSCRIPT_EXTRACT = "Error extracting transcript: {error}"
+ERROR_SUMMARY_GENERATE = "Error generating summary: {error}"
+ERROR_TRANSCRIPT_FAILED_GENERIC = "Failed to extract transcript. Please try again."
+ERROR_SUMMARY_FAILED_GENERIC = "Failed to generate summary. Please try again."
+
+# ============================================================================
+# Input Labels
+# ============================================================================
+
+LABEL_VIDEO_LINK = "Enter YouTube Video Link"
+LABEL_SELECT_LANGUAGE = "Select Transcript Language"
+BUTTON_SUBMIT = "Submit"
+
+# ============================================================================
+# Spinner Messages
+# ============================================================================
+
+SPINNER_EXTRACTING = "Extracting Transcript..."
+SPINNER_GENERATING = "Generating Summary..."
diff --git a/requirements.txt b/requirements.txt
index 0e28e34..bbc5194 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,7 +1,7 @@
python-dotenv
streamlit
streamlit-extras
-youtube-transcript-api
+youtube-transcript-api>=0.6.0
google-generativeai
langcodes
language_data
diff --git a/run.bat b/run.bat
new file mode 100644
index 0000000..1754cdd
--- /dev/null
+++ b/run.bat
@@ -0,0 +1 @@
+streamlit run app.py
\ No newline at end of file