Artificial Intelligence Laboratory course of the 6th semester's project.
This project focuses on creating an extractive text summarization model using Term Frequency-Inverse Document Frequency (TF-IDF) to generate concise summaries from large textual datasets. .
tf_idf_built_in_function.ipynb: Implementation of the TF-IDF algorithm using built-in Python functions.tf_idf_raw_code.ipynb: Manual implementation of the TF-IDF algorithm from scratch.
Images Folder:output.png: The output of the summarization process.process_flow.png: A visual representation of the process flow.
ai_lab_final_report: The final report available in.docx,.pdf, and.zipformats for LaTeX.
-
Generated Text Data Folder:bangladesh_small.txt: Sample text data used in the project.
-
text_pre_processing.ipynb: Jupyter notebook for preprocessing text data. -
web_scraper.ipynb: Jupyter notebook for scraping text data from the web.
final_presentation.pptx: The final presentation for the project.
ai_proposal_cover: The cover page of the project proposal in.docxand.pdfformats.ai_proposal_main: The main content of the project proposal in.docxand.pdfformats.
reference_pdf.zip: A collection of research papers and other reference materials for convenience.
Note: The reference_pdf.zip file contains some research papers that were downloaded in PDF format for convenience and were used for this project.
- These PDFs and materials may be subject to copyright.
- I do not own these materials nor do I have permission to distribute them.
- They are provided solely for educational purposes, to facilitate access to reference papers.
- Please cite these sources appropriately if you use them.
-
Code Execution:
- The code for the project is located in the
Codefolder. - Use
tf_idf_raw_code.ipynbto explore the raw implementation. - Use
tf_idf_built_in_function.ipynbfor a version using built-in functions.
- The code for the project is located in the
-
Preprocessing:
- The
Preprocessingfolder contains the scripts used to clean and preprocess the text data. text_pre_processing.ipynbhandles text data cleaning.web_scraper.ipynbis used to scrape data from web sources.
- The
-
Final Report:
- The
Final Reportfolder contains the final documentation of the project. - You can find
output.pngandprocess_flow.pngin theImagesfolder. - The final report is available in
.docs,.pdf, and.zip(for LaTeX) formats.
- The
-
Presentation:
- The
Presentationfolder includesfinal_presentation.pptxwhich summarizes the project for presentations.
- The
-
Project Proposal:
- The
Project Proposalfolder contains the proposal documents in both.docxand.pdfformats.
- The
Enter size of your summary: 3
3 lines sized summary:
Sentence: Russell's viper (Daboia russelii) is responsible for nearly half of snakebites in neighboring India, but in Bangladesh, where it’s known as chandra bora, it was thought to be an exceedingly rare species for more than a century.
Sentence: Hospitals in rural Bangladesh have reported an increase in people being bitten by snakes, especially by the Russell's viper, which is found in South Asia.
Sentence: A series of stories have been making rounds on social media, of people dying in different parts of Bangladesh from the bite of the Russell's viper, a venomous snake.
Feel free to fork this repository, create a new branch, and submit pull requests.
This project is open-source and available under the MIT License.