add database feature #381

prajeeta15 · 2025-10-28T16:09:29Z

Issue #315

Changes Proposed

New Database Format for --save Flag
Added database as a new choice for the --save argument
Location: main.py lines 138-139
Core Database Module
Created src/torbot/modules/database.py
Implements SearchResultsDatabase class for SQLite management
No external database server required (uses built-in sqlite3)
Integration with LinkTree
Added saveDatabase() method in src/torbot/modules/linktree.py (lines 159-195)
Extracts all discovered links and metadata for persistent storage
Query Utilities
Created src/torbot/modules/db_query.py for result retrieval
Created scripts/query_database.py CLI for database operations

Explanation of Changes

Database Engine & Architecture
SQLite (file-based, no server)<project_root>/torbot_search_results.db
Auto-initialized on first use

Database Schema
searches Table (Search Metadata)

- id (INTEGER PRIMARY KEY): Auto-incrementing search ID
- root_url (TEXT): The root URL that was crawled
- search_timestamp (DATETIME): ISO 8601 formatted timestamp of search
- depth (INTEGER): Crawl depth setting used
- total_links (INTEGER): Count of total links discovered
- links_data (TEXT): JSON array of all link metadata
- created_at (DATETIME): Record creation timestamp

links Table (Individual Link Records)

- id (INTEGER PRIMARY KEY): Auto-incrementing link ID
- search_id (INTEGER): Foreign key referencing searches table
- url (TEXT): Full URL of discovered link
- title (TEXT): Page title or hostname
- status_code (INTEGER): HTTP response code (200, 404, etc.)
- classification (TEXT): Content classification from NLP module
- accuracy (REAL): Classification confidence score (0.0-1.0)
- emails (TEXT): JSON array of emails found on page
- phone_numbers (TEXT): JSON array of phone numbers found

Relationship: One search has many links (1:N relationship with CASCADE delete)

Metadata Captured Per Search

Root-Level Metadata:
✅ Root URL being crawled
✅ Exact timestamp of search (ISO 8601)
✅ Crawl depth configuration
✅ Total link count

Per-Link Metadata:
✅ Full URL
✅ Page title
✅ HTTP status code (connectivity indicator)
✅ Content classification (marketplace, forum, etc.)
✅ Classification accuracy/confidence
✅ Email addresses extracted
✅ Phone numbers extracted

Core Features:

Save Results -> searchResultsDatabase.save_search_results()->Stores search + links
Retrieve History -> get_search_history() -> Query with optional URL filter
Get Details -> get_search_by_id() - Full search details with all links
Close Connection -> close() -> Proper resource cleanup

Usage
Basic Save:

python main.py -u http://example.onion --depth 2 --save database

Benefits:

Persistence: Search results survive program restarts
Auditability: Full timestamp history of all crawls
Queryability: Filter and search previous results
Scalability: SQLite handles thousands of records efficiently
No Dependencies: Uses Python's built-in sqlite3 module
Relationship Integrity: Foreign keys prevent orphaned records
Export Ready: JSON data format enables easy integration with other tools

add database feature

67c7730

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add database feature #381

add database feature #381

Uh oh!

prajeeta15 commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

add database feature #381

Are you sure you want to change the base?

add database feature #381

Uh oh!

Conversation

prajeeta15 commented Oct 28, 2025

Changes Proposed

Explanation of Changes

Metadata Captured Per Search

Benefits:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant