Skip to content

Conversation

@caidanw
Copy link
Contributor

@caidanw caidanw commented Dec 2, 2025

Summary

Validate lexions DID authority through a DNS check, and validate that the lexicon matches the zod schema from the official atproto sdk.

Also added a dev compose file, and squashed the db schema since there is no production data yet.

…n ingest route

- Use LexiconSchemaRecord type and type guard for incoming records
- Resolve and check NSID DID authority matches record DID
- Return error if record is invalid or authority does not match
- Refactor to use new @atproto/lexicon and @atproto/lexicon-resolver utilities
…r NSID resolution

Remove extension from @atproto/lexicon-resolver type and explicitly define LexiconSchemaRecord with required 'id' property. Update type guard and documentation to clarify purpose for DID authority resolution.
…nd zod in ingest route

Parse incoming lexicon records with parseLexiconDoc, validate with zod, and improve error handling for invalid records. Store parsed LexiconDoc in the database instead of raw input.
- Replace single lexicons table with valid_lexicons and invalid_lexicons tables
- Add repo_did to primary key [nsid, cid, repo_did] to track lexicon migrations
- Store validation errors in invalid_lexicons for developer debugging
- Use different column names (data vs raw_data) for semantic clarity
- Add indexes on nsid and repo_did for both tables
- Include repo_rev field to track repository state at ingestion time
- Store valid lexicons in valid_lexicons table after successful parsing
- Store invalid lexicons in invalid_lexicons table with validation errors
- DNS validation acts as gate before any storage (prevents DDOS)
- Include repo_did and repo_rev in all stored records
- Improve logging to distinguish valid vs invalid ingestion events
- Remove onConflictDoUpdate logic (primary key now includes repo_did)
- Combine initial lexicons table and updated schema into one migration
- Remove intermediate migration file (0001)
- Clean schema shows final valid_lexicons and invalid_lexicons tables
…om Nexus and Postgres settings

Provides a Docker Compose override for development, disabling lexhub in Docker, reducing resource usage for Nexus and Postgres, and enabling debug logging and SQL query logging.
- Check for error shape (has 'issues' array) instead of instanceof check
- parseLexiconDoc throws errors that may not pass instanceof z.ZodError check
- Add onConflictDoNothing to invalid_lexicons insert
- Ensures invalid lexicons are now properly stored for debugging
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive lexicon validation to the ATProto lexicon hub by implementing DNS-based DID authority verification and Zod schema validation. The changes transform the simple lexicon storage system into a robust validation pipeline that separates valid from invalid lexicons while maintaining detailed debugging information.

Key Changes

  • Two-tier validation: DNS authority check followed by schema validation using @atproto/lexicon parser
  • Dual table architecture: Separate valid_lexicons and invalid_lexicons tables for production data and debugging
  • Enhanced schema: Added repo_did and repo_rev tracking with composite primary keys to handle lexicon migrations

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/util/lexicon.ts New type guard and interface for lexicon schema records
src/db/schema.ts Complete schema refactor with separate valid/invalid lexicon tables and migration tracking
src/app/api/ingest/route.ts Added DNS validation and schema parsing with error handling for invalid lexicons
package.json Added @atproto dependencies for lexicon parsing and DNS resolution
package-lock.json Dependency lockfile updates for new @atproto packages
drizzle/0000_init.sql Squashed migration creating dual table structure with appropriate indexes
drizzle/meta/_journal.json Updated migration timestamp
compose.override.yaml New development configuration for local development workflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@caidanw caidanw requested a review from Copilot December 2, 2025 20:33
Copilot finished reviewing on behalf of caidanw December 2, 2025 20:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -1,26 +1,31 @@
CREATE TABLE "lexicons" (
"id" varchar(317) NOT NULL,
-- Create valid_lexicons table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably don't need this comment 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with a couple others

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be in .gitignore?

Copy link
Contributor

@elijaharita elijaharita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 potential things otherwise looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants