Skip to content

Conversation

@HollywoodTonight
Copy link
Contributor

Please ensure your pull request adheres to the following guidelines:

  • make sure to link the related issues in this description
  • when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes

Related Issues

Thanks for contributing!

@HollywoodTonight HollywoodTonight requested a review from a team November 4, 2025 11:28
HollywoodTonight and others added 5 commits November 14, 2025 15:14
- Add rank and traffic fields to AuditUrl schema (optional, nullable)
- Implement sortAuditUrls static method for sorting by multiple fields
- Add allBySiteIdSorted and allBySiteIdAndSourceSorted methods
- Update allBySiteIdAndAuditType to support sorting
- Add TypeScript definitions for new fields and methods
- Add 20 comprehensive unit tests for sorting functionality
- All tests passing (1147 total)
- Code coverage: 98.01% for audit-url module
- Add NODE_OPTIONS with --max-old-space-size=4096 to prevent heap out of memory errors
- Fixes FATAL ERROR: JavaScript heap out of memory in CI pipeline
Add platformType field to AuditUrl schema to categorize URLs as primary-site
or offsite platforms (Wikipedia, YouTube, social media, etc.).

Changes:
- Add platformType attribute with 11 supported platform types
- Add GSI for efficient querying by siteId and platformType
- Add collection methods: allBySiteIdAndPlatform(), allOffsiteUrls()
- Add model helper methods: isOffsitePlatform(), isPlatformType()
- Export PLATFORM_TYPES constant
- Update TypeScript definitions
- Add 33 comprehensive unit tests

Platform types supported:
primary-site, wikipedia, youtube-channel, reddit-community,
facebook-page, twitter-profile, linkedin-company, instagram-account,
tiktok-account, github-org, medium-publication

All methods support sorting and pagination.
Copy link
Member

@nitinja nitinja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cant find any issues. Modern JS optional chaining can be used to shorten some of the functions like

this.getPlatformType?.() ?? this.platformType

@nitinja
Copy link
Member

nitinja commented Nov 18, 2025

I briefly discussed this with @ravkiran
We need a way to store one or more custom fields, just like Opportunity model has data field - to store fields specific to a particular type of audit.
In case of high value pages, it is AI rationale (and possibly couple more such fields like url-score), but it could be anything else.

@iuliag iuliag changed the title URL Store feat: URL store data model and data access Nov 18, 2025
Replace the 'source' string attribute with 'byCustomer' boolean to simplify
the data model and clearly distinguish between customer-added (true) and
system-added (false) URLs.

Changes:
- Schema: Replace 'source' with 'byCustomer' (boolean, default: true)
- Model: Replace isManualSource() with isCustomerUrl()
- Collection: Replace allBySiteIdAndSource with allBySiteIdByCustomer
- Collection: Replace allBySiteIdAndSourceSorted with allBySiteIdByCustomerSorted
- Collection: Replace removeForSiteIdAndSource with removeForSiteIdByCustomer
- Update GSI index from siteId+source to siteId+byCustomer
- Update TypeScript definitions

Migration mapping:
- source='manual' → byCustomer=true
- source='sitemap'/'discovery'/other → byCustomer=false
- Update test fixtures to use byCustomer instead of source
- Update model tests to use isCustomerUrl instead of isManualSource
- Update collection tests for byCustomer methods
- Update all unit tests for model and collection
- Update all integration tests
- Replace source with byCustomer throughout
- Replace isManualSource with isCustomerUrl
- Replace allBySiteIdAndSource with allBySiteIdByCustomer
- Replace removeForSiteIdAndSource with removeForSiteIdByCustomer
Resolve conflict in test/it/util/db.js:
- Use main branch's approach for disabling debug logging
- Keep console as default logger and disable debug inline
- Both approaches achieve same goal of preventing memory issues
Copy link
Contributor

@iuliag iuliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some observations below about the areas to cleanup for this iteration and having audits as a set to be able to filter at query time.

* @returns {boolean} True if the URL was added by a customer.
*/
isCustomerUrl() {
const byCustomer = this.getByCustomer ? this.getByCustomer() : this.byCustomer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this kind of check here and above in the file, i.e. if function not falsy then call function, otherwise return as property?

required: true,
default: [],
})
.addAttribute('rank', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove rank and traffic.
There are several "traffic" metrics that could be associated with a URL (organic, paid, agentic etc.), so these should end up as custom fields when these will be implemented.
There's no absolute rank for URLs, it all depends on use case.


// Get values using getter methods if available
switch (sortBy) {
case 'rank':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default: true,
})
.addAttribute('audits', {
type: 'list',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you turn this into a set and add a GSI by audits, you should be able to filter directly at query time.


/**
* Gets all audit URLs for a site that have a specific audit type enabled.
* Note: This performs filtering after retrieval since audits is an array.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you specify the audits as a set with a GSI then you should be able to filter directly when querying DynamoDB.

});

it('finds one audit URL by id', async () => {
const auditUrl = await AuditUrl.findById(sampleData.auditUrls[0].getId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the siteId + URL the composite primary key for the entity? Do we need to have an additional "id" for it?

4. Get URLs by audit type: allBySiteIdAndAuditType(siteId, auditType) - filtered in code

Indexes:
- Primary: siteId (PK) + url (SK) - for unique identification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this enforced from the schema ?

required: false,
default: null,
})
.addAttribute('createdAt', {
Copy link
Contributor

@sandsinh sandsinh Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant, schema builder already adds this attribute, same for updatedAt, updatedBy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants