-
Notifications
You must be signed in to change notification settings - Fork 1.1k
chore: pre-release v0.0.28 #907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRemoved the integration route and its registration; markdown conversion functions now require a Playwright Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25–35 minutes
Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
server/src/markdownify/scrape.ts (2)
5-5: UsePagetype instead ofanyfor type safety.Since
Pageis already imported, update the parameter type for consistency and better type checking.-async function gotoWithFallback(page: any, url: string) { +async function gotoWithFallback(page: Page, url: string) {
80-125: Consider extracting duplicated sanitization logic.The
page.evaluateblock (lines 86-118) is identical to lines 32-64 inconvertPageToMarkdown. Extract this to a shared helper for maintainability.+async function sanitizePageContent(page: Page): Promise<string> { + return page.evaluate(() => { + const selectors = [ + "script", "style", "link[rel='stylesheet']", "noscript", "meta", + "svg", "img", "picture", "source", "video", "audio", "iframe", "object", "embed" + ]; + selectors.forEach(sel => document.querySelectorAll(sel).forEach(e => e.remove())); + document.querySelectorAll("*").forEach(el => { + [...el.attributes].forEach(attr => { + if (attr.name.startsWith("on")) el.removeAttribute(attr.name); + }); + }); + return document.documentElement.outerHTML; + }); +} export async function convertPageToMarkdown(url: string, page: Page): Promise<string> { try { logger.log('info', `[Scrape] Using existing page instance for markdown conversion of ${url}`); await gotoWithFallback(page, url); - const cleanedHtml = await page.evaluate(() => { /* ... duplicated code ... */ }); + const cleanedHtml = await sanitizePageContent(page); const markdown = await parseMarkdown(cleanedHtml, url); return markdown; } catch (error: any) { /* ... */ } } export async function convertPageToHTML(url: string, page: Page): Promise<string> { try { logger.log('info', `[Scrape] Using existing page instance for HTML conversion of ${url}`); await gotoWithFallback(page, url); - const cleanedHtml = await page.evaluate(() => { /* ... duplicated code ... */ }); + const cleanedHtml = await sanitizePageContent(page); return cleanedHtml; } catch (error: any) { /* ... */ } }server/src/browser-management/classes/RemoteBrowser.ts (1)
551-561: Adblock/CDP init: add a small safety guard and confirm enable/disable intentThe fallback to always establish a CDP session even if adblocker init fails looks good. Two small nits here:
this.currentPageis typed asPage | null | undefined, but you’re casting it toanythree times. A simple null check + localpagevariable would keep this type‑safe and clearer.- You enable blocking and then immediately disable it; that means you won’t actually have adblocking on for subsequent browsing. If the goal is just a one‑off “warm up”, that’s fine, but if the intent was to keep ads blocked, this behavior might be surprising.
You can keep current behavior but tighten things up like this:
- await blocker.enableBlockingInPage(this.currentPage as any); - this.client = await this.currentPage.context().newCDPSession(this.currentPage); - await blocker.disableBlockingInPage(this.currentPage as any); + if (!this.currentPage) { + throw new Error('No page available for adblocker initialization'); + } + + const page = this.currentPage; + await blocker.enableBlockingInPage(page as any); + this.client = await page.context().newCDPSession(page); + await blocker.disableBlockingInPage(page as any);If you do want adblocking to remain active for the lifetime of the page, consider dropping the
disableBlockingInPagecall (or moving it to a more explicit teardown path instead of right after initialization).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
server/src/browser-management/classes/RemoteBrowser.ts(1 hunks)server/src/markdownify/scrape.ts(2 hunks)server/src/routes/storage.ts(0 hunks)server/src/server.ts(1 hunks)
💤 Files with no reviewable changes (1)
- server/src/routes/storage.ts
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/markdownify/scrape.ts (1)
server/src/markdownify/markdown.ts (1)
parseMarkdown(1-98)
🔇 Additional comments (3)
server/src/markdownify/scrape.ts (2)
1-1: LGTM! Proper type import for the new function signature.The
Pagetype import fromplaywright-corecorrectly types the new parameter.
26-72: LGTM! Good refactor to accept external Page instance.The change to accept an existing
Pageinstance instead of managing browser lifecycle internally is a solid improvement for resource management. Error handling with logging and re-throwing is appropriate.server/src/server.ts (1)
8-8: Integration route cleanly removed from server.The import statement on line 8 and routes exports have been properly updated. The integration route no longer exists in the codebase, and there are no dangling references to the route handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
Dockerfile.backend(1 hunks)docker-compose.yml(2 hunks)maxun-core/package.json(1 hunks)package.json(3 hunks)server/src/api/record.ts(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- maxun-core/package.json
- docker-compose.yml
🔇 Additional comments (2)
package.json (1)
3-3: Version bump looks good.Consistent pre-release patch bump: root version and maxun-core dependency both updated to 0.0.28. No breaking dependency changes introduced.
Also applies to: 55-55
server/src/api/record.ts (1)
14-14: LGTM. The change fromplaywrighttoplaywright-coreis appropriate for this type-only import. The Page type definition remains compatible, and playwright-core 1.57.0 is a documented release version.Note: playwright-core 1.57.0 includes breaking changes in glob URL patterns, route.continue() cookie handling, and Page#accessibility removal—verify these don't affect other parts of the codebase if they use routing or accessibility APIs.
| @@ -1,4 +1,4 @@ | |||
| FROM --platform=$BUILDPLATFORM mcr.microsoft.com/playwright:v1.46.0-noble | |||
| FROM --platform=$BUILDPLATFORM node:20-slim | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Search for browser launch patterns that would require local Chromium
rg -n "chromium://|launch\(|launchPersistentContext" --type=ts --type=js -g '!node_modules' -g '!dist' | head -50Repository: getmaxun/maxun
Length of output: 159
🏁 Script executed:
# Search for CDP and remote browser usage patterns
rg -n "connectOverCDP|RemoteBrowser|remote.*browser" --type=ts --type=js -g '!node_modules' -g '!dist' -C 2 | head -100Repository: getmaxun/maxun
Length of output: 7400
🏁 Script executed:
# Check the Dockerfile.backend for any system dependency installation
cat Dockerfile.backendRepository: getmaxun/maxun
Length of output: 630
🏁 Script executed:
# Look for docker-compose or orchestration files that might provide browser services
fd -e docker-compose.yml -e docker-compose.yaml -o compose.yml -o compose.yaml | head -20Repository: getmaxun/maxun
Length of output: 199
🏁 Script executed:
# Get context around the chromium.launch() call
cat -n server/src/browser-management/browserConnection.ts | sed -n '40,80p'Repository: getmaxun/maxun
Length of output: 1873
🏁 Script executed:
# Check the full browserConnection.ts file to understand the context
wc -l server/src/browser-management/browserConnection.tsRepository: getmaxun/maxun
Length of output: 112
🏁 Script executed:
# Look at the full file if it's not too large
cat -n server/src/browser-management/browserConnection.tsRepository: getmaxun/maxun
Length of output: 6979
🏁 Script executed:
# Search for error handling or fallback patterns around this launch call
rg -n "chromium\.launch|catch|fallback|CDP" server/src/browser-management/browserConnection.ts -C 5Repository: getmaxun/maxun
Length of output: 3388
🏁 Script executed:
# Check the RemoteBrowser class/interface definition
rg -n "class RemoteBrowser|interface RemoteBrowser" --type=ts -C 10Repository: getmaxun/maxun
Length of output: 3148
Remove local Chromium fallback or add Chromium binaries to Dockerfile.backend.
Switching from mcr.microsoft.com/playwright:v1.46.0-noble to node:20-slim creates a critical runtime failure. The code in server/src/browser-management/browserConnection.ts implements a fallback mechanism: it attempts to connect to a remote browser service first, but if that fails, it calls launchLocalBrowser() (line 128), which requires Chromium binaries and system libraries not present in the minimal base image.
When the remote browser service is unavailable, chromium.launch() will fail because:
- Playwright binaries are not installed (requires
npx playwright install chromium) - System dependencies are missing (libx11-6, libxdamage1, etc.)
Either install Chromium in the Dockerfile with RUN npx playwright@1.57.0 install chromium (and system packages), or remove the fallback and enforce remote browser service as mandatory.
🤖 Prompt for AI Agents
In Dockerfile.backend around line 1, the base image node:20-slim lacks
Playwright Chromium binaries and required system libraries so the local browser
fallback in server/src/browser-management/browserConnection.ts
(launchLocalBrowser at line 128) will fail; either (A) modify this Dockerfile to
install the Playwright Chromium browser and the system dependencies (add apt
packages required by Chromium and run the Playwright install for the specific
Playwright version used), or (B) remove/disable the local fallback in
browserConnection.ts and make the remote browser service mandatory (fail fast
with a clear error if connection to the remote browser cannot be established).
Ensure the chosen approach is consistent with CI/deployment and document the
change in the Dockerfile comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
browser/server.ts (1)
59-60: Consider usingreplaceAll()for robustness.The implementation correctly customizes the WebSocket host for clients. However,
.replace('localhost', BROWSER_WS_HOST)only replaces the first occurrence. While WebSocket endpoints typically contain a single 'localhost', using.replaceAll()would be more robust.Additionally, the health endpoint (line 52) returns the original
wsEndpointwhile the root endpoint replaces the host. This appears intentional (health for internal monitoring, root for external clients), but please confirm this design choice.Apply this diff for a more robust replacement:
- const wsEndpoint = browserServer?.wsEndpoint().replace('localhost', BROWSER_WS_HOST) || ''; + const wsEndpoint = browserServer?.wsEndpoint().replaceAll('localhost', BROWSER_WS_HOST) || '';
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
browser/server.ts(2 hunks)docker-compose.yml(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docker-compose.yml
🔇 Additional comments (2)
browser/server.ts (2)
14-14: LGTM: Configuration addition follows existing patterns.The new
BROWSER_WS_HOSTconfiguration is consistent with the existing port configurations and provides sensible defaults.
48-56: LGTM: Clean refactoring for clarity.Extracting the
wsEndpointto a variable improves code clarity and follows good practices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/components/recorder/RightSidePanel.tsx (1)
418-456: Highlight cleanup behavior looks good; consider de-duplicatingevaluateSelectorfor consistencyThe new blocks that clear auto-detected pagination highlights on completion and discard are functionally solid: they safely guard on
autoDetectedPagination?.selectorandiframeElement?.contentDocument, and they reset the same inline styles (outline,outlineOffset,zIndex) that are set during highlighting. IncludingautoDetectedPaginationin the callback deps ensures the latest selector is used.However, the
evaluateSelectorhelper is now duplicated here and in other places in this file (including a slightly more sophisticated variant that also handles comma-separated selectors with fallback). This duplication increases maintenance cost and makes it easy for highlight and cleanup behavior to diverge subtly if one copy is updated and others are not.It would be cleaner to extract a single shared helper (either at module scope or as a small utility imported from
clientPaginationDetector/a local helper file) that encapsulates the XPath vs. CSS selection logic and any fallback behavior, and then reuse it in:
- Pagination-type-change cleanup,
- Auto-detect highlighting,
- “Choose Different Element” handling,
- These new completion/discard cleanup blocks.
That would keep highlight and un-highlight logic symmetric and easier to evolve.
Also applies to: 789-827
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
browser/server.ts(2 hunks)src/components/recorder/RightSidePanel.tsx(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- browser/server.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/components/recorder/RightSidePanel.tsx (2)
src/helpers/clientPaginationDetector.ts (1)
evaluateSelector(290-318)maxun-core/src/browserSide/scraper.js (3)
selector(28-28)doc(225-225)node(664-664)
Summary by CodeRabbit
Refactor
Improvements
Behavior Change
Chores
✏️ Tip: You can customize this high-level summary in your review settings.