-
Notifications
You must be signed in to change notification settings - Fork 84
docs(clp-package): Add Spider user guide; Update deployment orchestration dev docs. #1647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughDocumentation adds Spider integration: a new "Using Spider" user guide and index entry, marks Celery queue/Redis startup as optional in multi-host instructions and adds optional Spider startup services, and extends the deployment/orchestration doc with Spider services, diagrams, tables and four deployment variants. No public API or code changes. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (5)📚 Learning: 2025-09-25T05:13:13.298ZApplied to files:
📚 Learning: 2025-08-08T06:59:42.436ZApplied to files:
📚 Learning: 2025-01-16T16:58:43.190ZApplied to files:
📚 Learning: 2025-07-23T09:54:45.185ZApplied to files:
📚 Learning: 2025-10-27T07:07:37.901ZApplied to files:
🪛 LanguageTooldocs/src/dev-docs/design-deployment-orchestration.md[duplication] ~100-~100: Possible typo: you repeated a word. (ENGLISH_WORD_REPEAT_RULE) [grammar] ~131-~131: You’ve repeated a verb. Did you mean to only write one of them? (REPEATED_VERBS) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
docs/src/dev-docs/design-deployment-orchestration.md(8 hunks)docs/src/user-docs/guides-multi-host.md(3 hunks)docs/src/user-docs/guides-using-spider.md(1 hunks)docs/src/user-docs/index.md(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.
Applied to files:
docs/src/user-docs/guides-multi-host.md
🪛 LanguageTool
docs/src/dev-docs/design-deployment-orchestration.md
[duplication] ~87-~87: Possible typo: you repeated a word.
Context: ...Engine] queue redis end end subgraph Initialization jobs db_...
(ENGLISH_WORD_REPEAT_RULE)
docs/src/user-docs/guides-using-spider.md
[typographical] ~7-~7: It appears that a comma is missing after this introductory phrase.
Context: ...tion with CLP may change in the future. Right now Spider only supports executing CLP comp...
(RIGHT_NOW_PUNCTUATION_COMMA)
[style] ~50-~50: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ..." port: 6000 ``` 3. Optionally, before starting the package, update th...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🔇 Additional comments (13)
docs/src/user-docs/guides-multi-host.md (3)
165-175: Clear annotations for optional Celery infrastructure services.The addition of "(optional, only if using Celery)" comments provides helpful context for users choosing between compression frameworks.
199-203: Spider scheduler service startup instruction is clear.The new spider-scheduler service follows the established pattern and includes appropriate optional annotation. The formatting and docker compose syntax are consistent with surrounding commands.
239-249: Spider compression worker service startup instructions align with guide structure.Both the standard compression-worker (now marked as Celery-optional) and the new spider-compression-worker service follow consistent patterns. The optional annotations properly guide users on when to start each service.
docs/src/dev-docs/design-deployment-orchestration.md (5)
32-33: Theme variables for diagram styling are correctly added.The new
tertiaryColorandclusterBkgtheme variables provide proper styling support for the Spider and Celery subgraph visualizations that follow.
45-45: Spider component nodes and dependencies are logically structured.The new spider-scheduler and spider-compression-worker nodes are correctly positioned in the diagram, and their dependencies from db_table_creator match the pattern used for other initialization-dependent services. This aligns with the learnings about compression scheduler initialization requirements.
Also applies to: 47-47, 69-70
84-102: Subgraph organization and styling effectively distinguish framework choices.The Celery and Spider subgraphs clearly separate framework-specific components, with appropriate colour-coding (gold for Celery, cyan for Spider). The styling helps users visually understand the architectural differences between deployment types. However, ensure the deployed docker-compose files (
docker-compose.yaml,docker-compose-base.yaml,docker-compose-spider.yaml,docker-compose-spider-base.yaml) actually exist and match the deployment types table.Also applies to: 123-129
141-157: Service table descriptions are clear and concise.The new rows for spider_scheduler, spider_compression_worker, queue, and redis correctly describe their roles. The optional nature of queue and redis for different configurations is implicit in the context of the deployment types section.
236-244: Deployment types table provides essential configuration guidance.The table clearly explains how
package.compression_scheduler.typeandpackage.query_enginesettings map to deployment variants and their corresponding docker-compose files. This is a valuable addition for users selecting their deployment strategy.docs/src/user-docs/index.md (1)
69-69: Guides toctree updated correctly with Spider guide entry.The new
guides-using-spiderentry is properly positioned in the Guides section and follows the established toctree structure.docs/src/user-docs/guides-using-spider.md (4)
1-10: Title and introduction are clear; note properly contextualizes Spider's current scope.The introductory note appropriately sets expectations that Spider is under active development and currently supports only compression tasks. This manages user expectations effectively.
12-17: Requirements section is specific and actionable.Version constraints are clearly specified, making it straightforward for users to verify their environment. The inclusion of python3-venv is a helpful detail often overlooked.
25-58: Configuration examples are clear and well-structured.The YAML snippets demonstrate the required
compression_scheduler.type: "spider"setting clearly, and the optional configurations for spider_db, spider_scheduler, and credentials are appropriately documented. The reference back to the quick-start guide for next steps maintains good flow between documents.
60-63: Reference links are consistent with project documentation patterns.External references to CLP releases, Docker, Docker Compose, and Spider are properly formatted and link to authoritative sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-spider.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md
[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...
(AT_THE_MOMENT)
🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-spider.md
12-12: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
19-19: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
35-35: Trailing spaces
Expected: 0 or 2; Actual: 6
(MD009, no-trailing-spaces)
37-37: Trailing spaces
Expected: 0 or 2; Actual: 3
(MD009, no-trailing-spaces)
44-44: Trailing spaces
Expected: 0 or 2; Actual: 3
(MD009, no-trailing-spaces)
49-49: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
57-57: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
63-63: Files should end with a single newline character
(MD047, single-trailing-newline)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/src/user-docs/guides-using-spider.md (2)
25-62: Consider consolidating optional configuration into a dedicated subsection.The three optional configuration items (spider_db, spider_scheduler, and credentials.yaml updates) are currently interspersed with step 2 as separate sub-bullets, which creates a somewhat confusing structure with "(Optional)" repeated multiple times. This has been noted in prior reviews.
For improved readability and to reduce repetition, consider creating a dedicated "Optional Configuration" subsection after the main numbered steps to group all optional settings together:
1. Follow the [quick-start](quick-start/index.md) guide to download and extract the CLP package, but don't start the package just yet. + 2. Before starting the package, update the package's config file (`etc/clp-config.yaml`) as follows: - - * Set the `compression_scheduler.type` key to `"spider"`. + + Set the `compression_scheduler.type` key to `"spider"`: ```yaml compression_scheduler: type: "spider" ``` - * (Optional) Set the `spider_db`. - - ```yaml - spider_db: - db_name: "spider-db" - ``` - - * (Optional) Set the `spider_scheduler`. - - ```yaml - spider_scheduler: - host: "localhost" - port: 6000 - ``` - -3. (Optional) Before starting the package, update the package's credential file (`etc/credentials.yaml`) + +3. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP. + +### Optional Configuration + +You can optionally customize the following settings: + +* **spider_db**: Set the Spider database configuration. + + ```yaml + spider_db: + db_name: "spider-db" + ``` + +* **spider_scheduler**: Set the Spider scheduler connection details. + + ```yaml + spider_scheduler: + host: "localhost" + port: 6000 + ``` + +* **Spider database credentials**: Before starting the package, update the package's credential file (`etc/credentials.yaml`) to add Spider database credentials as follows: ```yaml spider_db: username: "spider_user" password: "spider_password" ``` - -4. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP.This improves the flow by keeping the main procedural steps sequential while clearly separating optional customizations.
64-67: Add trailing newline to end of file.The file is missing a trailing newline character, which is required by the MD047 linting rule.
[clp-releases]: https://github.com/y-scope/clp/releases [docker-compose]: https://docs.docker.com/compose/install/ [Docker]: https://docs.docker.com/engine/install/ [Spider]: https://github.com/y-scope/spider +
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-spider.md(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.
Applied to files:
docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.
Applied to files:
docs/src/user-docs/guides-using-spider.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md
[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...
(AT_THE_MOMENT)
🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-spider.md
39-39: Trailing spaces
Expected: 0 or 2; Actual: 3
(MD009, no-trailing-spaces)
46-46: Trailing spaces
Expected: 0 or 2; Actual: 3
(MD009, no-trailing-spaces)
67-67: Files should end with a single newline character
(MD047, single-trailing-newline)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: antlr-code-committed (macos-15)
🔇 Additional comments (1)
docs/src/user-docs/guides-using-spider.md (1)
1-10: Documentation content is accurate and well-structured.The guide provides clear setup instructions for integrating Spider with CLP. The introduction appropriately notes that Spider is under active development, and the requirements section specifies concrete version dependencies. Setup steps follow a logical flow from downloading the package to configuration to starting CLP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/src/user-docs/guides-using-spider.md (1)
38-60: Consider consolidating optional configuration into a single subsection.The optional configuration items (lines 38, 45, and 53) are currently scattered across separate numbered bullets with repeated "(Optional)" prefixes. Consolidating them under a single "Optional Configuration" subsection would improve readability and reduce structural repetition, making the required steps (compression_scheduler setup) stand out more clearly.
This could be restructured as:
### Setting up CLP with Spider 1. Follow the [quick-start](quick-start/index.md) guide to download and extract the CLP package, but don't start the package just yet. 2. Update the package's config file (`etc/clp-config.yaml`) as follows: * Set the `compression_scheduler.type` key to `"spider"`. ```yaml compression_scheduler: type: "spider" ``` - * (Optional) Set the `spider_db`. +### Optional Configuration + +Before starting the package, you may optionally configure the following: + +* Set the `spider_db`. ```yaml spider_db: db_name: "spider-db" ``` - * (Optional) Set the `spider_scheduler`. +* Set the `spider_scheduler`. ```yaml spider_scheduler: host: "localhost" port: 6000 ``` -3. (Optional) Before starting the package, update the package's credential file (`etc/credentials.yaml`) - to add Spider database credentials as follows: +* Update the package's credential file (`etc/credentials.yaml`) to add Spider database credentials: ```yaml spider_db: username: "spider_user" password: "spider_password" ``` -4. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP. +3. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
docs/src/user-docs/guides-using-spider.md(1 hunks)docs/src/user-docs/index.md(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.
Applied to files:
docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.
Applied to files:
docs/src/user-docs/guides-using-spider.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md
[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...
(AT_THE_MOMENT)
🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-spider.md
67-67: Files should end with a single newline character
(MD047, single-trailing-newline)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
docs/src/user-docs/index.md (1)
70-70: LGTM – toctree entry added correctly.The new
guides-using-spiderentry is properly positioned within the Guides section and maintains consistent formatting with neighbouring entries. This change successfully exposes the new Spider guide in the documentation navigation structure.
junhaoliao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rest lgtm. i'll check the aesthetics later
Co-authored-by: Junhao Liao <junhao@junhao.ca>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-spider.md(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.
Applied to files:
docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.
Applied to files:
docs/src/user-docs/guides-using-spider.md
📚 Learning: 2024-11-15T16:28:08.644Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the `components/package-template/src/sbin/` directory, keep them simple and avoid adding additional validation code.
Applied to files:
docs/src/user-docs/guides-using-spider.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md
[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...
(AT_THE_MOMENT)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: build (macos-15)
🔇 Additional comments (2)
docs/src/user-docs/guides-using-spider.md (2)
36-42: Verify configuration structure and field names match the latest CLP codebase.Past review feedback indicated the configuration examples need updating to match current code. Please confirm that:
- The
database.names.spiderstructure (lines 36–42) is the correct path and naming convention- The
spider_schedulerconfig fields and structure (lines 44–50) are accurate, particularly the host/port override mechanism- The credentials format in
databasesection withspider_usernameandspider_passwordfields (lines 52–59) matches the current expected format inetc/credentials.yamlIf these structures have changed in recent CLP updates, please provide the correct configuration examples.
Also applies to: 44-50, 52-59
63-66: Ensure file ends with a trailing newline.Per markdown linting rule MD047, the file must end with a single newline character after the final reference link. Verify that the file ends with exactly one newline after line 66 (
[Spider]: https://github.com/y-scope/spider).
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-docs/guides-using-spider.md(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.
Applied to files:
docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.
Applied to files:
docs/src/user-docs/guides-using-spider.md
📚 Learning: 2024-11-15T16:28:08.644Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the `components/package-template/src/sbin/` directory, keep them simple and avoid adding additional validation code.
Applied to files:
docs/src/user-docs/guides-using-spider.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md
[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...
(AT_THE_MOMENT)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (macos-15)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (3)
docs/src/user-docs/guides-using-spider.md (3)
14-16: Verify CLP version requirement is current.The minimum CLP version is specified as v0.8.0. Please confirm this matches the latest supported version for Spider integration and that it hasn't been updated since this guide was written.
55-59: Verify credentials YAML structure matches the actual configuration schema.The credentials example shows
spider_usernameandspider_passwordunder thedatabasesection. Please confirm this structure aligns with the actualetc/credentials.yamlschema expected by the latest CLP package, particularly regarding field naming and nesting.
61-66: Ensure file ends with a single trailing newline.According to the MD047 linting rule, the file should end with exactly one newline character. The provided code ends at line 66 with the reference link
[Spider]: https://github.com/y-scope/spider. Please verify that a trailing newline is present at the end of the file.
junhaoliao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rest lgtm. for the title, how about
docs(clp-package): Add Spider user guide; Update deployment orchestration dev docs.
Co-authored-by: Junhao Liao <junhao@junhao.ca>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/src/user-docs/guides-using-spider.md (1)
7-9: Improve phrasing for conciseness and optimism per past feedback.The note should use "Currently" instead of "Right now" for better style, and "will be added soon" instead of "will be added later" for more optimistic phrasing. This aligns with past review feedback and the LanguageTool suggestion (AT_THE_MOMENT).
Apply this diff:
:::{note} Spider is under active development, and its integration with CLP may change in the future. -Right now, Spider only supports executing CLP compression tasks. Support for search tasks will be added -later. +Currently, Spider only supports executing CLP compression tasks. Support for search tasks will be added +soon. :::
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
docs/src/dev-docs/design-deployment-orchestration.md(8 hunks)docs/src/user-docs/guides-using-spider.md(1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.
Applied to files:
docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.
Applied to files:
docs/src/user-docs/guides-using-spider.md
📚 Learning: 2024-11-15T16:28:08.644Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the `components/package-template/src/sbin/` directory, keep them simple and avoid adding additional validation code.
Applied to files:
docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md
[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...
(AT_THE_MOMENT)
docs/src/dev-docs/design-deployment-orchestration.md
[duplication] ~87-~87: Possible typo: you repeated a word.
Context: ...Engine] queue redis end end subgraph Initialization jobs db_...
(ENGLISH_WORD_REPEAT_RULE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: build (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (6)
docs/src/dev-docs/design-deployment-orchestration.md (3)
123-129: Verify edge style colours for accessibility and dark mode rendering.The added edge and subgraph styles use specific colour hex codes (
#ffd700,#00ced1,#ffffe0,#e0ffff). Confirm these colours meet contrast requirements and render correctly in both light and dark modes. Per past review context, aesthetics and dark mode issues were flagged as items to verify.
141-157: Verify service descriptions match latest code conventions.The service table descriptions at lines 148, 151-152 should be reviewed for consistency with past feedback. Past review comments suggested more precise wording for
spider_scheduler(as "Scheduler for Spider distributed task execution framework") and separate entries forcompression_workerandspider_compression_workerwith clearer distinctions.Confirm the following descriptions match the current implementation:
- Line 148:
spider_schedulerdescription- Lines 150-152: Descriptions for compression workers and query workers
Past suggestions indicated these could be more specific in distinguishing Celery vs. Spider task execution.
69-70: Verify Spider component dependency status types match actual deployment configuration.Spider components use
healthystatus fordb_table_creatordependencies (lines 69-70), while Celery-based schedulers usecompleted_successfully(lines 62, 65). This distinction is valid in Docker Compose (service_healthy waits for HEALTHCHECK; service_completed_successfully waits for container exit with code 0), but verify against the actual docker-compose files to ensure the diagram accurately reflects the current deployment orchestration logic.docs/src/user-docs/guides-using-spider.md (3)
52-59: Verify Spider credentials configuration field names and nesting.The example shows
spider_usernameandspider_passwordnested under adatabaseblock inetc/credentials.yaml. Confirm these field names and structure match the latest code, as previous feedback flagged these as needing verification.
44-50: Verify spider_scheduler configuration against CLP codebase—may not exist in current implementation.The field path (
spider_scheduler.host,spider_scheduler.port) and default values (localhost,6000) could not be verified against available CLP documentation or sources. Web searches found no authoritative reference to aspider_schedulercomponent in CLP (y-scope); CLP uses components namedquery_schedulerandcompression_schedulerinstead. Confirm whether this configuration section exists in the current CLP implementation or if the documentation references an outdated or removed component.
36-42: Confirm the correct database configuration field path for spider database naming.The current documentation shows
database.names.spider: "spider-db", but CLP's configuration documentation indicates database settings usedatabase.nameanddatabase.table_prefixkeys directly, not adatabase.namessubstructure. Verify whether this configuration path is correct for the current CLP version, or if it should be updated to match the actual configuration schema (e.g.,database.namefor database naming or a different approach for self-provisioned instances).
Description
This PR:
guides-using-spider.mdfor instructions on setting upSpiderconfig in CLP package.guides-using-spider.mdinto the toctree.guides-multi-node.mdfor instructions on runningSpidercontainers manually.design-deployment-orchestrationfor Spider containers and some minor improvements on existing graph.Checklist
breaking change.
Validation performed
task docs:serveruns successfully.Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.