Skip to content

Conversation

@sitaowang1998
Copy link
Contributor

@sitaowang1998 sitaowang1998 commented Nov 21, 2025

Description

This PR:

  • Adds guides-using-spider.md for instructions on setting up Spider config in CLP package.
  • Adds the guides-using-spider.md into the toctree.
  • Updates guides-multi-node.md for instructions on running Spider containers manually.
  • Updates design-deployment-orchestration for Spider containers and some minor improvements on existing graph.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • task docs:serve runs successfully.
  • Website served looks good.

Summary by CodeRabbit

  • Documentation
    • Added a new user guide explaining prerequisites and step‑by‑step setup for integrating Spider.
    • Added a Guides navigation entry linking to the new Spider guide.
    • Updated deployment and orchestration docs and diagrams to include Spider components and four deployment variants.
    • Extended service tables and architecture visuals to show Spider scheduler and Spider compression worker.
    • Clarified startup wording: queue and Redis labelled optional for Celery; added optional Spider startup steps.

✏️ Tip: You can customize this high-level summary in your review settings.

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner November 21, 2025 00:19
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

Walkthrough

Documentation adds Spider integration: a new "Using Spider" user guide and index entry, marks Celery queue/Redis startup as optional in multi-host instructions and adds optional Spider startup services, and extends the deployment/orchestration doc with Spider services, diagrams, tables and four deployment variants. No public API or code changes.

Changes

Cohort / File(s) Summary
New user guide & index
docs/src/user-docs/guides-using-spider.md, docs/src/user-docs/index.md
Added "Using Spider" user guide and registered it in the docs toctree; includes requirements, example configuration (compression_scheduler.type: "spider"), optional DB/credentials and host/port overrides, and quick-start continuation.
Multi-host doc updates
docs/src/user-docs/guides-multi-host.md
Clarified startup sequence: renamed queue/Redis blocks to indicate "if using Celery"; introduced optional spider-scheduler and spider-compression-worker startup entries and reflowed blocks alongside existing Celery steps.
Design / deployment orchestration
docs/src/dev-docs/design-deployment-orchestration.md
Added Spider components to architecture docs: new spider_scheduler and spider_compression_worker subgraphs/services, updated diagrams/edges/styles, expanded services table, and revised deployment types to four variants with Spider-specific docker-compose references.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus review on:
    • docs/src/dev-docs/design-deployment-orchestration.md — diagram consistency, subgraph/edge labels, deployment-type keys and docker-compose references.
    • docs/src/user-docs/guides-using-spider.md — requirement versions, YAML snippets, optional override examples and external links.
    • docs/src/user-docs/guides-multi-host.md — placement and wording of optional Celery vs Spider startup steps.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: adding Spider user guide documentation and updating deployment orchestration documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 855279a and 65a3fab.

📒 Files selected for processing (1)
  • docs/src/dev-docs/design-deployment-orchestration.md (6 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
🪛 LanguageTool
docs/src/dev-docs/design-deployment-orchestration.md

[duplication] ~100-~100: Possible typo: you repeated a word.
Context: ...encies] queue redis end end subgraph Initialization jobs db_...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~131-~131: You’ve repeated a verb. Did you mean to only write one of them?
Context: ... AI mcp_server end %% Subgraph styles style celery_dependencies fill:#ffffe0 styl...

(REPEATED_VERBS)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build (macos-15)
  • GitHub Check: build (ubuntu-24.04)
🔇 Additional comments (3)
docs/src/dev-docs/design-deployment-orchestration.md (3)

44-135: Well-structured diagram updates integrating Spider architecture.

The Mermaid diagram properly incorporates Spider components with clear visual distinction (teal colouring), correct dependency relationships, and logical subgraph organization. The nesting of celery_dependencies under Databases improves architectural clarity. Node declarations, references, and link styling are all consistent and accurate.

Please confirm the diagram renders correctly in both light and dark modes on the documentation website, as noted in a previous review concern.


146-162: Table 1 correctly documents Spider components with clear, parallel descriptions.

The new spider_scheduler and spider_compression_worker entries are properly described and positioned logically within the services table, maintaining consistency with existing entries and diagram nomenclature.


241-249: Deployment types table clearly documents all four configuration variants.

The table effectively maps the 2×2 matrix of compression_scheduler.type and package.query_engine settings to specific deployment types and docker-compose files. Naming is consistent and logically intuitive. Verify that the referenced docker-compose files (docker-compose-base.yaml, docker-compose.yaml, docker-compose-spider-base.yaml, docker-compose-spider.yaml) exist in the repository with the exact filenames shown.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sitaowang1998 sitaowang1998 marked this pull request as ready for review November 24, 2025 17:21
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b6c1fb and b12c48d.

📒 Files selected for processing (4)
  • docs/src/dev-docs/design-deployment-orchestration.md (8 hunks)
  • docs/src/user-docs/guides-multi-host.md (3 hunks)
  • docs/src/user-docs/guides-using-spider.md (1 hunks)
  • docs/src/user-docs/index.md (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
📚 Learning: 2025-10-17T19:59:25.596Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:315-315
Timestamp: 2025-10-17T19:59:25.596Z
Learning: In components/clp-package-utils/clp_package_utils/controller.py, worker log directories (compression_worker, query_worker, reducer) created via `mkdir()` do not need `_chown_paths_if_root()` calls because directories are created with the same owner as the script caller. This differs from infrastructure service directories (database, queue, Redis, results cache) which do require explicit ownership changes.

Applied to files:

  • docs/src/user-docs/guides-multi-host.md
🪛 LanguageTool
docs/src/dev-docs/design-deployment-orchestration.md

[duplication] ~87-~87: Possible typo: you repeated a word.
Context: ...Engine] queue redis end end subgraph Initialization jobs db_...

(ENGLISH_WORD_REPEAT_RULE)

docs/src/user-docs/guides-using-spider.md

[typographical] ~7-~7: It appears that a comma is missing after this introductory phrase.
Context: ...tion with CLP may change in the future. Right now Spider only supports executing CLP comp...

(RIGHT_NOW_PUNCTUATION_COMMA)


[style] ~50-~50: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ..." port: 6000 ``` 3. Optionally, before starting the package, update th...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🔇 Additional comments (13)
docs/src/user-docs/guides-multi-host.md (3)

165-175: Clear annotations for optional Celery infrastructure services.

The addition of "(optional, only if using Celery)" comments provides helpful context for users choosing between compression frameworks.


199-203: Spider scheduler service startup instruction is clear.

The new spider-scheduler service follows the established pattern and includes appropriate optional annotation. The formatting and docker compose syntax are consistent with surrounding commands.


239-249: Spider compression worker service startup instructions align with guide structure.

Both the standard compression-worker (now marked as Celery-optional) and the new spider-compression-worker service follow consistent patterns. The optional annotations properly guide users on when to start each service.

docs/src/dev-docs/design-deployment-orchestration.md (5)

32-33: Theme variables for diagram styling are correctly added.

The new tertiaryColor and clusterBkg theme variables provide proper styling support for the Spider and Celery subgraph visualizations that follow.


45-45: Spider component nodes and dependencies are logically structured.

The new spider-scheduler and spider-compression-worker nodes are correctly positioned in the diagram, and their dependencies from db_table_creator match the pattern used for other initialization-dependent services. This aligns with the learnings about compression scheduler initialization requirements.

Also applies to: 47-47, 69-70


84-102: Subgraph organization and styling effectively distinguish framework choices.

The Celery and Spider subgraphs clearly separate framework-specific components, with appropriate colour-coding (gold for Celery, cyan for Spider). The styling helps users visually understand the architectural differences between deployment types. However, ensure the deployed docker-compose files (docker-compose.yaml, docker-compose-base.yaml, docker-compose-spider.yaml, docker-compose-spider-base.yaml) actually exist and match the deployment types table.

Also applies to: 123-129


141-157: Service table descriptions are clear and concise.

The new rows for spider_scheduler, spider_compression_worker, queue, and redis correctly describe their roles. The optional nature of queue and redis for different configurations is implicit in the context of the deployment types section.


236-244: Deployment types table provides essential configuration guidance.

The table clearly explains how package.compression_scheduler.type and package.query_engine settings map to deployment variants and their corresponding docker-compose files. This is a valuable addition for users selecting their deployment strategy.

docs/src/user-docs/index.md (1)

69-69: Guides toctree updated correctly with Spider guide entry.

The new guides-using-spider entry is properly positioned in the Guides section and follows the established toctree structure.

docs/src/user-docs/guides-using-spider.md (4)

1-10: Title and introduction are clear; note properly contextualizes Spider's current scope.

The introductory note appropriately sets expectations that Spider is under active development and currently supports only compression tasks. This manages user expectations effectively.


12-17: Requirements section is specific and actionable.

Version constraints are clearly specified, making it straightforward for users to verify their environment. The inclusion of python3-venv is a helpful detail often overlooked.


25-58: Configuration examples are clear and well-structured.

The YAML snippets demonstrate the required compression_scheduler.type: "spider" setting clearly, and the optional configurations for spider_db, spider_scheduler, and credentials are appropriately documented. The reference back to the quick-start guide for next steps maintains good flow between documents.


60-63: Reference links are consistent with project documentation patterns.

External references to CLP releases, Docker, Docker Compose, and Spider are properly formatted and link to authoritative sources.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b12c48d and 45f05ba.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-spider.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md

[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...

(AT_THE_MOMENT)

🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-spider.md

12-12: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


19-19: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


35-35: Trailing spaces
Expected: 0 or 2; Actual: 6

(MD009, no-trailing-spaces)


37-37: Trailing spaces
Expected: 0 or 2; Actual: 3

(MD009, no-trailing-spaces)


44-44: Trailing spaces
Expected: 0 or 2; Actual: 3

(MD009, no-trailing-spaces)


49-49: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


57-57: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


63-63: Files should end with a single newline character

(MD047, single-trailing-newline)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/src/user-docs/guides-using-spider.md (2)

25-62: Consider consolidating optional configuration into a dedicated subsection.

The three optional configuration items (spider_db, spider_scheduler, and credentials.yaml updates) are currently interspersed with step 2 as separate sub-bullets, which creates a somewhat confusing structure with "(Optional)" repeated multiple times. This has been noted in prior reviews.

For improved readability and to reduce repetition, consider creating a dedicated "Optional Configuration" subsection after the main numbered steps to group all optional settings together:

 1. Follow the [quick-start](quick-start/index.md) guide to download and extract the CLP package,
    but don't start the package just yet.
+
 2. Before starting the package, update the package's config file (`etc/clp-config.yaml`) as follows:
-
-    * Set the `compression_scheduler.type` key to `"spider"`.
+
+    Set the `compression_scheduler.type` key to `"spider"`:
 
       ```yaml
       compression_scheduler:
         type: "spider"
       ```
 
-    * (Optional) Set the `spider_db`.
-   
-        ```yaml
-        spider_db:
-            db_name: "spider-db"
-        ```
-
-    * (Optional) Set the `spider_scheduler`.
-   
-        ```yaml
-        spider_scheduler:
-            host: "localhost"
-            port: 6000
-        ```
-
-3. (Optional) Before starting the package, update the package's credential file (`etc/credentials.yaml`)
+
+3. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP.
+
+### Optional Configuration
+
+You can optionally customize the following settings:
+
+* **spider_db**: Set the Spider database configuration.
+
+  ```yaml
+  spider_db:
+      db_name: "spider-db"
+  ```
+
+* **spider_scheduler**: Set the Spider scheduler connection details.
+
+  ```yaml
+  spider_scheduler:
+      host: "localhost"
+      port: 6000
+  ```
+
+* **Spider database credentials**: Before starting the package, update the package's credential file (`etc/credentials.yaml`)
    to add Spider database credentials as follows:
 
     ```yaml
     spider_db:
       username: "spider_user"
       password: "spider_password"
     ```
-
-4. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP.

This improves the flow by keeping the main procedural steps sequential while clearly separating optional customizations.


64-67: Add trailing newline to end of file.

The file is missing a trailing newline character, which is required by the MD047 linting rule.

 [clp-releases]: https://github.com/y-scope/clp/releases
 [docker-compose]: https://docs.docker.com/compose/install/
 [Docker]: https://docs.docker.com/engine/install/
 [Spider]: https://github.com/y-scope/spider
+
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45f05ba and 8caa973.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-spider.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md

[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...

(AT_THE_MOMENT)

🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-spider.md

39-39: Trailing spaces
Expected: 0 or 2; Actual: 3

(MD009, no-trailing-spaces)


46-46: Trailing spaces
Expected: 0 or 2; Actual: 3

(MD009, no-trailing-spaces)


67-67: Files should end with a single newline character

(MD047, single-trailing-newline)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: antlr-code-committed (macos-15)
🔇 Additional comments (1)
docs/src/user-docs/guides-using-spider.md (1)

1-10: Documentation content is accurate and well-structured.

The guide provides clear setup instructions for integrating Spider with CLP. The introduction appropriately notes that Spider is under active development, and the requirements section specifies concrete version dependencies. Setup steps follow a logical flow from downloading the package to configuration to starting CLP.

@kirkrodrigues kirkrodrigues requested review from junhaoliao and removed request for LinZhihao-723 November 28, 2025 18:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/src/user-docs/guides-using-spider.md (1)

38-60: Consider consolidating optional configuration into a single subsection.

The optional configuration items (lines 38, 45, and 53) are currently scattered across separate numbered bullets with repeated "(Optional)" prefixes. Consolidating them under a single "Optional Configuration" subsection would improve readability and reduce structural repetition, making the required steps (compression_scheduler setup) stand out more clearly.

This could be restructured as:

 ### Setting up CLP with Spider

 1. Follow the [quick-start](quick-start/index.md) guide to download and extract the CLP package,
    but don't start the package just yet.

 2. Update the package's config file (`etc/clp-config.yaml`) as follows:

     * Set the `compression_scheduler.type` key to `"spider"`.

       ```yaml
       compression_scheduler:
         type: "spider"
       ```

-    * (Optional) Set the `spider_db`.
+### Optional Configuration
+
+Before starting the package, you may optionally configure the following:
+
+* Set the `spider_db`.

       ```yaml
       spider_db:
           db_name: "spider-db"
       ```

-    * (Optional) Set the `spider_scheduler`.
+* Set the `spider_scheduler`.

       ```yaml
       spider_scheduler:
           host: "localhost"
           port: 6000
       ```

-3. (Optional) Before starting the package, update the package's credential file (`etc/credentials.yaml`)
-   to add Spider database credentials as follows:
+* Update the package's credential file (`etc/credentials.yaml`) to add Spider database credentials:

     ```yaml
     spider_db:
       username: "spider_user"
       password: "spider_password"
     ```

-4. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP.
+3. Continue following the [quick-start](./quick-start/index.md#using-clp) guide to start CLP.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8caa973 and 933d527.

📒 Files selected for processing (2)
  • docs/src/user-docs/guides-using-spider.md (1 hunks)
  • docs/src/user-docs/index.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md

[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...

(AT_THE_MOMENT)

🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-spider.md

67-67: Files should end with a single newline character

(MD047, single-trailing-newline)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
docs/src/user-docs/index.md (1)

70-70: LGTM – toctree entry added correctly.

The new guides-using-spider entry is properly positioned within the Guides section and maintains consistent formatting with neighbouring entries. This change successfully exposes the new Spider guide in the documentation navigation structure.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest lgtm. i'll check the aesthetics later

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4133d67 and a6fea95.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-spider.md (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
📚 Learning: 2024-11-15T16:28:08.644Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the `components/package-template/src/sbin/` directory, keep them simple and avoid adding additional validation code.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md

[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...

(AT_THE_MOMENT)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: build (macos-15)
🔇 Additional comments (2)
docs/src/user-docs/guides-using-spider.md (2)

36-42: Verify configuration structure and field names match the latest CLP codebase.

Past review feedback indicated the configuration examples need updating to match current code. Please confirm that:

  1. The database.names.spider structure (lines 36–42) is the correct path and naming convention
  2. The spider_scheduler config fields and structure (lines 44–50) are accurate, particularly the host/port override mechanism
  3. The credentials format in database section with spider_username and spider_password fields (lines 52–59) matches the current expected format in etc/credentials.yaml

If these structures have changed in recent CLP updates, please provide the correct configuration examples.

Also applies to: 44-50, 52-59


63-66: Ensure file ends with a trailing newline.

Per markdown linting rule MD047, the file must end with a single newline character after the final reference link. Verify that the file ends with exactly one newline after line 66 ([Spider]: https://github.com/y-scope/spider).

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6fea95 and 28a2fdc.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-spider.md (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
📚 Learning: 2024-11-15T16:28:08.644Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the `components/package-template/src/sbin/` directory, keep them simple and avoid adding additional validation code.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md

[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...

(AT_THE_MOMENT)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (macos-15)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (3)
docs/src/user-docs/guides-using-spider.md (3)

14-16: Verify CLP version requirement is current.

The minimum CLP version is specified as v0.8.0. Please confirm this matches the latest supported version for Spider integration and that it hasn't been updated since this guide was written.


55-59: Verify credentials YAML structure matches the actual configuration schema.

The credentials example shows spider_username and spider_password under the database section. Please confirm this structure aligns with the actual etc/credentials.yaml schema expected by the latest CLP package, particularly regarding field naming and nesting.


61-66: Ensure file ends with a single trailing newline.

According to the MD047 linting rule, the file should end with exactly one newline character. The provided code ends at line 66 with the reference link [Spider]: https://github.com/y-scope/spider. Please verify that a trailing newline is present at the end of the file.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest lgtm. for the title, how about

docs(clp-package): Add Spider user guide; Update deployment orchestration dev docs.

@sitaowang1998 sitaowang1998 changed the title docs(package): Add document about using Spider in docker compose packaging. docs(clp-package): Add Spider user guide; Update deployment orchestration dev docs. Dec 10, 2025
sitaowang1998 and others added 2 commits December 10, 2025 11:52
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/src/user-docs/guides-using-spider.md (1)

7-9: Improve phrasing for conciseness and optimism per past feedback.

The note should use "Currently" instead of "Right now" for better style, and "will be added soon" instead of "will be added later" for more optimistic phrasing. This aligns with past review feedback and the LanguageTool suggestion (AT_THE_MOMENT).

Apply this diff:

 :::{note}
 Spider is under active development, and its integration with CLP may change in the future.
-Right now, Spider only supports executing CLP compression tasks. Support for search tasks will be added
-later.
+Currently, Spider only supports executing CLP compression tasks. Support for search tasks will be added
+soon.
 :::
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 28a2fdc and 855279a.

📒 Files selected for processing (2)
  • docs/src/dev-docs/design-deployment-orchestration.md (8 hunks)
  • docs/src/user-docs/guides-using-spider.md (1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-08-17T16:19:24.478Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/README.md:5-6
Timestamp: 2025-08-17T16:19:24.478Z
Learning: In the CLP project, README.md files can have multiple H1 headings when it makes sense structurally (e.g., "Contributing" as a top-level section alongside the main title). The MD025 markdown linting rule should be ignored in such cases where multiple H1s are intentional.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-11-17T22:58:50.056Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-11-17T22:58:50.056Z
Learning: In the y-scope/clp repository, when enabling new linting tools (ruff, mypy) on Python components, the team uses an incremental approach: first enable the tools with errors allowed (exit code 0), apply only safe auto-fixable fixes, then address remaining issues in follow-up PRs. During the initial enablement PR, reviews should focus on correctness of auto-fixes rather than flagging new code quality issues.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
📚 Learning: 2024-11-15T16:28:08.644Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the `components/package-template/src/sbin/` directory, keep them simple and avoid adding additional validation code.

Applied to files:

  • docs/src/user-docs/guides-using-spider.md
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-08-08T06:59:42.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • docs/src/dev-docs/design-deployment-orchestration.md
🪛 LanguageTool
docs/src/user-docs/guides-using-spider.md

[style] ~7-~7: For conciseness, consider replacing this expression with an adverb.
Context: ...tion with CLP may change in the future. Right now, Spider only supports executing CLP com...

(AT_THE_MOMENT)

docs/src/dev-docs/design-deployment-orchestration.md

[duplication] ~87-~87: Possible typo: you repeated a word.
Context: ...Engine] queue redis end end subgraph Initialization jobs db_...

(ENGLISH_WORD_REPEAT_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: build (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (6)
docs/src/dev-docs/design-deployment-orchestration.md (3)

123-129: Verify edge style colours for accessibility and dark mode rendering.

The added edge and subgraph styles use specific colour hex codes (#ffd700, #00ced1, #ffffe0, #e0ffff). Confirm these colours meet contrast requirements and render correctly in both light and dark modes. Per past review context, aesthetics and dark mode issues were flagged as items to verify.


141-157: Verify service descriptions match latest code conventions.

The service table descriptions at lines 148, 151-152 should be reviewed for consistency with past feedback. Past review comments suggested more precise wording for spider_scheduler (as "Scheduler for Spider distributed task execution framework") and separate entries for compression_worker and spider_compression_worker with clearer distinctions.

Confirm the following descriptions match the current implementation:

  • Line 148: spider_scheduler description
  • Lines 150-152: Descriptions for compression workers and query workers

Past suggestions indicated these could be more specific in distinguishing Celery vs. Spider task execution.


69-70: Verify Spider component dependency status types match actual deployment configuration.

Spider components use healthy status for db_table_creator dependencies (lines 69-70), while Celery-based schedulers use completed_successfully (lines 62, 65). This distinction is valid in Docker Compose (service_healthy waits for HEALTHCHECK; service_completed_successfully waits for container exit with code 0), but verify against the actual docker-compose files to ensure the diagram accurately reflects the current deployment orchestration logic.

docs/src/user-docs/guides-using-spider.md (3)

52-59: Verify Spider credentials configuration field names and nesting.

The example shows spider_username and spider_password nested under a database block in etc/credentials.yaml. Confirm these field names and structure match the latest code, as previous feedback flagged these as needing verification.


44-50: Verify spider_scheduler configuration against CLP codebase—may not exist in current implementation.

The field path (spider_scheduler.host, spider_scheduler.port) and default values (localhost, 6000) could not be verified against available CLP documentation or sources. Web searches found no authoritative reference to a spider_scheduler component in CLP (y-scope); CLP uses components named query_scheduler and compression_scheduler instead. Confirm whether this configuration section exists in the current CLP implementation or if the documentation references an outdated or removed component.


36-42: Confirm the correct database configuration field path for spider database naming.

The current documentation shows database.names.spider: "spider-db", but CLP's configuration documentation indicates database settings use database.name and database.table_prefix keys directly, not a database.names substructure. Verify whether this configuration path is correct for the current CLP version, or if it should be updated to match the actual configuration schema (e.g., database.name for database naming or a different approach for self-provisioned instances).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants