Skip to content

Commit bda5b90

Browse files
committed
PR Feedback; more precise definition of the cron
1 parent 26a3e3d commit bda5b90

File tree

1 file changed

+12
-27
lines changed

1 file changed

+12
-27
lines changed

docs/multiple-collectors.md

Lines changed: 12 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ The table below details a set of keywords, or a glossary of terms, that appear t
3434
| **MAX_JOB_FAILS** | Maximum number of failures before a job is marked as a failed. |
3535
| **Assigning a job** | The act of allocating one or more *jobs* to a collector. |
3636
| **website** | A standalone server responsible for inserting work into the queue. |
37+
| **backfilling** | Occurs when a commit's parent_sha does not have the same configuration as the request currently being enqueued. In this case, jobs with the requested configuration are added so that the commit can be benchmarked against its parent under matching conditions. |
38+
| **benchmark index** | A set off shas and release tags which have completed benchmark requests. Saves database lookups. |
3739

3840
## Programs that need to be available
3941

@@ -51,19 +53,25 @@ There are two major components in the new system; the website (CRON) and the col
5153

5254
It's simplest to show how the new system works by walking through it step by step. We will start with the website, which accepts requests as a web server and also has a cron job for managing the queue. This is the entry point for how work is queued.
5355

54-
Step 1:
56+
Step 1 - Creating requests:
57+
58+
The CRON will draw down all master commits and check the SHA's against the benchmark index, if the SHA does not exist in the index then it will be added to the database. The same process also happens for Releases with the same logic to determine if a request needs to be stored in the database.
59+
60+
Try commits are added on an adhoc basis by rustc developers manually making an http request to benchmark a commit. There will be a period of time where the artifact, for a Try, is not ready for benchmarking and will be in the state `waiting_for_artifacts`. Once the artifact is ready the request will move to `artifacts_ready`, indicating that the request is ready for benchmarking. This is updated through a web hook on the webserver.
5561

56-
The website also runs a CRON that will split a pending `benchmark_request` (request) into `benchmark_job`'s (jobs) and mark `in_progress` jobs as complete.
62+
Step 2 - Creating jobs:
5763

58-
When a request is created through a web hook there will be a period of time where the artifact is not ready for benchmarking and will be in the state `waiting_for_artifacts`. Once the artifact is ready the request will move to `artifacts_ready`, indicating that the request is ready for benchmarking. This is updated through an endpoint on the webserver.
64+
The CRON will create a queue and if the first request in the queue is not `in_progress`, will dequeue the request and split the request into `benchmark_job`'s (jobs). If the request has a parent tag, a request will be make and jobs will also be enqueued for the parent. If the jobs for the parent already exist then the database will simply ignore them. This process of finding jobs which need to be populated for the parent is "backfilling".
5965

6066
The states go as follows;
6167

6268
`waiting_for_artifacts` -> `artifacts_ready` -> `in_progress` -> `completed`
6369

6470
Only one request can presently be `in_progress` at any one time. If a request is in progress the CRON does not start splitting up other requests into jobs.
6571

66-
Step 2:
72+
Step 3 - Completing requests:
73+
74+
If the request at the head of the queue is `in_progress` the CRON will check to see if all the jobs associated with the request are in the state `failure` or `success` if they are the request will be marked as `completed`.
6775

6876
From here if a request is marked as `completed` then the next request that is in the state `artifacts_ready` will be expanded into the jobs needed to fulfil the request. This will be all the combinations of target, profile,
6977

@@ -89,29 +97,6 @@ Step 4:
8997

9098
The collectors health is monitored by updating a heartbeat column in the `collector_config` table. The UI will indicate the collector as offline if it is inactive for a specified period of time. This should be caught either by error logs or someone viewing the page and subsequently reporting the collector as offline in Zulip.
9199

92-
### Master artifacts
93-
94-
The website maintains a set of all master commits that have been completed in memory so it is able to quickly determine if the commit needs benchmarking.
95-
96-
- If the request's sha is in the benchmark index, nothing happens.
97-
- If the request is `in_progress`, check [request completion](#Checking-request-completion).
98-
- If the request is `waiting_for_parent` commit benchmark to be completed, nothing happens.
99-
- If the request is missing, we will recursively find a set of parent master commits that are missing data (by looking at their status in `benchmark_request`).
100-
- If the set is non-empty, these commits will be handled recursively with the same logic as this commit.
101-
- If the set is empty, the request will be *enqueued*.
102-
103-
### Try artifacts
104-
105-
The website will go through all try artifacts in `benchmark_request` that are not yet marked as `completed`.
106-
107-
- If the request is `waiting for artifacts`, do nothing (sometime later a GH notification will switch the status to `waiting for parent` once the artifacts are ready).
108-
- If the request is `waiting for parent`:
109-
- Recursively find a set of **grandparent** master commits that are missing data (by looking at their status in `benchmark_request`). This could happen on the edge switch from `waiting for artifacts` to `waiting for parent` in the GH webhook handler, or it could happen in each cron invocation.
110-
- If that set is empty, generate all necessary **parent** jobs and check if they are all completed in the `job_queue`.
111-
- If yes, *enqueue* the request.
112-
- If not, insert these jobs into the jobqueue. This is where backfilling happens, as we can backfill e.g. new backends for a parent master commit that was only benchmarked for LLVM before.
113-
- If the request is `in_progress`, check [request completion](#Checking-request-completion).
114-
115100
## Queue ordering
116101
The ordering of the queue is by priority, we assume that there is a collector online that is currently looking for work.
117102
- In progress requests, if there is a request that's state is `in_progress` the collector will take this request, for this to happen it presumably errored at some point and is restarting.

0 commit comments

Comments
 (0)