Skip to content

Commit 0810743

Browse files
committed
Add docs explaining polling control
1 parent d94bf52 commit 0810743

File tree

7 files changed

+81
-8
lines changed

7 files changed

+81
-8
lines changed

docs/_static/img/polling-multiproc-default.svg

Lines changed: 1 addition & 0 deletions
Loading

docs/_static/img/polling-multiproc-randomize.svg

Lines changed: 1 addition & 0 deletions
Loading

docs/_static/img/polling-rates.svg

Lines changed: 1 addition & 0 deletions
Loading

docs/config_reference.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1936,6 +1936,8 @@ General Configuration
19361936
Timeout value in seconds used when checking if a git repository exists.
19371937

19381938

1939+
.. _polling_config:
1940+
19391941
.. py:attribute:: general.poll_randomize_ms
19401942
19411943
:required: No

docs/manpage.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2100,6 +2100,8 @@ Whenever an environment variable is associated with a configuration option, its
21002100
.. versionadded:: 3.10.0
21012101

21022102

2103+
.. _polling_envvars:
2104+
21032105
.. envvar:: RFM_POLL_RANDOMIZE_MS
21042106

21052107
Range of randomization of the polling interval in milliseconds.

docs/polling.rst

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,57 @@
33
===================================
44
Undestanding job polling in ReFrame
55
===================================
6+
7+
8+
ReFrame executes the "compile" and "run" phases of the :doc:`test pipeline <pipeline>` by spawning "jobs" that will build and execute the test, respectively.
9+
A job may be a simple local process or a batch job submitted to a job scheduler, such as Slurm.
10+
11+
ReFrame monitors the progress of its spawned jobs through polling.
12+
It does so in a careful way to avoid overloading the software infrastructure of the job scheduler.
13+
For example, it will try to poll the status of all its pending jobs at once using a single job scheduler command.
14+
15+
ReFrame adjusts its polling rate dynamically using an exponential decay function to ensure both high interactivity and low load.
16+
Polling starts at a high rate and -- in absence of any job status changes -- it gradually decays to a minimum value.
17+
After this point the polling rate remains constant.
18+
However, whenever a job completes, ReFrame resets its polling rate to the maximum, so as to quickly reap any jobs that are finishing at a close time.
19+
20+
The following figure shows the instant polling rates (desired and current) as well as the global one from the beginning of the run loop.
21+
The workload is a series of 6 tests, where the i-th test sleeps for ``10*i`` seconds.
22+
23+
.. figure:: _static/img/polling-rates.svg
24+
:align: center
25+
26+
:sub:`Instant and global polling rates of ReFrame as it executes a workload of six tests that sleep different amount of time. The default polling settings are used (poll_rate_max=10, poll_rate_min=0.1, poll_rate_decay=0.1)`
27+
28+
Note how ReFrame resets the instant polling rate whenever a test job finishes.
29+
30+
Users can control the maximum and minimum instant polling rates as well as the polling rate decay through either :ref:`environment variables <polling_envvars>` or :ref:`configuration parameters <polling_config>`.
31+
32+
33+
Polling randomization
34+
---------------------
35+
36+
If multiple ReFrame processes execute the same workload at the same time, then the aggregated poll rate can be quite high, potentially stressing the batch scheduler infrastructure.
37+
The following picture shows the histogram of polls when running concurrently 10 ReFrame processes, each one of them executing a series of 6 tests with varying sleep times (see above):
38+
39+
40+
.. figure:: _static/img/polling-multiproc-default.svg
41+
:align: center
42+
43+
:sub:`Poll count histogram of 10 ReFrame processes running concurrently the same workload. Each histogram bin corresponds to a second.`
44+
45+
Note how the total polling rate can significantly exceed the maximum polling rate set in each reframe process.
46+
47+
One option would be to reduce the maximum polling rate of every process, so that their aggregation falls below a certain threshold.
48+
Alternatively, you can instruct ReFrame to randomize the polling interval duration.
49+
This has a less drastic effect compared to reducing the maximum polling rate, but it keeps the original polling characteristics, smoothening out the spikes.
50+
51+
The following figure shows poll histogram by setting ``RFM_POLL_RANDOMIZE=-500,1500``.
52+
This allows ReFrame to reduce randomly the polling interval up to 500ms or extend it up to 1500ms.
53+
54+
.. figure:: _static/img/polling-multiproc-randomize.svg
55+
:align: center
56+
57+
:sub:`Poll count histogram of 10 ReFrame processes executing the same workload using polling interval randomization. Each histogram bin corresponds to a second.`
58+
59+
Note how the spikes are now not so pronounced and polls are better distributed across time.

0 commit comments

Comments
 (0)