Skip to content

Commit 45b237b

Browse files
JohnGarbuttmarkgoddard
authored andcommitted
Support batching up commands
When you have around 60 baremetal nodes attached to a single switch, it takes a long time to execute all those commands. This gets worse when you limit the number of concurrent ssh connections. Here we look to batch up commands to send to the switch together using a single connection. The results of each port's commands are returned when available. This is implemented using etcd as a queueing system. Commands are added to an input key, then a worker thread processes the available commands for a particular switch device. We pull off the queue using the version at which the keys were added, giving a FIFO style queue. The result of each command set are added to an output key, which the original request thread is watching. Distributed locks are used to serialise the processing of commands for each switch device. Various neat etcd features are used here to alleviate some of the issues of distributed task coordination, including transactions, leases, watches, historical key/value tracking, etc. Co-Authored-By: Mark Goddard <mark@stackhpc.com> Change-Id: I8c458bbc94df5630cfede5434bcdbe527988059c
1 parent 0c7f61b commit 45b237b

File tree

9 files changed

+987
-11
lines changed

9 files changed

+987
-11
lines changed

doc/source/configuration.rst

Lines changed: 42 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -123,8 +123,9 @@ for the Dell PowerConnect device::
123123
ngs_switchport_mode = access
124124

125125
Dell PowerConnect devices have been seen to have issues with multiple
126-
concurrent configuration sessions. See :ref:`synchronization` for details on
127-
how to limit the number of concurrent active connections to each device.
126+
concurrent configuration sessions. See :ref:`synchronization` and
127+
:ref:`batching` for details on how to limit the number of concurrent active
128+
connections to each device.
128129

129130
for the Brocade FastIron (ICX) device::
130131

@@ -248,8 +249,16 @@ connection URL for the backend should be configured as follows::
248249
[ngs_coordination]
249250
backend_url = <backend URL>
250251

251-
The default is to limit the number of concurrent active connections to each
252-
device to one, but the number may be configured per-device as follows::
252+
The backend URL format includes the Tooz driver as the scheme, with driver
253+
options passed using query string parameters. For example, to use the
254+
``etcd3gw`` driver with an API version of ``v3`` and a path to a CA
255+
certificate::
256+
257+
[ngs_coordination]
258+
backend_url = etcd3+https://etcd.example.com?api_version=v3,ca_cert=/path/to/ca/cert.crt
259+
260+
The default behaviour is to limit the number of concurrent active connections
261+
to each device to one, but the number may be configured per-device as follows::
253262

254263
[genericswitch:device-hostname]
255264
ngs_max_connections = <max connections>
@@ -263,6 +272,35 @@ timeout of 60 seconds before failing. This timeout can be configured as follows
263272
...
264273
acquire_timeout = <timeout in seconds>
265274

275+
.. _batching:
276+
277+
Batching
278+
========
279+
280+
For many network devices there is a significant SSH connection overhead which
281+
is incurred for each network or port configuration change. In a large scale
282+
system with many concurrent changes, this overhead adds up quickly. Since the
283+
Antelope release, the Generic Switch driver includes support to batch up switch
284+
configuration changes and apply them together using a single SSH connection.
285+
286+
This is implemented using etcd as a queueing system. Commands are added
287+
to an input key, then a worker thread processes the available commands
288+
for a particular switch device. We pull off the queue using the version
289+
at which the keys were added, giving a FIFO style queue. The result of
290+
each command set are added to an output key, which the original request
291+
thread is watching. Distributed locks are used to serialise the
292+
processing of commands for each switch device.
293+
294+
The etcd endpoint is configured using the same ``[ngs_coordination]
295+
backend_url`` option used in :ref:`synchronization`, with the limitation that
296+
only ``etcd3gw`` is supported.
297+
298+
Additionally, each device that will use batched configuration should include
299+
the following option::
300+
301+
[genericswitch:device-hostname]
302+
ngs_batch_requests = True
303+
266304
Disabling Inactive Ports
267305
========================
268306

0 commit comments

Comments
 (0)