Skip to content

Commit 585508f

Browse files
author
Dan
committed
Updated doc strings
1 parent b635a94 commit 585508f

File tree

2 files changed

+129
-53
lines changed

2 files changed

+129
-53
lines changed

README.rst

Lines changed: 77 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
1+
============
12
parallel-ssh
23
============
34

45
Asynchronous parallel SSH client library.
56

6-
Run commands via SSH over tens/hundreds/thousands+ number of servers asynchronously and with minimal system load on the client host.
7+
Run SSH commands over many - hundreds/hundreds of thousands - number of servers asynchronously and with minimal system load on the client host.
78

89
.. image:: https://img.shields.io/pypi/v/parallel-ssh.svg
910
:target: https://pypi.python.org/pypi/parallel-ssh
@@ -18,6 +19,8 @@ Run commands via SSH over tens/hundreds/thousands+ number of servers asynchronou
1819

1920
.. _`read the docs`: http://parallel-ssh.readthedocs.org/en/latest/
2021

22+
.. contents:: Table of Contents
23+
2124
************
2225
Installation
2326
************
@@ -26,11 +29,11 @@ Installation
2629

2730
pip install parallel-ssh
2831

29-
As of version ``0.93.0`` pip version >= ``6.0.0`` is required for Python 2.6 compatibility. This limitation will be removed in ``1.0.0`` release which will drop ``2.6`` support.
32+
As of version ``0.93.0`` pip version >= ``6.0.0`` is required for Python 2.6 compatibility with newer versions of gevent which have dropped 2.6 support. This limitation will be removed post ``1.0.0`` releases which will deprecate ``2.6`` support.
3033

31-
To upgrade pip and setuptools run the following - use of ``virtualenv`` is recommended so as not to override system provided packages::
34+
To upgrade ``pip`` run the following - use of ``virtualenv`` is recommended so as not to override system provided packages::
3235

33-
pip install -U pip setuptools
36+
pip install -U pip
3437
pip install parallel-ssh
3538

3639
*************
@@ -43,6 +46,8 @@ Run ``ls`` on two remote hosts in parallel with ``sudo``.
4346

4447
::
4548

49+
from __future__ import print_function
50+
4651
from pssh import ParallelSSHClient
4752
hosts = ['myhost1', 'myhost2']
4853
client = ParallelSSHClient(hosts)
@@ -51,7 +56,7 @@ Run ``ls`` on two remote hosts in parallel with ``sudo``.
5156
{'myhost1': {'exit_code': None, 'stdout': <generator>, 'stderr': <generator>, 'channel': <channel>, 'cmd' : <greenlet>, 'exception' : None},
5257
'myhost2': {'exit_code': None, 'stdout': <generator>, 'stderr': <generator>, 'channel': <channel>, 'cmd' : <greenlet>, 'exception' : None}}
5358

54-
Stdout and stderr buffers are available in output. Iterating on them can be used to get output as it becomes available. Iteration ends *only when command has finished*.
59+
Stdout and stderr buffers are available in output. Iterating on them can be used to get output as it becomes available. Iteration ends *only when command has finished*, though it may be interrupted and resumed at any point.
5560

5661
::
5762

@@ -72,7 +77,7 @@ Exit codes become available once stdout/stderr is iterated on or ``client.join(o
7277
0
7378
0
7479

75-
Joining on the connection pool can be used to block and wait for all parallel commands to finish if output is not required. ::
80+
Joining on the connection pool can be used to block and wait for all parallel commands to finish *if output is not needed*. ::
7681

7782
client.pool.join()
7883

@@ -84,39 +89,80 @@ Similarly, if only exit codes are needed but not output ::
8489
print(output[client.hosts[0]]['exit_code'])
8590
0
8691

87-
There is a also host logger that can be enabled to log output from remote hosts. The helper function ``pssh.utils.enable_host_logger`` will enable host logging to stdout, for example ::
92+
There is also a built in host logger that can be enabled to log output from remote hosts. The helper function ``pssh.utils.enable_host_logger`` will enable host logging to stdout, for example ::
8893

8994
import pssh.utils
9095
pssh.utils.enable_host_logger()
91-
output = client.run_command('uname')
92-
client.join(output)
96+
client.join(client.run_command('uname'))
9397
9498
[localhost] Linux
9599

100+
*****************
101+
Design And Goals
102+
*****************
103+
104+
``ParallelSSH``'s design goals and motivation are to provide a *library* for running *asynchronous* SSH commands in parallel with little to no load induced on the system by doing so with the intended usage being completely programmatic and non-interactive.
105+
106+
To meet these goals, API driven solutions are preferred first and foremost. This frees up the developer to drive the library via any method desired, be that environment variables, CI driven tasks, command line tools, existing OpenSSH or new configuration files, from within an application et al.
107+
108+
********
109+
Scaling
110+
********
111+
112+
Some guide lines on scaling ``ParallelSSH`` client and pool size numbers.
113+
114+
In general, long lived commands with little or no output *gathering* will scale better. Pool sizes in the multiple thousands have been used successfully with little CPU overhead in the single process running them in these use cases.
115+
116+
Conversely, many short lived commands with output gathering will not scale as well. In this use case, smaller pool sizes in the hundreds are likely to perform better with regards to CPU overhead in the (g)event loop. Multiple processes, each with its own event loop, may be used to scale this use case further as CPU overhead allows.
117+
118+
Gathering is highlighted here as output generation does not affect scaling. Only when output is gathered is overhead increased.
119+
120+
Technical Details
121+
******************
122+
123+
To understand why this is, consider that in co-operative multi tasking, which is being used in this project via the ``gevent`` module, a co-routine (greenlet) needs to ``yield`` the event loop to allow others to execute - *co-operation*. When one co-routine is constantly grabbing the event loop in order to gather output, or when co-routines are constantly trying to start new short-lived commands, it causes overhead with other co-routines that also want to use the event loop.
124+
125+
This manifests itself as increased CPU usage in the process running the event loop and reduced performance with regards to scaling improvements from increasing pool size.
126+
127+
On the other end of the spectrum, long lived remote commands that generate *no* output only need the event loop at the start, when they are establishing connections, and at the end, when they are finished and need to gather exit codes, which results in practically zero CPU overhead at any time other than start or end of command execution.
128+
129+
Output *generation* is done remotely and has no effect on the event loop until output is gathered - output buffers are iterated on. Only at that point does the event loop need to be held.
96130

97131
**************************
98132
Frequently asked questions
99133
**************************
100134

101135
:Q:
102-
Why should I use this module and not, for example, `fabric <https://github.com/fabric/fabric>`_?
136+
Why should I use this library and not, for example, `fabric <https://github.com/fabric/fabric>`_?
103137

104138
:A:
105-
ParallelSSH's design goals and motivation are to provide a *library* for running *asynchronous* SSH commands in parallel with **no** load induced on the system by doing so with the intended usage being completely programmatic and non-interactive - Fabric provides none of these goals.
106-
107-
Fabric is a port of `Capistrano <https://github.com/capistrano/capistrano>`_ from ruby to python. Its design goals are to provide a faithful port of capistrano with its `tasks` and `roles` to python with interactive command line being the intended usage. Its use as a library is non-standard and in `many <https://github.com/fabric/fabric/issues/521>`_ `cases <https://github.com/fabric/fabric/pull/674>`_ `just <https://github.com/fabric/fabric/pull/1215>`_ `plain <https://github.com/fabric/fabric/issues/762>`_ `broken <https://github.com/fabric/fabric/issues/1068>`_.
108-
109-
Furthermore, its parallel commands use a combination of both threads and processes with extremely high CPU usage and system load while running. Fabric currently stands at over 6,000 lines of code, majority of which is untested, particularly if used as a library as opposed to less than 700 lines of code mostly consisting of documentation strings currently in `ParallelSSH` with over 80% code test coverage.
139+
In short, the tools are intended for different use cases.
140+
141+
``ParallelSSH`` satisfies uses cases for a parallel SSH client library that scales well over hundreds to hundreds of thousands of hosts - per `Design And Goals`_ - a use case that is very common on cloud platforms and virtual machine automation . It should be used where the use case is as such.
142+
143+
Fabric and tools like it on the other hand are not well suited to such use cases, for many reasons, performance and differing design goals in particular. The similarity is only that these tools also make use of SSH to run their commands.
144+
145+
``ParallelSSH`` is in other words well suited to be the SSH client tools like Fabric and Ansible and others use to run their commands rather than a direct replacement for.
146+
147+
By focusing on providing a well defined, lightweight - actual code is a few hundred lines - library, ``ParallelSSH`` is far better suited for *run this command on X number of hosts* for which frameworks like Fabric, Capistrano and others are overkill and unsuprisignly, as it is not what they are for, ill-suited to and do not perform particularly well with.
148+
149+
Fabric and tools like it are high level deployment frameworks - as opposed to general purpose libraries - for building tasks to perform on hosts matching a role with task chaining and a DSL like syntax and are primarily intended for command line use for which the framework is a good fit for - very far removed from an SSH client library and there is no intention to add any such framework, tasks/roles, task chaining or similar functionality to ``ParallelSSH``.
150+
151+
Fabric in particular is a port of `Capistrano <https://github.com/capistrano/capistrano>`_ from Ruby to Python. Its design goals are to provide a faithful port of Capistrano with its `tasks` and `roles` framework to python with interactive command line being the intended usage.
152+
153+
Furthermore, Fabric's use as a library is non-standard and in `many <https://github.com/fabric/fabric/issues/521>`_ `cases <https://github.com/fabric/fabric/pull/674>`_ `just <https://github.com/fabric/fabric/pull/1215>`_ `plain <https://github.com/fabric/fabric/issues/762>`_ `broken <https://github.com/fabric/fabric/issues/1068>`_ and currently stands at over 7,000 lines of code most of which is lacking code testing.
154+
155+
In addition, Fabric's parallel command implementation uses a combination of both threads and processes with extremely high CPU usage and system load while running with as little as tens of hosts.
110156

111157
:Q:
112158
Is Windows supported?
113159

114160
:A:
115161
The library installs and works on Windows though not formally supported as unit tests are currently Posix system based.
116162

117-
Pip versions >= 8.0 are required for binary package installation of `gevent` on Windows, a dependency of `ParallelSSH`.
163+
Pip versions >= 8.0 are required for binary package installation of ``gevent`` on Windows, a dependency of ``ParallelSSH``.
118164

119-
Though `ParallelSSH` is pure python code and will run on any platform that has a working Python interpreter, its `gevent` dependency contains native code which either needs a binary package to be provided for the platform or to be built from source. Binary packages for `gevent` are provided for OSX, Linux and Windows platforms as of this time of writing.
165+
Though ``ParallelSSH`` is pure python code and will run on any platform that has a working Python interpreter, its ``gevent`` dependency contains native code which either needs a binary package to be provided for the platform or to be built from source. Binary packages for ``gevent`` are provided for OSX, Linux and Windows platforms as of this time of writing.
120166

121167
:Q:
122168
Are SSH agents used?
@@ -130,19 +176,19 @@ Frequently asked questions
130176
Can ParallelSSH forward my SSH agent?
131177

132178
:A:
133-
SSH agent forwarding, what ``ssh -A`` does on the command line, is supported and enabled by default. Creating an object as ``ParallelSSHClient(forward_ssh_agent=False)`` will disable that behaviour.
179+
SSH agent forwarding, what ``ssh -A`` does on the command line, is supported and enabled by default. Creating an object as ``ParallelSSHClient(forward_ssh_agent=False)`` will disable this behaviour.
134180

135181
:Q:
136182
Is tunneling/proxying supported?
137183

138184
:A:
139-
Yes, `ParallelSSH` natively supports tunelling through an intermediate SSH server. Connecting to a remote host is accomplished via an SSH tunnel using the SSH's protocol direct TCP tunneling feature, using local port forwarding. This is done natively in python and tunnel connections are asynchronous like all other connections in the `ParallelSSH` library. For example, client -> proxy SSH server -> remote SSH destination.
185+
Yes, `ParallelSSH` natively supports tunelling - also known as proxying - through an intermediate SSH server. Connecting to a remote host is accomplished via an SSH tunnel using the SSH's protocol direct TCP tunneling feature, using local port forwarding. This is done natively in python and tunnel connections are asynchronous like all other connections in the `ParallelSSH` library. For example, client -> proxy SSH server -> remote SSH destination.
140186

141-
Use the ``proxy_host`` and ``proxy_port`` parameters to configure your proxy.
187+
Use the ``proxy_host`` and ``proxy_port`` parameters to configure your proxy::
142188

143-
>>> client = ParallelSSHClient(hosts, proxy_host='my_ssh_proxy_host')
144-
145-
Note that while connections from the ParallelSSH client to the tunnel host are asynchronous, connections from the tunnel host to the remote destination(s) may not be, depending on the SSH server implementation. If the SSH server uses threading to implement its tunelling and that server is used to tunnel to a large number of remote destinations system load on the tunnel server will increase linearly according to number of remote hosts.
189+
client = ParallelSSHClient(hosts, proxy_host='my_ssh_proxy_host')
190+
191+
Note that while connections from the ParallelSSH `client` to the tunnel host are asynchronous, connections from the tunnel host to the remote destination(s) may not be, depending on the SSH server implementation. If the SSH server uses threading to implement its tunelling and that server is used to tunnel to a large number of remote destinations system load on the tunnel server will increase linearly with number of threads used.
146192

147193
:Q:
148194
Is there a way to programmatically provide an SSH key?
@@ -170,11 +216,17 @@ SFTP is supported (SCP version 2) natively, no ``scp`` command required.
170216
For example to copy a local file to remote hosts in parallel::
171217

172218
from pssh import ParallelSSHClient, utils
219+
from gevent import joinall
220+
173221
utils.enable_logger(utils.logger)
174222
hosts = ['myhost1', 'myhost2']
175223
client = ParallelSSHClient(hosts)
176-
client.copy_file('../test', 'test_dir/test')
177-
client.pool.join()
224+
greenlets = client.copy_file('../test', 'test_dir/test')
225+
joinall(greenlets, raise_error=True)
178226
179227
Copied local file ../test to remote destination myhost1:test_dir/test
180228
Copied local file ../test to remote destination myhost2:test_dir/test
229+
230+
There is similar capability to copy remote files to local ones suffixed with the host's name with the ``copy_remote_file`` function.
231+
232+
Directory recursion is supported in both cases - defaults to off.

0 commit comments

Comments
 (0)