|
| 1 | +# Parallel Processing Documentation |
| 2 | + |
| 3 | +I will lay out how to use the `thread.ParallelProcessing` class! |
| 4 | + |
| 5 | +<br /> |
| 6 | +<details> |
| 7 | + <summary>Jump to</summary> |
| 8 | + <ul> |
| 9 | + <li><a href='#importing-the-class'> Import the class</a></li> |
| 10 | + <li><a href='#initializing-a-thread'> Initialize a thread </a></li> |
| 11 | + </ul> |
| 12 | +</details> |
| 13 | + |
| 14 | + |
| 15 | +Don't have the thread library? [See here](./getting-started.md) for installing thread |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## Importing the class |
| 20 | + |
| 21 | +```py |
| 22 | +from thread import ParallelProcessing |
| 23 | +``` |
| 24 | + |
| 25 | +<br /> |
| 26 | + |
| 27 | + |
| 28 | +## How does it work? |
| 29 | + |
| 30 | +Parallel Processing works best by optimizing data processing with large datasets. |
| 31 | + |
| 32 | +What it does: |
| 33 | +```py |
| 34 | +dataset = [1, 2, 3, ..., 2e10] |
| 35 | + |
| 36 | +# Splits into chunks as evenly as possible |
| 37 | +# thread_count = min(max_threads, len(dataset)) |
| 38 | +# n == len(chunks) == len(thread_count) |
| 39 | +chunks = [[1, 2, 3, ...], [50, 51, 52, ...], ...] |
| 40 | + |
| 41 | +# Initialize and run n threads |
| 42 | +# each thread handles 1 chunk of data and parses it into the function |
| 43 | + |
| 44 | +# processed data is arranged back in order |
| 45 | + |
| 46 | +# processed data is returned as a list[Data_Out] |
| 47 | +``` |
| 48 | + |
| 49 | +<br /> |
| 50 | + |
| 51 | + |
| 52 | +## Initializing a parallel process |
| 53 | + |
| 54 | +A simple example |
| 55 | +```py |
| 56 | +def my_data_processor(Data_In) -> Data_Out: ... |
| 57 | + |
| 58 | +# Reccommended way |
| 59 | +my_processor = ParallelProcessing( |
| 60 | + function = my_data_processor, |
| 61 | + dataset = [i in range(0, n)] |
| 62 | +) |
| 63 | + |
| 64 | +# OR |
| 65 | +# Not the reccommended way |
| 66 | +my_processor = ParallelProcessing(my_data_processor, [i in range(0, n)]) |
| 67 | +``` |
| 68 | + |
| 69 | +It can be ran by invoking the `start()` method |
| 70 | +```py |
| 71 | +my_processor.start() |
| 72 | +``` |
| 73 | + |
| 74 | +> [!NOTE] |
| 75 | +> The **threading.ParallelProcessing()** class from python will only be initialized when **start()** is invoked |
| 76 | +
|
| 77 | +<br /> |
| 78 | + |
| 79 | + |
| 80 | +### Parameters |
| 81 | + |
| 82 | +* function : (DataProcessor, dataset, *args, **kwargs) -> Any | Data_Out |
| 83 | + > This should be a function that takes in a dataset and/or anything and returns Data_Out and/or anything |
| 84 | +
|
| 85 | +* dataset : Sequence[Data_In] = () |
| 86 | + > This should be an interable sequence of arguments parsed to the `DataProcessor` function<br /> |
| 87 | + > (e.g. tuple('foo', 'bar')) |
| 88 | + |
| 89 | +* *overflow_args : Overflow_In |
| 90 | + > These are arguments parsed to [**thread.Thread**](./threading.md#parameters) |
| 91 | +
|
| 92 | +* **overflow_kwargs : Overflow_In |
| 93 | + > These are arguments parsed to [**thread.Thread**](./threading.md#parameters)<br /> |
| 94 | + > [!NOTE] |
| 95 | + > If `args` is present, then it will automatically be removed from kwargs and joined with `overflow_args` |
| 96 | +
|
| 97 | +* **Raises** AssertionError: max_threads is invalid |
| 98 | + |
| 99 | +<br /> |
| 100 | + |
| 101 | + |
| 102 | +### Attributes |
| 103 | + |
| 104 | +These are attributes of [`ParallelProcessing`](#importing-the-class) class |
| 105 | + |
| 106 | +* results : List[Data_Out] |
| 107 | + > The result value |
| 108 | + > **Raises** [`ThreadNotInitializedError`](./exceptions.md#threadNotInitializedError) |
| 109 | + > **Raises** [`ThreadNotRunningError`](./exceptions.md#threadnotrunningerror) |
| 110 | + > **Raises** [`ThreadStillRunningError`](./exceptions.md#threadStillRunningError) |
| 111 | +
|
| 112 | +<br /> |
| 113 | + |
| 114 | + |
| 115 | +### Methods |
| 116 | + |
| 117 | +These are methods of [`ParallelProcessing`](#importing-the-class) class |
| 118 | + |
| 119 | +* start : () -> None |
| 120 | + > Initializes the threads and starts it<br /> |
| 121 | + > **Raises** [`ThreadStillRunningError`](./exceptions.md#threadStillRunningError) |
| 122 | +
|
| 123 | +* is_alive : () -> bool |
| 124 | + > Indicates whether the thread is still alive<br /> |
| 125 | + > **Raises** [`ThreadNotInitializedError`](./exceptions.md#threadNotInitializedError) |
| 126 | +
|
| 127 | +* get_return_values : () -> Data_Out |
| 128 | + > Halts the current thread execution until the thread completes |
| 129 | +
|
| 130 | +* join : () -> JoinTerminatedStatus |
| 131 | + > Halts the current thread execution until a thread completes or exceeds the timeout |
| 132 | + > **Raises** [`ThreadNotInitializedError`](./exceptions.md#threadNotInitializedError) |
| 133 | + > **Raises** [`ThreadNotRunningError`](./exceptions.md#threadnotrunningerror) |
| 134 | +
|
| 135 | +<br /> |
| 136 | + |
| 137 | + |
| 138 | +Now you know how to use the [`Thread`](#importing-the-class) class! |
| 139 | + |
| 140 | +[See here](./parallel-processing.md) for how to using the `thread.ParallelProcessing` class! |
0 commit comments