Skip to content

Commit d5e5511

Browse files
committed
Rewrote readme lead and added "asynchronous" section.
Added async/throttle to list of suggested libraries in composer.json.
1 parent 65fc071 commit d5e5511

File tree

2 files changed

+40
-15
lines changed

2 files changed

+40
-15
lines changed

README.md

Lines changed: 37 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,15 @@ Porter <img src="https://github.com/ScriptFUSION/Porter/blob/master/docs/images/
88
[![Test coverage][Coverage image]][Coverage]
99
[![Code style][Style image]][Style]
1010

11-
Porter is the PHP data importer. She fetches data from anywhere, from the local file system to third party online services, and returns an [iterator](#record-collections). Porter is a fully pluggable import framework that can be extended with [connectors](#connectors) for any protocol and [transformers](#transformers) to manipulate data immediately after import.
11+
### Scalable and durable data imports for publishing and consuming APIs
1212

13-
Ready-to-use data [providers][Provider] include all the necessary connectors and other dependencies to access popular online services such as [Stripe][Stripe provider] for online payments, the [European Central Bank][ECB provider] for foreign exchange rates or [Steam][Steam provider] for its complete PC games library and more. Porter's provider library is limited right now, and some implementations are incomplete, but we hope the PHP community will rally around Porter's abstractions and become the de facto framework for publishing online services, APIs, web scrapers and data dumps. Porter's interfaces have undergone intensive scrutiny and several iterations during years of production use to ensure they are efficient, robust, flexible, testable and easy to implement.
13+
Porter is the all-purpose PHP data importer. She fetches data from anywhere and serves it as a [record collection](#record-collections) for iterable data sets, to encourage processing one record at a time instead of loading entire data sets into memory at once. Her [durability](#durability) feature provides automatic and transparent recovery from intermittent network connectivity problems by default.
1414

15-
Porter's key [durability](#durability) feature ensures recoverable connection failures are transparently retried up to five times by default, with increasing delays between each attempt until the fetch is successful. This helps ensure intermittent network failures will not disrupt the entire import operation. Special care has been taken to ensure Porter's features are safe for concurrency, such that multiple imports can be paused and resumed simultaneously, which is especially important for iterators implemented with generators (which can be paused) as well as the upcoming asynchronous imports in v5.
15+
Porter's interface trichotomy of [providers](#providers), [resources](#resources) and [connectors](#connectors) maps well to APIs. A typical API, for example GitHub, would define the Provider as GitHub, a resource as `GetUser` or `ListRepositories` and the Connector could be `HttpConnector`.
1616

17-
###### Quick links
17+
Porter provides a dual API for synchronous and [asynchronous](#asynchronous) imports, both of which are concurrency safe, so multiple imports can be paused and resumed simultaneously. Asynchronous mode allows large scale imports across multiple connections to work at maximum efficiency without waiting for each network call to complete.
18+
19+
###### Porter network quick links
1820

1921
[![][Porter icon]][Provider]
2022
[![][Porter transformers icon]][Porter transformers]
@@ -31,6 +33,7 @@ Contents
3133
1. [Overview](#overview)
3234
1. [Import specifications](#import-specifications)
3335
1. [Record collections](#record-collections)
36+
1. [Asynchronous](#asynchronous)
3437
1. [Transformers](#transformers)
3538
1. [Filtering](#filtering)
3639
1. [Durability](#durability)
@@ -48,22 +51,23 @@ Contents
4851
Benefits
4952
--------
5053

51-
* Formally defines a structured data import framework with the following concepts: [providers](#providers) represent one or more [resources](#resources) that fetch data from [connectors](#connectors).
52-
* Provides efficient in-memory data processing interfaces to handle large data sets one record at a time, via iterators, which can be implemented using generators.
53-
* Offers post-import [transformations](#transformers), such as [filtering](#filtering) and [mapping][MappingTransformer], to transform third-party into data useful for first-party applications.
54+
* Defines a formal structure for data import APIs: [providers](#providers) represent one or more [resources](#resources) that fetch data from [connectors](#connectors).
55+
* Provides efficient data processing interfaces to handle large data sets one record at a time, via iterators, which can be implemented using generators.
56+
* [Asynchronous](#asynchronous) imports offer highly efficient CPU-bound data processing for large scale imports across multiple connections concurrently.
5457
* Protects against intermittent network failures with [durability](#durability) features.
58+
* Offers post-import [transformations](#transformers), such as [filtering](#filtering) and [mapping][MappingTransformer], to transform third-party into data useful for first-party applications.
5559
* Supports PSR-6 [caching](#caching), at the connector level, for each fetch operation.
5660
* Joins two or more linked data sets together using [sub-imports][Sub-imports] automatically.
5761

5862
Quick start
5963
-----------
6064

61-
To get started quickly, try our [quick start guide][Quickstart]. For a more thorough introduction, continue reading this document.
65+
To get started quickly, try our [quick start guide][Quickstart]. For a more thorough introduction continue reading.
6266

6367
Understanding this manual
6468
-------------------------
6569

66-
The first half of this manual covers Porter's main features and how to use them. The second half covers architecture, interface and implementation details for Porter developers. There's an intermission inbetween so you'll know where the division is!
70+
The first half of this manual covers Porter's main API for consuming data services. The second half covers architecture, interface and implementation details for publishing data services. There's an intermission in-between so you'll know where the separation is!
6771

6872
Text marked as `inline code` denotes literal code, as it would appear in a PHP file. For example, `Porter` refers specifically to the class of the same name within this library, whereas *Porter* refers to this entire project as a whole.
6973

@@ -74,7 +78,7 @@ Usage
7478

7579
Create a `new Porter` instance—we'll usually only need one per application. Porter's constructor requires a [PSR-11][PSR-11] compatible `ContainerInterface` that acts as a repository of [providers](#providers).
7680

77-
When integrating Porter into a typical MVC framework application, we'll usually have a service locator or DI container implementing this interface already. We can simply inject the entire container into Porter. Although it's probably safer to create a separate container just for Porter's providers, it usually doesn't matter.
81+
When integrating Porter into a typical MVC framework application, we'll usually have a service locator or DI container implementing this interface already. We can simply inject the entire container into Porter, although it's best practice to create a separate container just for Porter's providers.
7882

7983
Without a framework, pick any [PSR-11 compatible library][PSR-11 search] and inject an instance of its container class. We could even write our own container since the interface is easy to implement, but using an existing library is beneficial, particularly since most support lazy-loading of services. If you're not sure which to use, [Joomla DI](https://github.com/joomla-framework/di) is fairly lightweight and straightforward.
8084

@@ -126,7 +130,7 @@ The following data flow diagram gives a high level overview of Porter's main int
126130

127131
</div>
128132

129-
Our application calls `Porter::import()` with an `ImportSpecification` and receives `PorterRecords` in return. Everything else happens internally so we don't need to worry about it unless writing custom providers, resources or connectors.
133+
Our application calls `Porter::import()` with an `ImportSpecification` and receives `PorterRecords` back. Everything else happens internally so we don't need to worry about it unless writing custom providers, resources or connectors.
130134

131135
Import specifications
132136
---------------------
@@ -150,7 +154,7 @@ Record collections are `Iterator`s, guaranteeing imported data is enumerable usi
150154

151155
### Details
152156

153-
Record collections may be `Countable`, depending on whether the imported data was countable and whether any destructive operations were performed after import. Filtering is a destructive operation since it may remove records and therefore the count reported by a `ProviderResource` would no longer be accurate. It is the responsibility of the resource to supply the number of records in its collection by returning an iterator that implements `Countable`, such as `ArrayIterator` or `CountableProviderRecords`. When a countable iterator is detected, Porter returns `CountablePorterRecords` provided no destructive operations were performed.
157+
Record collections may be `Countable`, depending on whether the imported data was countable and whether any destructive operations were performed after import. Filtering is a destructive operation since it may remove records and therefore the count reported by a `ProviderResource` would no longer be accurate. It is the responsibility of the resource to supply the total number of records in its collection by returning an iterator that implements `Countable`, such as `ArrayIterator`, or more commonly, `CountableProviderRecords`. When a countable iterator is used, Porter returns `CountablePorterRecords`, provided no destructive operations were performed.
154158

155159
Record collections are composed by Porter using the decorator pattern. If provider data is not modified, `PorterRecords` will decorate the `ProviderRecords` returned from a `ProviderResource`. That is, `PorterRecords` has a pointer back to the previous collection, which could be written as: `PorterRecords``ProviderRecords`. If a [filter](#filtering) was applied, the collection stack would be `PorterRecords``FilteredRecords``ProviderRecords`. Normally this is an unimportant detail but can sometimes be useful for debugging.
156160

@@ -162,6 +166,26 @@ Since record collections are just objects, it is possible to define derived type
162166

163167
The result of a successful `Porter::import` call is always an instance of `PorterRecords` or `CountablePorterRecords`, depending on whether the number of records is known. If we need to access methods of the original collection, returned by the provider, we can call `findFirstCollection()` on the collection. For an example, see [CurrencyRecords][CurrencyRecords] of the [European Central Bank Provider][ECB] and its associated [test case][ECB test].
164168

169+
Asynchronous
170+
------------
171+
172+
The new asynchronous API, introduced in version 5, is built on top of the fully programmable asynchronous framework, [Amp]. The synchronous API is not compatible with the asynchronous API so one must decide which to use. In general, the asynchronous API should be preferred for new projects because async can do everything sync can do, including emulating synchronous behaviour, but sync code cannot behave asynchronously without significant refactoring.
173+
174+
We must be inside the async event loop to begin programming asynchronously. Let's illustrate how to rewrite the [earlier example](#importing-data) asynchronously.
175+
176+
```php
177+
\Amp\Loop::run(function (): \Generator {
178+
$records = $porter->importAsync(new AsyncImportSpecification(new DailyForexRates));
179+
180+
while (yield $records->advance()) {
181+
$record = $records->current();
182+
// Insert breakpoint or var_dump($record) here to examine each record.
183+
}
184+
});
185+
```
186+
187+
Programming asynchronously requires an understanding of Amp, the async framework. Further details can be found in the official Amp documentation.
188+
165189
Transformers
166190
------------
167191

@@ -527,7 +551,7 @@ Porter is published under the open source GNU Lesser General Public License v3.0
527551
[Mapper]: https://github.com/ScriptFUSION/Mapper
528552
[PSR-6]: https://www.php-fig.org/psr/psr-6
529553
[PSR-11]: https://www.php-fig.org/psr/psr-11
530-
[PSR-11 search]: https://packagist.org/explore/?dFR[tags][0]=psr-11&hFR[type][0]=library
554+
[PSR-11 search]: https://packagist.org/explore/?type=library&tags=psr-11
531555
[Porter icon]: https://avatars3.githubusercontent.com/u/16755913?v=3&s=35 "Porter providers"
532556
[Porter transformers icon]: https://avatars2.githubusercontent.com/u/24607042?v=3&s=35 "Porter transformers"
533557
[Porter connectors icon]: https://avatars3.githubusercontent.com/u/25672142?v=3&s=35 "Porter connectors"

composer.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "scriptfusion/porter",
3-
"description": "Scalable and durable data import abstraction for APIs.",
3+
"description": "Scalable and durable data import for publishing and consuming APIs.",
44
"authors": [
55
{
66
"name": "Bilge",
@@ -25,7 +25,8 @@
2525
},
2626
"suggest" : {
2727
"connectors/http": "Provides an HTTP connector for Porter providers.",
28-
"transformers/mapping-transformer": "Transforms records using Mappings."
28+
"transformers/mapping-transformer": "Transforms records using Mappings.",
29+
"async/throttle": "Limits throughput of asynchronous imports."
2930
},
3031
"autoload": {
3132
"psr-4": {

0 commit comments

Comments
 (0)