Skip to content

Commit 09b4e1d

Browse files
committed
Add stored array documentation
1 parent 49feebd commit 09b4e1d

File tree

2 files changed

+110
-0
lines changed

2 files changed

+110
-0
lines changed

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ This software is licensed under the BSD-3-Clause license. See the LICENSE file f
4646

4747
basic_usage
4848
xhighfive
49+
stored_arrays
4950

5051
.. toctree::
5152
:caption: API REFERENCE

docs/source/stored_arrays.rst

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
.. Copyright (c) 2016, Wolf Vollprecht, Johan Mabille and Sylvain Corlay
2+
3+
Distributed under the terms of the BSD 3-Clause License.
4+
5+
The full license is in the file LICENSE, distributed with this software.
6+
7+
Stored Arrays
8+
=============
9+
10+
Arrays can be stored on a file system using ``xfile_array``, enabling
11+
persistence of data. This type of array is a file-backed cached ``xarray``,
12+
meaning that you can use it as a normal array, and it will be flushed to the
13+
file when it is destroyed or when ``flush()`` is explicitly called (provided
14+
that its content has changed). Various file systems can be used, e.g. the local
15+
file system or Google Cloud Storage, and data can be stored in various formats,
16+
e.g. GZip or Blosc.
17+
18+
File Mode
19+
---------
20+
21+
A file array can be created using one of the three following file modes:
22+
23+
- ``load``: the array is loaded from the file, meaning that the file must
24+
already exist, otherwise an exception is thrown.
25+
- ``init``: the array will initialize the file, meaning that its content will
26+
be flushed regardless of any pre-existing file.
27+
- ``init_on_fail``: the array is loaded from the file if it exists, otherwise
28+
the array will initialize the file. An initialization value can be used to
29+
fill the array.
30+
31+
The default mode is ``load``.
32+
33+
Example : on-disk file array
34+
----------------------------
35+
36+
37+
.. code:: cpp
38+
39+
#include <xtensor-io/xfile_array.hpp>
40+
#include <xtensor-io/xio_binary.hpp>
41+
#include <xtensor-io/xio_disk_handler.hpp>
42+
43+
int main()
44+
{
45+
// an on-disk file array stored in binary format
46+
using file_array = xt::xfile_array<double, xt::xio_disk_handler<xt::xio_binary_config>>;
47+
// since the file doesn't alreay exist, we use the "init" file mode
48+
file_array a1("a1.bin", xt::xfile_mode::init);
49+
50+
std::vector<size_t> shape = {2, 2};
51+
a1.resize(shape);
52+
53+
a1(0, 1) = 1.;
54+
// the in-memory value is changed, but not the on-disk file yet.
55+
// the on-disk file will change when the array is explicitly flushed,
56+
// or when it is destroyed (e.g. when going out of scope)
57+
58+
a1.flush();
59+
// now the on-disk file has changed
60+
61+
// a2 points to a1's file, we use the "load" file mode
62+
file_array a2("a1.bin", xt::xfile_mode::load);
63+
// the binary format doesn't store the shape
64+
a2.resize(shape);
65+
66+
// a1 and a2 are equal
67+
assert(xt:all(xt::equal(a1, a2)));
68+
69+
return 0;
70+
}
71+
72+
Stored Chunked Arrays
73+
---------------------
74+
75+
As for a "normal" array, a chunked array can be stored on a file system. Under
76+
the hood, it will use ``xfile_array`` to store the chunks. But rather than
77+
having one file array for each chunk (which could have a huge memory footprint),
78+
only a limited number of file arrays are used at the same time in a chunk pool.
79+
The container which is responsible for managing the chunk pool (i.e. map
80+
logical chunks in the array to physical chunks in the pool) is the
81+
``xchunk_store_manager``, but you should not use it directly. Instead, we
82+
provide factory functions to create a stored chunked array, as shown below:
83+
84+
.. code-block:: cpp
85+
86+
#include "xtensor-io/xchunk_store_manager.hpp"
87+
#include "xtensor-io/xio_binary.hpp"
88+
#include "xtensor-io/xio_disk_handler.hpp"
89+
90+
int main()
91+
{
92+
namespace fs = ghc::filesystem;
93+
94+
std::vector<size_t> shape = {4, 4};
95+
std::vector<size_t> chunk_shape = {2, 2};
96+
std::string chunk_dir = "chunks1";
97+
fs::create_directory(chunk_dir);
98+
double init_value = 5.5;
99+
std::size_t pool_size = 2; // a maximum of 2 chunks will be hold in memory
100+
101+
auto a1 = xt::chunked_file_array<double, xt::xio_disk_handler<xt::xio_binary_config>>(shape, chunk_shape, chunk_dir, init_value, pool_size);
102+
103+
a1(2, 1) = 1.2; // this assigns to chunk (1, 0) in memory
104+
a1(1, 2) = 3.4; // this assigns to chunk (0, 1) in memory
105+
a1(0, 0) = 5.6; // because the pool is full, this saves chunk (1, 0) to disk
106+
// and assigns to chunk (0, 0) in memory
107+
// when a1 is destroyed, all the modified chunks are saved to disk
108+
// this can be forced with a1.chunks().flush()
109+
}

0 commit comments

Comments
 (0)