You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tree/dataframe/src/RDataFrame.cxx
+25-4Lines changed: 25 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1265,10 +1265,28 @@ In that case, RDataFrame will snapshot the filtered columns in a memory-efficien
1265
1265
default-constructed object in case of classes. If none of the filters pass like in row 6, the entire event is omitted from the snapshot.
1266
1266
1267
1267
To tell apart a genuine `0` (like `x` in row 0) from a variation that didn't pass the selection, RDataFrame writes a bitmask for each event, indicating which variations
1268
-
are valid (see last column). A mapping of column names to this bitmask is placed in the same file as the output dataset, and automatically loaded when
1269
-
RDataFrame opens a file that was snapshot with variations.
1270
-
Attempting to read such missing values with RDataFrame will produce an error, but RDataFrame can either skip these values or fill in defaults as
1271
-
described in the \ref missing-values "section on dealing with missing values".
1268
+
are valid (see last column). The bitmask is implemented as a 64-bit `std::bitset` in memory, written to the output
1269
+
dataset as a `std::uin64_t`. For every 64 columns, a new bitmask column is added to the output dataset.
1270
+
1271
+
Each column that might contain invalid values is connected to exactly one bit in one bitmask. A mapping of column names
1272
+
to the corresponding bitmask is placed in the same file as the output dataset, with a name that follows the pattern
1273
+
`"R_rdf_branchToBitmaskMapping_<NAME_OF_THE_DATASET>"`. It is of type
1274
+
`std::unordered_map<std::string, std::pair<std::string, unsigned int>>`, and maps a column name to the name of the
1275
+
bitmask column and the index of the relevant bit. For example, in the same file as the dataset "Events" there would be
1276
+
an object named `R_rdf_branchToBitmaskMapping_Events`. This object for example would describe a connection such as:
1277
+
1278
+
~~~
1279
+
muon_pt --> (R_rdf_mask_Events_0, 42)
1280
+
~~~
1281
+
1282
+
which means that the validity of the entries in `muon_pt` is established by the bit `42` in the bitmask found in the
1283
+
column `R_rdf_mask_Events_0`.
1284
+
1285
+
When RDataFrame opens a file, it checks for the existence of this mapping between columns and bitmasks, and loads it automatically if found. As such,
1286
+
RDataFrame makes the treatment of the various bitmap maskings completely transparent to the user.
1287
+
1288
+
In case certain values are labeled invalid by the corresponding bit, this will result in reading a missing value. The semantics of such a scenario follow the
1289
+
rules described in the \ref missing-values "section on dealing with missing values" and can be dealt with accordingly.
1272
1290
1273
1291
\note Snapshot with variations is currently restricted to single-threaded TTree snapshots.
1274
1292
@@ -1780,6 +1798,9 @@ more of its entries. For example:
1780
1798
- When joining different datasets horizontally according to some index value
1781
1799
(e.g. the event number), if the index does not find a match in one or more
1782
1800
other datasets for a certain entry.
1801
+
- If, for a certain event, a column is invalid because it results from a Snapshot
1802
+
with systematic variations, and that variation didn't pass its filters. For
1803
+
more details, see \ref snapshot-with-variations.
1783
1804
1784
1805
For example, suppose that column "y" does not have a value for entry 42:
0 commit comments