-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Rethink the whole way we interact with data: Session, CheckedSession, FileHandler, LazySession (#727), open_excel, ... See also the refactoring in #761 and #614.
Dataset API:
__init__(connect_string, max_memory=None, **kwargs)-- filepath or connection string, kwargs passed to underlying Dataset implementation (compression option, Excel option, ...). If max_memory is not None, the Dataset will transparently flush some of its content (probably base on LRU) to "disk" when more memory is needed.open(**kwargs)-- open/connect to the underlying storage. Kwargs here override those passed in__init__. Normally called via__enter__.__enter__and__exit__(to be usable as a context manager)read(key=None)-- read a single key, multiple keys (when key is a list), or everything (if key is None) and return the values. Unsure this explicit method makes sense. Maybe__getitem__, with an optionalload()is enough.load(key=None)-- load a single key, multiple keys (when key is a list), or everything (if key is None) and return nothing.open_key(key=None)-- in the future for returning a lazy object which will load data when actually accessed. Can potentially load only part of that key (array/...). This needs further thoughts.__getattr__-> forwards to__getitem____getitem__(key)-> equivalent toload(key)if not loaded yet and return the array (or use open_key(key) instead???)__setitem__(key)-> add or change an existing value.close()-- close file/connection to underlying storage. Normally called via__exit__
Misc thoughts:
- I think excel.Workbook should be a subclass of Dataset
- We could/should also implement a generic "read" top-level function which would open a dataset, read the array and close it, to replace/complement the read_* functions.