|
1 | | -# Work in progress |
| 1 | +# Custom Types |
| 2 | + |
| 3 | +In modern scientific research, data pipelines often involve complex workflows that |
| 4 | +generate diverse data types. From high-dimensional imaging data to machine learning |
| 5 | +models, these data types frequently exceed the basic representations supported by |
| 6 | +traditional relational databases. For example: |
| 7 | + |
| 8 | ++ A lab working on neural connectivity might use graph objects to represent brain |
| 9 | + networks. |
| 10 | ++ Researchers processing raw imaging data might store custom objects for pre-processing |
| 11 | + configurations. |
| 12 | ++ Computational biologists might store fitted machine learning models or parameter |
| 13 | + objects for downstream predictions. |
| 14 | + |
| 15 | +To handle these diverse needs, DataJoint provides the `dj.AttributeAdapter` method. It |
| 16 | +enables researchers to store and retrieve complex, non-standard data types—like Python |
| 17 | +objects or data structures—in a relational database while maintaining the |
| 18 | +reproducibility, modularity, and query capabilities required for scientific workflows. |
| 19 | + |
| 20 | +## Uses in Scientific Research |
| 21 | + |
| 22 | +Imagine a neuroscience lab studying neural connectivity. Researchers might generate |
| 23 | +graphs (e.g., networkx.Graph) to represent connections between brain regions, where: |
| 24 | + |
| 25 | ++ Nodes are brain regions. |
| 26 | ++ Edges represent connections weighted by signal strength or another metric. |
| 27 | + |
| 28 | +Storing these graph objects in a database alongside other experimental data (e.g., |
| 29 | +subject metadata, imaging parameters) ensures: |
| 30 | + |
| 31 | +1. Centralized Data Management: All experimental data and analysis results are stored |
| 32 | + together for easy access and querying. |
| 33 | +2. Reproducibility: The exact graph objects used in analysis can be retrieved later for |
| 34 | + validation or further exploration. |
| 35 | +3. Scalability: Graph data can be integrated into workflows for larger datasets or |
| 36 | + across experiments. |
| 37 | + |
| 38 | +However, since graphs are not natively supported by relational databases, here’s where |
| 39 | +`dj.AttributeAdapter` becomes essential. It allows researchers to define custom logic for |
| 40 | +serializing graphs (e.g., as edge lists) and deserializing them back into Python |
| 41 | +objects, bridging the gap between advanced data types and the database. |
| 42 | + |
| 43 | +### Example: Storing Graphs in DataJoint |
| 44 | + |
| 45 | +To store a networkx.Graph object in a DataJoint table, researchers can define a custom |
| 46 | +attribute type in a datajoint table class: |
| 47 | + |
| 48 | +```python |
| 49 | +import datajoint as dj |
| 50 | + |
| 51 | +class GraphAdapter(dj.AttributeAdapter): |
| 52 | + |
| 53 | + attribute_type = 'longblob' # this is how the attribute will be declared |
| 54 | + |
| 55 | + def put(self, obj): |
| 56 | + # convert the nx.Graph object into an edge list |
| 57 | + assert isinstance(obj, nx.Graph) |
| 58 | + return list(obj.edges) |
| 59 | + |
| 60 | + def get(self, value): |
| 61 | + # convert edge list back into an nx.Graph |
| 62 | + return nx.Graph(value) |
| 63 | + |
| 64 | + |
| 65 | +# instantiate for use as a datajoint type |
| 66 | +graph = GraphAdapter() |
| 67 | + |
| 68 | + |
| 69 | +# define a table with a graph attribute |
| 70 | +schema = dj.schema('test_graphs') |
| 71 | + |
| 72 | + |
| 73 | +@schema |
| 74 | +class Connectivity(dj.Manual): |
| 75 | + definition = """ |
| 76 | + conn_id : int |
| 77 | + --- |
| 78 | + conn_graph = null : <graph> # a networkx.Graph object |
| 79 | + """ |
| 80 | +``` |
0 commit comments