Skip to content

Commit dee7214

Browse files
authored
Merge branch 'develop' into SYNPY-1673
2 parents af737b1 + cbf384b commit dee7214

File tree

29 files changed

+7586
-21
lines changed

29 files changed

+7586
-21
lines changed
Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
# How to Create Metadata Curation Workflows
2+
3+
This guide shows you how to set up a metadata curation workflows in Synapse using the curator extension. You'll learn to find appropriate schemas, create curation tasks for your research data.
4+
5+
## What you'll accomplish
6+
7+
By following this guide, you will:
8+
9+
- Find and select the right JSON schema for your data type
10+
- Create a metadata curation workflow with automatic validation
11+
- Set up either file-based or record-based metadata collection
12+
- Configure curation tasks that guide collaborators through metadata entry
13+
14+
## Prerequisites
15+
16+
- A Synapse account with project creation permissions
17+
- Python environment with synapseclient and the `curator` extension installed (ie. `pip install --upgrade "synapseclient[curator]"`)
18+
- An existing Synapse project and folder where you want to manage metadata
19+
- A JSON Schema registered in Synapse (many schemas are already available for Sage-affiliated projects, or you can register your own by following the [JSON Schema tutorial](../../../tutorials/python/json_schema.md))
20+
21+
## Step 1: Authenticate and import required functions
22+
23+
```python
24+
from synapseclient.extensions.curator import (
25+
create_record_based_metadata_task,
26+
create_file_based_metadata_task,
27+
query_schema_registry
28+
)
29+
from synapseclient import Synapse
30+
31+
syn = Synapse()
32+
syn.login()
33+
```
34+
35+
## Step 2: Find the right schema for your data
36+
37+
Before creating a curation task, identify which JSON schema matches your data type. Many schemas are already registered in Synapse for Sage-affiliated projects. The schema registry contains validated schemas organized by data coordination center (DCC) and data type.
38+
39+
**If you need to register your own schema**, follow the [JSON Schema tutorial](../../../tutorials/python/json_schema.md) to understand the registration process.
40+
41+
```python
42+
# Find the latest schema for your specific data type
43+
schema_uri = query_schema_registry(
44+
synapse_client=syn,
45+
dcc="ad", # Your data coordination center, check out the `syn69735275` table if you do not know your code
46+
datatype="IndividualAnimalMetadataTemplate" # Your specific data type
47+
)
48+
49+
print("Latest schema URI:", schema_uri)
50+
```
51+
52+
**When to use this approach:** You know your DCC and data type, you want the most current schema version, and it has already been registered into <https://www.synapse.org/Synapse:syn69735275/tables/>.
53+
54+
**Alternative - browse available schemas:**
55+
```python
56+
# Get all versions to see what's available
57+
all_schemas = query_schema_registry(
58+
synapse_client=syn,
59+
dcc="ad",
60+
datatype="IndividualAnimalMetadataTemplate",
61+
return_latest_only=False
62+
)
63+
```
64+
65+
## Step 3: Choose your metadata workflow type
66+
67+
### Option A: Record-based metadata
68+
69+
Use this when metadata describes individual data files and is stored as annotations directly on each file.
70+
71+
```python
72+
record_set, curation_task, data_grid = create_record_based_metadata_task(
73+
synapse_client=syn,
74+
project_id="syn123456789", # Your project ID
75+
folder_id="syn987654321", # Folder where files are stored
76+
record_set_name="AnimalMetadata_Records",
77+
record_set_description="Centralized metadata for animal study data",
78+
curation_task_name="AnimalMetadata_Curation", # Must be unique within the project
79+
upsert_keys=["StudyKey"], # Fields that uniquely identify records
80+
instructions="Complete all required fields according to the schema. Use StudyKey to link records to your data files.",
81+
schema_uri=schema_uri, # Schema found in Step 2
82+
bind_schema_to_record_set=True
83+
)
84+
85+
print(f"Created RecordSet: {record_set.id}")
86+
print(f"Created CurationTask: {curation_task.task_id}")
87+
```
88+
89+
**What this creates:**
90+
91+
- A RecordSet where metadata is stored as structured records (like a spreadsheet)
92+
- A CurationTask that guides users through completing the metadata
93+
- Automatic schema binding for validation
94+
- A data grid interface for easy metadata entry
95+
96+
### Option B: File-based metadata (for unique per-file metadata)
97+
98+
Use this when metadata is normalized in structured records to eliminate duplication and ensure consistency.
99+
100+
```python
101+
entity_view_id, task_id = create_file_based_metadata_task(
102+
synapse_client=syn,
103+
folder_id="syn987654321", # Folder containing your data files
104+
curation_task_name="FileMetadata_Curation", # Must be unique within the project
105+
instructions="Annotate each file with metadata according to the schema requirements.",
106+
attach_wiki=True, # Creates a wiki in the folder with the entity view
107+
entity_view_name="Animal Study Files View",
108+
schema_uri=schema_uri # Schema found in Step 2
109+
)
110+
111+
print(f"Created EntityView: {entity_view_id}")
112+
print(f"Created CurationTask: {task_id}")
113+
```
114+
115+
**What this creates:**
116+
117+
- An EntityView that displays all files in the folder
118+
- A CurationTask for guided metadata entry
119+
- Automatic schema binding to the folder for validation
120+
- Optional wiki attached to the folder
121+
122+
## Complete example script
123+
124+
Here's the full script that demonstrates both workflow types:
125+
126+
```python
127+
from pprint import pprint
128+
from synapseclient.extensions.curator import (
129+
create_record_based_metadata_task,
130+
create_file_based_metadata_task,
131+
query_schema_registry
132+
)
133+
from synapseclient import Synapse
134+
135+
# Step 1: Authenticate
136+
syn = Synapse()
137+
syn.login()
138+
139+
# Step 2: Find schema
140+
schema_uri = query_schema_registry(
141+
synapse_client=syn,
142+
dcc="ad",
143+
datatype="IndividualAnimalMetadataTemplate"
144+
)
145+
print("Using schema:", schema_uri)
146+
147+
# Step 3A: Create record-based workflow
148+
record_set, curation_task, data_grid = create_record_based_metadata_task(
149+
synapse_client=syn,
150+
project_id="syn123456789",
151+
folder_id="syn987654321",
152+
record_set_name="AnimalMetadata_Records",
153+
record_set_description="Centralized animal study metadata",
154+
curation_task_name="AnimalMetadata_Curation",
155+
upsert_keys=["StudyKey"],
156+
instructions="Complete metadata for all study animals using StudyKey to link records to data files.",
157+
schema_uri=schema_uri,
158+
bind_schema_to_record_set=True
159+
)
160+
161+
print(f"Record-based workflow created:")
162+
print(f" RecordSet: {record_set.id}")
163+
print(f" CurationTask: {curation_task.task_id}")
164+
165+
# Step 3B: Create file-based workflow
166+
entity_view_id, task_id = create_file_based_metadata_task(
167+
synapse_client=syn,
168+
folder_id="syn987654321",
169+
curation_task_name="FileMetadata_Curation",
170+
instructions="Annotate each file with complete metadata according to schema.",
171+
attach_wiki=True,
172+
entity_view_name="Animal Study Files View",
173+
schema_uri=schema_uri
174+
)
175+
176+
print(f"File-based workflow created:")
177+
print(f" EntityView: {entity_view_id}")
178+
print(f" CurationTask: {task_id}")
179+
```
180+
181+
## Additional utilities
182+
183+
### Validate schema binding on folders
184+
185+
Use this script to verify the schema on a folder against the items contained within that folder:
186+
187+
```python
188+
from synapseclient import Synapse
189+
from synapseclient.models import Folder
190+
191+
# The Synapse ID of the entity you want to bind the JSON Schema to. This should be the ID of a Folder where you want to enforce the schema.
192+
FOLDER_ID = ""
193+
194+
syn = Synapse()
195+
syn.login()
196+
197+
folder = Folder(id=FOLDER_ID).get()
198+
schema_validation = folder.validate_schema()
199+
200+
print(f"Schema validation result for folder {FOLDER_ID}: {schema_validation}")
201+
```
202+
203+
### List existing curation tasks
204+
205+
Use this script to see all curation tasks in a project:
206+
207+
```python
208+
from pprint import pprint
209+
from synapseclient import Synapse
210+
from synapseclient.models.curation import CurationTask
211+
212+
PROJECT_ID = "" # The Synapse ID of the project to list tasks from
213+
214+
syn = Synapse()
215+
syn.login()
216+
217+
for curation_task in CurationTask.list(
218+
project_id=PROJECT_ID
219+
):
220+
pprint(curation_task)
221+
```
222+
223+
## References
224+
225+
### API Documentation
226+
227+
- [query_schema_registry][synapseclient.extensions.curator.query_schema_registry] - Search for schemas in the registry
228+
- [create_record_based_metadata_task][synapseclient.extensions.curator.create_record_based_metadata_task] - Create RecordSet-based curation workflows
229+
- [create_file_based_metadata_task][synapseclient.extensions.curator.create_file_based_metadata_task] - Create EntityView-based curation workflows
230+
- [Folder.bind_schema][synapseclient.models.Folder.bind_schema] - Bind schemas to folders
231+
- [Folder.validate_schema][synapseclient.models.Folder.validate_schema] - Validate folder schema compliance
232+
- [CurationTask.list][synapseclient.models.CurationTask.list] - List curation tasks in a project
233+
234+
### Related Documentation
235+
236+
- [JSON Schema Tutorial](../../../tutorials/python/json_schema.md) - Learn how to register schemas
237+
- [Schema Registry](https://synapse.org/Synapse:syn69735275/tables/) - Browse available schemas
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Evaluation
2+
3+
Contained within this file are experimental interfaces for working with the Synapse Python
4+
Client. Unless otherwise noted these interfaces are subject to change at any time. Use
5+
at your own risk.
6+
7+
## API reference
8+
9+
::: synapseclient.models.Evaluation
10+
options:
11+
inherited_members: true
12+
members:
13+
- store_async
14+
- get_async
15+
- delete_async
16+
- get_all_evaluations_async
17+
- get_available_evaluations_async
18+
- get_evaluations_by_project_async
19+
- get_acl_async
20+
- update_acl_async
21+
- get_permissions_async
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Evaluation
2+
3+
Contained within this file are experimental interfaces for working with the Synapse Python
4+
Client. Unless otherwise noted these interfaces are subject to change at any time. Use
5+
at your own risk.
6+
7+
## API reference
8+
9+
::: synapseclient.models.Evaluation
10+
options:
11+
inherited_members: true
12+
members:
13+
- store
14+
- get
15+
- delete
16+
- get_all_evaluations
17+
- get_available_evaluations
18+
- get_evaluations_by_project
19+
- get_acl
20+
- update_acl
21+
- get_permissions
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
::: synapseclient.extensions.curator
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Evaluations
2+
An Evaluation is essentially a container that organizes and manages the submission, assessment, and scoring of data, models, or other research artifacts.
3+
It allows teams to set up challenges where participants contribute their work, and those contributions can be systematically reviewed or scored.
4+
5+
This tutorial will walk you through the basics of working with Evaluations using the Synapse Python client.
6+
7+
## Tutorial Purpose
8+
In this tutorial you will:
9+
10+
1. Create and update an Evaluation on Synapse
11+
1. Update the ACL (Access Control List) of an Evaluation on Synapse
12+
1. Retrieve and delete all Evaluations from a given Project
13+
14+
## Prerequisites
15+
* You have completed the [Project](./project.md) tutorial, or have an existing Project on Synapse to work from
16+
* You have a Synapse user or team ID to share the evaluation with
17+
18+
## 1. Create and update an Evaluation on Synapse
19+
20+
In this first part, we'll be showing you how to interact with an Evaluation object as well as introducing you to its two core functionalities `store()` and `get()`.
21+
22+
```python
23+
{!docs/tutorials/python/tutorial_scripts/evaluation.py!lines=5-46}
24+
```
25+
26+
## 2. Update the ACL of an Evaluation on Synapse
27+
28+
Like Synapse entities, Evaluations have ACLs that can be used to control who has access to your evaluations and what level of access they have. Updating the ACL of an Evaluation object is slightly different from updating other Evaluation components, because the ACL is not an attribute of the Evaluation object. Let's see an example of how this looks:
29+
30+
```python
31+
{!docs/tutorials/python/tutorial_scripts/evaluation.py!lines=54-64}
32+
```
33+
34+
You can also remove principals from an ACL by simply feeding `update_acl` an empty list for the `access_type` argument, like so:
35+
36+
```python
37+
{!docs/tutorials/python/tutorial_scripts/evaluation.py!lines=66-67}
38+
```
39+
40+
## 3. Retrieve and delete all Evaluations from a given Project
41+
42+
Now we will show how you can retrieve lists of Evaluation objects, rather than retrieving them one-by-one with `get()`. This is a powerful tool if you want to perform the same action on all the evaluations in a given project, for example, like what we're about to do here:
43+
44+
```python
45+
{!docs/tutorials/python/tutorial_scripts/evaluation.py!lines=69-75}
46+
```
47+
48+
## Source code for this tutorial
49+
50+
<details class="quote">
51+
<summary>Click to show me</summary>
52+
53+
```python
54+
{!docs/tutorials/python/tutorial_scripts/evaluation.py!}
55+
```
56+
</details>
57+
58+
## References
59+
- [Evaluation][synapseclient.models.Evaluation]
60+
- [Project][synapseclient.models.Project]
61+
- [syn.login][synapseclient.Synapse.login]

0 commit comments

Comments
 (0)