Skip to content

Commit c581356

Browse files
authored
[SYNPY-1672] Extract JSON Schema creation code from schematic (#1266)
* Extract convert and create json schema functions from schematic, renamed to create json schemas
1 parent 81ea30c commit c581356

File tree

8 files changed

+6372
-6
lines changed

8 files changed

+6372
-6
lines changed

.github/workflows/build.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,15 +84,15 @@ jobs:
8484
path: |
8585
${{ steps.get-dependencies.outputs.site_packages_loc }}
8686
${{ steps.get-dependencies.outputs.site_bin_dir }}
87-
key: ${{ runner.os }}-${{ matrix.python }}-build-${{ env.cache-name }}-${{ hashFiles('setup.py') }}-v27
87+
key: ${{ runner.os }}-${{ matrix.python }}-build-${{ env.cache-name }}-${{ hashFiles('setup.py') }}-v28
8888

8989
- name: Install py-dependencies
9090
if: steps.cache-dependencies.outputs.cache-hit != 'true'
9191
shell: bash
9292
run: |
9393
python -m pip install --upgrade pip
9494
95-
pip install -e ".[boto3,pandas,pysftp,tests]"
95+
pip install -e ".[boto3,pandas,pysftp,tests,curator]"
9696
9797
# ensure that numpy c extensions are installed on windows
9898
# https://stackoverflow.com/a/59346525

Pipfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ synapseclient = {file = ".", path = "."}
1010
python_version = "3.12.6"
1111

1212
[dev-packages]
13-
synapseclient = {file = ".", editable = true, path = ".", extras = ["dev", "tests", "pandas", "pysftp", "boto3", "docs"]}
13+
synapseclient = {file = ".", editable = true, path = ".", extras = ["dev", "tests", "pandas", "pysftp", "boto3", "docs", "curator"]}

setup.cfg

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,11 @@ pandas =
107107

108108
curator =
109109
%(pandas)s
110+
pandarallel>=1.6.4
111+
inflection>=0.5.1
112+
networkx>=2.2.8
113+
dataclasses-json>=0.6.1
114+
rdflib>=6.0.0
110115

111116
pysftp =
112117
pysftp>=0.2.8,<0.3

synapseclient/extensions/curator/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,13 @@
66

77
from .file_based_metadata_task import create_file_based_metadata_task
88
from .record_based_metadata_task import create_record_based_metadata_task
9+
from .schema_generation import generate_jsonld, generate_jsonschema
910
from .schema_registry import query_schema_registry
1011

1112
__all__ = [
1213
"create_file_based_metadata_task",
1314
"create_record_based_metadata_task",
1415
"query_schema_registry",
16+
"generate_jsonld",
17+
"generate_jsonschema",
1518
]

synapseclient/extensions/curator/readme.md

Lines changed: 59 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,24 +12,30 @@ The curator extension is designed around three core principles:
1212

1313
## Module Structure
1414

15-
The curator extension consists of three focused modules:
15+
The curator extension consists of four focused modules:
1616

1717
```
1818
synapseclient/extensions/curator/
1919
├── __init__.py # Clean public API surface
2020
├── file_based_metadata_task.py # File-annotation workflows
2121
├── record_based_metadata_task.py # Structured record workflows
22-
└── schema_registry.py # Schema discovery and validation
22+
├── schema_registry.py # Schema discovery and validation
23+
└── schema_generation.py # Data model and JSON Schema generation
2324
```
2425

2526
## Public API Design
2627

27-
The module exposes three main functions that follow consistent design patterns:
28+
The module exposes five main functions that follow consistent design patterns:
2829

30+
**Metadata Curation Workflows:**
2931
- **`create_file_based_metadata_task()`** - Configurable file-annotation curation workflows
3032
- **`create_record_based_metadata_task()`** - Configurable structured-record curation workflows
3133
- **`query_schema_registry()`** - Flexible schema discovery with custom filtering
3234

35+
**Data Model and Schema Generation:**
36+
- **`generate_jsonld()`** - Convert CSV data models to JSON-LD format with validation
37+
- **`generate_jsonschema()`** - Generate JSON Schema validation files from data models
38+
3339
## Configuration and Flexibility
3440

3541
### Extensive Parameter Control
@@ -167,6 +173,56 @@ The module provides composable building blocks that can be combined to create so
167173
- Version filtering (latest-only or all versions)
168174
- Dynamic filter construction using keyword arguments
169175

176+
### Data Model and Schema Generation
177+
178+
**Purpose**: Create and validate data models, then generate JSON Schema validation files.
179+
180+
The schema generation workflow consists of two key functions that work together:
181+
182+
#### JSON-LD Data Model Generation (`generate_jsonld`)
183+
184+
Converts CSV-based data model specifications into standardized JSON-LD format with comprehensive validation:
185+
186+
**Input Requirements**:
187+
- CSV file with attributes, validation rules, dependencies, and valid values
188+
- Columns defining display names, descriptions, requirements, and relationships
189+
190+
**Validation Performed**:
191+
- Required field presence checks
192+
- Dependency cycle detection (ensures valid DAG structure)
193+
- Blacklisted character detection in display names
194+
- Reserved name conflict checking
195+
- Graph structure validation
196+
197+
**Configuration Levers**:
198+
- Label format selection (`class_label` vs `display_label`)
199+
- Custom output path or automatic naming
200+
- Comprehensive error and warning logging
201+
202+
**Output**: JSON-LD file suitable for schema generation and other data model operations
203+
204+
#### JSON Schema Generation (`generate_jsonschema`)
205+
206+
Generates JSON Schema validation files from JSON-LD data models, translating validation rules into schema constraints:
207+
208+
**Supported Validation Rules**:
209+
- Type validation (string, number, integer, boolean)
210+
- Enum constraints from valid values
211+
- Required field enforcement (including component-specific requirements)
212+
- Range constraints (`inRange` → min/max)
213+
- Pattern matching (`regex` → JSON Schema patterns)
214+
- Format validation (`date`, `url`)
215+
- Array handling (`list` rules)
216+
- Conditional dependencies (if/then schemas)
217+
218+
**Configuration Levers**:
219+
- Component selection (specific data types or all components)
220+
- Label format for property names
221+
- Custom output directory structure
222+
- Component-based rule application using `#Component` syntax
223+
224+
**Output**: JSON Schema files for each component, enabling validation of submitted manifests
225+
170226
## Development Philosophy
171227

172228
### Fail Fast with Clear Messages

0 commit comments

Comments
 (0)