@@ -12,24 +12,30 @@ The curator extension is designed around three core principles:
1212
1313## Module Structure
1414
15- The curator extension consists of three focused modules:
15+ The curator extension consists of four focused modules:
1616
1717```
1818synapseclient/extensions/curator/
1919├── __init__.py # Clean public API surface
2020├── file_based_metadata_task.py # File-annotation workflows
2121├── record_based_metadata_task.py # Structured record workflows
22- └── schema_registry.py # Schema discovery and validation
22+ ├── schema_registry.py # Schema discovery and validation
23+ └── schema_generation.py # Data model and JSON Schema generation
2324```
2425
2526## Public API Design
2627
27- The module exposes three main functions that follow consistent design patterns:
28+ The module exposes five main functions that follow consistent design patterns:
2829
30+ ** Metadata Curation Workflows:**
2931- ** ` create_file_based_metadata_task() ` ** - Configurable file-annotation curation workflows
3032- ** ` create_record_based_metadata_task() ` ** - Configurable structured-record curation workflows
3133- ** ` query_schema_registry() ` ** - Flexible schema discovery with custom filtering
3234
35+ ** Data Model and Schema Generation:**
36+ - ** ` generate_jsonld() ` ** - Convert CSV data models to JSON-LD format with validation
37+ - ** ` generate_jsonschema() ` ** - Generate JSON Schema validation files from data models
38+
3339## Configuration and Flexibility
3440
3541### Extensive Parameter Control
@@ -167,6 +173,56 @@ The module provides composable building blocks that can be combined to create so
167173- Version filtering (latest-only or all versions)
168174- Dynamic filter construction using keyword arguments
169175
176+ ### Data Model and Schema Generation
177+
178+ ** Purpose** : Create and validate data models, then generate JSON Schema validation files.
179+
180+ The schema generation workflow consists of two key functions that work together:
181+
182+ #### JSON-LD Data Model Generation (` generate_jsonld ` )
183+
184+ Converts CSV-based data model specifications into standardized JSON-LD format with comprehensive validation:
185+
186+ ** Input Requirements** :
187+ - CSV file with attributes, validation rules, dependencies, and valid values
188+ - Columns defining display names, descriptions, requirements, and relationships
189+
190+ ** Validation Performed** :
191+ - Required field presence checks
192+ - Dependency cycle detection (ensures valid DAG structure)
193+ - Blacklisted character detection in display names
194+ - Reserved name conflict checking
195+ - Graph structure validation
196+
197+ ** Configuration Levers** :
198+ - Label format selection (` class_label ` vs ` display_label ` )
199+ - Custom output path or automatic naming
200+ - Comprehensive error and warning logging
201+
202+ ** Output** : JSON-LD file suitable for schema generation and other data model operations
203+
204+ #### JSON Schema Generation (` generate_jsonschema ` )
205+
206+ Generates JSON Schema validation files from JSON-LD data models, translating validation rules into schema constraints:
207+
208+ ** Supported Validation Rules** :
209+ - Type validation (string, number, integer, boolean)
210+ - Enum constraints from valid values
211+ - Required field enforcement (including component-specific requirements)
212+ - Range constraints (` inRange ` → min/max)
213+ - Pattern matching (` regex ` → JSON Schema patterns)
214+ - Format validation (` date ` , ` url ` )
215+ - Array handling (` list ` rules)
216+ - Conditional dependencies (if/then schemas)
217+
218+ ** Configuration Levers** :
219+ - Component selection (specific data types or all components)
220+ - Label format for property names
221+ - Custom output directory structure
222+ - Component-based rule application using ` #Component ` syntax
223+
224+ ** Output** : JSON Schema files for each component, enabling validation of submitted manifests
225+
170226## Development Philosophy
171227
172228### Fail Fast with Clear Messages
0 commit comments