|
| 1 | +# A Data Package built with Seedcase packages |
| 2 | + |
| 3 | +This [Data Package](https://datapackage.org/) was generated from the |
| 4 | +[`template-data-package`](https://github.com/seedcase-project/template-data-package) |
| 5 | +Seedcase template. |
| 6 | + |
| 7 | +## Project files and folders |
| 8 | + |
| 9 | +- `docs/`: Documentation about using and developing the Data Package, |
| 10 | + including this README file. |
| 11 | +- `scripts/`: Python scripts for creating and managing the Data |
| 12 | + Package. Files describing the data will be generated here. |
| 13 | +- `.copier-answers.yml`: Contains the answers you gave when copying |
| 14 | + the project from the template. **You should not modify this file |
| 15 | + directly.** |
| 16 | +- `.cz.toml`: |
| 17 | + [Commitizen](https://commitizen-tools.github.io/commitizen/) |
| 18 | + configuration file for managing versions and changelogs. |
| 19 | +- `.pre-commit-config.yaml`: [Pre-commit](https://pre-commit.com/) |
| 20 | + configuration file for managing and running checks before each |
| 21 | + commit. |
| 22 | +- `.typos.toml`: [typos](https://github.com/crate-ci/typos) spell |
| 23 | + checker configuration file. |
| 24 | +- `CITATION.cff`: Structured citation metadata for your project. |
| 25 | +- `justfile`: [`just`](https://just.systems/man/en/) configuration |
| 26 | + file for scripting project tasks. |
| 27 | +- `main.py`: Central script file for the Data Package. This is where |
| 28 | + helper scripts are invoked and work together to create and manage |
| 29 | + the Data Package. |
| 30 | +- `pyproject.toml`: Main Python project configuration file defining |
| 31 | + metadata and dependencies. |
| 32 | +- `README.md`: Autogenerated description of the Data Package. Not a |
| 33 | + development guide. Information on using and developing the project |
| 34 | + should be included in the `docs/` folder. |
| 35 | +- `ruff.toml`: [Ruff](https://docs.astral.sh/ruff/) configuration file |
| 36 | + for linting and formatting Python code. |
| 37 | +- `uv.lock`: Lockfile used by [`uv`](https://docs.astral.sh/uv/) to |
| 38 | + record exact versions of installed dependencies. |
| 39 | + |
| 40 | +## How to develop your Data Package |
| 41 | + |
| 42 | +In your new project generated from the `template-data-package`, the |
| 43 | +first steps for creating and developing your Data Package are already |
| 44 | +set up in `main.py`. For more detailed instructions on using Seedcase |
| 45 | +Sprout to organise your Data Package, see the |
| 46 | +[guide](https://sprout.seedcase-project.org/docs/guide/) on Sprout's |
| 47 | +website. You can read more about the files and folders created by |
| 48 | +`main.py` on the |
| 49 | +[Outputs](https://sprout.seedcase-project.org/docs/design/interface/outputs) |
| 50 | +page of the design documentation. |
| 51 | + |
| 52 | +### Creating package properties |
| 53 | + |
| 54 | +1. Run `main.py` to create the `scripts/package_properties.py` file for |
| 55 | + the properties of your Data Package. |
| 56 | + |
| 57 | + ``` bash |
| 58 | + just build |
| 59 | + ``` |
| 60 | + |
| 61 | + You can also run `main.py` by clicking the "Run" button in your IDE. |
| 62 | + |
| 63 | +2. Open `scripts/package_properties.py` and fill in all required |
| 64 | + fields. Also fill in any optional fields you find useful. You can |
| 65 | + always update these later. Make sure to save the file. |
| 66 | + |
| 67 | +3. In `main.py`, uncomment the lines referencing the |
| 68 | + `package_properties` and `package_path` variables. |
| 69 | + |
| 70 | +4. Rerun `main.py` to create the `datapackage.json` and `README.md` |
| 71 | + files for your Data Package. |
| 72 | + |
| 73 | +### Creating a new resource |
| 74 | + |
| 75 | +#### With data to add to the resource |
| 76 | + |
| 77 | +While you can create resource properties without data, it is a lot more |
| 78 | +challenging. If at all possible, only create a resource properties |
| 79 | +object when you have data to use to at least pre-fill in some of the |
| 80 | +important fields. In order to use Sprout, the data needs to already be |
| 81 | +in a tidy format. When it is, load the data as a Polars data frame into |
| 82 | +the `raw_data` variable in `main.py`. |
| 83 | + |
| 84 | +1. Uncomment lines up to and including the creation of resource |
| 85 | + properties. |
| 86 | + |
| 87 | +2. Fill in the `resource_name` argument. |
| 88 | + |
| 89 | +3. Rerun `main.py` to create the |
| 90 | + `scripts/resource_properties_<name>.py` file for the properties of |
| 91 | + the new resource. |
| 92 | + |
| 93 | +4. Open `scripts/resource_properties_<name>.py` and fill in all |
| 94 | + required fields. Also fill in any optional fields you find useful. |
| 95 | + You can always update these later. Make sure to save the file. |
| 96 | + |
| 97 | +5. In `package_properties.py`, import your new resource properties by |
| 98 | + uncommenting and updating it with the name of your resource. Also |
| 99 | + uncomment the `resources` field and update the name of the resource |
| 100 | + properties in the array to match the name of your new resource. |
| 101 | + |
| 102 | +6. In `main.py`, import your new resource properties by uncommenting it |
| 103 | + and updating it with the name of your resource. |
| 104 | + |
| 105 | +7. Uncomment everything else in the `main.py` file and rename the |
| 106 | + `resource_properties` variable to the name of the new resource |
| 107 | + properties you just imported. |
| 108 | + |
| 109 | +8. Rerun `main.py`. This will: |
| 110 | + |
| 111 | + - Update `datapackage.json` and `README.md`. |
| 112 | + - Create a `resources/` folder containing a folder for your new |
| 113 | + resource. In here, you will find a `batch/` folder with the |
| 114 | + individual data batches you've uploaded for this resource and a |
| 115 | + `data.parquet` file containing all resource data. |
| 116 | +
|
| 117 | +## How to use the `justfile` |
| 118 | +
|
| 119 | +The `justfile` contains scripts or "recipes" that are shorthands for |
| 120 | +performing common project tasks. You can get an overview of available |
| 121 | +recipes by running |
| 122 | +
|
| 123 | +``` bash |
| 124 | +just |
| 125 | +``` |
| 126 | +
|
| 127 | +in the project root. |
| 128 | +
|
| 129 | +You can run a recipe by typing |
| 130 | +
|
| 131 | +``` bash |
| 132 | +just <recipe-name> |
| 133 | +``` |
| 134 | +
|
| 135 | +A simple workflow would be running |
| 136 | +
|
| 137 | +1. `just build` repeatedly while working on a new feature to test that |
| 138 | + it's working |
| 139 | +2. `just run-all` before submitting your work for review to make sure |
| 140 | + all checks pass |
| 141 | + |
| 142 | +## Versioning and changelog |
| 143 | + |
| 144 | +This project uses |
| 145 | +[Commitizen](https://commitizen-tools.github.io/commitizen/) to update |
| 146 | +versions and generate changelogs. Based on the [Conventional |
| 147 | +Commits](https://www.conventionalcommits.org/en/v1.0.0/) message, it |
| 148 | +will automatically update the version in both `pyproject.toml` and |
| 149 | +`datapackage.json`. The [Data Package](https://datapackage.org/) |
| 150 | +standard suggests using their version of [Semantic |
| 151 | +Versioning](https://datapackage.org/recipes/data-package-version/). So |
| 152 | +follow these conventions when making commits to this repository. |
0 commit comments