Implement `git` submodules for specs #182

anatoly-scherbakov · 2023-10-18T15:35:52Z

Alleviate the step of manually downloading specifications for tests to use them. Instead, put specifications into a dedicated directory using Git submodules mechanism.

anatoly-scherbakov · 2023-10-18T18:15:01Z

closes #181

BigBlueHat · 2023-11-02T12:11:48Z

@anatoly-scherbakov let's give this a try. Can you expand the README in this PR also to explain what the pre-testing steps would include now (i.e. git submodule init etc.)?

It would be great to simplify testing.

Thanks for exploring this idea!
🎩

anatoly-scherbakov · 2023-11-12T16:38:02Z

@BigBlueHat @davidlehn My apologies for the delay; I think I have something to show here now.

BigBlueHat · 2023-12-05T16:32:47Z

Hey @anatoly-scherbakov, Sorry for the delay here. I continue to feel a bit squeamish about using git submodules. It's not a pattern we typically use in our other implementations, and it's a rarely used (and oft derided...) git feature. I'm also concerned about the overhead of keeping the sha's pinned accurately.

However, I did find that the jsonld.js implementation has a nice approach to this same problem using npm run commands: https://github.com/digitalbazaar/jsonld.js/blob/main/package.json#L96-L101

My thought is that we could do something similar here by providing a simple Python script that would run the git clone --depth 1 commands, and ideally also provide environment variables to use (as the JS implementation does) in case someone already has the test suites on hand but in different locations (as we often do because we work on those also).

Would you be up for taking a crack at that?

anatoly-scherbakov · 2023-12-06T19:53:40Z

@BigBlueHat thank you for your review. I understand the concern but might disagree in certain regards, let me try to elaborate on them.

Let's analyze the approach taken at jsonld.js.

    "fetch-json-ld-org-test-suite": "if [ ! -e test-suites/json-ld.org ]; then git clone --depth 1 https://github.com/json-ld/json-ld.org.git test-suites/json-ld.org; fi",

What happens if JSON-LD test suite is updated?

A developer already has a downloaded version of JSON-LD test suite locally.
- The developer is not notified about the change
- If they realize that the change has occurred, to refresh the spec version, they have to manually remove the spec directory
- Different developers will inevitably debug the implementation against different versions of the test suite, which leads to issues potentially complicated to debug
- It is hard to track which exact version of the test suite one has, and Git doesn't help with that
A developer submits their changes to jsonld-js to CI, and CI fails
- That's because the spec has been updated and CI downloaded the latest version
- First of all, the developer needs to realize that had happened
- Then, they have to download the latest spec and debug what has changed,
- Which wasn't in the original scope of their pull request.

I feel these cases can be very annoying and might hurt developer productivity.

Update of the spec version is not tied to an explicit human decision, which goes against normal practice for dependencies in many environments.

For instance, with Python poetry, you have to issue poetry update to update your dependencies; otherwise, their versions are explicitly pinned in poetry.lock. The same applies to cargo (Rust dependency manager), to npm (with its package.lock), and others.

Git submodules essentially provide the same level of control over the version of dependencies. It is easy to provide a make command which will upgrade dependencies, say

make upgrade-submodules

I'll add it in a moment in another commit. This will upgrade each submodule to its latest version; afterwards, the developer can run tests, make sure everything is 🟢, and submit to CI.

As to the popularity of git submodules, — quick search on Github returns a few popular projects relying upon them: pytorch, ghc, draw.io, mono, ceph, obs-studio, micropython, qemu and more.

Let me know what you think!

anatoly-scherbakov · 2023-12-06T21:34:13Z

@BigBlueHat

P. S. To address the concern about testing pyld, this can be done by:

python tests/runtests.py ~/projects/json-ld-api

…as documented in README.rst. I don't think git submodules would be an obstacle for local development in this fashion.

BigBlueHat · 2024-01-31T18:50:13Z

@anatoly-scherbakov coming back to this again--sorry for the wait.

The experience of using the submodules was fine--not much different than the experience of using the approach we took in jsonld.js, so I'm fine with it.

@davidlehn unless you're diametrically opposed to this, I'd like to get it merged sooner than later.

BigBlueHat

Small tweak which isn't a blocker--and could be done later--so approving.

Makefile

tests/runtests.py

anatoly-scherbakov · 2024-02-01T17:50:15Z

Thanks for the review @BigBlueHat! I am writing an issue about the Makefile shortly but I do not believe I have the rights to merge this PR.

This might be also related to me being unable to create a PR in this repository which has another PR as its base.

davidlehn · 2024-02-06T01:38:09Z

.gitmodules

@@ -0,0 +1,11 @@
+[submodule "specifications/json-ld-api"]


These are the specs, but since their sole purpose here is for testing, could this be named test-suites/? That's what the scripts in jsonld.js use.

Thanks for reviewing!

I can rename those. A few things concern me a bit though:

Each repository we're checking out contains not just tests but the whole specification, which might be confusing and violate the principle of least surprise;

We will likely add more tests to the tests directory which will not be using the specifications, and the existence of both tests and test-suites will be confusing;

At some point, we might want to use the specifications for other purposes than tests, for instance, to aid documentation generation, or something.

What do you think?

I prefer the specifications/ folder name, as I do hope/expect we'll get more unit tests added to tests/ at some point, and test-suites/ would certainly be confusing.

I agree keeping specifications/, but should we not place it under tests/?

@mielvds this will mean there will be a path like tests/specifications/json-ld-api/tests/, lots of nesting. Also... what if we want to reuse specifications for different purposes other than tests?

For instance, they could be extremely useful for LLM-assisted work on the library because specs are authoritatitve.

davidlehn · 2024-02-06T01:40:56Z

Makefile

@@ -0,0 +1,2 @@
+upgrade-submodules:
+	git submodule update --remote --init --recursive


How do you remove all the checked out dirs without leaving git thinking there are unstaged changes? Put another way, how do you undo this command?

If you want to undo the version upgrade of the submodules, you can do:

git reset --hard

If you've modified the submodules locally, after having them upgraded (for debugging purposes perhaps) then you do:

git submodule foreach git reset --hard

should we add those to the makefile for convenience?

During a normal workflow, I guess modification of submodules won't happen often. reset --hard is a destructive operation, not sure if we should create a shortcut for it 🤔

README.rst

tests/runtests.py

davidlehn · 2024-02-06T02:18:19Z

If this PR is going to go all-in, then consider updating docs to handle what #157 was doing to address #156.

davidlehn · 2024-02-06T03:22:49Z

I don't necessarily disagree with the reasoning to use submodules. As a counter point, pyld and jsonld.js are over a decade old and have gotten by without using that feature. It's a matter of tradeoffs between using different commands and dealing with different problems. I'm not sure what the best approach is. Certainly better docs and tooling would help in whatever approach is used.
Adding this feature forces the maintainers to learn to use it. Help us (me) out with a few more snippets of docs perhaps in the README and/or CONTRIBUTING. Maybe even more script/makefile targets to run. Specifically, what do we (I) need to type to keep the test suites synced up all the time.
I have concerns about forgetting to update the test suite repo refs and not noticing newly added tests are failing. I can imagine complex processes to address this, but what is the theory here? I guess the idea is to tradeoff possibly running with updated tests unintentionally to having outdated tests unintentionally.
I put in some comments, partly directed towards making the former workflow possible with minimal hassle. I found that worked well for me. Especially when I was working on js and python at the same time and explicitly wanted to use shared sibling test suites. That's a particular use case for sure, but a real one.
I suppose I'm willing to try this change if pressed. To me it seems either method is fine for any potential contributors. There is almost no input here, so I'm not sure others feel strongly at all.

davidlehn · 2024-02-06T03:38:14Z

@anatoly-scherbakov Could you also change the commit message with the fancy "+" and one with ellipses, arrow, and math symbol to use simple ASCII? Nicer to use chars people can easily type. Feel free to argue. :-)

anatoly-scherbakov · 2024-02-06T13:36:48Z

@davidlehn thanks again for the review!

I have applied a few requested changes.

Help us (me) out with a few more snippets of docs perhaps in the README and/or CONTRIBUTING. Maybe even more script/makefile targets to run.

Yes! I feel that less documentation and more automation is for the better :)

I have concerns about forgetting to update the test suite repo refs and not noticing newly added tests are failing.
Indeed, this is another side of the coin. Most dependency management tools (poetry, npm, ...) seem to be okay with that, staying on the safe side to make builds predictable.

Tools like Dependabot, which post notifications about out-of-date dependencies and even create PRs automatically, are often leveraged. I think I can easily imagine a rather simple scheduled GitHub actions workflow that will update the specs submodules every week, run the tests, and create a PR with that upgrade. Do you think we should create an issue for that?

I put in some comments, partly directed towards making the former workflow possible with minimal hassle. I found that worked well for me. Especially when I was working on js and python at the same time and explicitly wanted to use shared sibling test suites. That's a particular use case for sure, but a real one.

I see. The runtests.py script allows to specify the path to the specification. Say, you have a development version of a spec somewhere on your local disk; you can easily run the tests from that version using this command.

Could you also change the commit message with the fancy "+" and one with ellipses, arrow, and math symbol to use simple ASCII? Nicer to use chars people can easily type. Feel free to argue. :-)

No argument here: I just like those fancy Unicode characters, especially putting them into commit messages 🙂 , and have setup my Linux to easily type them, but this is purely a matter of taste. Removed ✅

Co-authored-by: Anatoly Scherbakov <altaisoft@gmail.com>

Co-authored-by: David I. Lehn <dil@lehn.org>

anatoly-scherbakov · 2024-02-06T18:30:20Z

I've rebased this branch on top of #185.

BigBlueHat · 2024-02-06T21:13:42Z

@BigBlueHat the previous address was https://github.com/json-ld/normalization/tests. Right now, https://github.com/json-ld/normalization redirects to https://github.com/w3c-ccg/rdf-dataset-canonicalization. Is that the link we should use?

This should be the correct location of the old tests suites:
https://github.com/w3c-ccg/rdf-dataset-canonicalization/tree/main/tests

So, still needs an updated URL in this (or another) PR. The rds-canon ones, though, introduce a whole new host of concerns.

Sorry I didn't see that earlier!

anatoly-scherbakov · 2024-02-06T21:43:24Z

Added rdf-dataset-canonicalization spec submodule,
Reconfigured the tests to use it and to ignore rdf-canon submodule,
Added make test command.

Should we replace rdf-canon references everywhere in the docs as well, given that we're going to get them back anyway later?

BigBlueHat · 2024-02-06T22:33:38Z

Added rdf-dataset-canonicalization spec submodule,

Reconfigured the tests to use it and to ignore rdf-canon submodule,

Added make test command.

Should we replace rdf-canon references everywhere in the docs as well, given that we're going to get them back anyway later?

Yeah...I think we need to add rdf-canon stuff as a separate PR. Otherwise, we're pulling all this code and content down, and then not using it at all...which would be very confusing.

Makefile

mielvds · 2025-11-06T14:59:20Z

I understand @davidlehn 's concerns; submodules can be a hassle. That said, the support for submodules has improved a lot over the years and, in this case, they solve the version alignment between repos. Technically, you can do this with git clone a specific commit and make, but it won't be as robust or transparent. I suggest we try it. It's no very destructive, so rolling back should be easy ifneedbe.

# Conflicts: # .github/workflows/main.yaml # CHANGELOG.md # README.rst

anatoly-scherbakov · 2025-11-09T13:29:20Z

I created a replacement PR in the main repo instead of my fork. Closing this one.

davidlehn added 3 commits November 3, 2023 18:24

Switch from travis-ci to GitHub workflow.

ca21290

Add flake8 lint workflow.

a8db2af

Support Python >= 3.8.

b9d25d6

anatoly-scherbakov marked this pull request as ready for review November 12, 2023 16:37

anatoly-scherbakov mentioned this pull request Nov 12, 2023

Switch from travis-ci to GitHub workflow. #185

Merged

BigBlueHat mentioned this pull request Nov 29, 2023

Handle keys dropped from JSON-LD document during expansion (closes #50) #186

Closed

BigBlueHat requested review from BigBlueHat and davidlehn January 31, 2024 18:50

BigBlueHat approved these changes Jan 31, 2024

View reviewed changes

Makefile Show resolved Hide resolved

tests/runtests.py Outdated Show resolved Hide resolved

This was referenced Jan 31, 2024

Switch from normalization to rdf-canon test suite #190

Open

change 2 if statements #189

Merged

anatoly-scherbakov mentioned this pull request Feb 3, 2024

Use Makefile to improve developer experience #191

Open

BigBlueHat added this to the v2.0.4 milestone Feb 5, 2024

BigBlueHat mentioned this pull request Feb 5, 2024

Clarify documentation on runing Earl reports #157

Closed

davidlehn reviewed Feb 6, 2024

View reviewed changes

anatoly-scherbakov force-pushed the 181-git-submodules branch 2 times, most recently from 5798f0b to 440a251 Compare February 6, 2024 13:36

BigBlueHat removed this from the v2.0.4 milestone Feb 6, 2024

anatoly-scherbakov changed the base branch from master to add-github-workflow February 6, 2024 17:56

Echsecutor and others added 9 commits February 6, 2024 22:28

fixing dead link in comment

9ee13f0

fixing another dead link

b7086a5

Co-authored-by: Anatoly Scherbakov <altaisoft@gmail.com>

Add submodules

342e7a2

Update .gitignore

5732daf

Use spec tests from git modules as default test targets

1fc8668

Note about Git submodules in README.rst

d984f26

Add make upgrade-submodules command

592168c

Upgrade submodules

e3de074

Remove trailing / at tests/runtests.py

c195dcd

Co-authored-by: David I. Lehn <dil@lehn.org>

anatoly-scherbakov force-pushed the 181-git-submodules branch from 440a251 to c195dcd Compare February 6, 2024 18:28

anatoly-scherbakov added 4 commits February 7, 2024 01:19

Add rdf-dataset-canonicalization specification

4a813dd

Upgrade specifications

05df321

Use previous version of RDF canonicalization spec

60654ae

Add make test command

d2f7920

Delete rdf-canon for now

0435f6a

anatoly-scherbakov requested review from BigBlueHat and davidlehn February 8, 2024 19:28

davidlehn force-pushed the add-github-workflow branch from b9d25d6 to 5dba5aa Compare October 29, 2025 20:51

mielvds reviewed Nov 6, 2025

View reviewed changes

Makefile Show resolved Hide resolved

davidlehn force-pushed the add-github-workflow branch from 01fe0f8 to f9772ea Compare November 7, 2025 15:14

Merge branch 'base-branch-temp' into 181-git-submodules

8b4e912

# Conflicts: # .github/workflows/main.yaml # CHANGELOG.md # README.rst

anatoly-scherbakov closed this Nov 9, 2025

mielvds mentioned this pull request Nov 12, 2025

Add git submodules for specs #209

Open

		@@ -0,0 +1,2 @@
		upgrade-submodules:
		git submodule update --remote --init --recursive

Implement git submodules for specs #182

Implement git submodules for specs #182

Uh oh!

Conversation

anatoly-scherbakov commented Oct 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anatoly-scherbakov commented Oct 18, 2023

Uh oh!

BigBlueHat commented Nov 2, 2023

Uh oh!

anatoly-scherbakov commented Nov 12, 2023

Uh oh!

BigBlueHat commented Dec 5, 2023

Uh oh!

anatoly-scherbakov commented Dec 6, 2023

Uh oh!

anatoly-scherbakov commented Dec 6, 2023

Uh oh!

BigBlueHat commented Jan 31, 2024

Uh oh!

BigBlueHat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

anatoly-scherbakov commented Feb 1, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mielvds Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidlehn commented Feb 6, 2024

Uh oh!

davidlehn commented Feb 6, 2024

Uh oh!

davidlehn commented Feb 6, 2024

Uh oh!

anatoly-scherbakov commented Feb 6, 2024

Uh oh!

anatoly-scherbakov commented Feb 6, 2024

Uh oh!

BigBlueHat commented Feb 6, 2024

Uh oh!

anatoly-scherbakov commented Feb 6, 2024

Uh oh!

BigBlueHat commented Feb 6, 2024

Uh oh!

Uh oh!

mielvds commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anatoly-scherbakov commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

Implement `git` submodules for specs #182

Implement `git` submodules for specs #182

anatoly-scherbakov commented Oct 18, 2023 •

edited

Loading

mielvds Nov 6, 2025 •

edited

Loading

mielvds commented Nov 6, 2025 •

edited

Loading