Skip to content

Conversation

@haxtibal
Copy link
Contributor

@haxtibal haxtibal commented Nov 7, 2025

Until now, finding a static merge candidate for source nodes relied on having a UID field set at source code side. Now we can also use MIDs for this purpose. Merge by MID gets priority over merge merge by UID.

This requires thinking about several edge cases. See test descriptions for some of them.

@haxtibal haxtibal force-pushed the tdmg/source_node_mid branch from 7b45145 to cdf5a9e Compare November 8, 2025 22:28
# TODO:
# If we really want to support changing the auto-assigned MID,
# at least the graph database and the document search index need an update (remove old MID, add new MID).
# I currently struggle to update the search index.
Copy link
Contributor Author

@haxtibal haxtibal Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've cleaned up up a bit now. Here is my remaining problem. I try to allow and handle this case

# sdoc
[SRC_NODE]
UID: SREQ-1

# example.c
/*
 * UID: SREQ-1
 * MID: 12345678
 */
example_1() { }

meaning a MID from source code would overwrite the earlier auto-assigned MID. However, auto-assigned MID has been entered to graph and search index early (at least, maybe also to other places?) and I would need to update them. I know how to update the graph, but couldn't figure out how to consistently update the search index.

What do you think, would it be easy enough to do these updates, or should I simply not allow that edge case (exit with error)?

EDIT: For myself, I tend to not allow it. That edge case is not needed for the Linux showcase. I don't like the idea of having to modify already established graph connections while we're still in traceability construction phase. Rather, it should be possible to conceptually separate things into a "compile" and "link" phase as you already suggested. Source node parsing and merging would be part of the compile phase. At it's end we would know all nodes with some "I would like to link to..." information, but nothing is actually linked yet. And only the final link stage will add links to the graph DB and create the search index.

Copy link
Collaborator

@stanislaw stanislaw Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think about your comment but something that is not clear to me even before and in general is how we want to auto-generate the UUID for both source code and sidecar when both or either of SDoc node's or source file's MID/UUID do not exist yet. Is my understanding correct that we will not have the human-readable UID at all in the Linux context?

# sdoc
[SRC_NODE]
 * Has no MID or UID, or MID only but the source node may not have it initially?

# example.c
/*
 * Has no MID or UID, or MID only but the SDoc document node may not have it initially?
 */
example_1() { }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On other words, it is some sort of a chicken-and-egg problem. How are we imaging the workflow of auto-generating MID/UUID between source code and sidecars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something that is not clear to me even before and in general is how we want to auto-generate the UUID for both source code and sidecar when both or either of SDoc node or source file do not exist yet

We can look at ELISA's trace_events.c annotations and their idgen.py script.

Source code starts of like

/*
 * SPDX-Req-ID: [TODO: automatically generate it]
 * ...
 */

Then one shall call idgen.py generate trace_events.c to calculate sha256sum("linux" + "trace_events.c" + instance + code), where instance is the text after SPDX-Text: in the comment, and code is the full C-function definition without comment.

The script acknowledges the problem you have mentioned

# TODO: since sidecar is not yet defined, this script doesn't consider the sidecar added content to the instance.

I see a few options:

  1. For initial uuid generation, only hash over content in source code but neglect the sdoc part (that's what idgen.py currently does). Copying the generated UUID to sdoc is a manual step. Only the second run will have the nodes merged.
  2. Start off with SPDX-Req-ID: UUID-TICKET-123, and MID: UUID-TICKET-123 in related sdoc. When StrictDoc sees such a preliminary UID, it will replace it with a proper calculated hash value
  3. Start off with SPDX-Req-ID: [TODO: automatically generate it], and MID: tracing.c/__ftrace_event_enable_disable in related sdoc. Let StrictDoc merge by conventional MID and replace conventional UID with proper calculated hash value
  4. Use UID (manually assigned) + MID, and merge by UID.

I have no clear favorite right now from that options. Maybe we should ask Gabriele?

Is my understanding correct that we will not have the UID at all in the Linux context?

Yes, that's also my understanding. The pilot work nowhere mentions a UID. If we wanted one, it's up to us to propose it.

Until now, finding a static merge candidate for source nodes relied on
having a UID field set at source code side. Now we can also use MIDs
for this purpose. Merge by MID gets priority over merge merge by UID.

This requires thinking about several edge cases. See test desriptions
for some of them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants