-
Notifications
You must be signed in to change notification settings - Fork 36
draft: feat(backend/sdoc_source_code): Support merge by MID #2549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7b45145 to
cdf5a9e
Compare
| # TODO: | ||
| # If we really want to support changing the auto-assigned MID, | ||
| # at least the graph database and the document search index need an update (remove old MID, add new MID). | ||
| # I currently struggle to update the search index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've cleaned up up a bit now. Here is my remaining problem. I try to allow and handle this case
# sdoc
[SRC_NODE]
UID: SREQ-1
# example.c
/*
* UID: SREQ-1
* MID: 12345678
*/
example_1() { }
meaning a MID from source code would overwrite the earlier auto-assigned MID. However, auto-assigned MID has been entered to graph and search index early (at least, maybe also to other places?) and I would need to update them. I know how to update the graph, but couldn't figure out how to consistently update the search index.
What do you think, would it be easy enough to do these updates, or should I simply not allow that edge case (exit with error)?
EDIT: For myself, I tend to not allow it. That edge case is not needed for the Linux showcase. I don't like the idea of having to modify already established graph connections while we're still in traceability construction phase. Rather, it should be possible to conceptually separate things into a "compile" and "link" phase as you already suggested. Source node parsing and merging would be part of the compile phase. At it's end we would know all nodes with some "I would like to link to..." information, but nothing is actually linked yet. And only the final link stage will add links to the graph DB and create the search index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think about your comment but something that is not clear to me even before and in general is how we want to auto-generate the UUID for both source code and sidecar when both or either of SDoc node's or source file's MID/UUID do not exist yet. Is my understanding correct that we will not have the human-readable UID at all in the Linux context?
# sdoc
[SRC_NODE]
* Has no MID or UID, or MID only but the source node may not have it initially?
# example.c
/*
* Has no MID or UID, or MID only but the SDoc document node may not have it initially?
*/
example_1() { }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On other words, it is some sort of a chicken-and-egg problem. How are we imaging the workflow of auto-generating MID/UUID between source code and sidecars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something that is not clear to me even before and in general is how we want to auto-generate the UUID for both source code and sidecar when both or either of SDoc node or source file do not exist yet
We can look at ELISA's trace_events.c annotations and their idgen.py script.
Source code starts of like
/*
* SPDX-Req-ID: [TODO: automatically generate it]
* ...
*/
Then one shall call idgen.py generate trace_events.c to calculate sha256sum("linux" + "trace_events.c" + instance + code), where instance is the text after SPDX-Text: in the comment, and code is the full C-function definition without comment.
The script acknowledges the problem you have mentioned
# TODO: since sidecar is not yet defined, this script doesn't consider the sidecar added content to the instance.
I see a few options:
- For initial uuid generation, only hash over content in source code but neglect the sdoc part (that's what idgen.py currently does). Copying the generated UUID to sdoc is a manual step. Only the second run will have the nodes merged.
- Start off with
SPDX-Req-ID: UUID-TICKET-123, andMID: UUID-TICKET-123in related sdoc. When StrictDoc sees such a preliminary UID, it will replace it with a proper calculated hash value - Start off with
SPDX-Req-ID: [TODO: automatically generate it], andMID: tracing.c/__ftrace_event_enable_disablein related sdoc. Let StrictDoc merge by conventional MID and replace conventional UID with proper calculated hash value - Use UID (manually assigned) + MID, and merge by UID.
I have no clear favorite right now from that options. Maybe we should ask Gabriele?
Is my understanding correct that we will not have the UID at all in the Linux context?
Yes, that's also my understanding. The pilot work nowhere mentions a UID. If we wanted one, it's up to us to propose it.
Until now, finding a static merge candidate for source nodes relied on having a UID field set at source code side. Now we can also use MIDs for this purpose. Merge by MID gets priority over merge merge by UID. This requires thinking about several edge cases. See test desriptions for some of them.
cdf5a9e to
97af3e1
Compare
Until now, finding a static merge candidate for source nodes relied on having a UID field set at source code side. Now we can also use MIDs for this purpose. Merge by MID gets priority over merge merge by UID.
This requires thinking about several edge cases. See test descriptions for some of them.