-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C#: Overlay extraction support. #20425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
C#: Overlay extraction support. #20425
Conversation
19c75fe to
41e57d9
Compare
41e57d9 to
beb6cd3
Compare
beb6cd3 to
cd1ba5f
Compare
csharp/extractor/Semmle.Extraction.CSharp.Standalone/Extractor.cs
Dismissed
Show dismissed
Hide dismissed
1155ee2 to
d158b1f
Compare
d158b1f to
871c6c9
Compare
csharp/extractor/Semmle.Extraction.CSharp/Extractor/OverlayInfo.cs
Dismissed
Show dismissed
Hide dismissed
cd0b31c to
b749636
Compare
6db88cf to
915770b
Compare
5fe0d0e to
6ea8338
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces overlay extraction support for C#, enabling incremental database creation. The strategy extracts only signature information (types, methods, parameters) for unchanged files while fully extracting changed files, reusing locations from the base database. Key changes include adding discard predicates, enabling overlay compilation, and modifying the extraction context to skip expression/statement extraction for unchanged files.
Reviewed Changes
Copilot reviewed 59 out of 59 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
csharp/ql/lib/semmlecode.csharp.dbscheme |
Adds databaseMetadata, overlayChangedFiles relations and @locatable type for overlay support |
csharp/ql/lib/semmle/code/csharp/Overlay.qll |
Implements entity discard predicates for overlay analysis |
csharp/ql/lib/semmle/code/csharp/Conversion.qll |
Refactors type constraint logic by extracting typeConstraintToBaseType predicate |
csharp/ql/lib/qlpack.yml |
Enables overlay compilation with compileForOverlayEval: true |
csharp/ql/lib/csharp.qll |
Imports Overlay module |
csharp/extractor/Semmle.Util/EnvironmentVariables.cs |
Adds methods to detect overlay mode and read overlay changes file path |
csharp/extractor/Semmle.Extraction.CSharp/Extractor/OverlayInfo.cs |
Implements overlay information tracking and changed file detection |
csharp/extractor/Semmle.Extraction.CSharp/Extractor/Context.cs |
Adds OnlyScaffold flag to control skeleton extraction for unchanged files |
| Multiple entity extractors | Conditionally skip location extraction when OnlyScaffold is true |
csharp/codeql-extractor.yml |
Declares overlay support version |
| Test files | Adds overlay test infrastructure and expected results |
| Upgrade/downgrade files | Provides database schema migration support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ns and some named TRAP entities.
…s when not scaffolding and only extract attributes when they are in source code in overlay mode.
|
Based on your DCA results (I haven't reviewed the code), it sounds to me like this is good enough to staff-ship 🎉 but that more follow-up work is needed.
That should be fine. With or without incrementality, the PR view doesn't show alerts in generated files because they're not in the changed-files list in the UI.
That's fine for now, but do you see a path forward to only extracting skeletons of the (transitive) dependencies of changed files? That's what we do for Java. The ideal is that the extraction process should take time proportional to the number of changed files (plus a minimal set of dependencies).
This is comparable to what we've seen for other languages. Analysis time should come down when
In summary, these numbers look good to me.
That's a small reduction compared to what we've seen in other languages. I'm guessing it's because the C# extractor does a lot of work setting up the build and the dependencies before it gets to the actual code. I hope github/codeql-action#3117 will help here, or at least it will become clearer what work remains. If there's any computation you want to cache between the full analysis and the PR, it can be stored in Extracting only the necessary dependencies as skeleton files should also help. |
|
@jbj : Thank you very much for the feedback! I really appreciate it!
Yes, exactly. We could consider always extracting generated files, if we can't get a mapping between original <-> generated files. As you mention, this is less urgent.
Really good question!
Yes, I think downloading dependencies is a large portion of the overall time. Do you know, whether it will be possible to measure the benefit of cached dependencies using DCA? |
I know that |
hvitved
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, great work 💪
| display_name: "C#" | ||
| version: 1.22.1 | ||
| column_kind: "utf16" | ||
| overlay_support_version: 20250626 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOI, where does this date come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here.
| /// ] | ||
| /// } | ||
| /// </summary> | ||
| public record ChangedFiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it actually be made private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can 😄
| { | ||
| trapFile.commentblock(this); | ||
| WriteLocationToTrap(trapFile.commentblock_location, this, Context.CreateLocation(Symbol.Location)); | ||
| if (!Context.OnlyScaffold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it that we not simply return when Context.OnlyScaffold holds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the "named" TRAP entities, I have just tried to avoid re-extracting the source locations. Will try and make the change. In this case the child information is extracted below. I will restructure a bit to streamline implementation.
| location = Context.CreateLocation(Location); | ||
| trapFile.commentline(this, Type == CommentLineType.MultilineContinuation ? CommentLineType.Multiline : Type, Text, RawText); | ||
| WriteLocationToTrap(trapFile.commentline_location, this, location); | ||
| if (!Context.OnlyScaffold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
| if (info.Symbol is INamespaceSymbol namespaceSymbol) | ||
| { | ||
| var ns = Namespace.Create(Context, namespaceSymbol); | ||
| trapFile.using_namespace_directives(this, ns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we extracting this in scaffold mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, yes, they are indeed * ID elements, so we should probably avoid extracting them altogether when scaffolding. Will try and make that change.
| */ | ||
| overlay[local] | ||
| private class StarEntity = | ||
| @expr or @stmt or @diagnostic or @extractor_message or @using_directive or @type_mention; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@local_variable uses * as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and, of course, @location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only source locations are * entities. Furthermore, a location is not considered a @locatable. There is a separate discard predicate for handling locations.
| private predicate discardStarEntity(@locatable e) { | ||
| e instanceof StarEntity and | ||
| // Entities with *-ids can exist either in base or overlay, but not both. | ||
| exists(DiscardableEntity de | de = e | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e = any(DiscardableEntity de | is slightly less verbose.
| private predicate discardNamedEntity(@locatable e) { | ||
| not e instanceof StarEntity and | ||
| // Entities with named IDs can exist both in base, overlay, or both. | ||
| exists(DiscardableEntity de | de = e | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
| } | ||
|
|
||
| overlay[local] | ||
| private class PossibleDiscardableEntity extends DiscardableEntity instanceof @locatable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instanceof @locatable is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can put all the logic directly in the DiscardableEntity class instead.
| overlay[local] | ||
| abstract private class DiscardableEntity extends @locatable { | ||
| /** Gets the path to the file in which this element occurs. */ | ||
| abstract string getPath(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getFilePath would be better, I think.
1ca50aa to
9d300e3
Compare
In this PR we
The strategy for doing overlay extraction in C# is as follows.
The CLI supplies a JSON file with "changed" files. This information is used in the extraction context in C#.
There are three DCA experiments
nightly/nightlywithout overlay (internally available here). This experiment shows no change in performance (but an increase in DIL sizes) and a small alert discrepancy which is likely due to wobliness. This experiment is to validate that enabling overlay compilation in the C# QL library don't have any negative side effects.code-scanningsecurity queries on full extraction compared to overlay extraction. According to DCA we get 97.4% accuracy (which is above the 90% min target). It looks like the missing results all are located in generated files. At the moment we only extract "changed" files fully in overlay mode, which could explain the missing results as the files are generated on the fly by the extractor (by running source generators). That is, we are only doing a skeleton extraction of the generated files as they are considered "unchanged".There is still plenty that could be done to improve the quality of the overlay database and query execution. Known possible improvements
overlayannotations.