Skip to content

Commit 218c2a5

Browse files
authored
Merge pull request #14751 from owen-mc/go/feature/use-use-flow
Go: Switch from def-use flow to use-use flow
2 parents cbe34f1 + f35d28d commit 218c2a5

File tree

119 files changed

+2316
-1698
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

119 files changed

+2316
-1698
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
category: breaking
3+
---
4+
* The member predicate `writesField` on `DataFlow::Write` now uses the post-update node for `base` when that is the node being updated, which is in all cases except initializing a struct literal. A new member predicate `writesFieldPreUpdate` has been added for cases where this behaviour is not desired.
5+
* The member predicate `writesElement` on `DataFlow::Write` now uses the post-update node for `base` when that is the node being updated, which is in all cases except initializing an array/slice/map literal. A new member predicate `writesElementPreUpdate` has been added for cases where this behaviour is not desired.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: majorAnalysis
3+
---
4+
* The shape of the Go data-flow graph has changed. Previously for code like `x := def(); use1(x); use2(x)`, there would be edges from the definition of `x` to each use. Now there is an edge from the definition to the first use, then another from the first use to the second, and so on. This means that data-flow barriers work differently - flow will not reach any uses after the barrier node. Where this is not desired it may be be necessary to add an additional flow step to propagate the flow forward. Additionally, when a variable may be subject to a side-effect, such as updating an array, passing a pointer to a function that might write through it or writing to a field of a struct, there is now a dedicated post-update node representing the variable after this side-effect has taken place. Previously post-update nodes were aliases for either a variable's definition, or were equal to the pre-update node. This led to backwards steps in the data-flow graph, which could cause false positives. For example, in the previous code there would be an edge from `x` in `use2(x)` back to the definition of `x`. If we define our sources as any argument of `use2` and our sinks as any argument of `use1` then this would lead to a false positive path. Now there are distinct post-update nodes and no backwards edge to the definition, so we will not find this false positive path.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: minorAnalysis
3+
---
4+
* For the query `go/unvalidated-url-redirection`, when untrusted data is assigned to the `Host` field of a `url.URL` struct, we consider the whole struct untrusted. We now also include the case when this happens during struct initialization, for example `&url.URL{Host: untrustedData}`.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: deprecated
3+
---
4+
* The member predicate `writesComponent` on `DataFlow::Write` has been deprecated. Instead, use `writesFieldPreUpdate` and `writesElementPreUpdate`, or their new versions `writesField` and `writesElement`.

go/ql/lib/semmle/go/controlflow/ControlFlowGraph.qll

Lines changed: 82 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,8 @@ module ControlFlow {
118118
/** Gets the left-hand side of this write. */
119119
IR::WriteTarget getLhs() { result = super.getLhs() }
120120

121+
private predicate isInitialization() { super.isInitialization() }
122+
121123
/** Gets the right-hand side of this write. */
122124
DataFlow::Node getRhs() { super.getRhs() = result.asInstruction() }
123125

@@ -132,49 +134,112 @@ module ControlFlow {
132134

133135
/**
134136
* Holds if this node sets the value of field `f` on `base` (or its implicit dereference) to
135-
* `rhs`.
137+
* `rhs`, where `base` represents the post-update value.
138+
*
139+
* For example, for the assignment `x.width = newWidth`, `base` is the post-update node of
140+
* either the data-flow node corresponding to `x` or (if `x` is a pointer) the data-flow node
141+
* corresponding to the implicit dereference `*x`, `f` is the field referenced by `width`, and
142+
* `rhs` is the data-flow node corresponding to `newWidth`. If this `WriteNode` is a struct
143+
* initialization then there is no post-update node and `base` is the struct literal being
144+
* initialized.
145+
*/
146+
predicate writesField(DataFlow::Node base, Field f, DataFlow::Node rhs) {
147+
exists(DataFlow::Node b | this.writesFieldPreUpdate(b, f, rhs) |
148+
this.isInitialization() and base = b
149+
or
150+
not this.isInitialization() and
151+
b = base.(DataFlow::PostUpdateNode).getPreUpdateNode()
152+
)
153+
}
154+
155+
/**
156+
* Holds if this node sets the value of field `f` on `base` (or its implicit dereference) to
157+
* `rhs`, where `base` represents the pre-update value.
136158
*
137159
* For example, for the assignment `x.width = newWidth`, `base` is either the data-flow node
138160
* corresponding to `x` or (if `x` is a pointer) the data-flow node corresponding to the
139-
* implicit dereference `*x`, `f` is the field referenced by `width`, and `rhs` is the data-flow
140-
* node corresponding to `newWidth`.
161+
* implicit dereference `*x`, `f` is the field referenced by `width`, and `rhs` is the
162+
* data-flow node corresponding to `newWidth`.
141163
*/
142-
predicate writesField(DataFlow::Node base, Field f, DataFlow::Node rhs) {
164+
predicate writesFieldPreUpdate(DataFlow::Node base, Field f, DataFlow::Node rhs) {
165+
this.writesFieldInsn(base.asInstruction(), f, rhs.asInstruction())
166+
}
167+
168+
private predicate writesFieldInsn(IR::Instruction base, Field f, IR::Instruction rhs) {
143169
exists(IR::FieldTarget trg | trg = super.getLhs() |
144170
(
145-
trg.getBase() = base.asInstruction() or
146-
trg.getBase() = MkImplicitDeref(base.asExpr())
171+
trg.getBase() = base or
172+
trg.getBase() = MkImplicitDeref(base.(IR::EvalInstruction).getExpr())
147173
) and
148174
trg.getField() = f and
149-
super.getRhs() = rhs.asInstruction()
175+
super.getRhs() = rhs
150176
)
151177
}
152178

153179
/**
154180
* Holds if this node sets the value of element `index` on `base` (or its implicit dereference)
155181
* to `rhs`.
156182
*
157-
* For example, for the assignment `xs[i] = v`, `base` is either the data-flow node
158-
* corresponding to `xs` or (if `xs` is a pointer) the data-flow node corresponding to the
159-
* implicit dereference `*xs`, `index` is the data-flow node corresponding to `i`, and `rhs`
160-
* is the data-flow node corresponding to `base`.
183+
* For example, for the assignment `xs[i] = v`, `base` is the post-update node of the data-flow
184+
* node corresponding to `xs` or (if `xs` is a pointer) the implicit dereference `*xs`, `index`
185+
* is the data-flow node corresponding to `i`, and `rhs` is the data-flow node corresponding to
186+
* `base`. If this `WriteNode` corresponds to the initialization of an array/slice/map then
187+
* there is no need for a post-update node and `base` is the array/slice/map literal being
188+
* initialized.
161189
*/
162190
predicate writesElement(DataFlow::Node base, DataFlow::Node index, DataFlow::Node rhs) {
191+
exists(DataFlow::Node b | this.writesElementPreUpdate(b, index, rhs) |
192+
this.isInitialization() and base = b
193+
or
194+
not this.isInitialization() and
195+
b = base.(DataFlow::PostUpdateNode).getPreUpdateNode()
196+
)
197+
}
198+
199+
/**
200+
* Holds if this node sets the value of element `index` on `base` (or its implicit dereference)
201+
* to `rhs`.
202+
*
203+
* For example, for the assignment `xs[i] = v`, `base` is the post-update node of the data-flow
204+
* node corresponding to `xs` or (if `xs` is a pointer) the implicit dereference `*xs`, `index`
205+
* is the data-flow node corresponding to `i`, and `rhs` is the data-flow node corresponding to
206+
* `base`. If this `WriteNode` corresponds to the initialization of an array/slice/map then
207+
* there is no need for a post-update node and `base` is the array/slice/map literal being
208+
* initialized.
209+
*/
210+
predicate writesElementPreUpdate(DataFlow::Node base, DataFlow::Node index, DataFlow::Node rhs) {
211+
this.writesElementInsn(base.asInstruction(), index.asInstruction(), rhs.asInstruction())
212+
}
213+
214+
private predicate writesElementInsn(
215+
IR::Instruction base, IR::Instruction index, IR::Instruction rhs
216+
) {
163217
exists(IR::ElementTarget trg | trg = super.getLhs() |
164218
(
165-
trg.getBase() = base.asInstruction() or
166-
trg.getBase() = MkImplicitDeref(base.asExpr())
219+
trg.getBase() = base or
220+
trg.getBase() = MkImplicitDeref(base.(IR::EvalInstruction).getExpr())
167221
) and
168-
trg.getIndex() = index.asInstruction() and
169-
super.getRhs() = rhs.asInstruction()
222+
trg.getIndex() = index and
223+
super.getRhs() = rhs
170224
)
171225
}
172226

227+
/**
228+
* DEPRECATED: Use the disjunct of `writesElement` and `writesField`, or `writesFieldPreUpdate`
229+
* and `writesElementPreUpdate`, instead.
230+
*
231+
* Holds if this node sets any field or element of `base` (or its implicit dereference) to
232+
* `rhs`, where `base` represents the pre-update value.
233+
*/
234+
deprecated predicate writesComponent(DataFlow::Node base, DataFlow::Node rhs) {
235+
this.writesElementPreUpdate(base, _, rhs) or this.writesFieldPreUpdate(base, _, rhs)
236+
}
237+
173238
/**
174239
* Holds if this node sets any field or element of `base` to `rhs`.
175240
*/
176-
predicate writesComponent(DataFlow::Node base, DataFlow::Node rhs) {
177-
this.writesElement(base, _, rhs) or this.writesField(base, _, rhs)
241+
predicate writesComponentInstruction(IR::Instruction base, IR::Instruction rhs) {
242+
this.writesElementInsn(base, _, rhs) or this.writesFieldInsn(base, _, rhs)
178243
}
179244
}
180245

go/ql/lib/semmle/go/controlflow/IR.qll

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -430,18 +430,25 @@ module IR {
430430
*/
431431
class WriteInstruction extends Instruction {
432432
WriteTarget lhs;
433+
Boolean initialization;
433434

434435
WriteInstruction() {
435-
lhs = MkLhs(this, _)
436-
or
437-
lhs = MkLiteralElementTarget(this)
436+
(
437+
lhs = MkLhs(this, _)
438+
or
439+
lhs = MkResultWriteTarget(this)
440+
) and
441+
initialization = false
438442
or
439-
lhs = MkResultWriteTarget(this)
443+
lhs = MkLiteralElementTarget(this) and initialization = true
440444
}
441445

442446
/** Gets the target to which this instruction writes. */
443447
WriteTarget getLhs() { result = lhs }
444448

449+
/** Holds if this instruction initializes a literal. */
450+
predicate isInitialization() { initialization = true }
451+
445452
/** Gets the instruction computing the value this instruction writes. */
446453
Instruction getRhs() { none() }
447454

go/ql/lib/semmle/go/dataflow/SSA.qll

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,13 @@ class SsaDefinition extends TSsaDefinition {
166166
) {
167167
this.getLocation().hasLocationInfo(filepath, startline, startcolumn, endline, endcolumn)
168168
}
169+
170+
/**
171+
* Gets the first instruction that the value of this `SsaDefinition` can
172+
* reach without passing through any other instructions, but possibly through
173+
* phi nodes.
174+
*/
175+
IR::Instruction getAFirstUse() { firstUse(this, result) }
169176
}
170177

171178
/**
@@ -410,3 +417,12 @@ DataFlow::Node getASimilarReadNode(DataFlow::Node node) {
410417
result = readFields.similar().getAUse()
411418
)
412419
}
420+
421+
/**
422+
* Gets an instruction such that `pred` and `result` form an adjacent
423+
* use-use-pair of the same`SsaSourceVariable`, that is, the value read in
424+
* `pred` can reach `result` without passing through any other use or any SSA
425+
* definition of the variable except for phi nodes and uncertain implicit
426+
* updates.
427+
*/
428+
IR::Instruction getAnAdjacentUse(IR::Instruction pred) { adjacentUseUse(pred, result) }

go/ql/lib/semmle/go/dataflow/SsaImpl.qll

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -199,6 +199,8 @@ private module Internal {
199199
/**
200200
* Holds if the `i`th node of `bb` is a use or an SSA definition of variable `v`, with
201201
* `k` indicating whether it is the former or the latter.
202+
*
203+
* Note this includes phi nodes, whereas `ref` above only includes explicit writes and captures.
202204
*/
203205
private predicate ssaRef(ReachableBasicBlock bb, int i, SsaSourceVariable v, RefKind k) {
204206
useAt(bb, i, v) and k = ReadRef()
@@ -290,6 +292,172 @@ private module Internal {
290292
or
291293
rewindReads(bb, i, v) = 1 and result = getDefReachingEndOf(bb.getImmediateDominator(), v)
292294
}
295+
296+
private module AdjacentUsesImpl {
297+
/** Holds if `v` is defined or used in `b`. */
298+
private predicate varOccursInBlock(SsaSourceVariable v, ReachableBasicBlock b) {
299+
ssaRef(b, _, v, _)
300+
}
301+
302+
/** Holds if `v` occurs in `b` or one of `b`'s transitive successors. */
303+
private predicate blockPrecedesVar(SsaSourceVariable v, ReachableBasicBlock b) {
304+
varOccursInBlock(v, b)
305+
or
306+
exists(getDefReachingEndOf(b, v))
307+
}
308+
309+
/**
310+
* Holds if `v` occurs in `b1` and `b2` is one of `b1`'s successors.
311+
*
312+
* Factored out of `varBlockReaches` to force join order compared to the larger
313+
* set `blockPrecedesVar(v, b2)`.
314+
*/
315+
pragma[noinline]
316+
private predicate varBlockReachesBaseCand(
317+
SsaSourceVariable v, ReachableBasicBlock b1, ReachableBasicBlock b2
318+
) {
319+
varOccursInBlock(v, b1) and
320+
b2 = b1.getASuccessor()
321+
}
322+
323+
/**
324+
* Holds if `b2` is a transitive successor of `b1` and `v` occurs in `b1` and
325+
* in `b2` or one of its transitive successors but not in any block on the path
326+
* between `b1` and `b2`. Unlike `varBlockReaches` this may include blocks `b2`
327+
* where `v` is dead.
328+
*
329+
* Factored out of `varBlockReaches` to force join order compared to the larger
330+
* set `blockPrecedesVar(v, b2)`.
331+
*/
332+
pragma[noinline]
333+
private predicate varBlockReachesRecCand(
334+
SsaSourceVariable v, ReachableBasicBlock b1, ReachableBasicBlock mid, ReachableBasicBlock b2
335+
) {
336+
varBlockReaches(v, b1, mid) and
337+
not varOccursInBlock(v, mid) and
338+
b2 = mid.getASuccessor()
339+
}
340+
341+
/**
342+
* Holds if `b2` is a transitive successor of `b1` and `v` occurs in `b1` and
343+
* in `b2` or one of its transitive successors but not in any block on the path
344+
* between `b1` and `b2`.
345+
*/
346+
private predicate varBlockReaches(
347+
SsaSourceVariable v, ReachableBasicBlock b1, ReachableBasicBlock b2
348+
) {
349+
varBlockReachesBaseCand(v, b1, b2) and
350+
blockPrecedesVar(v, b2)
351+
or
352+
varBlockReachesRecCand(v, b1, _, b2) and
353+
blockPrecedesVar(v, b2)
354+
}
355+
356+
/**
357+
* Holds if `b2` is a transitive successor of `b1` and `v` occurs in `b1` and
358+
* `b2` but not in any block on the path between `b1` and `b2`.
359+
*/
360+
private predicate varBlockStep(
361+
SsaSourceVariable v, ReachableBasicBlock b1, ReachableBasicBlock b2
362+
) {
363+
varBlockReaches(v, b1, b2) and
364+
varOccursInBlock(v, b2)
365+
}
366+
367+
/**
368+
* Gets the maximum rank among all SSA references to `v` in basic block `bb`.
369+
*/
370+
private int maxSsaRefRank(ReachableBasicBlock bb, SsaSourceVariable v) {
371+
result = max(ssaRefRank(bb, _, v, _))
372+
}
373+
374+
/**
375+
* Holds if `v` occurs at index `i1` in `b1` and at index `i2` in `b2` and
376+
* there is a path between them without any occurrence of `v`.
377+
*/
378+
pragma[nomagic]
379+
predicate adjacentVarRefs(
380+
SsaSourceVariable v, ReachableBasicBlock b1, int i1, ReachableBasicBlock b2, int i2
381+
) {
382+
exists(int rankix |
383+
b1 = b2 and
384+
ssaRefRank(b1, i1, v, _) = rankix and
385+
ssaRefRank(b2, i2, v, _) = rankix + 1
386+
)
387+
or
388+
maxSsaRefRank(b1, v) = ssaRefRank(b1, i1, v, _) and
389+
varBlockStep(v, b1, b2) and
390+
ssaRefRank(b2, i2, v, _) = 1
391+
}
392+
393+
predicate variableUse(SsaSourceVariable v, IR::Instruction use, ReachableBasicBlock bb, int i) {
394+
bb.getNode(i) = use and
395+
exists(SsaVariable sv |
396+
sv.getSourceVariable() = v and
397+
use = sv.getAUse()
398+
)
399+
}
400+
}
401+
402+
private import AdjacentUsesImpl
403+
404+
/**
405+
* Holds if the value defined at `def` can reach `use` without passing through
406+
* any other uses, but possibly through phi nodes.
407+
*/
408+
cached
409+
predicate firstUse(SsaDefinition def, IR::Instruction use) {
410+
exists(SsaSourceVariable v, ReachableBasicBlock b1, int i1, ReachableBasicBlock b2, int i2 |
411+
adjacentVarRefs(v, b1, i1, b2, i2) and
412+
def.definesAt(b1, i1, v) and
413+
variableUse(v, use, b2, i2)
414+
)
415+
or
416+
exists(
417+
SsaSourceVariable v, SsaPhiNode redef, ReachableBasicBlock b1, int i1, ReachableBasicBlock b2,
418+
int i2
419+
|
420+
adjacentVarRefs(v, b1, i1, b2, i2) and
421+
def.definesAt(b1, i1, v) and
422+
redef.definesAt(b2, i2, v) and
423+
firstUse(redef, use)
424+
)
425+
}
426+
427+
/**
428+
* Holds if `use1` and `use2` form an adjacent use-use-pair of the same SSA
429+
* variable, that is, the value read in `use1` can reach `use2` without passing
430+
* through any other use or any SSA definition of the variable.
431+
*/
432+
cached
433+
predicate adjacentUseUseSameVar(IR::Instruction use1, IR::Instruction use2) {
434+
exists(SsaSourceVariable v, ReachableBasicBlock b1, int i1, ReachableBasicBlock b2, int i2 |
435+
adjacentVarRefs(v, b1, i1, b2, i2) and
436+
variableUse(v, use1, b1, i1) and
437+
variableUse(v, use2, b2, i2)
438+
)
439+
}
440+
441+
/**
442+
* Holds if `use1` and `use2` form an adjacent use-use-pair of the same
443+
* `SsaSourceVariable`, that is, the value read in `use1` can reach `use2`
444+
* without passing through any other use or any SSA definition of the variable
445+
* except for phi nodes and uncertain implicit updates.
446+
*/
447+
cached
448+
predicate adjacentUseUse(IR::Instruction use1, IR::Instruction use2) {
449+
adjacentUseUseSameVar(use1, use2)
450+
or
451+
exists(
452+
SsaSourceVariable v, SsaPhiNode def, ReachableBasicBlock b1, int i1, ReachableBasicBlock b2,
453+
int i2
454+
|
455+
adjacentVarRefs(v, b1, i1, b2, i2) and
456+
variableUse(v, use1, b1, i1) and
457+
def.definesAt(b2, i2, v) and
458+
firstUse(def, use2)
459+
)
460+
}
293461
}
294462

295463
import Internal

0 commit comments

Comments
 (0)