Skip to content

Commit c0c45cf

Browse files
authored
Merge pull request #126 from lqd/origin_of_symmetry
book: document input relations, naive rules, location-insensitive rules
2 parents ab8eb57 + a2f6c52 commit c0c45cf

File tree

8 files changed

+423
-4
lines changed

8 files changed

+423
-4
lines changed

book/src/SUMMARY.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@
66
- [Generating inputs from rustc](./generate_inputs.md)
77
- [Rules](./rules.md)
88
- [Atoms](./rules/atoms.md)
9-
- [Initialization](./rules/initialization.md)
9+
- [Relations](./rules/relations.md)
10+
- [Initialization analysis](./rules/initialization.md)
11+
- [Liveness analysis](./rules/liveness.md)
12+
- [Loan analysis](./rules/loans.md)
1013
- [Testing Polonius](./testing.md)
1114
- [See also](./see_also.md)

book/src/rules.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,9 @@
22

33
These chapters document and explain the Polonius rules, primarily in
44
datalog form.
5+
6+
First, we'll describe the [atoms](./rules/atoms.md), and the [relations](./rules/relations.md) they are stored in. Then, we'll look at the polonius computation in more detail. It's a pipeline consisting of multiple steps and analyses:
7+
8+
- [Initialization analysis](./rules/initialization.md) will compute move and initialization errors, as well as the initialization and uninitialization data used by the next step.
9+
- [Liveness analysis](./rules/liveness.md) will compute which origins are live at which points in the control flow graph, used by the next step.
10+
- [Loan analysis](./rules/loans.md) (the core of "borrow checking") will compute illegal access errors, and illegal subset relationships errors. This is currently done with different variants (with different datalog rules) which will be described in that section.

book/src/rules/atoms.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# Atoms
2+
13
Polonius defines the following **atoms**. To Polonius, these are
24
opaque identifiers that identify particular things within the input
35
program (literally they are newtype'd integers). Their meaning and

book/src/rules/initialization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
# Initialization
1+
# Initialization analysis
22

33
**These rules are not yet fully merged.**

book/src/rules/liveness.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Liveness analysis
2+
3+
**These rules are not yet described.**

book/src/rules/loans.md

Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
# Loan analysis
2+
3+
Loan analysis is the heart of the borrow checker, and will compute:
4+
- illegal access errors: an action on a loan, that is illegal to perform
5+
- illegal subset relations errors: missing relationships between placeholder origins
6+
7+
This is done in multiple variants, whose goals are different: performance, readability, tests and validation.
8+
9+
Broadly speaking, the goals of the analysis are 1) to track loans:
10+
- from the point and origin in which they are issued, to the points where they are invalidated
11+
- flowing from origin to origin at a single point, via their `subset` relationships
12+
- flowing from point to point in the CFG, according to the origins' liveness (stopping at points where a loan is killed)
13+
14+
And 2) to track undeclared relationships between placeholder origins.
15+
16+
Any live loan which is invalidated will be an illegal access error, any placeholder which flows into another placeholder unexpectedly will be an illegal subset relation error.
17+
18+
### Inputs
19+
20+
The input relations will be described below, but the [dedicated page](./relations.md) will have more information about them.
21+
22+
```prolog
23+
// Indicates that the `loan` was "issued" at the given `point`, creating a
24+
// reference with the `origin`. Effectively, `origin` may refer to data from
25+
// `loan` starting at `point` (this is usually the point *after* a borrow rvalue).
26+
.decl loan_issued_at(Origin:origin, Loan:loan, Point:point)
27+
.input loan_issued_at
28+
29+
// When some prefix of the path borrowed at `loan` is assigned at `point`.
30+
// Indicates that the path borrowed by the `loan` has changed in some way that the
31+
// loan no longer needs to be tracked. (In particular, mutations to the path that
32+
// was borrowed no longer invalidate the loan)
33+
.decl loan_killed_at(Loan:loan, Point:point)
34+
.input loan_killed_at
35+
36+
// Indicates that the `loan` is invalidated by some action
37+
// taking place at `point`; if any origin that references this loan is live,
38+
// this is an error.
39+
.decl loan_invalidated_at(Loan:loan, Point:point)
40+
.input loan_invalidated_at
41+
42+
// When we require `origin1@point: origin2@point`.
43+
// Indicates that `origin1 <= origin2` -- i.e., the set of loans in `origin1`
44+
// are a subset of those in `origin2`.
45+
.decl subset_base(Origin1:origin, Origin2:origin, Point:point)
46+
.input subset_base
47+
48+
// Describes a placeholder `origin`, with its associated placeholder `loan`.
49+
.decl placeholder(Origin:origin, Loan:loan)
50+
.input placeholder
51+
52+
// These reflect the `'a: 'b` relations that are either declared by the user on
53+
// function declarations or which are inferred via implied bounds.
54+
// For example: `fn foo<'a, 'b: 'a, 'c>(x: &'c &'a u32)` would have two entries:
55+
// - one for the user-supplied subset `'b: 'a`
56+
// - and one for the `'a: 'c` implied bound from the `x` parameter,
57+
// (note that the transitive relation `'b: 'c` is not necessarily included
58+
// explicitly, but rather inferred by polonius).
59+
.decl known_placeholder_subset(Origin1:origin, Origin2:origin)
60+
.input known_placeholder_subset
61+
```
62+
63+
The datalog rules below are considered the "naive" implementation, as it computes the whole transitive closure of the subset relation, but are easy to describe and explain. They are implemented using the datafrog engine in the [Naive variant](https://github.com/rust-lang/polonius/blob/master/polonius-engine/src/output/naive.rs).
64+
65+
Some trivial differences exist with the implementation:
66+
- the use of the `;` alternative operator in the rules
67+
- some API limitations about joins in the implementation, sometimes requiring intermediate steps per join (and these can sometimes be shared between different rules)
68+
69+
### Subsets between origins
70+
71+
The rules below compute the complete graph of subsets between origins: starting from the non-transitive subsets, we close over this relation at a given point in the CFG (regardless of liveness). Liveness is then taken into account to propagate these transitive subsets along the CFG: if an origin flows into another at a given point, and they both are live at the successor points (reminder: placeholder origins are considered live at all points), the relationship is propagated to the successor points.
72+
73+
```prolog
74+
.decl subset(Origin1:origin, Origin2:origin, Point:point)
75+
76+
// R1: the initial subsets are the non-transitive `subset_base` static input
77+
subset(Origin1, Origin2, Point) :-
78+
subset_base(Origin1, Origin2, Point).
79+
80+
// R2: compute the subset transitive closure, at a given point
81+
subset(Origin1, Origin3, Point) :-
82+
subset(Origin1, Origin2, Point),
83+
subset(Origin2, Origin3, Point).
84+
85+
// R3: propagate subsets along the CFG, according to liveness
86+
subset(Origin1, Origin2, TargetPoint) :-
87+
subset(Origin1, Origin2, SourcePoint),
88+
cfg_edge(SourcePoint, TargetPoint),
89+
(origin_live_on_entry(Origin1, TargetPoint); placeholder(Origin1, _)),
90+
(origin_live_on_entry(Origin2, TargetPoint); placeholder(Origin2, _)).
91+
```
92+
93+
### The origins contain loans
94+
95+
The rules below compute what loans are contained in which origins, at given points of the CFG: starting from the "issuing point and origin", a loan is propagated via the subsets computed above, at a given point in the CFG. Liveness is then taken into account to propagate these loans along the CFG: if a loan is contained in an origin at a given point, and that the origin is live at the successor points, the loan is propagated to the successor points. A subtlety here is that there are points in the CFG where a loan can be killed, and that will stop propagation. Rule 6 uses both liveness and kill points to decide whether the loan should be propagated further in the CFG.
96+
97+
```prolog
98+
.decl origin_contains_loan_on_entry(Origin:origin, Loan:loan, Point:point)
99+
100+
// R4: the issuing origins are the ones initially containing loans
101+
origin_contains_loan_on_entry(Origin, Loan, Point) :-
102+
loan_issued_at(Origin, Loan, Point).
103+
104+
// R5: propagate loans within origins, at a given point, according to subsets
105+
origin_contains_loan_on_entry(Origin2, Loan, Point) :-
106+
origin_contains_loan_on_entry(Origin1, Loan, Point),
107+
subset(Origin1, Origin2, Point).
108+
109+
// R6: propagate loans along the CFG, according to liveness
110+
origin_contains_loan_on_entry(Origin, Loan, TargetPoint) :-
111+
origin_contains_loan_on_entry(Origin, Loan, SourcePoint),
112+
!loan_killed_at(Loan, SourcePoint),
113+
cfg_edge(SourcePoint, TargetPoint),
114+
(origin_live_on_entry(Origin, TargetPoint); placeholder(Origin, _)).
115+
```
116+
117+
### Loan liveness, and illegal access errors
118+
119+
With the information computed above, we can compute illegal accesses errors. It is an error to invalidate a loan that is live at a given point. A loan is live at a point if it is contained in an origin that is live at that point.
120+
121+
```prolog
122+
.decl loan_live_at(Loan:loan, Point:point)
123+
124+
// R7: compute whether a loan is live at a given point, i.e. whether it is
125+
// contained in a live origin at this point
126+
loan_live_at(Loan, Point) :-
127+
origin_contains_loan_on_entry(Origin, Loan, Point),
128+
(origin_live_on_entry(Origin, Point); placeholder(Origin, _)).
129+
130+
.decl errors(Loan:loan, Point:point)
131+
132+
// R8: compute illegal access errors, i.e. an invalidation of a live loan
133+
errors(Loan, Point) :-
134+
loan_invalidated_at(Loan, Point),
135+
loan_live_at(Loan, Point).
136+
```
137+
138+
### Placeholder subsets, and illegal subset relations errors
139+
140+
These errors can be computed differently depending on the variant, but the goal is the same: if the analysis detects that a placeholder origin ultimately flows into another placeholder origin, that relationship needs to be declared or it is an error.
141+
142+
The `Naive` rules variant computes the complete subset transitive closure and can more easily detect whether one of these facts links two placeholder origins. The `LocationInsensitive` rules variant does not compute transitive subsets at all, and uses loan propagation to detect these errors (if a placeholder loan flows into a placeholder origin). The `Opt` optimized rules variant only computes the transitive closure of some origins according to their liveness and possible contribution to any error (mostly the ones dying along an edge, and the origins they can reach), and tracks the transitive subsets of placeholders explicitly.
143+
144+
```prolog
145+
.decl subset_errors(Origin1:origin, Origin2:origin, Point:point)
146+
147+
// R9: compute illegal subset relations errors, i.e. the undeclared subsets
148+
// between two placeholder origins.
149+
subset_errors(Origin1, Origin2, Point) :-
150+
subset(Origin1, Origin2, Point),
151+
placeholder_origin(Origin1),
152+
placeholder_origin(Origin2),
153+
!known_placeholder_subset(Origin1, Origin2).
154+
```
155+
156+
### Location Insensitive analysis
157+
158+
The rules above document the `Naive` variant of loan analysis, as it is conceptually simple and describes all the important parts computed by the Polonius model. This variant is "naive" in the sense that to stay clear and simple, the rules compute more things than strictly required. In particular, it computes the complete transitive subsets of all origins, as well as the loans contained by each origin at every point of the CFG.
159+
160+
In practice, different "grades" of borrow-checking can be useful: each with different levels of precision in what it accepts and with different computational complexity requirements. The lowest of such grades, the `LocationInsensitive` variant, trades off precision for speed by ignoring both the location where subsets happen, and the origins' contents at the CFG points. The idea is: if an analysis would find no error when ignoring path- and flow-sensitivity, then the full analysis would find no error either. If it does find potential errors, then the full analysis will find a subset of these location-insensitive errors.
161+
162+
This can be used as a quick pre-pass: if there are no errors, a full, expensive, analysis does not need to run, otherwise, only the loans where potential errors occur would need to be fully checked to remove false positives.
163+
164+
The inputs are the same as the `Naive` variant, but ignore the CFG points from the `subset`s. Subsets are not tracked, and are used to approximate loan propagation inside origins (regardless of liveness and location-sensitivity) in `origin_contains_loan`:
165+
166+
```prolog
167+
.decl subset(Origin1:origin, Origin2:origin)
168+
169+
// R1: the subsets are the non-transitive `subset_base` static input,
170+
// with their location stripped.
171+
subset(Origin1, Origin2) :-
172+
subset_base(Origin1, Origin2, _).
173+
174+
.decl origin_contains_loan(Origin:origin, Loan:loan)
175+
176+
// R2: the issuing origins are the ones initially containing loans.
177+
origin_contains_loan(Origin, Loan) :-
178+
loan_issued_at(Origin, Loan, _).
179+
180+
// R3: the placeholder origins also contain their placeholder loan.
181+
origin_contains_loan(Origin, Loan) :-
182+
placeholder_loan(Origin, Loan).
183+
184+
// R4: propagate the loans from the origins to their subsets.
185+
origin_contains_loan(Origin2, Loan) :-
186+
origin_contains_loan(Origin1, Loan),
187+
subset(Origin1, Origin2).
188+
189+
.decl loan_live_at(Loan:loan, Point:point)
190+
191+
// R5a: Approximate loan liveness. If an origin is live at a given
192+
// point, and it contains a loan *anywhere* in the CFG, that loan is
193+
// considered live at that point.
194+
loan_live_at(Loan, Point) :-
195+
origin_contains_loan(Origin, Loan),
196+
(origin_live_on_entry(Origin, Point); placeholder_origin(Origin)).
197+
198+
.decl potential_errors(Loan:loan, Point:point)
199+
200+
// R5b: Compute potential illegal access errors, i.e. invalidations
201+
// of live loans.
202+
potential_errors(Loan, Point) :-
203+
loan_invalidated_at(Loan, Point),
204+
loan_live_at(Loan, Point).
205+
```
206+
207+
Note: rules "5a" and "5b" above are named to match [the implementation](https://github.com/rust-lang/polonius/blob/master/polonius-engine/src/output/location_insensitive.rs) which computes `potential_errors` as a single "rule 5" without materializing the `loan_live_at` intermediate relation of "rule 5a".
208+
209+
Illegal subset relation errors (which are by definition about "subsets") can still be computed by propagating the placeholder loans, and detecting when they unexpectedly flow into another placeholder origin: one where this specific relationship between the two placeholders was not declared.
210+
211+
```prolog
212+
.decl potential_subset_errors(Origin1:origin, Origin2:origin)
213+
214+
// R6: compute potential illegal subset relations errors, i.e. the
215+
// placeholder loans which ultimately flowed into another placeholder
216+
// origin unexpectedly.
217+
potential_subset_errors(Origin1, Origin2) :-
218+
placeholder(Origin1, Loan1),
219+
placeholder(Origin2, _),
220+
origin_contains_loan(Origin2, Loan1),
221+
!placeholder_known_to_contain(Origin2, Loan1).
222+
```
223+
224+
This requires a simple input equivalent to the transitive closure of `known_placeholder_subset`, tracking the placeholder loans a given placeholder origin is known to contain instead, and is computed like so:
225+
226+
```prolog
227+
.decl placeholder_known_to_contain(Origin:origin, Loan:loan)
228+
229+
placeholder_known_to_contain(Origin, Loan) :-
230+
placeholder(Origin, Loan).
231+
232+
placeholder_known_to_contain(Origin2, Loan1) :-
233+
placeholder_known_to_contain(Origin1, Loan1),
234+
known_placeholder_subset(Origin1, Origin2).
235+
```
236+
237+
### To be continued
238+
239+
In the current implementation, this quick `LocationInsensitive` filter is used as a pre-pass to another optimized variant, as part of [the `Hybrid` algorithm](https://github.com/rust-lang/polonius/blob/2cf8336f7ff9932270160a392ca5be3c804b7f41/polonius-engine/src/output/mod.rs#L42).
240+
241+
A more detailed description of the rules in this `Opt` variant will be added later but it computes the same data as the `Naive` variant described above, more efficiently, by limiting where the subset transitive closure is computed: some origins are short-lived, or part of a subsection of the subset graph into which no loan ever flows, and therefore don't contribute to errors or loan propagation. There's no need to track these specific cases.
242+
243+
In the meantime, [the implementation](https://github.com/rust-lang/polonius/blob/master/polonius-engine/src/output/datafrog_opt.rs) documents the relations and rules it uses in its computation.

0 commit comments

Comments
 (0)