@@ -4,7 +4,7 @@ Author: Bob Nystrom
44
55Status: In-progress
66
7- Version 0.3 (see [ CHANGELOG] ( #CHANGELOG ) at end)
7+ Version 0.4 (see [ CHANGELOG] ( #CHANGELOG ) at end)
88
99Experiment flag: unquoted-imports
1010
@@ -107,18 +107,18 @@ import widget.tla.proto/client/component;
107107```
108108
109109You can probably infer what's going on from the before and after, but the basic
110- idea is that the library is a slash-separated series of dotted identifier
111- segments. The first segment is the name of the package . The rest is the path to
112- the library within that package. A ` .dart ` extension is implicitly added to the
113- end. If there is only a single segment, it is treated as the package name and
114- its last dotted component is the path. If the package name is ` dart ` , it's a
115- "dart:" library import.
110+ idea is that the library is a slash-separated series path segments, each of
111+ which is a dotted-separated identifier component . The first segment is the name
112+ of the package. The rest is the path to the library within that package. A
113+ ` .dart ` extension is implicitly added to the end. If there is only a single
114+ segment, it is treated as the package name and its last dotted component is the
115+ path. If the package name is ` dart ` , it's a "dart:" library import.
116116
117117The way I think about the proposed syntax is that relative imports are
118118* physical* in that they specify the actual relative path on the file system from
119119the current library to another library * file* . Because those are physical file
120120paths, they use string literals and file extensions as they do today. SDK and
121- package imports are * logical* in that you don't know where the library your
121+ package imports are * logical* in that you don't know where the library you're
122122importing lives on your disk. What you know is it's * logical name* and the
123123relative location of the library you want inside that package. Since these are
124124abstract references to a * library* , they are unquoted and omit the file
@@ -160,7 +160,7 @@ the reasons for the choices this proposal makes:
160160
161161### Path separator
162162
163- An import shorthand syntax that only supported a single identifier would work
163+ A package shorthand syntax that only supported a single identifier would work
164164for packages like ` test ` and ` args ` that only expose a single library, but
165165would fail for even very common libraries like ` package:flutter/material.dart ` .
166166So we need some notion of a package name and a path within the that package.
@@ -220,27 +220,29 @@ import flutter/material;
220220Is the ` flutter/material ` part a single token or three (` flutter ` , ` / ` , and
221221` material ` )? The main advantage of tokenizing it as a single monolithic token is
222222that we could potentially allow characters or identifiers in there aren't
223- otherwise valid Dart. For example, we could let you use reserved words as path
224- segments :
223+ otherwise valid Dart. For example, we could let you use hyphens as word
224+ separators as in :
225225
226226``` dart
227- import weird_package/for/if/ ok;
227+ import weird-package/but- ok;
228228```
229229
230230The disadvantage is that the tokenizer doesn't generally have enough context to
231- know when it should tokenize ` foo/bar ` as a single import path token versus
231+ know when it should tokenize ` foo/bar ` as a single package path token versus
232232three tokens that are presumably dividing two variables named ` foo ` and ` bar ` .
233233
234- Unlike Lasse's [ earlier proposal] [ lasse ] , this proposal does * not* tokenize an
235- import path as a single token. Instead, it's tokenized using Dart's current
234+ Unlike Lasse's [ earlier proposal] [ lasse ] , this proposal does * not* tokenize a
235+ package path as a single token. Instead, it's tokenized using Dart's current
236236lexical grammar.
237237
238- This means you can't have a path segment that's a reserved word or is otherwise
239- not a valid Dart identifier. Fortunately, our published guidance has * always*
240- told users that [ package names] [ name guideline ] and [ directories] [ directory
241- guideline] should be valid Dart identifiers. Pub will complain if you try to
242- publish a package whose name isn't a valid identifier. Likewise, the linter will
243- flag directory or library names that aren't identifiers.
238+ This means you can't have a path component that uses some combination of
239+ characters that isn't currently a single token in Dart, like ` hyphen-separated `
240+ or ` 123LeadingDigits ` . A path component must be an identifier (including
241+ built-in identifiers) or a reserved word. Fortunately, our published guidance
242+ has * always* told users that [ package names] [ name guideline ] and
243+ [ directories] [ directory guideline ] should be valid Dart identifiers. Pub will
244+ complain if you try to publish a package whose name isn't a valid identifier.
245+ Likewise, the linter will flag file names that aren't identifiers.
244246
245247[ name guideline ] : https://dart.dev/tools/pub/pubspec#name
246248[ directory guideline ] : https://dart.dev/effective-dart/style#do-name-packages-and-file-system-entities-using-lowercase-with-underscores
@@ -258,18 +260,52 @@ in a large corpus of pub packages and open source widgets:
258260 69 ( 0.010%): dotted with non-identifiers =
259261```
260262
261- This splits every "package:" import's path into segments separated by ` / ` . Then
262- for each segment, it reports whether the segment is a valid identifier, a
263- built-in identifier like ` dynamic ` or ` covariant ` , etc. Almost all segments are
264- either valid identifiers, or dotted identifiers where each subcomponent is a
265- valid identifier.
263+ This splits every "package:" path into segments separated by ` / ` . Then it splits
264+ segments into components separated by ` . ` For each component, the analysis
265+ reports whether the component is a valid identifier, a built-in identifier like
266+ ` dynamic ` or ` covariant ` , or a reserved word like ` for ` or ` if ` .
266267
267- (For the very small number that aren't, they can continue to use the old quoted
268- "package:" import syntax to import the library.)
268+ Components that are not some kind of identifier (regular, reserved, or built-in)
269+ are vanishingly rare. In those few cases, if a user can't simply rename the
270+ file, they can continue to use the old quoted "package:" syntax to refer to the
271+ file.
269272
270- I think this approach is much simpler than trying to add special lexing rules.
271- It's consistent with how Java, C# and other languages parse their imports. It
272- does mean users can do silly things like:
273+ ### Reserved words and semi-reserved words
274+
275+ One confusing area of Dart that the previous table hints at is that Dart has
276+ several categories of identifiers that vary in how user-accessible they are:
277+
278+ * Reserved words like ` for ` and ` class ` can never be used by a user as a
279+ regular identifier in any context.
280+
281+ * Built-in identifiers like ` abstract ` and ` interface ` can't be used as * type*
282+ names but can be used as other kinds of identifiers.
283+
284+ * Contextual keywords like ` await ` and ` show ` behave like keywords in some
285+ specific contexts but are usable as regular identifiers everywhere else.
286+
287+ This leads to confusion about which of these flavors of identifiers can be used
288+ as package paths. Which of these, if any, are valid:
289+
290+ ``` dart
291+ import if/else;
292+ import abstract/interface;
293+ import show/hide;
294+ ```
295+
296+ Many Dart users (including experts, some of whom may be members of the Dart
297+ language team) don't know the full list of reserved or semi-reserved words. We
298+ don't want users to run into problems determining which identifiers work in
299+ package paths. To that end, we allow * all* reserved words and identifiers,
300+ including built-in identifiers and contextual keywords as path components.
301+
302+ ### Whitespace and comments
303+
304+ Even though the unquoted path is tokenized as separate tokens, we don't allow
305+ whitespace or comments to appear between them as we do in most other places in
306+ the language.
307+
308+ We could allow users to write code like:
273309
274310``` dart
275311import strange /* comment */ . but
@@ -281,7 +317,37 @@ import strange /* comment */ . but
281317 fine;
282318```
283319
284- But they can also choose to * not* do that.
320+ This wouldn't cause any problems for a Dart implementation. It would simply
321+ discard the whitespace and comments as it does elsewhere and the resulting path
322+ is ` strange.but/another/fine ` .
323+
324+ However, it likely causes problems for Dart * users* and other simpler tools and
325+ scripts that work with Dart code. In particular, we often see homegrown tools
326+ that want to "parse" a Dart file to find its package references and traverse the
327+ dependency graph. While these tools ideally should use a full Dart parser (like
328+ the one in the [ analyzer package] [ ] , which is freely available), the reality is
329+ that users often cobble together simple scripts using regex to do this kind of
330+ parsing, or they need to write these tools in a language other than Dart. In
331+ those cases, if the package path happens to contain whitespace or comments, the
332+ tool will likely silently fail to recognize the package path.
333+
334+ [ analyzer package ] : https://pub.dev/packages/analyzer
335+
336+ Also, we find no compelling * use* for whitespace and comments inside package
337+ paths. To that end, this proposal makes it an error. All of the tokens in the
338+ path must be directly adjacent with no whitespace, newlines, or comments between
339+ them. The previous import is an error. However, we still allow comments in or
340+ after the directives outside of the path. These are all valid:
341+
342+ ``` dart
343+ import /* Weird but OK. */ some/path;
344+ export some/path; // Hi there.
345+ part some/path // Before the semicolon? Really?
346+ ;
347+ ```
348+
349+ The syntax that results from the above few sections is simple to tokenize and
350+ parse while looking like a single opaque "unquoted string" to users and tools.
285351
286352## Syntax
287353
@@ -291,54 +357,57 @@ We add a new rule and hang it off the existing `uri` rule already used by import
291357and export directives:
292358
293359```
294- uri ::= stringLiteral | packagePath
295- packagePath ::= packagePathSegment ( '/' packagePathSegment )*
296- packagePathSegment ::= dottedIdentifierList
297- dottedIdentifierList ::= identifier ('.' identifier)*
360+ uri ::= stringLiteral | packagePath
361+ packagePath ::= pathSegment ( '/' pathSegment )*
362+ pathSegment ::= segmentComponent ( '.' segmentComponent )*
363+ segmentComponent ::= IDENTIFIER
364+ | RESERVED_WORD
365+ | BUILT_IN_IDENTIFIER
366+ | OTHER_IDENTIFIER
298367```
299368
300- An import or export can continue to use a ` stringLiteral ` for the quoted form
301- (which is what they will do for relative imports). But they can also use a
302- ` packagePath ` , which is a slash-separated series of segments, each of which is a
303- series of dot-separated identifiers. * (The ` dottedIdentifierList ` rule is
304- already in the grammar and is shown here for clarity.)*
369+ It is a compile-time error if any whitespace, newlines, or comments occur
370+ between any of the ` segmentComponent ` , ` / ` , or ` . ` tokens in a ` packagePath ` .
371+ * In other words, there can be nothing except the terminals themselves from the
372+ first ` segmentComponent ` in the ` packagePath ` to the last.*
373+
374+ * An import, export, or part directive can continue to use a ` stringLiteral ` for
375+ the quoted form (which is what they will do for relative references). But they
376+ can also use a ` packagePath ` , which is a slash-separated series of segments,
377+ each of which is a series of dot-separated components.*
305378
306379### Part directive lookahead
307380
308- * There are two directives for working with part files, ` part ` and ` part of ` . The
309- ` of ` identifier is not a reserved word in Dart. This means that when the parser
310- sees ` part of ` , it doesn't immediately know if it is looking at a ` part `
311- directive followed by an unquoted identifier like ` part of; ` or `part
312- of.some/other.thing;` versus a ` part of` directive like ` part of thing;` or
313- ` part of 'uri.dart'; ` It must lookahead past the ` of ` identifier to see if the
314- next token is ` ; ` , ` . ` , ` / ` , or another identifier.*
381+ * There are two directives for working with part files, ` part ` and ` part of ` .
382+ This means that when the parser sees ` part of ` , it doesn't immediately know if
383+ it is looking at a ` part ` directive followed by an unquoted identifier like
384+ ` part of; ` or ` part of.some/other.thing; ` versus a ` part of ` directive like
385+ ` part of thing; ` or ` part of 'uri.dart'; ` It must lookahead past the ` of `
386+ identifier to see if the next token is ` ; ` , ` . ` , ` / ` , or another identifier.*
315387
316388* This may add some complexity to parsing, but should be minor. Dart's grammar
317389has other places that require much more (sometimes unbounded) lookahead.*
318390
319391## Static semantics
320392
321393The semantics of the new syntax are defined by taking the ` packagePath ` and
322- converting it to a string. The directive then behaves as if the user had written
323- a string literal containing that string . The process is:
394+ converting it to a URI string. The directive then behaves as if the user had
395+ written a string literal containing that URI . The process is:
324396
325- 1 . Let the * segment* for a ` packagePathSegment ` be a string defined by the
326- ordered concatenation of the ` identifier ` and ` . ` terminals in the
327- ` packagePathSegment ` , with all whitespace and comments removed. * So if
328- ` packagePathSegment ` is ` a . b /* comment */ . c ` , then its * segment* is
397+ 1 . Let the * segment* for a ` pathSegment ` be a string defined by the ordered
398+ concatenation of the ` segmentComponent ` and ` . ` terminals in the
399+ ` pathSegment ` . * So if ` pathSegment ` is ` a.b.c ` , then its * segment* is
329400 "a.b.c".*
330401
331- 2 . Let * segments* be an ordered list of the segments of each
332- ` packagePathSegment ` in ` packagePath ` . * In other words, this and the
333- preceding step take the ` packagePath ` and convert it to a list of segment
334- strings while discarding whitespace and comments. So if ` packagePathSegment `
335- is ` a . b /* comment */ / c / d . e ` , then * segments* is [ "a.b", "c",
336- "d.e"] .*
402+ 2 . Let * segments* be an ordered list of the segments of each ` pathSegment ` in
403+ ` packagePath ` . * In other words, this and the preceding step take the
404+ ` packagePath ` and convert it to a list of segment strings. So if
405+ ` pathSegment ` is ` a.b/c/d.e ` , then * segments* is [ "a.b", "c", "d.e"] .*
337406
3384073 . If the first segment in * segments* is "dart":
339408
340- 1 . It is a compile error if there are no subsequent segments. * There's no
341- "dart: dart " or "package: dart /dart.dart" library. We reserve the right
409+ 1 . It is a compile-time error if there are no subsequent segments. * There's
410+ no "dart: dart " or "package: dart /dart.dart" library. We reserve the right
342411 to use ` import dart; ` in the future to mean something useful.*
343412
344413 2 . Let * path* be the concatenation of the remaining segments, separated
@@ -347,38 +416,38 @@ a string literal containing that string. The process is:
347416 imports. But a custom Dart embedder or future version of Dart could in
348417 theory introduce directories for SDK libraries.*
349418
350- 3 . The URI is "dart:* path* ". * So ` import dart/async; ` desugars to
351- ` import "dart:async"; ` .*
419+ 3 . The URI is "dart:* path* ". * So ` import dart/async; ` imports the library
420+ ` "dart:async" ` .*
352421
3534224 . Else if there is only a single segment:
354423
355424 1 . Let * name* be the segment.
356425
357- 2 . Let * path* be the last identifier in the segment. * If the segment is
358- only a single identifier , this is the entire segment. Otherwise, it's
359- the last identifier after the last ` . ` . So in ` foo ` , * path * is ` foo ` .
360- In ` foo.bar.baz ` , it's ` baz ` .*
426+ 2 . Let * path* be the last ` segmentComponent ` in the segment. * If the
427+ segment is only a single ` segmentComponent ` , this is the entire segment.
428+ Otherwise, it's the last identifier after the last ` . ` . So in ` foo ` ,
429+ * path * is ` foo ` . In ` foo.bar.baz ` , it's ` baz ` .*
361430
362- 3 . The URI is "package:* name* /* path* .dart". * So ` import test; ` desugars to
363- ` import "package:test/test.dart"; ` , and ` import server.api; ` desugars
364- to ` import "package:server.api/api.dart"; ` .*
431+ 3 . The URI is "package:* name* /* path* .dart". * So ` import test; ` imports the
432+ library ` "package:test/test.dart" ` , and ` import server.api; ` imports
433+ ` "package:server.api/api.dart" ` .*
365434
3664355 . Else:
367436
368437 1 . Let * path* be the concatenation of the segments, separated by ` / ` .
369438
370- 3 . The URI is "package:* path* .dart". * So ` import a/b/c/d; ` desugars to
371- ` import "package:a/b/c/d.dart"; ` .
439+ 2 . The URI is "package:* path* .dart". * So ` import a/b/c/d; ` imports
440+ ` "package:a/b/c/d.dart" ` .
372441
373442Once the ` packagePath ` has been converted to a string, the directive behaves
374443exactly as if the user had written a ` stringLiteral ` containing that same
375444string.
376445
377- Given the list of segments, here is a complete implementation of the desugaring
378- logic in Dart :
446+ Given the list of segments, here is a complete Dart implementation of the logic
447+ to convert an unquoted path to the effective URI it refers to :
379448
380449``` dart
381- String desugar (List<String> segments) => switch (segments) {
450+ String toUri (List<String> segments) => switch (segments) {
382451 ['dart'] => 'ERROR. Not allowed to import just "dart"',
383452 ['dart', ...var rest] => 'dart:${rest.join('/')}',
384453 [var name] => 'package:$name/${name.split('.').last}.dart',
@@ -409,15 +478,15 @@ may make a breaking change and remove support for the old syntax.
409478
410479The ` part of ` directive allows a library name after ` of ` instead of a string
411480literal. With this proposal, that syntax is now ambiguous. Is it interpreted
412- as a library name, or as an unquoted URI that should be desugared to a URI?
481+ as a library name, or as an unquoted URI that should be converted to a URI?
413482In other words, given:
414483
415484``` dart
416485part of foo.bar;
417486```
418487
419488Is the file saying it's a part of the library containing ` library foo.bar; ` or
420- that it's part of the library found at URI ` package:foo/bar.dart ` ?
489+ that it's part of the library found at URI ` package:foo.bar /bar.dart ` ?
421490
422491Library names in ` part of ` directives have been deprecated for many years
423492because the syntax doesn't work well with many tools. How is a given tool
@@ -463,7 +532,7 @@ this proposal's semantics. In other words, `part of foo.bar;` is part of the
463532library at ` package:foo/bar.dart ` , not part of the library with name ` foo.bar ` .
464533
465534Users affected by the breakage can and should update their ` part of ` directive
466- to point to the URI of the library that the file is a part, using either the
535+ to point to the URI of the library that the file is a part of , using either the
467536quoted or unquoted syntax.
468537
469538### Language versioning
@@ -487,7 +556,7 @@ Since the static semantics are so simple, it is trivial to write a `dart fix`
487556that automatically converts existing "dart:" and "package:" string-based
488557directives to the new syntax. A handful of regexes are sufficient to break an
489558existing import into a series of slash-separated segments which are
490- dot-separated identifiers . Then the above snippet of Dart code will convert that
559+ dot-separated components . Then the above snippet of Dart code will convert that
491560to the new syntax.
492561
493562### Lint
@@ -501,6 +570,12 @@ new unquoted style whenever an existing directive could use it.
501570
502571## Changelog
503572
573+ ### 0.4
574+
575+ - Allow reserved words and built-in identifiers as path components (#3984 ).
576+
577+ - Disallow whitespace and comments inside package paths (#3983 ).
578+
504579### 0.3
505580
506581- Address breaking change in ` part of ` directives with library names.
0 commit comments