@@ -32,8 +32,6 @@ Literals are tokens used in [literal expressions].
3232| [ Byte] ( #byte-literals ) | ` b'H' ` | 0 | All ASCII | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) |
3333| [ Byte string] ( #byte-string-literals ) | ` b"hello" ` | 0 | All ASCII | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) |
3434| [ Raw byte string] ( #raw-byte-string-literals ) | ` br#"hello"# ` | <256 | All ASCII | ` N/A ` |
35- | [ C string] ( #c-string-literals ) | ` c"hello" ` | 0 | All Unicode | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) & [ Unicode] ( #unicode-escapes ) |
36- | [ Raw C string] ( #raw-c-string-literals ) | ` cr#"hello"# ` | <256 | All Unicode | ` N/A ` |
3735
3836\* The number of ` # ` s on each side of the same literal must be equivalent.
3937
@@ -330,107 +328,6 @@ b"\x52"; b"R"; br"R"; // R
330328b " \ \ x52" ; br " \x52" ; // \x52
331329```
332330
333- ### C string and raw C string literals
334-
335- #### C string literals
336-
337- > ** <sup >Lexer</sup >** \
338- > C_STRING_LITERAL :\
339- >   ;  ; ` c" ` (\
340- >   ;  ;   ;  ; ~ \[ ` " ` ` \ ` _ IsolatedCR_ ] \
341- >   ;  ;   ;  ; | BYTE_ESCAPE\
342- >   ;  ;   ;  ; | UNICODE_ESCAPE\
343- >   ;  ;   ;  ; | STRING_CONTINUE\
344- >   ;  ; )<sup >\* </sup > ` " ` SUFFIX<sup >?</sup >
345-
346- A _ C string literal_ is a sequence of Unicode characters and _ escapes_ ,
347- preceded by the characters ` U+0063 ` (` c ` ) and ` U+0022 ` (double-quote), and
348- followed by the character ` U+0022 ` . If the character ` U+0022 ` is present within
349- the literal, it must be _ escaped_ by a preceding ` U+005C ` (` \ ` ) character.
350- Alternatively, a C string literal can be a _ raw C string literal_ , defined
351- below. The type of a C string literal is [ ` &core::ffi::CStr ` ] [ CStr ] .
352-
353- [ CStr ] : ../core/ffi/struct.CStr.html
354-
355- C strings are implicitly terminated by byte ` 0x00 ` , so the C string literal
356- ` c"" ` is equivalent to manually constructing a ` &CStr ` from the byte string
357- literal ` b"\x00" ` . Other than the implicit terminator, byte ` 0x00 ` is not
358- permitted within a C string.
359-
360- Some additional _ escapes_ are available in non-raw C string literals. An escape
361- starts with a ` U+005C ` (` \ ` ) and continues with one of the following forms:
362-
363- * A _ byte escape_ escape starts with ` U+0078 ` (` x ` ) and is followed by exactly
364- two _ hex digits_ . It denotes the byte equal to the provided hex value.
365- * A _ 24-bit code point escape_ starts with ` U+0075 ` (` u ` ) and is followed
366- by up to six _ hex digits_ surrounded by braces ` U+007B ` (` { ` ) and ` U+007D `
367- (` } ` ). It denotes the Unicode code point equal to the provided hex value,
368- encoded as UTF-8.
369- * A _ whitespace escape_ is one of the characters ` U+006E ` (` n ` ), ` U+0072 `
370- (` r ` ), or ` U+0074 ` (` t ` ), denoting the bytes values ` 0x0A ` (ASCII LF),
371- ` 0x0D ` (ASCII CR) or ` 0x09 ` (ASCII HT) respectively.
372- * The _ backslash escape_ is the character ` U+005C ` (` \ ` ) which must be
373- escaped in order to denote its ASCII encoding ` 0x5C ` .
374-
375- The escape sequences ` \0 ` , ` \x00 ` , and ` \u{0000} ` are permitted within the token
376- but will be rejected as invalid, as C strings may not contain byte ` 0x00 ` except
377- as the implicit terminator.
378-
379- A C string represents bytes with no defined encoding, but a C string literal
380- may contain Unicode characters above ` U+007F ` . Such characters will be replaced
381- with the bytes of that character's UTF-8 representation.
382-
383- The following C string literals are equivalent:
384-
385- ``` rust
386- c " æ" ; // LATIN SMALL LETTER AE (U+00E6)
387- c " \ u{ 00E6} " ;
388- c " \ x C3\ x A6" ;
389- ```
390-
391- > ** Edition Differences** : C string literals are accepted in the 2021 edition or
392- > later. In earlier additions the token ` c"" ` is lexed as ` c "" ` .
393-
394- #### Raw C string literals
395-
396- > ** <sup >Lexer</sup >** \
397- > RAW_C_STRING_LITERAL :\
398- >   ;  ; ` cr ` RAW_C_STRING_CONTENT SUFFIX<sup >?</sup >
399- >
400- > RAW_C_STRING_CONTENT :\
401- >   ;  ;   ;  ; ` " ` ( ~ _ IsolatedCR_ )<sup >* (non-greedy)</sup > ` " ` \
402- >   ;  ; | ` # ` RAW_C_STRING_CONTENT ` # `
403-
404- Raw C string literals do not process any escapes. They start with the
405- character ` U+0063 ` (` c ` ), followed by ` U+0072 ` (` r ` ), followed by fewer than 256
406- of the character ` U+0023 ` (` # ` ), and a ` U+0022 ` (double-quote) character. The
407- _ raw C string body_ can contain any sequence of Unicode characters and is
408- terminated only by another ` U+0022 ` (double-quote) character, followed by the
409- same number of ` U+0023 ` (` # ` ) characters that preceded the opening ` U+0022 `
410- (double-quote) character.
411-
412- All characters contained in the raw C string body represent themselves in UTF-8
413- encoding. The characters ` U+0022 ` (double-quote) (except when followed by at
414- least as many ` U+0023 ` (` # ` ) characters as were used to start the raw C string
415- literal) or ` U+005C ` (` \ ` ) do not have any special meaning.
416-
417- > ** Edition Differences** : Raw C string literals are accepted in the 2021
418- > edition or later. In earlier additions the token ` cr"" ` is lexed as ` cr "" ` ,
419- > and ` cr#""# ` is lexed as ` cr #""# ` (which is non-grammatical).
420-
421- #### Examples for C string and raw C string literals
422-
423- ``` rust
424- c " foo" ; cr " foo" ; // foo
425- c " \ " foo\ "" ; cr #"" foo "" #; // "foo"
426-
427- c " foo #\ " # bar" ;
428- cr ##" foo #" # bar " ##; // foo #" # bar
429-
430- c " \ x52 " ; c " R" ; cr " R" ; // R
431- c " \ \ x52" ; cr " \ x52 " ; // \x52
432- ```
433-
434331### Number literals
435332
436333A _ number literal_ is either an _ integer literal_ or a _ floating-point
@@ -731,17 +628,17 @@ them are referred to as "token trees" in [macros]. The three types of brackets
731628## Reserved prefixes
732629
733630> ** <sup >Lexer 2021+</sup >** \
734- > RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` or ` c ` or ` r ` or ` br ` or ` cr ` _ </sub > | ` _ ` ) ` " ` \
631+ > RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` or ` r ` or ` br ` _ </sub > | ` _ ` ) ` " ` \
735632> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` _ </sub > | ` _ ` ) ` ' ` \
736- > RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` r ` or ` br ` or ` cr ` _ </sub > | ` _ ` ) ` # `
633+ > RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` r ` or ` br ` _ </sub > | ` _ ` ) ` # `
737634
738635Some lexical forms known as _ reserved prefixes_ are reserved for future use.
739636
740637Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or ` _ ` ) which is immediately followed by a ` # ` , ` ' ` , or ` " ` character (without intervening whitespace) is identified as a reserved prefix.
741638
742639Note that raw identifiers, raw string literals, and raw byte string literals may contain a ` # ` character but are not interpreted as containing a reserved prefix.
743640
744- Similarly the ` r ` , ` b ` , ` br ` , ` c ` , and ` cr ` prefixes used in raw string literals, byte literals, byte string literals, raw byte string literals, C string literals, and raw C string literals are not interpreted as reserved prefixes.
641+ Similarly the ` r ` , ` b ` , and ` br ` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes.
745642
746643> ** Edition Differences** : Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros).
747644>
0 commit comments