@@ -32,6 +32,8 @@ Literals are tokens used in [literal expressions].
3232| [ Byte] ( #byte-literals ) | ` b'H' ` | 0 | All ASCII | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) |
3333| [ Byte string] ( #byte-string-literals ) | ` b"hello" ` | 0 | All ASCII | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) |
3434| [ Raw byte string] ( #raw-byte-string-literals ) | ` br#"hello"# ` | <256 | All ASCII | ` N/A ` |
35+ | [ C string] ( #c-string-literals ) | ` c"hello" ` | 0 | All Unicode | [ Quote] ( #quote-escapes ) & [ Byte] ( #byte-escapes ) & [ Unicode] ( #unicode-escapes ) |
36+ | [ Raw C string] ( #raw-c-string-literals ) | ` cr#"hello"# ` | <256 | All Unicode | ` N/A ` |
3537
3638\* The number of ` # ` s on each side of the same literal must be equivalent.
3739
@@ -328,6 +330,107 @@ b"\x52"; b"R"; br"R"; // R
328330b " \ \ x52" ; br " \x52" ; // \x52
329331```
330332
333+ ### C string and raw C string literals
334+
335+ #### C string literals
336+
337+ > ** <sup >Lexer</sup >** \
338+ > C_STRING_LITERAL :\
339+ >   ;  ; ` c" ` (\
340+ >   ;  ;   ;  ; ~ \[ ` " ` ` \ ` _ IsolatedCR_ ] \
341+ >   ;  ;   ;  ; | BYTE_ESCAPE\
342+ >   ;  ;   ;  ; | UNICODE_ESCAPE\
343+ >   ;  ;   ;  ; | STRING_CONTINUE\
344+ >   ;  ; )<sup >\* </sup > ` " ` SUFFIX<sup >?</sup >
345+
346+ A _ C string literal_ is a sequence of Unicode characters and _ escapes_ ,
347+ preceded by the characters ` U+0063 ` (` c ` ) and ` U+0022 ` (double-quote), and
348+ followed by the character ` U+0022 ` . If the character ` U+0022 ` is present within
349+ the literal, it must be _ escaped_ by a preceding ` U+005C ` (` \ ` ) character.
350+ Alternatively, a C string literal can be a _ raw C string literal_ , defined
351+ below. The type of a C string literal is [ ` &core::ffi::CStr ` ] [ CStr ] .
352+
353+ [ CStr ] : ../core/ffi/struct.CStr.html
354+
355+ C strings are implicitly terminated by byte ` 0x00 ` , so the C string literal
356+ ` c"" ` is equivalent to manually constructing a ` &CStr ` from the byte string
357+ literal ` b"\x00" ` . Other than the implicit terminator, byte ` 0x00 ` is not
358+ permitted within a C string.
359+
360+ Some additional _ escapes_ are available in non-raw C string literals. An escape
361+ starts with a ` U+005C ` (` \ ` ) and continues with one of the following forms:
362+
363+ * A _ byte escape_ escape starts with ` U+0078 ` (` x ` ) and is followed by exactly
364+ two _ hex digits_ . It denotes the byte equal to the provided hex value.
365+ * A _ 24-bit code point escape_ starts with ` U+0075 ` (` u ` ) and is followed
366+ by up to six _ hex digits_ surrounded by braces ` U+007B ` (` { ` ) and ` U+007D `
367+ (` } ` ). It denotes the Unicode code point equal to the provided hex value,
368+ encoded as UTF-8.
369+ * A _ whitespace escape_ is one of the characters ` U+006E ` (` n ` ), ` U+0072 `
370+ (` r ` ), or ` U+0074 ` (` t ` ), denoting the bytes values ` 0x0A ` (ASCII LF),
371+ ` 0x0D ` (ASCII CR) or ` 0x09 ` (ASCII HT) respectively.
372+ * The _ backslash escape_ is the character ` U+005C ` (` \ ` ) which must be
373+ escaped in order to denote its ASCII encoding ` 0x5C ` .
374+
375+ The escape sequences ` \0 ` , ` \x00 ` , and ` \u{0000} ` are permitted within the token
376+ but will be rejected as invalid, as C strings may not contain byte ` 0x00 ` except
377+ as the implicit terminator.
378+
379+ A C string represents bytes with no defined encoding, but a C string literal
380+ may contain Unicode characters above ` U+007F ` . Such characters will be replaced
381+ with the bytes of that character's UTF-8 representation.
382+
383+ The following C string literals are equivalent:
384+
385+ ``` rust
386+ c " æ" ; // LATIN SMALL LETTER AE (U+00E6)
387+ c " \ u{ 00E6} " ;
388+ c " \ x C3\ x A6" ;
389+ ```
390+
391+ > ** Edition Differences** : C string literals are accepted in the 2021 edition or
392+ > later. In earlier additions the token ` c"" ` is lexed as ` c "" ` .
393+
394+ #### Raw C string literals
395+
396+ > ** <sup >Lexer</sup >** \
397+ > RAW_C_STRING_LITERAL :\
398+ >   ;  ; ` cr ` RAW_C_STRING_CONTENT SUFFIX<sup >?</sup >
399+ >
400+ > RAW_C_STRING_CONTENT :\
401+ >   ;  ;   ;  ; ` " ` ( ~ _ IsolatedCR_ )<sup >* (non-greedy)</sup > ` " ` \
402+ >   ;  ; | ` # ` RAW_C_STRING_CONTENT ` # `
403+
404+ Raw C string literals do not process any escapes. They start with the
405+ character ` U+0063 ` (` c ` ), followed by ` U+0072 ` (` r ` ), followed by fewer than 256
406+ of the character ` U+0023 ` (` # ` ), and a ` U+0022 ` (double-quote) character. The
407+ _ raw C string body_ can contain any sequence of Unicode characters and is
408+ terminated only by another ` U+0022 ` (double-quote) character, followed by the
409+ same number of ` U+0023 ` (` # ` ) characters that preceded the opening ` U+0022 `
410+ (double-quote) character.
411+
412+ All characters contained in the raw C string body represent themselves in UTF-8
413+ encoding. The characters ` U+0022 ` (double-quote) (except when followed by at
414+ least as many ` U+0023 ` (` # ` ) characters as were used to start the raw C string
415+ literal) or ` U+005C ` (` \ ` ) do not have any special meaning.
416+
417+ > ** Edition Differences** : Raw C string literals are accepted in the 2021
418+ > edition or later. In earlier additions the token ` cr"" ` is lexed as ` cr "" ` ,
419+ > and ` cr#""# ` is lexed as ` cr #""# ` (which is non-grammatical).
420+
421+ #### Examples for C string and raw C string literals
422+
423+ ``` rust
424+ c " foo" ; cr " foo" ; // foo
425+ c " \ " foo\ "" ; cr #"" foo "" #; // "foo"
426+
427+ c " foo #\ " # bar" ;
428+ cr ##" foo #" # bar " ##; // foo #" # bar
429+
430+ c " \ x52 " ; c " R" ; cr " R" ; // R
431+ c " \ \ x52" ; cr " \ x52 " ; // \x52
432+ ```
433+
331434### Number literals
332435
333436A _ number literal_ is either an _ integer literal_ or a _ floating-point
@@ -628,17 +731,17 @@ them are referred to as "token trees" in [macros]. The three types of brackets
628731## Reserved prefixes
629732
630733> ** <sup >Lexer 2021+</sup >** \
631- > RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` or ` r ` or ` br ` _ </sub > | ` _ ` ) ` " ` \
734+ > RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` or ` c ` or ` r ` or ` br ` or ` cr ` _ </sub > | ` _ ` ) ` " ` \
632735> RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` b ` _ </sub > | ` _ ` ) ` ' ` \
633- > RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` r ` or ` br ` _ </sub > | ` _ ` ) ` # `
736+ > RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD <sub >_ Except ` r ` or ` br ` or ` cr ` _ </sub > | ` _ ` ) ` # `
634737
635738Some lexical forms known as _ reserved prefixes_ are reserved for future use.
636739
637740Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or ` _ ` ) which is immediately followed by a ` # ` , ` ' ` , or ` " ` character (without intervening whitespace) is identified as a reserved prefix.
638741
639742Note that raw identifiers, raw string literals, and raw byte string literals may contain a ` # ` character but are not interpreted as containing a reserved prefix.
640743
641- Similarly the ` r ` , ` b ` , and ` br ` prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes.
744+ Similarly the ` r ` , ` b ` , ` br ` , ` c ` , and ` cr ` prefixes used in raw string literals, byte literals, byte string literals, raw byte string literals, C string literals, and raw C string literals are not interpreted as reserved prefixes.
642745
643746> ** Edition Differences** : Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros).
644747>
0 commit comments