Skip to content

Commit 6f9d25b

Browse files
committed
Add a way to override the lexer in a Parser instance
Currently, it's possible to do, but would require a lot of code duplication and pinning to an exact version of tolerant-php-parser, which discourages that. I can think of the following use cases for this. My main reason is to reuse token_get_all() elsewhere, but being able to parse `T_FN` in php < 7.4 is also convenient. - Multiple applications needing to use the result of token_get_all for the same file. If none of them modify the array, it's much faster to reuse the same array than to create this. For example, Phan's language server mode will potentially use tolerant-php-parser. In addition to that, it also uses token_get_all in InlineHTMLPlugin (to check for misuse of inline HTML) and sometimes in BuiltinSuppressionPlugin (to list T_COMMENT/T_DOC_COMMENT containing `@phan-suppress-*`) Aside: https://wiki.php.net/rfc/token_as_object would be faster and more memory efficient than token_get_all() in php 8.0, if it gets approved - Needing to call tolerant-php-parser on the same token stream, multiple times. (e.g. an application that modifies the Microsoft\PhpParser\Node instances (but not tokens), or which creates data structures from the original Node but usually discards them to save memory)
1 parent c5e2bf5 commit 6f9d25b

File tree

2 files changed

+33
-2
lines changed

2 files changed

+33
-2
lines changed

src/Parser.php

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,21 @@ public function __construct() {
135135
$this->returnTypeDeclarationTokens = \array_merge([TokenKind::VoidReservedWord, TokenKind::NullReservedWord, TokenKind::FalseReservedWord, TokenKind::StaticKeyword], $this->parameterTypeDeclarationTokens);
136136
}
137137

138+
/**
139+
* This method exists so that it can be overridden in subclasses.
140+
* Any subclass must return a token stream that is equivalent to the contents in $fileContents for this to work properly.
141+
*
142+
* Possible reasons for applications to override the lexer:
143+
*
144+
* - Imitate token stream of a newer/older PHP version (e.g. T_FN is only available in php 7.4)
145+
* - Reuse the result of token_get_all to create a Node again.
146+
* - Reuse the result of token_get_all in a different library.
147+
*/
148+
protected function makeLexer(string $fileContents): TokenStreamProviderInterface
149+
{
150+
return TokenStreamProviderFactory::GetTokenStreamProvider($fileContents);
151+
}
152+
138153
/**
139154
* Generates AST from source file contents. Returns an instance of SourceFileNode, which is always the top-most
140155
* Node-type of the tree.
@@ -143,7 +158,7 @@ public function __construct() {
143158
* @return SourceFileNode
144159
*/
145160
public function parseSourceFile(string $fileContents, string $uri = null) : SourceFileNode {
146-
$this->lexer = TokenStreamProviderFactory::GetTokenStreamProvider($fileContents);
161+
$this->lexer = $this->makeLexer($fileContents);
147162

148163
$this->reset();
149164

src/PhpTokenizer.php

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ public static function getTokensArrayFromContent(
7474
$content = $prefix . $content;
7575
}
7676

77-
$tokens = @\token_get_all($content);
77+
$tokens = static::tokenGetAll($content, $parseContext);
7878

7979
$arr = array();
8080
$fullStart = $start = $pos = $initialPos;
@@ -147,6 +147,22 @@ public static function getTokensArrayFromContent(
147147
return $arr;
148148
}
149149

150+
/**
151+
* @param string $content the raw php code
152+
* @param ?int $parseContext can be SourceElements when extracting doc comments.
153+
* Having this available may be useful for subclasses to decide whether or not to post-process results, cache results, etc.
154+
* @return array[]|string[] an array of tokens. When concatenated, these tokens must equal $content.
155+
*
156+
* This exists so that it can be overridden in subclasses, e.g. to cache the result of tokenizing entire files.
157+
* Applications using tolerant-php-parser may often end up needing to use the token stream for other reasons that are hard to do in the resulting AST,
158+
* such as iterating over T_COMMENTS, checking for inline html,
159+
* looking up all tokens (including skipped tokens) on a given line, etc.
160+
*/
161+
protected static function tokenGetAll(string $content, $parseContext): array
162+
{
163+
return @\token_get_all($content);
164+
}
165+
150166
const TOKEN_MAP = [
151167
T_CLASS_C => TokenKind::Name,
152168
T_DIR => TokenKind::Name,

0 commit comments

Comments
 (0)