Skip to content

Commit 943f70e

Browse files
Update README.md
1 parent e3fb3e4 commit 943f70e

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,24 @@
11
# GPT-3-Encoder-PHP
22
PHP BPE Text Encoder for GPT-2 / GPT-3
3+
4+
## About
5+
GPT-2 and GPT-3 use byte pair encoding to turn text into a series of integers to feed into the model. This is a javascript implementation of OpenAI's original python encoder/decoder which can be found [here](https://github.com/openai/gpt-2)
6+
7+
This specific encoder is used in one of my [WordPress plugins](https://coderevolution.ro), to count the number of tokens a string will use when sent to OpenAI API.
8+
9+
10+
## Usage
11+
12+
The mbstring PHP extension is needed for this tool to work correctly (in case non-ASCII characters are present in the tokenized text: [details here on how to install mbstring](https://www.php.net/manual/en/mbstring.installation.php)
13+
14+
15+
```php
16+
17+
$prompt = "Many words map to one token, but some don't: indivisible. Unicode characters like emojis may be split into many tokens containing the underlying bytes: 🤚🏾 Sequences of characters commonly found next to each other may be grouped together: 1234567890";
18+
19+
$token_array = gpt_encode($prompt);
20+
21+
```
22+
23+
24+

0 commit comments

Comments
 (0)