You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The LuaJIT Language Toolkit is a Lua implementation of the Lua programming language itself.
5
-
It generates LuaJIT's bytecode complete with debug informations.
6
-
The generated bytecode, in turn can be run by the LuaJIT's virtual machine.
4
+
The LuaJIT Language Toolkit is an implementation of the Lua programming language written in Lua iself.
5
+
It works by generating a LuaJIT's bytecode including the debug informations and use the LuaJIT's virtual machine to run the generated bytecode.
7
6
8
-
On itself this tookit does not do anything useful since LuaJIT is able to generate and run the bytecode for any Lua program.
7
+
On itself the language toolkit does not do anything useful since LuaJIT itself does the same things natively.
9
8
The purpose of the language toolkit is to provide a starting point to implement a programming language that target the LuaJIT virtual machine.
10
9
11
-
With the LuaJIT Language Toolkit is easy to create a new language or extend the Lua language because the parser is cleanly separated from the bytecode generator and the virtual machine run time environment.
10
+
With the LuaJIT Language Toolkit is easy to create a new language or modify the Lua language because the parser is cleanly separated from the bytecode generator and the virtual machine.
12
11
13
12
The toolkit implement actually a complete pipeline to parse a Lua program, generate an AST tree and generate the bytecode.
14
13
@@ -66,20 +65,40 @@ The Lexer's code is an almost literal translation of the LuaJIT's lexer.
66
65
Parser
67
66
---
68
67
69
-
The parser takes the flow of tokens as given by the lexer and form the statements and expressions according to the language's grammar.
70
-
The parser takes a list of user supplied rules that are invoked each time a parsing rule is completed.
71
-
The user's module can return a result that will be passed to the other rules's invocation.
68
+
The parser takes the flow of tokens as given by the lexer and forms the statements and expressions according to the language's grammar.
69
+
The parser is based on a list of parsing rules that are invoked each time a the input match a given rule.
70
+
When the input match a rule a corresponding function in the AST module is called to build an AST node.
71
+
The generated nodes in turns are passed as arguments to the other parsing rules until the whole program is parsed and a complete AST tree is built for the program text.
72
72
73
-
For example, the grammar rule for the "return" statement is:
73
+
The AST tree is very useful since it does abstract the structure of the program and is more easy to manipulate.
74
+
75
+
What distinguish the language toolkit from LuaJIT is that the parser phase does generate an AST tree and the bytecode generation is done in a separate phase only when the AST tree is completely generated.
76
+
77
+
LuaJIT itself operates differently.
78
+
During the parsing phase it does not generate any AST but instead the bytecode is directly generated and loaded into the memory to be executed by the VM.
79
+
This means that LuaJIT's C implementation perform the three operations:
80
+
81
+
- parse the program text
82
+
- generate the bytecode
83
+
- load the bytecode into memory
84
+
85
+
in one single pass.
86
+
This approach is remarkable on itself and very efficient but it makes difficult to modify or extend the programming language.
87
+
88
+
Parsing Rule example
89
+
~~~~~
90
+
91
+
To illustrate how the parsing work in the language toolkit let us make an example.
92
+
The grammar rule for the "return" statement is:
74
93
75
94
```
76
95
explist ::= {exp ','} exp
77
96
78
97
return_stmt ::= return [explist]
79
98
```
80
99
81
-
In this case the toolkit parser rule will parse the optional expression list by calling the function `expr_list`.
82
-
Then, once the expressions are parsed the user's rule `ast:return_stmt(exps, line)` will be invoked by passing the expressions list obtained before.
100
+
In this case the toolkit parser's rule will parse the optional expression list by calling the function `expr_list`.
101
+
Then, once the expressions are parsed the AST's rule `ast:return_stmt(exps, line)` will be invoked by passing the expressions list obtained before.
83
102
84
103
```lua
85
104
local function parse_return(ast, ls, line)
@@ -95,9 +114,7 @@ local function parse_return(ast, ls, line)
95
114
end
96
115
```
97
116
98
-
As you cas see the user's parsing rules are invoked using the `ast` object.
99
-
100
-
With the LuaJIT Language Toolkit a set of rules are defined in "lua-ast.lua" to build the AST of the program.
117
+
As you cas see the AST function are invoked using the `ast` object.
101
118
102
119
In addition the parser provides additional informations about:
103
120
@@ -113,7 +130,11 @@ The Abstract Syntax Tree (AST)
113
130
---
114
131
115
132
The abstract syntax tree represent the whole Lua program with all the informations.
116
-
If you implement a new programming language you can implement some transformations of the AST tree if you need.
133
+
134
+
One possible approach to implement a new programming language is to generate an AST tree that correspond to the target programming language and to transform the tree in a Lua's AST tree in a separate phase.
135
+
136
+
Another possible approach is to act from the parser itself and directly generate the appropriate Lua AST nodes.
137
+
117
138
Currently the language toolkit does not perform any transformation and just pass the AST tree to the bytecode generator module.
118
139
119
140
Bytecode Generator
@@ -122,7 +143,8 @@ Bytecode Generator
122
143
Once the AST tree is generated it can be feeded to the bytecode generator module that will generate the corresponding LuaJIT bytecode.
123
144
124
145
The bytecode generator is based on the original work of Richard Hundt for the Nyanga programming language.
125
-
It was greatly modified by myself to produce optimized code similar to what LuaJIT generate itself.
146
+
It was largely modified by myself to produce optimized code similar to what LuaJIT generate itself.
147
+
A lot of work was also done to ensure the correctness of the bytecode and of the debug informations.
126
148
127
149
Alternative Lua Code generator
128
150
------------------------------
@@ -135,26 +157,77 @@ The Lua code generator has the advantage of being more simple and more safe as t
135
157
Currently the Lua Code Generator backend does not preserve the line numbers of the original source code. This is meant to be fixed in the future.
136
158
137
159
Use this backend instead of the bytecode generator if you prefer to have a more safe backend to convert the Lua AST to code.
138
-
The module can be used also to pretty-printing a Lua AST tree since the code itself is propably the most human readable representation of the AST tree.
160
+
The module can be used also to pretty-printing a Lua AST tree since the code itself is probably the most human readable representation of the AST tree.
161
+
162
+
C API
163
+
---
164
+
165
+
The language toolkit does provide a very simple set of C API to implement a custom language.
166
+
The functions provided by the C API are:
167
+
168
+
```c
169
+
/* The functions above are the equivalent of the luaL_* corresponding
170
+
functions. */
171
+
extern int language_init(lua_State *L);
172
+
extern int language_report(lua_State *L, int status);
extern int language_loadfile(lua_State *L, const char *filename);
175
+
176
+
177
+
/* This function push on the stack a Lua table with the functions:
178
+
loadstring, loadfile, dofile and loader.
179
+
The first three function can replace the Lua functions while the
180
+
last one, loader, can be used as a customized "loader" function for
181
+
the "require" function. */
182
+
extern int luaopen_langloaders(lua_State *L);
183
+
```
184
+
185
+
The functions above can be used to create a custom LuaJIT executable that use the language toolkit implementation.
186
+
187
+
When the function `language_*` are used an indipendent `lua_State` is created behind the scenes and used to compile the bytecode.
188
+
Once the bytecode is generated it is loaded into the user's `lua_State` ready to be executed.
189
+
The approach of using a separate Lua's state ensure that the process of compiling does not interfere with the user's application.
190
+
191
+
It should be noted that even when an executable is created with the C API the lang/* Lua files need to be available at run time because they are used by the language toolkit's Lua state.
139
192
140
193
Running the Application
141
194
---
142
195
143
196
The application can be run with the following command:
144
197
145
198
```
146
-
luajit run.lua <filename>
199
+
luajit run.lua [lua-options] <filename>
147
200
```
148
201
149
202
The "run.lua" script will just invoke the complete pipeline of the lexer, parser and bytecode generator and it will pass the bytecode to luajit with "loadstring".
150
203
151
-
The script "run.lua" can optionally show the generated bytecode using the "-bl" flag. For example:
204
+
The language toolkit provide also a customized executable named `luajit-x` that use the language toolkit's toolchain instead of the native one.
205
+
Otherwise the program `luajit-x` works exactly as luajit itself and accept the same options.
206
+
207
+
This means that you can experiment with the language by modifying the Lua implementation of the language and test the changes immediately without recompiling anything by using `luajit-x` as a REPL.
208
+
209
+
Generated Bytecode
210
+
~~~~~~~~~~~~
211
+
212
+
You can inspect the bytecode generated by the language toolkit by using the "-b" options.
213
+
They can be invoked either with standard luajit by using "run.lua" or directly using the customized program `luajit-x`.
214
+
215
+
For example you can inspect the bytecode using the following command:
152
216
153
217
```
154
218
luajit run.lua -bl tests/test-1.lua
155
219
```
156
220
157
-
will print on the screen:
221
+
or in alternative:
222
+
223
+
```
224
+
./src/luajit-x -bl tests/test-1.lua
225
+
```
226
+
227
+
where we suppose that you are running `luajit-x` from the language toolkit's root directory.
228
+
This is somewhat *required* since the `luajit-x` programe needs to found the lang/* Lua modules when is executed.
229
+
230
+
Either way, when you use one of the two commands above to generate the bytecode you will obtain on the screen:
158
231
159
232
```
160
233
-- BYTECODE -- "test-1.lua":0-7
@@ -188,8 +261,79 @@ In the example above the generated bytecode will be *identical* to those generat
188
261
This is not an hazard since the Language Toolkit's bytecode generator is designed to produce the same bytecode that LuaJIT itself would generate.
189
262
Yet in some cases the generated code will differ but this is not considered a problem as long as the generated code is still correct.
190
263
264
+
Bytecode Annotated Dump
265
+
~~~~~~~~~~~~~~
266
+
267
+
In addition to the standard LuaJIT bytecode functions the language toolkit support also a special debug mode where the bytecode in printed byte-by-byte in hex format with some annotations on the right side of the screen.
268
+
The annotations will explain the meaning of each chunk of bytes and decode them as appropriate.
This kind of output is especially useful for debugging the language toolkit itself because it does account for every byte of the bytecode and include all the sections of the bytecode.
326
+
For examples you will be able to inspect the `kgc` or `knum` sections where the prototype's constants are stored.
327
+
The output will include also the debug section in decoded form so that it can be easily inspected.
328
+
191
329
Current Status
192
330
---
193
331
194
332
Currently LuaJIT Language Toolkit should be considered as beta software.
333
+
195
334
The implementation is now complete in term of features and well tested, even for the most complex cases and a complete test suite is used to verify the correctness of the generated bytecode.
335
+
336
+
The language toolkit is currently capable of executing itself.
337
+
This means that the language toolkit is able to correctly compile and load all of its module and execute them correctly.
338
+
339
+
Yet some bugs are probably present and you should be cautious when you use the language toolkit.
0 commit comments