Skip to content

Commit 42fa21b

Browse files
committed
Merge branch 'c-api'
Conflicts: lang/bytecode.lua lang/generator.lua
2 parents 7d9480b + 3644a55 commit 42fa21b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+3155
-317
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
11
*~
22
.out*
3+
.deps
4+
*.o
5+
*.a
36
tests/log/*

Makefile

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
2+
# Makefile
3+
#
4+
# Copyright (C) 2014 Francesco Abbate
5+
#
6+
# This program is free software; you can redistribute it and/or modify
7+
# it under the terms of the GNU General Public License as published by
8+
# the Free Software Foundation; either version 3 of the License, or (at
9+
# your option) any later version.
10+
#
11+
# This program is distributed in the hope that it will be useful, but
12+
# WITHOUT ANY WARRANTY; without even the implied warranty of
13+
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
14+
# General Public License for more details.
15+
#
16+
# You should have received a copy of the GNU General Public License
17+
# along with this program; if not, write to the Free Software
18+
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
19+
#
20+
21+
all:
22+
$(MAKE) -C src
23+
24+
clean:
25+
$(MAKE) -C src clean
26+
27+
.PHONY: clean all

README.md

Lines changed: 164 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,13 @@
11
LuaJIT Language Toolkit
22
===
33

4-
The LuaJIT Language Toolkit is a Lua implementation of the Lua programming language itself.
5-
It generates LuaJIT's bytecode complete with debug informations.
6-
The generated bytecode, in turn can be run by the LuaJIT's virtual machine.
4+
The LuaJIT Language Toolkit is an implementation of the Lua programming language written in Lua iself.
5+
It works by generating a LuaJIT's bytecode including the debug informations and use the LuaJIT's virtual machine to run the generated bytecode.
76

8-
On itself this tookit does not do anything useful since LuaJIT is able to generate and run the bytecode for any Lua program.
7+
On itself the language toolkit does not do anything useful since LuaJIT itself does the same things natively.
98
The purpose of the language toolkit is to provide a starting point to implement a programming language that target the LuaJIT virtual machine.
109

11-
With the LuaJIT Language Toolkit is easy to create a new language or extend the Lua language because the parser is cleanly separated from the bytecode generator and the virtual machine run time environment.
10+
With the LuaJIT Language Toolkit is easy to create a new language or modify the Lua language because the parser is cleanly separated from the bytecode generator and the virtual machine.
1211

1312
The toolkit implement actually a complete pipeline to parse a Lua program, generate an AST tree and generate the bytecode.
1413

@@ -66,20 +65,40 @@ The Lexer's code is an almost literal translation of the LuaJIT's lexer.
6665
Parser
6766
---
6867

69-
The parser takes the flow of tokens as given by the lexer and form the statements and expressions according to the language's grammar.
70-
The parser takes a list of user supplied rules that are invoked each time a parsing rule is completed.
71-
The user's module can return a result that will be passed to the other rules's invocation.
68+
The parser takes the flow of tokens as given by the lexer and forms the statements and expressions according to the language's grammar.
69+
The parser is based on a list of parsing rules that are invoked each time a the input match a given rule.
70+
When the input match a rule a corresponding function in the AST module is called to build an AST node.
71+
The generated nodes in turns are passed as arguments to the other parsing rules until the whole program is parsed and a complete AST tree is built for the program text.
7272

73-
For example, the grammar rule for the "return" statement is:
73+
The AST tree is very useful since it does abstract the structure of the program and is more easy to manipulate.
74+
75+
What distinguish the language toolkit from LuaJIT is that the parser phase does generate an AST tree and the bytecode generation is done in a separate phase only when the AST tree is completely generated.
76+
77+
LuaJIT itself operates differently.
78+
During the parsing phase it does not generate any AST but instead the bytecode is directly generated and loaded into the memory to be executed by the VM.
79+
This means that LuaJIT's C implementation perform the three operations:
80+
81+
- parse the program text
82+
- generate the bytecode
83+
- load the bytecode into memory
84+
85+
in one single pass.
86+
This approach is remarkable on itself and very efficient but it makes difficult to modify or extend the programming language.
87+
88+
Parsing Rule example
89+
~~~~~
90+
91+
To illustrate how the parsing work in the language toolkit let us make an example.
92+
The grammar rule for the "return" statement is:
7493
7594
```
7695
explist ::= {exp ','} exp
7796
7897
return_stmt ::= return [explist]
7998
```
8099
81-
In this case the toolkit parser rule will parse the optional expression list by calling the function `expr_list`.
82-
Then, once the expressions are parsed the user's rule `ast:return_stmt(exps, line)` will be invoked by passing the expressions list obtained before.
100+
In this case the toolkit parser's rule will parse the optional expression list by calling the function `expr_list`.
101+
Then, once the expressions are parsed the AST's rule `ast:return_stmt(exps, line)` will be invoked by passing the expressions list obtained before.
83102
84103
```lua
85104
local function parse_return(ast, ls, line)
@@ -95,9 +114,7 @@ local function parse_return(ast, ls, line)
95114
end
96115
```
97116
98-
As you cas see the user's parsing rules are invoked using the `ast` object.
99-
100-
With the LuaJIT Language Toolkit a set of rules are defined in "lua-ast.lua" to build the AST of the program.
117+
As you cas see the AST function are invoked using the `ast` object.
101118
102119
In addition the parser provides additional informations about:
103120
@@ -113,7 +130,11 @@ The Abstract Syntax Tree (AST)
113130
---
114131
115132
The abstract syntax tree represent the whole Lua program with all the informations.
116-
If you implement a new programming language you can implement some transformations of the AST tree if you need.
133+
134+
One possible approach to implement a new programming language is to generate an AST tree that correspond to the target programming language and to transform the tree in a Lua's AST tree in a separate phase.
135+
136+
Another possible approach is to act from the parser itself and directly generate the appropriate Lua AST nodes.
137+
117138
Currently the language toolkit does not perform any transformation and just pass the AST tree to the bytecode generator module.
118139
119140
Bytecode Generator
@@ -122,7 +143,8 @@ Bytecode Generator
122143
Once the AST tree is generated it can be feeded to the bytecode generator module that will generate the corresponding LuaJIT bytecode.
123144
124145
The bytecode generator is based on the original work of Richard Hundt for the Nyanga programming language.
125-
It was greatly modified by myself to produce optimized code similar to what LuaJIT generate itself.
146+
It was largely modified by myself to produce optimized code similar to what LuaJIT generate itself.
147+
A lot of work was also done to ensure the correctness of the bytecode and of the debug informations.
126148
127149
Alternative Lua Code generator
128150
------------------------------
@@ -135,26 +157,77 @@ The Lua code generator has the advantage of being more simple and more safe as t
135157
Currently the Lua Code Generator backend does not preserve the line numbers of the original source code. This is meant to be fixed in the future.
136158
137159
Use this backend instead of the bytecode generator if you prefer to have a more safe backend to convert the Lua AST to code.
138-
The module can be used also to pretty-printing a Lua AST tree since the code itself is propably the most human readable representation of the AST tree.
160+
The module can be used also to pretty-printing a Lua AST tree since the code itself is probably the most human readable representation of the AST tree.
161+
162+
C API
163+
---
164+
165+
The language toolkit does provide a very simple set of C API to implement a custom language.
166+
The functions provided by the C API are:
167+
168+
```c
169+
/* The functions above are the equivalent of the luaL_* corresponding
170+
functions. */
171+
extern int language_init(lua_State *L);
172+
extern int language_report(lua_State *L, int status);
173+
extern int language_loadbuffer(lua_State *L, const char *buff, size_t sz, const char *name);
174+
extern int language_loadfile(lua_State *L, const char *filename);
175+
176+
177+
/* This function push on the stack a Lua table with the functions:
178+
loadstring, loadfile, dofile and loader.
179+
The first three function can replace the Lua functions while the
180+
last one, loader, can be used as a customized "loader" function for
181+
the "require" function. */
182+
extern int luaopen_langloaders(lua_State *L);
183+
```
184+
185+
The functions above can be used to create a custom LuaJIT executable that use the language toolkit implementation.
186+
187+
When the function `language_*` are used an indipendent `lua_State` is created behind the scenes and used to compile the bytecode.
188+
Once the bytecode is generated it is loaded into the user's `lua_State` ready to be executed.
189+
The approach of using a separate Lua's state ensure that the process of compiling does not interfere with the user's application.
190+
191+
It should be noted that even when an executable is created with the C API the lang/* Lua files need to be available at run time because they are used by the language toolkit's Lua state.
139192
140193
Running the Application
141194
---
142195
143196
The application can be run with the following command:
144197
145198
```
146-
luajit run.lua <filename>
199+
luajit run.lua [lua-options] <filename>
147200
```
148201
149202
The "run.lua" script will just invoke the complete pipeline of the lexer, parser and bytecode generator and it will pass the bytecode to luajit with "loadstring".
150203
151-
The script "run.lua" can optionally show the generated bytecode using the "-bl" flag. For example:
204+
The language toolkit provide also a customized executable named `luajit-x` that use the language toolkit's toolchain instead of the native one.
205+
Otherwise the program `luajit-x` works exactly as luajit itself and accept the same options.
206+
207+
This means that you can experiment with the language by modifying the Lua implementation of the language and test the changes immediately without recompiling anything by using `luajit-x` as a REPL.
208+
209+
Generated Bytecode
210+
~~~~~~~~~~~~
211+
212+
You can inspect the bytecode generated by the language toolkit by using the "-b" options.
213+
They can be invoked either with standard luajit by using "run.lua" or directly using the customized program `luajit-x`.
214+
215+
For example you can inspect the bytecode using the following command:
152216
153217
```
154218
luajit run.lua -bl tests/test-1.lua
155219
```
156220
157-
will print on the screen:
221+
or in alternative:
222+
223+
```
224+
./src/luajit-x -bl tests/test-1.lua
225+
```
226+
227+
where we suppose that you are running `luajit-x` from the language toolkit's root directory.
228+
This is somewhat *required* since the `luajit-x` programe needs to found the lang/* Lua modules when is executed.
229+
230+
Either way, when you use one of the two commands above to generate the bytecode you will obtain on the screen:
158231
159232
```
160233
-- BYTECODE -- "test-1.lua":0-7
@@ -188,8 +261,79 @@ In the example above the generated bytecode will be *identical* to those generat
188261
This is not an hazard since the Language Toolkit's bytecode generator is designed to produce the same bytecode that LuaJIT itself would generate.
189262
Yet in some cases the generated code will differ but this is not considered a problem as long as the generated code is still correct.
190263
264+
Bytecode Annotated Dump
265+
~~~~~~~~~~~~~~
266+
267+
In addition to the standard LuaJIT bytecode functions the language toolkit support also a special debug mode where the bytecode in printed byte-by-byte in hex format with some annotations on the right side of the screen.
268+
The annotations will explain the meaning of each chunk of bytes and decode them as appropriate.
269+
270+
For example:
271+
272+
```
273+
luajit run.lua -bx tests/test-1.lua
274+
```
275+
276+
will print on the screen something like:
277+
278+
```
279+
1b 4c 4a 01 | Header LuaJIT 2.0 BC
280+
00 | Flags: None
281+
11 40 74 65 73 74 73 2f | Chunkname: @tests/test-1.lua
282+
74 65 73 74 2d 31 2e 6c |
283+
75 61 |
284+
| .. prototype ..
285+
8a 01 | prototype length 138
286+
02 | prototype flags PROTO_VARARG
287+
00 | parameters number 0
288+
07 | framesize 7
289+
00 01 01 12 | size uv: 0 kgc: 1 kn: 1 bc: 19
290+
31 | debug size 49
291+
00 07 | firstline: 0 numline: 7
292+
| .. bytecode ..
293+
32 00 00 00 | 0001 TNEW 0 0
294+
27 01 01 00 | 0002 KSHORT 1 1
295+
27 02 0a 00 | 0003 KSHORT 2 10
296+
27 03 01 00 | 0004 KSHORT 3 1
297+
49 01 04 80 | 0005 FORI 1 => 0010
298+
20 05 04 04 | 0006 => MULVV 5 4 4
299+
14 05 00 05 | 0007 ADDVN 5 5 0 ; 1
300+
39 05 04 00 | 0008 TSETV 5 0 4
301+
4b 01 fc 7f | 0009 FORL 1 => 0006
302+
27 01 01 00 | 0010 => KSHORT 1 1
303+
27 02 0a 00 | 0011 KSHORT 2 10
304+
27 03 01 00 | 0012 KSHORT 3 1
305+
49 01 04 80 | 0013 FORI 1 => 0018
306+
34 05 00 00 | 0014 => GGET 5 0 ; "print"
307+
36 06 04 00 | 0015 TGETV 6 0 4
308+
3e 05 02 01 | 0016 CALL 5 1 2
309+
4b 01 fc 7f | 0017 FORL 1 => 0014
310+
47 00 01 00 | 0018 => RET0 0 1
311+
| .. uv ..
312+
| .. kgc ..
313+
0a 70 72 69 6e 74 | kgc: "print"
314+
| .. knum ..
315+
02 | knum int: 1
316+
| .. debug ..
317+
01 | pc001: line 1
318+
02 | pc002: line 2
319+
02 | pc003: line 2
320+
02 | pc004: line 2
321+
02 | pc005: line 2
322+
...
323+
```
324+
325+
This kind of output is especially useful for debugging the language toolkit itself because it does account for every byte of the bytecode and include all the sections of the bytecode.
326+
For examples you will be able to inspect the `kgc` or `knum` sections where the prototype's constants are stored.
327+
The output will include also the debug section in decoded form so that it can be easily inspected.
328+
191329
Current Status
192330
---
193331
194332
Currently LuaJIT Language Toolkit should be considered as beta software.
333+
195334
The implementation is now complete in term of features and well tested, even for the most complex cases and a complete test suite is used to verify the correctness of the generated bytecode.
335+
336+
The language toolkit is currently capable of executing itself.
337+
This means that the language toolkit is able to correctly compile and load all of its module and execute them correctly.
338+
339+
Yet some bugs are probably present and you should be cautious when you use the language toolkit.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)