Brainfuck interpreter and compiler with optimizer and ELF output.
bzt 3232ef5753 Fixed typo | 3 jaren geleden | |
---|---|---|
tests | 3 jaren geleden | |
LICENSE | 3 jaren geleden | |
Makefile | 3 jaren geleden | |
README.md | 3 jaren geleden | |
bf.c | 3 jaren geleden |
Why? Because why not? Stupid example on how to generate machine independent bytecode, run it through various optimizer techniques, then interpret it in a VM, or compile it into C or Assembly source or directly into a native ELF executable.
Just run make
. No library dependencies of any kind, just binutils and gcc.
BrainFuck by bzt Copyright (C) 2021 MIT license
./bf [-p] [-O[01234]] [-m<size>] [-w|-d] <bf source> [-b|-c|-s|-o] [out]
-p turn on profiling
-O optimization level
-m set program memory size
-w use word cells (16 bit)
-d use dword cells (32 bit)
-b output bytecode
-c output C source
-s output x86_64 assembly
-o output ELF binary
Specify the BF script as single parameter.
$ ./bf tests/hello.bf
Hello World!
With -p
it will do profiling too:
$ ./bf -p tests/hello.bf
Optimizer: 0.000004 sec
Hello World!
Execution: 0.000052 sec
Syntax and run-time checks are made on the script. Error messages look like this:
<source file>:<line>:<col>: <message>
With the -o
flag, BrainFuck compiler outputs ELF executable (x86_64 only, requires the binutils package to be installed):
$ ./bf tests/hello.bf -o hello
$ ./hello
Hello World!
You can also create ANSI C and x86_64 sources (which in turn can be compiled into native code) with this compiler. For example:
$ ./bf tests/hello.bf -c hello.c
$ gcc hello.c -o hello
$ ./hello
Hello World!
The -c
flag outputs ANSI C source. You can compile that with gcc -O3 but that's still not the real thing. Instead, let
Brainfuck do the optimizations and use the -s
flag to output x86_64 Assembly source (this way the result is about 4 times
faster than the gcc compiled C source).
$ ./bf tests/hello.bf -s hello.s
$ gcc hello.s -o hello
$ ./hello
Hello World!
You can specify the cell size (which defaults to byte) with -w
(word) and -d
(dword), like in these examples:
$ ./bf tests/bitwidth.bf
Hello World! 255
$ ./bf -w tests/bitwidth.bf
Hello world! 65535
$ ./bf -d tests/bitwidth.bf
Hello, world!
$
To set the program's memory limit (in cells, defaults to 32768 cells), use -m
like
$ ./bf -m8192 tests/hello.bf
Hello World!
Optimization can be set with -O0
through -O4
(level 4 only makes sense with the compiler, not with the interpreter).
Flag | Optimization |
---|---|
-O0 | no optimization at all (sloooooow) |
-O1 | run-length encode tokens |
-O2 | like -O1 plus dead loop elimination and zero out cell |
-O3 | like -O2 plus fast find and cell move mnemonics |
-O4 | like -O3 but without run-time boundary checks |
You can save the optimized, intermediate bytecode into a file, like this:
$ ./bf tests/hello.bf -b hello.bc
$ hexdump -C hello.bc
00000000 42 43 21 c0 00 19 01 c0 00 16 11 31 01 21 c0 00 |BC!........1.!..|
00000010 15 01 23 01 4f 2b 01 5c f1 00 01 31 f2 00 01 01 |..#.O+.\...1....|
00000020 31 58 53 2a 11 51 02 04 34 60 12 23 60 11 31 60 |1XS*.Q..4..#..1.|
00000030 60 23 60 11 31 60 03 60 12 60 23 60 36 60 01 31 |.#..1.....#.6..1|
00000040 60 12 21 60 11 60 |..!...|
Then the bytecode can be read back as if it were a plain text BF script (but this time no optimizations will take place):
$ ./bf hello.bc
Hello World!
$
The bytecode file starts with a 2 bytes magic, BC
, followed by encoded instructions:
OpCode | Mnemonic | Source | Description |
---|---|---|---|
i000xxxx | INC | > | increment pointer by x |
i001xxxx | DEC | < | decrement pointer by x |
i010xxxx | ADD | + | add x to the pointed cell |
i011xxxx | SUB | - | substract x from the pointed cell |
i100xxxx | JZ | [ | jump to x if pointed cell is zero |
i101xxxx | JNZ | ] | jump to x if pointed cell is non-zero |
0110???? | PUT | . | output pointed cell |
0111???? | GET | , | get input into pointed cell |
11110000 | ZRO | [-] | zero out cell |
1111i001 | FFZ x | [>] | find first zero cell in x big steps |
1111i010 | FLZ x | [<] | find last zero cell in x big steps |
1111i011 | MVD x | [-<+>] | move cell value to down cell |
1111i100 | MVU x | [->+<] | move cell value to up cell |
The highest bit, i
indicates the existance of an immedate operand. That is 16 bit wide, and together with the
command byte's lower tetrad forms a big endian 0 .. 0xFFFFF value.
For the additional mnemonics (upper tetrad 0xF0), i
is the highest bit of the lower tetrad, and immediate is 3
bytes long if its set, otherwise 2 bytes. The only exception is ZRO, which has no argument at all.
First, measure Hello World:
$ ./bf -p tests/hello.bf
Optimizer: 0.000004 sec
Hello World!
Execution: 0.000052 sec
Then let's add profiling and save as C source, then compile it as optimized native code and execute :-)
$ ./bf -p tests/hello.bf -c hello.c
Optimizer: 0.000003 sec
$ gcc -O3 hello.c -o hello
$ ./hello
Hello World!
Execution: 0.000069 sec
$
Haha, the interpreter was actually faster than the gcc optimized code! Let's try with Assembly directly compiled to ELF:
$ ./bf -p -O3 hello.bc -o hello
Optimizer: 0.000003 sec
$ ./hello
Hello World!
Execution: 0.000067 sec
$
Obviously if you do this with a much larger script that computes a lot more, then the Assembly is going to be faster:
$ ./bf -p tests/mandelbrot.bf >/dev/null
Optimizer: 0.000121 sec
Execution: 9.085870 sec
$ ./bf -p tests/mandelbrot.bf -o mandelbrot
Optimizer: 0.000121 sec
$ ./mandelbrot >/dev/null
Execution: 1.393856 sec
$
Here the speed up was little less than 7 times over the interpreter.
Free and Open Source, licensed under the MIT license.
Cheers, bzt