An attempt at keeping a record of where we were when, and speculation on what the log might contain soon
Project kickoff. Beginnings of the compilation strategy document, and the first WebAssembly files built by hand, to make sure that the compilation strategy will work out.
Lots of confusion around nominal types. For context, Scheme is quite
monomorphic in flavor (e.g. length
only works on lists) but it has to
contend with dynamically-typed objects. You don't need to do so much
polymorphic dispatch but you do need to be able to quickly dynamically
check that an object is of the expected type. For WebAssembly as hosted
by a JS implementation, each GC object starts with an internal
"map"/"shape"/"structure"/"hidden class" word (the nomenclature depends
on the engine but it's all the same); the fastest checks will simply
check than an object's first word has a given value.
However the abstraction that exposes cheap map word checks to WebAssembly (nominal types / run-time types) was removed from the GC MVP. You can still do dynamic checks, but they are structural, and some objects have the same shape. So in these first couple weeks there was significant confusion on whether structural checks would be sufficient, or whether you would need a tag word.
First, some work on making the basic-types example represent all fundamental types that there are in Guile.
Hash tables threw a bit of a spanner in the works. The initial idea was
to punt to the embedder; for a JS host, you'd use
Map
.
But how would you implement identity-based hashing in Wasmtime?
Assuming a GC that moves objects, you can't simply use object address as
the key. It's asking too much of the host to implement some strange
side table.
So, as the JVM does, we now explicitly include space for a hash code in
all objects, even pairs (which before we were trying to keep lean, as in
Guile). On the positive side this hash word can contain a tag, so
dynamic checks are cheaper than before. Relatedly, because string
is not a subtype of
eq
, we needed to
wrap strings anyway; this gives us a way to have mutable strings if we
end up needing to have that.
All in all, the arc is towards using a little more memory, but not taking short-cuts to compromise on semantics, relative to what native Guile does. In the end it will be fine.
Some additional run-time work in the basic-types example to show how to implement a symbol table, a keyword table, string hashing, struct vtables, and so on.
Now that the hand-written basic-types example supports all types that Guile does, we start on making WebAssembly from Scheme. First, a port of wassemble to Scheme. As we go, we update wassemble for more recent additions to the wasm standard. There are actually a couple parts of the assembler: one to take s-expression input that mostly corresponds to the WebAssembly textual grammar, and parses to Guile records; and one that take a Guile WebAssembly record and assembles it to bytes.
Having gotten that to work, next up was a disassembler. This one was built to handle all of the WebAssembly that V8 can handle, including experimental extensions. For GC, this corresponds to the Milestone 6 MVP document. Representing all WebAssembly features required some expansions to the ../module/wasm/types.scm, and corresponding updates to the assembler.
Finally, after getting the parser working and the assembler back up to speed on (almost) all the features that the parser can parse, tied things together with some tests.
Gave a talk about our efforts at BOB 2023. Seems to have been well-received.
On the implementation side, started looking at the
tailify
pass. Right now if you compile the trivial Scheme
program 42
, you get this:
L0:
v0 := self
L1(...)
L1:
receive()
v1 := const 42 ; val
v2 := restore1[ptr]() ; ret
tail calli v2(v1)
Here we see two blocks. (Really it's just one, but it renders as two
because the $kentry
has two successors: the $kclause
and the
$ktail
). L0
is the entry, which binds self
as the closure and
then continues to L1
which parses the args, expecting 0 additional
args. Then we define the return value (42
), then pop a ptr
from the
stack and call it indirectly (tail calli
).
The thing is, ptr
is a really lazy way to describe the type of the
continuation. Can we assume that it's a return continuation or are
there other kinds of pointers that we might see? Turns out, yes, for
now at least we can assume it's a return continuation; the other uses of
ptr
-typed locals in native Guile are for interior pointers for strings
and pointers to the struct-unboxed-field bitvector, neither of which we
will have for the wasm target.
But, we still have some work to do on the compiler front-end, to avoid
eager "instruction explosion" of CPS primitives in (language tree-il
compile-cps)
to primitives that correspond to the native Guile VM
rather than what we will use in wasm.
Aaaaaaahhhhh, finally:
guild compile-wasm -o 42.wasm 42.scm
wrote `42.wasm`
We also have a new reflect.wat / reflect.js run-time library to allow these WebAssembly files to be loaded in web browsers or other places where you have a JavaScript implementation with a capable WebAssembly implementation.
Some progress, in that the compiler now supports all kinds of constant literals, and we are starting to implement the different primcalls. Then, non-tail function calls; once that is working we'll be close to having a working Scheme.
Some interesting progress: the test suite and JS host harness has been enhanced so as to compile multiple compilation units and have them interoperate. To do this we need to ensure they share the same run-time (stack pointers, etc). Probably there are some more guard-rails to make here but for now it's an interesting development.
We now compile value returns, tail calls, conditionals, and a few minimal primcalls (add, add/immediate, sub, sub/immediate, mul). We can do non-tail calls too. Recursive fac and fib run (though don't handle overflow to bignum yet); speed appears to currently be about 15x slower than Guile native. Something to work on over time.
Conversations from Hoot sync meeting:
In medium term wanna define more functions in scheme. Want to compile expressions with lexically available environment, but that's kind of a separate thing
Might be in medium term that we could define functions in scheme that have their implementation in webassembly
Add special logic to compiler to emit WASM code instead of in other ways to compile a function (kinda like inline-asm)
Gotta make it nicer to write WASM! Especially all those damn make-type-use things. If you're parsing WAT, you get around a lot of those
In that file maybe have more emacs indentation rules so the code
doesn't look terrible. Should put in local variables or .dir-locals.el
or etc.
Prioritization for this week:
Land self-assembly MRs
Move stdlib from hoot compile to hoot stdlib
Compute stdlib return standard wasm module but parse from WAT instead of more imperative mode. Put it in the WAT DSL instead of in scheme.
Use WAT more. Less explicit assembly records, more WAT parsing
Moved the unify-returns pass into Guile (wip-tailify
branch), and made
it so that Guile can select Hoot-specific backend passes to its CPS.
But the big change is that Guile no longer eagerly explodes
e.g. vector-length
into generic low-level word-ref/immediate
instructions when converting from Tree-IL to CPS; we keep the various
type checks and bailouts as part of explicit control flow, but CPS
conversion residualizes e.g. vector-ref
etc. This will let Hoot
allocate objects with Wasm/GC typed objects.
Finished adding support for vectors, pairs, closures, variables/boxes, structs, and started on bytevectors. Next up will be the dynamic environment.
In WebAssembly, vector-length
needs to work on a value of type
(struct $vector)
. But generally what is flowing around is the (ref
eq)
unitype, so we have to introduce explicit ref.cast
operations.
Still, type checks can help us automatically recover this information:
because vector-length
is dominated by a vector?
check, we should be
able to use br_on_cast
or similar.
Right now we're just going to insert casts right before vector-ref
et
al, as we emit the vector-ref
. But really we should instead do like
this:
$vector
casts right before each
vector accessor (or pairs, structs, etc).vector?
primcall.$vector
cast after the type check. This is already a win because
probably there's more than one access.br_on_cast
if a branch is followed by
a cast. Not sure how this integrates with the "beyond relooper"
design though!Some good starter tasks for new contributors:
The "wat" component of the assembler isn't yet updated for reference types / GC. It would be nice to fix that so that we can replace our use of binaryen to use our own assembler instead.
We should have a wasm->wat serializer. Probably a good way to test
that the wat->wasm parser for reftypes is working: take
basic-test.wasm
, parse to records via parse-wasm
, then take it to
wat and then back to wasm and binary.
It would be nice to add a "symbolizer" pass for parsed wasm files.
That way you can write some of the standard library in wat, use
binaryen to compile, optimize, and validate it, and when you want to
pull in parts of the standard library you can just parse the compiled
wasm via compute-stdlib
in (hoot compile)
. To do this though,
intramodule references need to be symbolic (by-name) rather than by
index, to allow the module to be picked apart and combined with the
generated wasm.
The WebAssembly tool conventions define a way to serialize names to object files. Probably we should do this, for debuggability.
Currently the subset of the Scheme language that is supported is
quite minimal. We need to expand this. See (hoot
compile), anywhere it says
"unimplemented", and add tests to
test-constants.scm
or some other file
there.