12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409 |
- @c -*-texinfo-*-
- @c This is part of the GNU Guile Reference Manual.
- @c Copyright (C) 2008-2016, 2018, 2020
- @c Free Software Foundation, Inc.
- @c See the file guile.texi for copying conditions.
- @node Compiling to the Virtual Machine
- @section Compiling to the Virtual Machine
- Compilers! The word itself inspires excitement and awe, even among
- experienced practitioners. But a compiler is just a program: an
- eminently hackable thing. This section aims to describe Guile's
- compiler in such a way that interested Scheme hackers can feel
- comfortable reading and extending it.
- @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
- know how to compile your @code{.scm} file.
- @menu
- * Compiler Tower::
- * The Scheme Compiler::
- * Tree-IL::
- * Continuation-Passing Style::
- * Bytecode::
- * Writing New High-Level Languages::
- * Extending the Compiler::
- @end menu
- @node Compiler Tower
- @subsection Compiler Tower
- Guile's compiler is quite simple -- its @emph{compilers}, to put it more
- accurately. Guile defines a tower of languages, starting at Scheme and
- progressively simplifying down to languages that resemble the VM
- instruction set (@pxref{Instruction Set}).
- Each language knows how to compile to the next, so each step is simple
- and understandable. Furthermore, this set of languages is not hardcoded
- into Guile, so it is possible for the user to add new high-level
- languages, new passes, or even different compilation targets.
- Languages are registered in the module, @code{(system base language)}:
- @example
- (use-modules (system base language))
- @end example
- They are registered with the @code{define-language} form.
- @deffn {Scheme Syntax} define-language @
- [#:name] [#:title] [#:reader] [#:printer] @
- [#:parser=#f] [#:compilers='()] @
- [#:decompilers='()] [#:evaluator=#f] @
- [#:joiner=#f] [#:for-humans?=#t] @
- [#:make-default-environment=make-fresh-user-module] @
- [#:lowerer=#f] [#:analyzer=#f] [#:compiler-chooser=#f]
- Define a language.
- This syntax defines a @code{<language>} object, bound to @var{name} in
- the current environment. In addition, the language will be added to the
- global language set. For example, this is the language definition for
- Scheme:
- @example
- (define-language scheme
- #:title "Scheme"
- #:reader (lambda (port env) ...)
- #:compilers `((tree-il . ,compile-tree-il))
- #:decompilers `((tree-il . ,decompile-tree-il))
- #:evaluator (lambda (x module) (primitive-eval x))
- #:printer write
- #:make-default-environment (lambda () ...))
- @end example
- @end deffn
- The interesting thing about having languages defined this way is that
- they present a uniform interface to the read-eval-print loop. This
- allows the user to change the current language of the REPL:
- @example
- scheme@@(guile-user)> ,language tree-il
- Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
- tree-il@@(guile-user)> ,L scheme
- Happy hacking with Scheme! To switch back, type `,L tree-il'.
- scheme@@(guile-user)>
- @end example
- Languages can be looked up by name, as they were above.
- @deffn {Scheme Procedure} lookup-language name
- Looks up a language named @var{name}, autoloading it if necessary.
- Languages are autoloaded by looking for a variable named @var{name} in
- a module named @code{(language @var{name} spec)}.
- The language object will be returned, or @code{#f} if there does not
- exist a language with that name.
- @end deffn
- When Guile goes to compile Scheme to bytecode, it will ask the Scheme
- language to choose a compiler from Scheme to the next language on the
- path from Scheme to bytecode. Performing this computation recursively
- builds transformations from a flexible chain of compilers. The next
- link will be obtained by invoking the language's compiler chooser, or if
- not present, from the language's compilers field.
- A language can specify an analyzer, which is run before a term of that
- language is lowered and compiled. This is where compiler warnings are
- issued.
- If a language specifies a lowerer, that procedure is called on
- expressions before compilation. This is where optimizations and
- canonicalizations go.
- Finally a language's compiler translates a lowered term from one
- language to the next one in the chain.
- There is a notion of a ``current language'', which is maintained in the
- @code{current-language} parameter, defined in the core @code{(guile)}
- module. This language is normally Scheme, and may be rebound by the
- user. The run-time compilation interfaces
- (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
- and target languages.
- The normal tower of languages when compiling Scheme goes like this:
- @itemize
- @item Scheme
- @item Tree Intermediate Language (Tree-IL)
- @item Continuation-Passing Style (CPS)
- @item Bytecode
- @end itemize
- As discussed before (@pxref{Object File Format}), bytecode is in ELF
- format, ready to be serialized to disk. But when compiling Scheme at
- run time, you want a Scheme value: for example, a compiled procedure.
- For this reason, so as not to break the abstraction, Guile defines a
- fake language at the bottom of the tower:
- @itemize
- @item Value
- @end itemize
- Compiling to @code{value} loads the bytecode into a procedure, turning
- cold bytes into warm code.
- Perhaps this strangeness can be explained by example:
- @code{compile-file} defaults to compiling to bytecode, because it
- produces object code that has to live in the barren world outside the
- Guile runtime; but @code{compile} defaults to compiling to @code{value},
- as its product re-enters the Guile world.
- @c FIXME: This doesn't work anymore :( Should we add some kind of
- @c special GC pass, or disclaim this kind of code, or what?
- Indeed, the process of compilation can circulate through these
- different worlds indefinitely, as shown by the following quine:
- @example
- ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
- @end example
- @node The Scheme Compiler
- @subsection The Scheme Compiler
- The job of the Scheme compiler is to expand all macros and all of Scheme
- to its most primitive expressions. The definition of ``primitive
- expression'' is given by the inventory of constructs provided by
- Tree-IL, the target language of the Scheme compiler: procedure calls,
- conditionals, lexical references, and so on. This is described more
- fully in the next section.
- The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
- that it is completely implemented by the macro expander. Since the
- macro expander has to run over all of the source code already in order
- to expand macros, it might as well do the analysis at the same time,
- producing Tree-IL expressions directly.
- Because this compiler is actually the macro expander, it is extensible.
- Any macro which the user writes becomes part of the compiler.
- The Scheme-to-Tree-IL expander may be invoked using the generic
- @code{compile} procedure:
- @lisp
- (compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
- @result{}
- #<tree-il (call (toplevel +) (const 1) (const 2))>
- @end lisp
- @code{(compile @var{foo} #:from 'scheme #:to 'tree-il)} is entirely
- equivalent to calling the macro expander as @code{(macroexpand @var{foo}
- 'c '(compile load eval))}. @xref{Macro Expansion}.
- @code{compile-tree-il}, the procedure dispatched by @code{compile} to
- @code{'tree-il}, is a small wrapper around @code{macroexpand}, to make
- it conform to the general form of compiler procedures in Guile's
- language tower.
- Compiler procedures take three arguments: an expression, an
- environment, and a keyword list of options. They return three values:
- the compiled expression, the corresponding environment for the target
- language, and a ``continuation environment''. The compiled expression
- and environment will serve as input to the next language's compiler.
- The ``continuation environment'' can be used to compile another
- expression from the same source language within the same module.
- For example, you might compile the expression, @code{(define-module
- (foo))}. This will result in a Tree-IL expression and environment. But
- if you compiled a second expression, you would want to take into account
- the compile-time effect of compiling the previous expression, which puts
- the user in the @code{(foo)} module. That is the purpose of the
- ``continuation environment''; you would pass it as the environment when
- compiling the subsequent expression.
- For Scheme, an environment is a module. By default, the @code{compile}
- and @code{compile-file} procedures compile in a fresh module, such
- that bindings and macros introduced by the expression being compiled
- are isolated:
- @example
- (eq? (current-module) (compile '(current-module)))
- @result{} #f
- (compile '(define hello 'world))
- (defined? 'hello)
- @result{} #f
- (define / *)
- (eq? (compile '/) /)
- @result{} #f
- @end example
- Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
- @code{current-reader}}) are isolated:
- @example
- (compile '(fluid-set! current-reader (lambda args 'fail)))
- (fluid-ref current-reader)
- @result{} #f
- @end example
- Nevertheless, having the compiler and @dfn{compilee} share the same name
- space can be achieved by explicitly passing @code{(current-module)} as
- the compilation environment:
- @example
- (define hello 'world)
- (compile 'hello #:env (current-module))
- @result{} world
- @end example
- @node Tree-IL
- @subsection Tree-IL
- Tree Intermediate Language (Tree-IL) is a structured intermediate
- language that is close in expressive power to Scheme. It is an
- expanded, pre-analyzed Scheme.
- Tree-IL is ``structured'' in the sense that its representation is
- based on records, not S-expressions. This gives a rigidity to the
- language that ensures that compiling to a lower-level language only
- requires a limited set of transformations. For example, the Tree-IL
- type @code{<const>} is a record type with two fields, @code{src} and
- @code{exp}. Instances of this type are created via @code{make-const}.
- Fields of this type are accessed via the @code{const-src} and
- @code{const-exp} procedures. There is also a predicate, @code{const?}.
- @xref{Records}, for more information on records.
- @c alpha renaming
- All Tree-IL types have a @code{src} slot, which holds source location
- information for the expression. This information, if present, will be
- residualized into the compiled object code, allowing backtraces to
- show source information. The format of @code{src} is the same as that
- returned by Guile's @code{source-properties} function. @xref{Source
- Properties}, for more information.
- Although Tree-IL objects are represented internally using records,
- there is also an equivalent S-expression external representation for
- each kind of Tree-IL. For example, the S-expression representation
- of @code{#<const src: #f exp: 3>} expression would be:
- @example
- (const 3)
- @end example
- Users may program with this format directly at the REPL:
- @example
- scheme@@(guile-user)> ,language tree-il
- Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
- tree-il@@(guile-user)> (call (primitive +) (const 32) (const 10))
- @result{} 42
- @end example
- The @code{src} fields are left out of the external representation.
- One may create Tree-IL objects from their external representations via
- calling @code{parse-tree-il}, the reader for Tree-IL. If any source
- information is attached to the input S-expression, it will be
- propagated to the resulting Tree-IL expressions. This is probably the
- easiest way to compile to Tree-IL: just make the appropriate external
- representations in S-expression format, and let @code{parse-tree-il}
- take care of the rest.
- @deftp {Scheme Variable} <void> src
- @deftpx {External Representation} (void)
- An empty expression. In practice, equivalent to Scheme's @code{(if #f
- #f)}.
- @end deftp
- @deftp {Scheme Variable} <const> src exp
- @deftpx {External Representation} (const @var{exp})
- A constant.
- @end deftp
- @deftp {Scheme Variable} <primitive-ref> src name
- @deftpx {External Representation} (primitive @var{name})
- A reference to a ``primitive''. A primitive is a procedure that, when
- compiled, may be open-coded. For example, @code{cons} is usually
- recognized as a primitive, so that it compiles down to a single
- instruction.
- Compilation of Tree-IL usually begins with a pass that resolves some
- @code{<module-ref>} and @code{<toplevel-ref>} expressions to
- @code{<primitive-ref>} expressions. The actual compilation pass has
- special cases for calls to certain primitives, like @code{apply} or
- @code{cons}.
- @end deftp
- @deftp {Scheme Variable} <lexical-ref> src name gensym
- @deftpx {External Representation} (lexical @var{name} @var{gensym})
- A reference to a lexically-bound variable. The @var{name} is the
- original name of the variable in the source program. @var{gensym} is a
- unique identifier for this variable.
- @end deftp
- @deftp {Scheme Variable} <lexical-set> src name gensym exp
- @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
- Sets a lexically-bound variable.
- @end deftp
- @deftp {Scheme Variable} <module-ref> src mod name public?
- @deftpx {External Representation} (@@ @var{mod} @var{name})
- @deftpx {External Representation} (@@@@ @var{mod} @var{name})
- A reference to a variable in a specific module. @var{mod} should be
- the name of the module, e.g.@: @code{(guile-user)}.
- If @var{public?} is true, the variable named @var{name} will be looked
- up in @var{mod}'s public interface, and serialized with @code{@@};
- otherwise it will be looked up among the module's private bindings,
- and is serialized with @code{@@@@}.
- @end deftp
- @deftp {Scheme Variable} <module-set> src mod name public? exp
- @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
- @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
- Sets a variable in a specific module.
- @end deftp
- @deftp {Scheme Variable} <toplevel-ref> src name
- @deftpx {External Representation} (toplevel @var{name})
- References a variable from the current procedure's module.
- @end deftp
- @deftp {Scheme Variable} <toplevel-set> src name exp
- @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
- Sets a variable in the current procedure's module.
- @end deftp
- @deftp {Scheme Variable} <toplevel-define> src name exp
- @deftpx {External Representation} (define @var{name} @var{exp})
- Defines a new top-level variable in the current procedure's module.
- @end deftp
- @deftp {Scheme Variable} <conditional> src test then else
- @deftpx {External Representation} (if @var{test} @var{then} @var{else})
- A conditional. Note that @var{else} is not optional.
- @end deftp
- @deftp {Scheme Variable} <call> src proc args
- @deftpx {External Representation} (call @var{proc} . @var{args})
- A procedure call.
- @end deftp
- @deftp {Scheme Variable} <primcall> src name args
- @deftpx {External Representation} (primcall @var{name} . @var{args})
- A call to a primitive. Equivalent to @code{(call (primitive @var{name})
- . @var{args})}. This construct is often more convenient to generate and
- analyze than @code{<call>}.
- As part of the compilation process, instances of @code{(call (primitive
- @var{name}) . @var{args})} are transformed into primcalls.
- @end deftp
- @deftp {Scheme Variable} <seq> src head tail
- @deftpx {External Representation} (seq @var{head} @var{tail})
- A sequence. The semantics is that @var{head} is evaluated first, and
- any resulting values are ignored. Then @var{tail} is evaluated, in tail
- position.
- @end deftp
- @deftp {Scheme Variable} <lambda> src meta body
- @deftpx {External Representation} (lambda @var{meta} @var{body})
- A closure. @var{meta} is an association list of properties for the
- procedure. @var{body} is a single Tree-IL expression of type
- @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
- an alternate clause, this makes Tree-IL's @code{<lambda>} have the
- expressiveness of Scheme's @code{case-lambda}.
- @end deftp
- @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
- @deftpx {External Representation} @
- (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
- @var{body})@
- [@var{alternate}])
- One clause of a @code{case-lambda}. A @code{lambda} expression in
- Scheme is treated as a @code{case-lambda} with one clause.
- @var{req} is a list of the procedure's required arguments, as symbols.
- @var{opt} is a list of the optional arguments, or @code{#f} if there
- are no optional arguments. @var{rest} is the name of the rest
- argument, or @code{#f}.
- @var{kw} is a list of the form, @code{(@var{allow-other-keys?}
- (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
- keyword corresponding to the argument named @var{name}, and whose
- corresponding gensym is @var{var}, or @code{#f} if there are no keyword
- arguments. @var{inits} are tree-il expressions corresponding to all of
- the optional and keyword arguments, evaluated to bind variables whose
- value is not supplied by the procedure caller. Each @var{init}
- expression is evaluated in the lexical context of previously bound
- variables, from left to right.
- @var{gensyms} is a list of gensyms corresponding to all arguments:
- first all of the required arguments, then the optional arguments if
- any, then the rest argument if any, then all of the keyword arguments.
- @var{body} is the body of the clause. If the procedure is called with
- an appropriate number of arguments, @var{body} is evaluated in tail
- position. Otherwise, if there is an @var{alternate}, it should be a
- @code{<lambda-case>} expression, representing the next clause to try.
- If there is no @var{alternate}, a wrong-number-of-arguments error is
- signaled.
- @end deftp
- @deftp {Scheme Variable} <let> src names gensyms vals exp
- @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
- Lexical binding, like Scheme's @code{let}. @var{names} are the original
- binding names, @var{gensyms} are gensyms corresponding to the
- @var{names}, and @var{vals} are Tree-IL expressions for the values.
- @var{exp} is a single Tree-IL expression.
- @end deftp
- @deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
- @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
- @deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
- A version of @code{<let>} that creates recursive bindings, like
- Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
- @end deftp
- @deftp {Scheme Variable} <prompt> escape-only? tag body handler
- @deftpx {External Representation} (prompt @var{escape-only?} @var{tag} @var{body} @var{handler})
- A dynamic prompt. Instates a prompt named @var{tag}, an expression,
- during the dynamic extent of the execution of @var{body}, also an
- expression. If an abort occurs to this prompt, control will be passed
- to @var{handler}, also an expression, which should be a procedure. The
- first argument to the handler procedure will be the captured
- continuation, followed by all of the values passed to the abort. If
- @var{escape-only?} is true, the handler should be a @code{<lambda>} with
- a single @code{<lambda-case>} body expression with no optional or
- keyword arguments, and no alternate, and whose first argument is
- unreferenced. @xref{Prompts}, for more information.
- @end deftp
- @deftp {Scheme Variable} <abort> tag args tail
- @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
- An abort to the nearest prompt with the name @var{tag}, an expression.
- @var{args} should be a list of expressions to pass to the prompt's
- handler, and @var{tail} should be an expression that will evaluate to
- a list of additional arguments. An abort will save the partial
- continuation, which may later be reinstated, resulting in the
- @code{<abort>} expression evaluating to some number of values.
- @end deftp
- There are two Tree-IL constructs that are not normally produced by
- higher-level compilers, but instead are generated during the
- source-to-source optimization and analysis passes that the Tree-IL
- compiler does. Users should not generate these expressions directly,
- unless they feel very clever, as the default analysis pass will generate
- them as necessary.
- @deftp {Scheme Variable} <let-values> src names gensyms exp body
- @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
- Like Scheme's @code{receive} -- binds the values returned by
- evaluating @code{exp} to the @code{lambda}-like bindings described by
- @var{gensyms}. That is to say, @var{gensyms} may be an improper list.
- @code{<let-values>} is an optimization of a @code{<call>} to the
- primitive, @code{call-with-values}.
- @end deftp
- @deftp {Scheme Variable} <fix> src names gensyms vals body
- @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
- Like @code{<letrec>}, but only for @var{vals} that are unset
- @code{lambda} expressions.
- @code{fix} is an optimization of @code{letrec} (and @code{let}).
- @end deftp
- Tree-IL is a convenient compilation target from source languages. It
- can be convenient as a medium for optimization, though CPS is usually
- better. The strength of Tree-IL is that it does not fix order of
- evaluation, so it makes some code motion a bit easier.
- Optimization passes performed on Tree-IL currently include:
- @itemize
- @item Open-coding (turning toplevel-refs into primitive-refs,
- and calls to primitives to primcalls)
- @item Partial evaluation (comprising inlining, copy propagation, and
- constant folding)
- @end itemize
- @node Continuation-Passing Style
- @subsection Continuation-Passing Style
- @cindex CPS
- Continuation-passing style (CPS) is Guile's principal intermediate
- language, bridging the gap between languages for people and languages
- for machines. CPS gives a name to every part of a program: every
- control point, and every intermediate value. This makes it an excellent
- medium for reasoning about programs, which is the principal job of a
- compiler.
- @menu
- * An Introduction to CPS::
- * CPS in Guile::
- * Building CPS::
- * CPS Soup::
- * Compiling CPS::
- @end menu
- @node An Introduction to CPS
- @subsubsection An Introduction to CPS
- Consider the following Scheme expression:
- @lisp
- (begin
- (display "The sum of 32 and 10 is: ")
- (display 42)
- (newline))
- @end lisp
- Let us identify all of the sub-expressions in this expression,
- annotating them with unique labels:
- @lisp
- (begin
- (display "The sum of 32 and 10 is: ")
- |k1 k2
- k0
- (display 42)
- |k4 k5
- k3
- (newline))
- |k7
- k6
- @end lisp
- Each of these labels identifies a point in a program. One label may be
- the continuation of another label. For example, the continuation of
- @code{k7} is @code{k6}. This is because after evaluating the value of
- @code{newline}, performed by the expression labelled @code{k7}, we
- continue to apply it in @code{k6}.
- Which expression has @code{k0} as its continuation? It is either the
- expression labelled @code{k1} or the expression labelled @code{k2}.
- Scheme does not have a fixed order of evaluation of arguments, though it
- does guarantee that they are evaluated in some order. Unlike general
- Scheme, continuation-passing style makes evaluation order explicit. In
- Guile, this choice is made by the higher-level language compilers.
- Let us assume a left-to-right evaluation order. In that case the
- continuation of @code{k1} is @code{k2}, and the continuation of
- @code{k2} is @code{k0}.
- With this example established, we are ready to give an example of CPS in
- Scheme:
- @smalllisp
- (lambda (ktail)
- (let ((k1 (lambda ()
- (let ((k2 (lambda (proc)
- (let ((k0 (lambda (arg0)
- (proc k4 arg0))))
- (k0 "The sum of 32 and 10 is: ")))))
- (k2 display))))
- (k4 (lambda _
- (let ((k5 (lambda (proc)
- (let ((k3 (lambda (arg0)
- (proc k7 arg0))))
- (k3 42)))))
- (k5 display))))
- (k7 (lambda _
- (let ((k6 (lambda (proc)
- (proc ktail))))
- (k6 newline)))))
- (k1))
- @end smalllisp
- Holy code explosion, Batman! What's with all the lambdas? Indeed, CPS
- is by nature much more verbose than ``direct-style'' intermediate
- languages like Tree-IL. At the same time, CPS is simpler than full
- Scheme, because it makes things more explicit.
- In the original program, the expression labelled @code{k0} is in effect
- context. Any values it returns are ignored. In Scheme, this fact is
- implicit. In CPS, we can see it explicitly by noting that its
- continuation, @code{k4}, takes any number of values and ignores them.
- Compare this to @code{k2}, which takes a single value; in this way we
- can say that @code{k1} is in a ``value'' context. Likewise @code{k6} is
- in tail context with respect to the expression as a whole, because its
- continuation is the tail continuation, @code{ktail}. CPS makes these
- details manifest, and gives them names.
- @node CPS in Guile
- @subsubsection CPS in Guile
- @cindex continuation, CPS
- Guile's CPS language is composed of @dfn{continuations}. A continuation
- is a labelled program point. If you are used to traditional compilers,
- think of a continuation as a trivial basic block. A program is a
- ``soup'' of continuations, represented as a map from labels to
- continuations.
- @cindex term, CPS
- @cindex expression, CPS
- Like basic blocks, each continuation belongs to only one function. Some
- continuations are special, like the continuation corresponding to a
- function's entry point, or the continuation that represents the tail of
- a function. Others contain a @dfn{term}. A term contains an
- @dfn{expression}, which evaluates to zero or more values. The term also
- records the continuation to which it will pass its values. Some terms,
- like conditional branches, may continue to one of a number of
- continuations.
- Continuation labels are small integers. This makes it easy to sort them
- and to group them into sets. Whenever a term refers to a continuation,
- it does so by name, simply recording the label of the continuation.
- Continuation labels are unique among the set of labels in a program.
- Variables are also named by small integers. Variable names are unique
- among the set of variables in a program.
- For example, a simple continuation that receives two values and adds
- them together can be matched like this, using the @code{match} form from
- @code{(ice-9 match)}:
- @smallexample
- (match cont
- (($ $kargs (x-name y-name) (x-var y-var)
- ($ $continue k src ($ $primcall '+ #f (x-var y-var))))
- (format #t "Add ~a and ~a and pass the result to label ~a"
- x-var y-var k)))
- @end smallexample
- Here we see the most common kind of continuation, @code{$kargs}, which
- binds some number of values to variables and then evaluates a term.
- @deftp {CPS Continuation} $kargs names vars term
- Bind the incoming values to the variables @var{vars}, with original
- names @var{names}, and then evaluate @var{term}.
- @end deftp
- The @var{names} of a @code{$kargs} are just for debugging, and will end
- up residualized in the object file for use by the debugger.
- The @var{term} in a @code{$kargs} is always a @code{$continue}, which
- evaluates an expression and continues to a continuation.
- @deftp {CPS Term} $continue k src exp
- Evaluate the expression @var{exp} and pass the resulting values (if any)
- to the continuation labelled @var{k}. The source information associated
- with the expression may be found in @var{src}, which is either an alist
- as in @code{source-properties} or is @code{#f} if there is no associated
- source.
- @end deftp
- There are a number of expression kinds. Above you see an example of
- @code{$primcall}.
- @deftp {CPS Expression} $primcall name param args
- Perform the primitive operation identified by @code{name}, a well-known
- symbol, passing it the arguments @var{args}, and pass all resulting
- values to the continuation.
- @var{param} is a constant parameter whose interpretation is up to the
- primcall in question. Usually it's @code{#f} but for a primcall that
- might need some compile-time constant information -- such as
- @code{add/immediate}, which adds a constant number to a value -- the
- parameter holds this information.
- The set of available primitives includes many primitives known to
- Tree-IL and then some more; see the source code for details. Note that
- some Tree-IL primcalls need to be converted to a sequence of lower-level
- CPS primcalls. Again, see @code{(language tree-il compile-cps)} for
- full details.
- @end deftp
- @cindex dominate, CPS
- The variables that are used by @code{$primcall}, or indeed by any
- expression, must be defined before the expression is evaluated. An
- equivalent way of saying this is that predecessor @code{$kargs}
- continuation(s) that bind the variables(s) used by the expression must
- @dfn{dominate} the continuation that uses the expression: definitions
- dominate uses. This condition is trivially satisfied in our example
- above, but in general to determine the set of variables that are in
- ``scope'' for a given term, you need to do a flow analysis to see what
- continuations dominate a term. The variables that are in scope are
- those variables defined by the continuations that dominate a term.
- Here is an inventory of the kinds of expressions in Guile's CPS
- language, besides @code{$primcall} which has already been described.
- Recall that all expressions are wrapped in a @code{$continue} term which
- specifies their continuation.
- @deftp {CPS Expression} $const val
- Continue with the constant value @var{val}.
- @end deftp
- @deftp {CPS Expression} $prim name
- Continue with the procedure that implements the primitive operation
- named by @var{name}.
- @end deftp
- @deftp {CPS Expression} $call proc args
- Call @var{proc} with the arguments @var{args}, and pass all values to
- the continuation. @var{proc} and the elements of the @var{args} list
- should all be variable names. The continuation identified by the term's
- @var{k} should be a @code{$kreceive} or a @code{$ktail} instance.
- @end deftp
- @deftp {CPS Expression} $values args
- Pass the values named by the list @var{args} to the continuation.
- @end deftp
- @deftp {CPS Expression} $prompt escape? tag handler
- @end deftp
- @cindex higher-order CPS
- @cindex CPS, higher-order
- @cindex first-order CPS
- @cindex CPS, first-order
- There are two sub-languages of CPS, @dfn{higher-order CPS} and
- @dfn{first-order CPS}. The difference is that in higher-order CPS,
- there are @code{$fun} and @code{$rec} expressions that bind functions or
- mutually-recursive functions in the implicit scope of their use sites.
- Guile transforms higher-order CPS into first-order CPS by @dfn{closure
- conversion}, which chooses representations for all closures and which
- arranges to access free variables through the implicit closure parameter
- that is passed to every function call.
- @deftp {CPS Expression} $fun body
- Continue with a procedure. @var{body} names the entry point of the
- function, which should be a @code{$kfun}. This expression kind is only
- valid in higher-order CPS, which is the CPS language before closure
- conversion.
- @end deftp
- @deftp {CPS Expression} $rec names vars funs
- Continue with a set of mutually recursive procedures denoted by
- @var{names}, @var{vars}, and @var{funs}. @var{names} is a list of
- symbols, @var{vars} is a list of variable names (unique integers), and
- @var{funs} is a list of @code{$fun} values. Note that the @code{$kargs}
- continuation should also define @var{names}/@var{vars} bindings.
- @end deftp
- The contification pass will attempt to transform the functions declared
- in a @code{$rec} into local continuations. Any remaining @code{$fun}
- instances are later removed by the closure conversion pass. If the
- function has no free variables, it gets allocated as a constant.
- @deftp {CPS Expression} $const-fun label
- A constant which is a function whose entry point is @var{label}. As a
- constant, instances of @code{$const-fun} with the same @var{label} will
- not allocate; the space for the function is allocated as part of the
- compilation unit.
- In practice, @code{$const-fun} expressions are reified by CPS-conversion
- for functions whose call sites are not all visible within the
- compilation unit and which have no free variables. This expression kind
- is part of first-order CPS.
- @end deftp
- Otherwise, if the closure has free variables, it will be allocated at
- its definition site via an @code{allocate-words} primcall and its free
- variables initialized there. The code pointer in the closure is
- initialized from a @code{$code} expression.
- @deftp {CPS Expression} $code label
- Continue with the value of @var{label}, which should denote some
- @code{$kfun} continuation in the program. Used when initializing the
- code pointer of closure objects.
- @end deftp
- However, If the closure can be proven to never escape its scope then
- other lighter-weight representations can be chosen. Additionally, if
- all call sites are known, closure conversion will hard-wire the calls by
- lowering @code{$call} to @code{$callk}.
- @deftp {CPS Expression} $callk label proc args
- Like @code{$call}, but for the case where the call target is known to be
- in the same compilation unit. @var{label} should denote some
- @code{$kfun} continuation in the program. In this case the @var{proc}
- is simply an additional argument, since it is not used to determine the
- call target at run-time.
- @end deftp
- To summarize: a @code{$continue} is a CPS term that continues to a
- single label. But there are other kinds of CPS terms that can continue
- to a different number of labels: @code{$branch}, @code{$switch},
- @code{$throw}, and @code{$prompt}.
- @deftp {CPS Term} $branch kf kt src op param args
- Evaluate the branching primcall @var{op}, with arguments @var{args} and
- constant parameter @var{param}, and continue to @var{kt} with zero
- values if the test is true. Otherwise continue to @var{kf}.
- The @code{$branch} term is like a @code{$continue} term with a
- @code{$primcall} expression, except that instead of binding a value and
- continuing to a single label, the result of the test is not bound but
- instead used to choose the continuation label.
- The set of operations (corresponding to @var{op} values) that are valid
- in a @var{$branch} is limited. In the general case, bind the result of
- a test expression to a variable, and then make a @code{$branch} on a
- @code{true?} op referencing that variable. The optimizer should inline
- the branch if possible.
- @end deftp
- @deftp {CPS Term} $switch kf kt* src arg
- Continue to a label in the list @var{k*} according to the index argument
- @var{arg}, or to the default continuation @var{kf} if @var{arg} is
- greater than or equal to the length @var{k*}. The index variable
- @var{arg} is an unboxed, unsigned 64-bit value.
- The @code{$switch} term is like C's @code{switch} statement. The
- compiler to CPS can generate a @code{$switch} term directly, if the
- source language has such a concept, or it can rely on the CPS optimizer
- to turn appropriate chains of @code{$branch} statements to
- @code{$switch} instances, which is what the Scheme compiler does.
- @end deftp
- @deftp {CPS Term} $throw src op param args
- Throw a non-resumable exception. Throw terms do not continue at all.
- The usual value of @var{op} is @code{throw}, with two arguments
- @var{key} and @var{args}. There are also some specific primcalls that
- compile to the VM @code{throw/value} and @code{throw/value+data}
- instructions; see the code for full details.
- The advantage of having @code{$throw} as a term is that, because it does
- not continue, this allows the optimizer to gather more information from
- type predicates. For example, if the predicate is @code{char?} and the
- @var{kf} continues to a throw, the set of labels dominated by @var{kt}
- is larger than if the throw notationally continued to some label that
- would never be reached by the throw.
- @end deftp
- @deftp {CPS Term} $prompt k kh src escape? tag
- Push a prompt on the stack identified by the variable name @var{tag},
- which may be escape-only if @var{escape?} is true, and continue to
- @var{kh} with zero values. If the body aborts to this prompt, control
- will proceed at the continuation labelled @var{kh}, which should be a
- @code{$kreceive} continuation. Prompts are later popped by
- @code{pop-prompt} primcalls.
- @end deftp
- At this point we have described terms, expressions, and the most common
- kind of continuation, @code{$kargs}. @code{$kargs} is used when the
- predecessors of the continuation can be instructed to pass the values
- where the continuation wants them. For example, if a @code{$kargs}
- continuation @var{k} binds a variable @var{v}, and the compiler decides
- to allocate @var{v} to slot 6, all predecessors of @var{k} should put
- the value for @var{v} in slot 6 before jumping to @var{k}. One
- situation in which this isn't possible is receiving values from function
- calls. Guile has a calling convention for functions which currently
- places return values on the stack. A continuation of a call must check
- that the number of values returned from a function matches the expected
- number of values, and then must shuffle or collect those values to named
- variables. @code{$kreceive} denotes this kind of continuation.
- @deftp {CPS Continuation} $kreceive arity k
- Receive values on the stack. Parse them according to @var{arity}, and
- then proceed with the parsed values to the @code{$kargs} continuation
- labelled @var{k}. As a limitation specific to @code{$kreceive},
- @var{arity} may only contain required and rest arguments.
- @end deftp
- @code{$arity} is a helper data structure used by @code{$kreceive} and
- also by @code{$kclause}, described below.
- @deftp {CPS Data} $arity req opt rest kw allow-other-keys?
- A data type declaring an arity. @var{req} and @var{opt} are lists of
- source names of required and optional arguments, respectively.
- @var{rest} is either the source name of the rest variable, or @code{#f}
- if this arity does not accept additional values. @var{kw} is a list of
- the form @code{((@var{keyword} @var{name} @var{var}) ...)}, describing
- the keyword arguments. @var{allow-other-keys?} is true if other keyword
- arguments are allowed and false otherwise.
- Note that all of these names with the exception of the @var{var}s in the
- @var{kw} list are source names, not unique variable names.
- @end deftp
- Additionally, there are three specific kinds of continuations that are
- only used in function entries.
- @deftp {CPS Continuation} $kfun src meta self tail clause
- Declare a function entry. @var{src} is the source information for the
- procedure declaration, and @var{meta} is the metadata alist as described
- above in Tree-IL's @code{<lambda>}. @var{self} is a variable bound to
- the procedure being called, and which may be used for self-references.
- @var{tail} is the label of the @code{$ktail} for this function,
- corresponding to the function's tail continuation. @var{clause} is the
- label of the first @code{$kclause} for the first @code{case-lambda}
- clause in the function, or otherwise @code{#f}.
- @end deftp
- @deftp {CPS Continuation} $ktail
- A tail continuation.
- @end deftp
- @deftp {CPS Continuation} $kclause arity cont alternate
- A clause of a function with a given arity. Applications of a function
- with a compatible set of actual arguments will continue to the
- continuation labelled @var{cont}, a @code{$kargs} instance representing
- the clause body. If the arguments are incompatible, control proceeds to
- @var{alternate}, which is a @code{$kclause} for the next clause, or
- @code{#f} if there is no next clause.
- @end deftp
- @node Building CPS
- @subsubsection Building CPS
- Unlike Tree-IL, the CPS language is built to be constructed and
- deconstructed with abstract macros instead of via procedural
- constructors or accessors, or instead of S-expression matching.
- Deconstruction and matching is handled adequately by the @code{match}
- form from @code{(ice-9 match)}. @xref{Pattern Matching}. Construction
- is handled by a set of mutually builder macros:
- @code{build-term}, @code{build-cont}, and @code{build-exp}.
- In the following interface definitions, consider @code{term} and
- @code{exp} to be built by @code{build-term} or @code{build-exp},
- respectively. Consider any other name to be evaluated as a Scheme
- expression. Many of these forms recognize @code{unquote} in some
- contexts, to splice in a previously-built value; see the specifications
- below for full details.
- @deffn {Scheme Syntax} build-term ,val
- @deffnx {Scheme Syntax} build-term ($continue k src exp)
- @deffnx {Scheme Syntax} build-exp ,val
- @deffnx {Scheme Syntax} build-exp ($const val)
- @deffnx {Scheme Syntax} build-exp ($prim name)
- @deffnx {Scheme Syntax} build-exp ($fun kentry)
- @deffnx {Scheme Syntax} build-exp ($const-fun kentry)
- @deffnx {Scheme Syntax} build-exp ($code kentry)
- @deffnx {Scheme Syntax} build-exp ($rec names syms funs)
- @deffnx {Scheme Syntax} build-exp ($call proc (arg ...))
- @deffnx {Scheme Syntax} build-exp ($call proc args)
- @deffnx {Scheme Syntax} build-exp ($callk k proc (arg ...))
- @deffnx {Scheme Syntax} build-exp ($callk k proc args)
- @deffnx {Scheme Syntax} build-exp ($primcall name param (arg ...))
- @deffnx {Scheme Syntax} build-exp ($primcall name param args)
- @deffnx {Scheme Syntax} build-exp ($values (arg ...))
- @deffnx {Scheme Syntax} build-exp ($values args)
- @deffnx {Scheme Syntax} build-exp ($prompt escape? tag handler)
- @deffnx {Scheme Syntax} build-term ($branch kf kt src op param (arg ...))
- @deffnx {Scheme Syntax} build-term ($branch kf kt src op param args)
- @deffnx {Scheme Syntax} build-term ($switch kf kt* src arg)
- @deffnx {Scheme Syntax} build-term ($throw src op param (arg ...))
- @deffnx {Scheme Syntax} build-term ($throw src op param args)
- @deffnx {Scheme Syntax} build-term ($prompt k kh src escape? tag)
- @deffnx {Scheme Syntax} build-cont ,val
- @deffnx {Scheme Syntax} build-cont ($kargs (name ...) (sym ...) term)
- @deffnx {Scheme Syntax} build-cont ($kargs names syms term)
- @deffnx {Scheme Syntax} build-cont ($kreceive req rest kargs)
- @deffnx {Scheme Syntax} build-cont ($kfun src meta self ktail kclause)
- @deffnx {Scheme Syntax} build-cont ($kclause ,arity kbody kalt)
- @deffnx {Scheme Syntax} build-cont ($kclause (req opt rest kw aok?) kbody)
- Construct a CPS term, expression, or continuation.
- @end deffn
- There are a few more miscellaneous interfaces as well.
- @deffn {Scheme Procedure} make-arity req opt rest kw allow-other-keywords?
- A procedural constructor for @code{$arity} objects.
- @end deffn
- @deffn {Scheme Syntax} rewrite-term val (pat term) ...
- @deffnx {Scheme Syntax} rewrite-exp val (pat exp) ...
- @deffnx {Scheme Syntax} rewrite-cont val (pat cont) ...
- Match @var{val} against the series of patterns @var{pat...}, using
- @code{match}. The body of the matching clause should be a template in
- the syntax of @code{build-term}, @code{build-exp}, or @code{build-cont},
- respectively.
- @end deffn
- @node CPS Soup
- @subsubsection CPS Soup
- We describe programs in Guile's CPS language as being a kind of ``soup''
- because all continuations in the program are mixed into the same
- ``pot'', so to speak, without explicit markers as to what function or
- scope a continuation is in. A program in CPS is a map from continuation
- labels to continuation values. As discussed in the introduction, a
- continuation label is an integer. No label may be negative.
- As a matter of convention, label 0 should map to the @code{$kfun}
- continuation of the entry to the program, which should be a function of
- no arguments. The body of a function consists of the labelled
- continuations that are reachable from the function entry. A program can
- refer to other functions, either via @code{$fun} and @code{$rec} in
- higher-order CPS, or via @code{$const-fun}, @code{$callk}, and allocated
- closures in first-order CPS. The program logically contains all
- continuations of all functions reachable from the entry function. A
- compiler pass may leave unreachable continuations in a program;
- subsequent compiler passes should ensure that their transformations and
- analyses only take reachable continuations into account. It's OK though
- if transformation runs over all continuations if including the
- unreachable continuations has no effect on the transformations on the
- live continuations.
- @cindex intmap
- The ``soup'' itself is implemented as an @dfn{intmap}, a functional
- array-mapped trie specialized for integer keys. Intmaps associate
- integers with values of any kind. Currently intmaps are a private data
- structure only used by the CPS phase of the compiler. To work with
- intmaps, load the @code{(language cps intmap)} module:
- @example
- (use-modules (language cps intmap))
- @end example
- Intmaps are functional data structures, so there is no constructor as
- such: one can simply start with the empty intmap and add entries to it.
- @example
- (intmap? empty-intmap) @result{} #t
- (define x (intmap-add empty-intmap 42 "hi"))
- (intmap? x) @result{} #t
- (intmap-ref x 42) @result{} "hi"
- (intmap-ref x 43) @result{} @i{error: 43 not present}
- (intmap-ref x 43 (lambda (k) "yo!")) @result{} "yo"
- (intmap-add x 42 "hej") @result{} @i{error: 42 already present}
- @end example
- @code{intmap-ref} and @code{intmap-add} are the core of the intmap
- interface. There is also @code{intmap-replace}, which replaces the
- value associated with a given key, requiring that the key was present
- already, and @code{intmap-remove}, which removes a key from an intmap.
- Intmaps have a tree-like structure that is well-suited to set operations
- such as union and intersection, so there are also the binary
- @code{intmap-union} and @code{intmap-intersect} procedures. If the
- result is equivalent to either argument, that argument is returned
- as-is; in that way, one can detect whether the set operation produced a
- new result simply by checking with @code{eq?}. This makes intmaps
- useful when computing fixed points.
- If a key is present in both intmaps and the associated values are not
- the same in the sense of @code{eq?}, the resulting value is determined
- by a ``meet'' procedure, which is the optional last argument to
- @code{intmap-union}, @code{intmap-intersect}, and also to
- @code{intmap-add}, @code{intmap-replace}, and similar functions. The
- meet procedure will be called with the two values and should return the
- intersected or unioned value in some domain-specific way. If no meet
- procedure is given, the default meet procedure will raise an error.
- To traverse over the set of values in an intmap, there are the
- @code{intmap-next} and @code{intmap-prev} procedures. For example, if
- intmap @var{x} has one entry mapping 42 to some value, we would have:
- @example
- (intmap-next x) @result{} 42
- (intmap-next x 0) @result{} 42
- (intmap-next x 42) @result{} 42
- (intmap-next x 43) @result{} #f
- (intmap-prev x) @result{} 42
- (intmap-prev x 42) @result{} 42
- (intmap-prev x 41) @result{} #f
- @end example
- There is also the @code{intmap-fold} procedure, which folds over keys
- and values in the intmap from lowest to highest value, and
- @code{intmap-fold-right} which does so in the opposite direction. These
- procedures may take up to 3 seed values. The number of values that the
- fold procedure returns is the number of seed values.
- @example
- (define q (intmap-add (intmap-add empty-intmap 1 2) 3 4))
- (intmap-fold acons q '()) @result{} ((3 . 4) (1 . 2))
- (intmap-fold-right acons q '()) @result{} ((1 . 2) (3 . 4))
- @end example
- When an entry in an intmap is updated (removed, added, or changed), a
- new intmap is created that shares structure with the original intmap.
- This operation ensures that the result of existing computations is not
- affected by future computations: no mutation is ever visible to user
- code. This is a great property in a compiler data structure, as it lets
- us hold a copy of a program before a transformation and use it while we
- build a post-transformation program. Updating an intmap is O(log
- @var{n}) in the size of the intmap.
- However, the O(log @var{n}) allocation costs are sometimes too much,
- especially in cases when we know that we can just update the intmap in
- place. As an example, say we have an intmap mapping the integers 1 to
- 100 to the integers 42 to 141. Let's say that we want to transform this
- map by adding 1 to each value. There is already an efficient
- @code{intmap-map} procedure in the @code{(language cps utils}) module,
- but if we didn't know about that we might do:
- @example
- (define (intmap-increment map)
- (let lp ((k 0) (map map))
- (let ((k (intmap-next map k)))
- (if k
- (let ((v (intmap-ref map k)))
- (lp (1+ k) (intmap-replace map k (1+ v))))
- map))))
- @end example
- @cindex intmap, transient
- @cindex transient intmaps
- Observe that the intermediate values created by @code{intmap-replace}
- are completely invisible to the program -- only the last result of
- @code{intmap-replace} value is needed. The rest might as well share
- state with the last one, and we could update in place. Guile allows
- this kind of interface via @dfn{transient intmaps}, inspired by
- Clojure's transient interface (@uref{http://clojure.org/transients}).
- The in-place @code{intmap-add!} and @code{intmap-replace!} procedures
- return transient intmaps. If one of these in-place procedures is called
- on a normal persistent intmap, a new transient intmap is created. This
- is an O(1) operation. In all other respects the interface is like their
- persistent counterparts, @code{intmap-add} and @code{intmap-replace}.
- If an in-place procedure is called on a transient intmap, the intmap is
- mutated in-place and the same value is returned.
- If a persistent operation like @code{intmap-add} is called on a
- transient intmap, the transient's mutable substructure is then marked as
- persistent, and @code{intmap-add} then runs on a new persistent intmap
- sharing structure but not state with the original transient. Mutating a
- transient will cause enough copying to ensure that it can make its
- change, but if part of its substructure is already ``owned'' by it, no
- more copying is needed.
- We can use transients to make @code{intmap-increment} more efficient.
- The two changed elements have been marked @strong{like this}.
- @example
- (define (intmap-increment map)
- (let lp ((k 0) (map map))
- (let ((k (intmap-next map k)))
- (if k
- (let ((v (intmap-ref map k)))
- (lp (1+ k) (@strong{intmap-replace!} map k (1+ v))))
- (@strong{persistent-intmap} map)))))
- @end example
- Be sure to tag the result as persistent using the
- @code{persistent-intmap} procedure to prevent the mutability from
- leaking to other parts of the program. For added paranoia, you could
- call @code{persistent-intmap} on the incoming map, to ensure that if it
- were already transient, that the mutations in the body of
- @code{intmap-increment} wouldn't affect the incoming value.
- In summary, programs in CPS are intmaps whose values are continuations.
- See the source code of @code{(language cps utils)} for a number of
- useful facilities for working with CPS values.
- @node Compiling CPS
- @subsubsection Compiling CPS
- Compiling CPS in Guile has three phases: conversion, optimization, and
- code generation.
- CPS conversion is the process of taking a higher-level language and
- compiling it to CPS. Source languages can do this directly, or they can
- convert to Tree-IL (which is probably easier) and let Tree-IL convert to
- CPS later. Going through Tree-IL has the advantage of running Tree-IL
- optimization passes, like partial evaluation. Also, the compiler from
- Tree-IL to CPS handles assignment conversion, in which assigned local
- variables (in Tree-IL, locals that are @code{<lexical-set>}) are
- converted to being boxed values on the heap. @xref{Variables and the
- VM}.
- After CPS conversion, Guile runs some optimization passes over the CPS.
- Most optimization in Guile is done on the CPS language. The one major
- exception is partial evaluation, which for historic reasons is done on
- Tree-IL.
- The major optimization performed on CPS is contification, in which
- functions that are always called with the same continuation are
- incorporated directly into a function's body. This opens up space for
- more optimizations, and turns procedure calls into @code{goto}. It can
- also make loops out of recursive function nests. Guile also does dead
- code elimination, common subexpression elimination, loop peeling and
- invariant code motion, and range and type inference.
- The rest of the optimization passes are really cleanups and
- canonicalizations. CPS spans the gap between high-level languages and
- low-level bytecodes, which allows much of the compilation process to be
- expressed as source-to-source transformations. Such is the case for
- closure conversion, in which references to variables that are free in a
- function are converted to closure references, and in which functions are
- converted to closures. There are a few more passes to ensure that the
- only primcalls left in the term are those that have a corresponding
- instruction in the virtual machine, and that their continuations expect
- the right number of values.
- Finally, the backend of the CPS compiler emits bytecode for each
- function, one by one. To do so, it determines the set of live variables
- at all points in the function. Using this liveness information, it
- allocates stack slots to each variable, such that a variable can live in
- one slot for the duration of its lifetime, without shuffling. (Of
- course, variables with disjoint lifetimes can share a slot.) Finally
- the backend emits code, typically just one VM instruction, for each
- continuation in the function.
- @node Bytecode
- @subsection Bytecode
- As mentioned before, Guile compiles all code to bytecode, and that
- bytecode is contained in ELF images. @xref{Object File Format}, for
- more on Guile's use of ELF.
- To produce a bytecode image, Guile provides an assembler and a linker.
- The assembler, defined in the @code{(system vm assembler)} module, has a
- relatively straightforward imperative interface. It provides a
- @code{make-assembler} function to instantiate an assembler and a set of
- @code{emit-@var{inst}} procedures to emit instructions of each kind.
- The @code{emit-@var{inst}} procedures are actually generated at
- compile-time from a machine-readable description of the VM. With a few
- exceptions for certain operand types, each operand of an emit procedure
- corresponds to an operand of the corresponding instruction.
- Consider @code{allocate-words}, from @pxref{Memory Access Instructions}.
- It is documented as:
- @deftypefn Instruction {} allocate-words s12:@var{dst} s12:@var{nwords}
- @end deftypefn
- Therefore the emit procedure has the form:
- @deffn {Scheme Procedure} emit-allocate-words asm dst nwords
- @end deffn
- All emit procedure take the assembler as their first argument, and
- return no useful values.
- The argument types depend on the operand types. @xref{Instruction Set}.
- Most are integers within a restricted range, though labels are generally
- expressed as opaque symbols. Besides the emitters that correspond to
- instructions, there are a few additional helpers defined in the
- assembler module.
- @deffn {Scheme Procedure} emit-label asm label
- Define a label at the current program point.
- @end deffn
- @deffn {Scheme Procedure} emit-source asm source
- Associate @var{source} with the current program point.
- @end deffn
- @deffn {Scheme Procedure} emit-cache-ref asm dst key
- @deffnx {Scheme Procedure} emit-cache-set! asm key val
- Macro-instructions to implement compilation-unit caches. A single cache
- cell corresponding to @var{key} will be allocated for the compilation
- unit.
- @end deffn
- @deffn {Scheme Procedure} emit-load-constant asm dst constant
- Load the Scheme datum @var{constant} into @var{dst}.
- @end deffn
- @deffn {Scheme Procedure} emit-begin-program asm label properties
- @deffnx {Scheme Procedure} emit-end-program asm
- Delimit the bounds of a procedure, with the given @var{label} and the
- metadata @var{properties}.
- @end deffn
- @deffn {Scheme Procedure} emit-load-static-procedure asm dst label
- Load a procedure with the given @var{label} into local @var{dst}. This
- macro-instruction should only be used with procedures without free
- variables -- procedures that are not closures.
- @end deffn
- @deffn {Scheme Procedure} emit-begin-standard-arity asm req nlocals alternate
- @deffnx {Scheme Procedure} emit-begin-opt-arity asm req opt rest nlocals alternate
- @deffnx {Scheme Procedure} emit-begin-kw-arity asm req opt rest kw-indices allow-other-keys? nlocals alternate
- @deffnx {Scheme Procedure} emit-end-arity asm
- Delimit a clause of a procedure.
- @end deffn
- The linker is a complicated beast. Hackers interested in how it works
- would do well do read Ian Lance Taylor's series of articles on linkers.
- Searching the internet should find them easily. From the user's
- perspective, there is only one knob to control: whether the resulting
- image will be written out to a file or not. If the user passes
- @code{#:to-file? #t} as part of the compiler options (@pxref{The Scheme
- Compiler}), the linker will align the resulting segments on page
- boundaries, and otherwise not.
- @deffn {Scheme Procedure} link-assembly asm #:page-aligned?=#t
- Link an ELF image, and return the bytevector. If @var{page-aligned?} is
- true, Guile will align the segments with different permissions on
- page-sized boundaries, in order to maximize code sharing between
- different processes. Otherwise, padding is minimized, to minimize
- address space consumption.
- @end deffn
- To write an image to disk, just use @code{put-bytevector} from
- @code{(ice-9 binary-ports)}.
- Compiling object code to the fake language, @code{value}, is performed
- via loading objcode into a program, then executing that thunk with
- respect to the compilation environment. Normally the environment
- propagates through the compiler transparently, but users may specify the
- compilation environment manually as well, as a module. Procedures to
- load images can be found in the @code{(system vm loader)} module:
- @lisp
- (use-modules (system vm loader))
- @end lisp
- @deffn {Scheme Variable} load-thunk-from-file file
- @deffnx {C Function} scm_load_thunk_from_file (file)
- Load object code from a file named @var{file}. The file will be mapped
- into memory via @code{mmap}, so this is a very fast operation.
- @end deffn
- @deffn {Scheme Variable} load-thunk-from-memory bv
- @deffnx {C Function} scm_load_thunk_from_memory (bv)
- Load object code from a bytevector. The data will be copied out of the
- bytevector in order to ensure proper alignment of embedded Scheme
- values.
- @end deffn
- Additionally there are procedures to find the ELF image for a given
- pointer, or to list all mapped ELF images:
- @deffn {Scheme Variable} find-mapped-elf-image ptr
- Given the integer value @var{ptr}, find and return the ELF image that
- contains that pointer, as a bytevector. If no image is found, return
- @code{#f}. This routine is mostly used by debuggers and other
- introspective tools.
- @end deffn
- @deffn {Scheme Variable} all-mapped-elf-images
- Return all mapped ELF images, as a list of bytevectors.
- @end deffn
- @node Writing New High-Level Languages
- @subsection Writing New High-Level Languages
- In order to integrate a new language @var{lang} into Guile's compiler
- system, one has to create the module @code{(language @var{lang} spec)}
- containing the language definition and referencing the parser,
- compiler and other routines processing it. The module hierarchy in
- @code{(language brainfuck)} defines a very basic Brainfuck
- implementation meant to serve as easy-to-understand example on how to
- do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
- for more information about the Brainfuck language itself.
- @node Extending the Compiler
- @subsection Extending the Compiler
- At this point we take a detour from the impersonal tone of the rest of
- the manual. Admit it: if you've read this far into the compiler
- internals manual, you are a junkie. Perhaps a course at your university
- left you unsated, or perhaps you've always harbored a desire to hack the
- holy of computer science holies: a compiler. Well you're in good
- company, and in a good position. Guile's compiler needs your help.
- There are many possible avenues for improving Guile's compiler.
- Probably the most important improvement, speed-wise, will be some form
- of optimized ahead-of-time native compilation with global register
- allocation. A first pass could simply extend the compiler to also emit
- machine code in addition to bytecode, pre-filling the corresponding JIT
- data structures referenced by the @code{instrument-entry} bytecodes.
- @xref{Instrumentation Instructions}.
- The compiler also needs help at the top end, adding new high-level
- compilers. We have JavaScript and Emacs Lisp mostly complete, but they
- could use some love; Lua would be nice as well, but whatever language it
- is that strikes your fancy would be welcome too.
- Compilers are for hacking, not for admiring or for complaining about.
- Get to it!
|