MITDDC
/
clu-1976-1989
spiegel van https://github.com/MITDDC/clu-1976-1989


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
							.dv xgp
.fo 0 fonts;25vg kst
.fo 1 fonts;25vgb kst
.fo 2 fonts;25vri kst
.fo 3 fonts;31vgb kst
.nr verbose 1
.so r;r macros
.ls 1.5
.tr ` 
An idealized scheme of compilation is that the parser produces a tree with
"forms" in it.
A form is an "unknown" idn appearing as an expression or constant
(in particular, as a type).
The legality of the form is unknown until equates are processed.
After equates have been processed (or as they are processed),
a new tree is built that contains no forms.
The type checker and code generator work on this new tree.

This is not a suitable implementation of compilation;
rebuilding the parse tree is expensive in time (it requires an extra pass)
and space.
The important decision to make is: when and how to process equates.
There are two possible ways to go.
One way is to process equates after parsing,
and make the parse tree (in particular, idns and typespecs) mutable
so that rebuilding the tree is not necessary.
The other way is to process equates during the parse.
This way,
only module headers need to be rebuilt,
and typespecs (and perhaps idns) can be immutable.

Before going on,
some background will be helpful.
For efficiency,
the parser needs to do two things (in addition to parsing):
canonicalize idns, and canonicalize typespecs.
Idns need to be canonicalized for declaration- and type- checking,
and for emitting (short, uniform) variable names.
Typespecs need to be canonicalized for efficient type comparison,
and for emitting (a minimal number of) type descriptors.
Canonicalization involves,
at the least,
inserting a unique (to the compilation) id in the idn or typespec,
and may also mean having a unique object for each idn or typespec.

If we process equates after parsing then we need mutable idns so that,
once we compute the value of an equate,
we can "substitute" the value everywhere in one fell swoop.
(For efficiency I want an idn to contain info, not just a pointer to info.)
We need mutable typespecs so that,
once we discover that x`=`int, we can substitute int for the typespec x
everywhere in the tree in one operation.
(I don't want to just change the idn contained in the typespec,
because this makes type comparison slower and more complicated.)
Note that equality on typespecs is now equality on uids,
not rep equality.
However,
the parser is fairly free of semantic information;
it knows about scope,
but that's all.
Canonicalizing idns allows declaration checking to be done along with type checking.

If we put declaration checking and equate processing into the "parser",
we get a cleaner type checker,
one that does not have to worry about the possibility that x is not a type
in x$foo.
However,
we have not gotten rid of any problems;
the parser still has to do the checking.
We could get immutable typespecs.
Idns might still be mutable,
but the only idns a typespec could contain are parameters,
and these could be represented by integers instead of actual idns.
The good thing about this scheme is that there is no tricky use of mutability.
One bad thing is that we have thrown a lot of semantic information into the "parser".
But even worse,
I think,
is that there are then two sets of abstract syntax equates for types and expressions,
and three sets of procedures for dealing with them.
(One for parsing equates and module headers,
one for rebuilding module headers,
and one for "normal" parsing.)
The addition of these procedures would double the size of the (existing) parser.

A question that is sort of related to all of this is,
how does the canonicalization of idns and typespecs interact with our notion of
using a compilation environment (CE), and using the library for type-checking?
A CE need not be the most obvious map from strings to values.
Rather,
we note that the parser uses two (hash) tables for canonicalizing idns and typespecs;
a CE can almost be represented by such a pair of tables.
However, the idn table cannot be used by the parser unless the parser does
both declaration checking and equate processing.
If the parser does not do these,
then a CE is represented by a type table and a map from strings to values.
A CE might contain information only down to DUs,
and the DU specs would have to be retrieved from the library,
or it might contain all of the specs, too,
in which case the library need not be accessed during type checking.
As for the representation of typespecs within DU specs,
it doesn't really matter if they are actual typespecs or just something like typespecs.
They will have to be (re-)canonicalized when read in,
so a recursive walk is necessary regardless of their representation.
Writing them out as actual typespecs puts out some "useless" info,
but not all that much.