type-decisions.txt 20 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556
  1. Notes on the type system.
  2. =========================
  3. 2007-Sep-09
  4. -----------
  5. NO DAMN TUPLES
  6. Decided against transparent tuple and union types. There is already
  7. enough machinery to do heterogeneous sums and unions with recs. Tuples
  8. tend to only grow to 2 or 3 before they're a programmer hazard
  9. anyways, and tup[int,float] is not much different than pair[int,float]
  10. where pair is a library module alias for tup2[a,b]. We can probably
  11. define tupN up to 9 and be done with it.
  12. Likewise for unions, it's rare to usefully union anything other than a
  13. pair of things or "anything". We need a separate dyn for "anything"
  14. and a pair-union we can do with either[a,b]. In a pinch,
  15. either3[a,b,c].
  16. There is too much supporting syntax and structural/nominal baggage,
  17. and it's too tempting to get into destructuring assignment in order to
  18. "pick apart and name" the parts of a tuple. Neither of these are
  19. desirable. You get field names with records.
  20. There was an earlier thought about keeping tuples around for handling
  21. polymorphic apply, but I decided this is not really a problem. You can
  22. do apply in terms of a nested type associated with the function type:
  23. the arg-record type. It's the "call message" from hermes. Same thing.
  24. DATA AND TYPE
  25. *Did* decide to split rec and alt into "data" declarations (not
  26. "type") such that nominal types ( / ML style data constructors) are
  27. introduced with "data" and transparent types with "type". Type
  28. expressions as we might see in a type parameter list can only use
  29. transparent type stuff, or name data types by name; rec and alt
  30. declarations are not legal in type expressions. This helps clarify
  31. several use contexts of type expressions.
  32. LITERALS AND NEW
  33. A primary example is that these are all valid "new" expressions:
  34. new foo { a=b, c=d } // where foo is a rec data type
  35. new func(int a) -> b { ... }
  36. new vec[int] { 1, 2, 3 }
  37. But this, in particular, is not!
  38. // Butt-ugly, wrong, and incomparable with rec foo!
  39. new rec {int a; int c;} { a = b, c = d } // NO!
  40. // Do this instead
  41. data foo = rec { int a, int c }
  42. new foo { a = b, c = d }
  43. NO WAIT, WE NEED TUPLES
  44. Crap. I've gone back and forth on tuples so many times my head is
  45. getting dizzy. Here is a possible reason we need tuples. The possible
  46. reason is that a function that looks like it's of type (int,int)->int
  47. is supposed to be compatible with any other function of that type,
  48. not just those that *name the args the same way*
  49. If we associate an arg record with a function, and say the function is
  50. "really" from x->y where x is the arg record and y is the return
  51. record, we're really saying "records are structurally comparable". But
  52. we don't want them to be! We want record types x and y to be
  53. incompatible since x is a different name than y.
  54. I think this might be a clincher.
  55. *Sigh* ok.
  56. Earlier I was thinking of a "stripped down" set of rules given the
  57. notion of tuples: give up on parameter passing modes entirely and
  58. imply it all from limitation. I think we may still be ok with
  59. that. There was a corner case -- a single limited function arg with a
  60. single non-limited slot you *don't* want to move in -- but I can
  61. imagine a few possible solutions to that:
  62. 1. limited implies move-in-and-out, not just move-in. Possible, but
  63. also a little sticky; you have to move an option in if you
  64. want the function to consume the arg and not give it back. Bleah.
  65. 2. maybe the problem is imaginary. limited implies move-in only if
  66. the arg is still an lval during tuple-expression evaluation. If
  67. it's any other sort of lval, we evaluate fully to an rval,
  68. meaning we've already copied from the lval into a temp slot and
  69. we've simply going to move the *temp slot* into the function and
  70. back.
  71. but no, that's a little muddled. Evaluating an expression on the
  72. RHS of a move should produce an lval -- in fact it's
  73. syntactically restricted to an lval -- but on the sides of a binary
  74. "," operator or inside a 1-ary function param list, maybe it
  75. should produce an lval as well? tricky. lvals as anything other
  76. than syntactic categories are js "REFs", which are a little bogus.
  77. the central problem is that we have no 1-ary tuple type, and it's
  78. odd or possibly silly to even propose one (what, parens?)
  79. Another possibility is to give up on the "no param modes, just use
  80. limitation and build expressions" argument, and say the following:
  81. There's an argument tuple. Tuples are denoted tup[p,q,r]. The arg
  82. tuple is not built by the commas in the arg list. It's built by the
  83. call binding code analyzing the param passing modes and copying or
  84. moving stuff into the tuple, then moving the tuple, then copying or
  85. moving stuff out into the result scope, then repeating on the way
  86. back.
  87. Yeah. That's probably what we're going to have to do. If we're going
  88. to do that, maybe we should support an sum[p,q,r] transparent union as
  89. well?
  90. Or alternatively, we could spell tup[p,q,r] as "rec[p,q,r]" since rec
  91. is a prohibited keyword in type expressions the rest of the time, and
  92. we could spell mix[p,q,r] as "alt[p,q,r]". Then we could call the
  93. former "anonymous records" and the latter "anonymous alternatives".
  94. Ooh, shivers! I like this. It has a nice ... orthogonality!
  95. Though actually, using "tup" and "sum" might be better. Gives the
  96. concepts words. They're really different concepts (and nested anys
  97. may in fact normalize out...)
  98. ok, WE HAVE TUPLES BUT THEY ARE CALLED TUP[A,B,C] AND THEY DO NOT INVOLVE
  99. THE BINARY COMMA "OPERATOR" OR SUCH NONSENSE, AND WE STILL NEED CODE TO
  100. ANALYZE PARAMETER MODES
  101. 2007-Sep-10
  102. -----------
  103. Curious. Is alt really that different than sum? consider:
  104. rec nodefoo { ... }
  105. rec nodebar { ... }
  106. type node = alt[nodefoo, nodebar];
  107. node x <- rec nodefoo { ... };
  108. Well, consider instead the notion of encoding some semantic meaning
  109. in the constructors. Say you were trying to differentiate centigrade from
  110. fahrenheit.
  111. alt temp { dec fahr;
  112. dec cent; }
  113. you can construct new temp instances, say, like so:
  114. temp x = temp.fahr(10.0);
  115. now let's look at how you replicate this without
  116. rec fahr { dec degrees }
  117. rec cent { dec degrees }
  118. type temp = alt[fahr, cent];
  119. temp x = rec fahr { dec = 10.0 }
  120. I think I'd prefer the ability to do the first. but I'll have to sleep
  121. on it. damn.
  122. 2007-Sep-11
  123. -----------
  124. Sleep says: no anonymous sums. They're not useful enough to
  125. justify. So then:
  126. Declarations:
  127. data foo = alt { int a; int b; }
  128. data bar = rec { int x; int y; }
  129. expressions:
  130. foo p = foo.a(10);
  131. bar q = bar { x = 10; y = 12; }
  132. how do we syntactically select the second case? we've seen an lval; if
  133. the lval is a name *and* the peek is '{', we accept it as a
  134. record-constructor call. Otherwise it's a parse error. No 'new'.
  135. foo.a(10) is a normal function call (to a function the system
  136. generates for the type record foo). the alt-ness of foo is only
  137. special in an alt stmt.
  138. you get enums this way too:
  139. data state = alt { running; stopped; }
  140. state k = state.running;
  141. 'state.running' is a constant artifact, also allocated by the runtime. Again,
  142. we differentiate a constant alt arm from a constructor alt arm
  143. 2007-Sep-14
  144. -----------
  145. although, sigh, if we permit bar { ... } as a form of expression,
  146. we're not quite LL(1) anymore. Hm. yeah, we still are: we just define
  147. the *grammar* as permitting lvals on the LHS and restrict the
  148. *semantics* to say "only name lvals, not any old lvals". It still
  149. parses as lvals, since names are a subset of lvals, and you don't need
  150. to backtrack if you assume only lval.
  151. also note: using state.running is no good; we want 'state' to refer to a
  152. data/type artifact, not a record full of functions. Let's adopt what MLs
  153. do here and just say the constructors are unqualified functions / constants.
  154. alt color { red, green, blue }
  155. means you can use unqualified 'red', 'green' and 'blue'
  156. constants. Period. Then 'color' is a data/type, and color.foo are
  157. various attributes of that type.
  158. some exploration of protocols, constraints, 'plugs', and the relationship
  159. between procs and progs.
  160. One way is to say this:
  161. lim native proc;
  162. native prog;
  163. lim rec obj[PLUGS] {
  164. proc self;
  165. prog code;
  166. PLUGS plugs;
  167. }
  168. plug canvas {
  169. chan draw_rect(int x, int y, nat w, nat h) -> nil;
  170. chan clear() -> nil;
  171. }
  172. auto prog my_canvas(canvas) {
  173. plug mycanv(canvas) {
  174. port draw_rect(int x, int y, nat w, nat h) -> nil {
  175. // blah blah.
  176. }
  177. port clear() -> nil {
  178. //
  179. }
  180. }
  181. }
  182. let my_canvas.ty cobj = my_canvas.new(x, y, z);
  183. let canvas c = cobj.plugs.mycanv;
  184. c.draw_rect(10, 10, 20, 20);
  185. Various points:
  186. - the prog p defines p.plug_ty and p.ty = obj[p.plug_ty]. you probably
  187. only ever have to refer to p.ty and p.new to instantiate one.
  188. - each plug name inside a prog must be unique, because they make up
  189. fields in the implicit 'plugs' record type associated with the prog.
  190. - if you don't use plug foo(bar) but just say plug foo, the plug is
  191. called foo.
  192. - plug renaming is legal inside the prog using plug foo(bar) { ... }.
  193. so there is never any name collision during multiple-plug support.
  194. - plugs can be multiply-supported in a prog using plug foo2(bar) { ... }.
  195. this gives you a second bar plug, perhaps one that does debugging,
  196. or uses a different strategy, or is simply a newer version.
  197. - plugs can be delegated using "plug foo(bar) = other;" where other is
  198. a name of a plug found somewhere in the prog.
  199. - plugs can be partially delegated using
  200. "plug foo(bar) = other { .. exeception ports ... }"
  201. which is as close as we're going to come to supporting "inheritence"!
  202. 2008-May-24
  203. -----------
  204. some simplifications:
  205. Plugs, dead. ADTs (modules) live. Types can be abstract in
  206. them. Visible inside, invisible outside. Existential style,
  207. SML+functor style or possibly a bit like mixml but with any
  208. simplifications required to make it behave with out DAG storage model
  209. (i.e. perhaps iso-recursive cyclical types permitted in module,
  210. cyclical function calls, but no cyclical value ownership. Module
  211. refcounted as a whole? possibly but assume existential-style at worst.)
  212. Channels with signatures and call protocols dead. Auto ports
  213. dead. Channels with *data types* live, buffered and async (overflow =>
  214. drop, erlang style, or at least call a condition handler) using
  215. movable (non-copyable) ports and alt statements that literally just
  216. pull out a value from the queue. predicate: armed(port).
  217. Move in / out / inout on fn signatures dead. Copyables are copied unless
  218. annotated with @, which means alias (as before). Non-copyable are moved in
  219. and dropped. Non-copyable return values are moved-out. Encode what you
  220. mean in copyable/non-copyable nature. If you have a move-out on yield it
  221. is actually threaded in and out of the callee with each yield; you can
  222. modify it in the caller and pass it "back down", between yields. This
  223. provides a system for 'send', the facility in some generator-systems,
  224. without as much complexity.
  225. copyability of a composite structure is inferred by the contents of
  226. the structure, is provided by the compiler, cannot be sensibly
  227. dropped, cannot be asserted.
  228. init is implied as true for all statically named members of a
  229. composite structure (rec or alt), implied as true for every incoming
  230. arg of a fn and outgoing ret val of a fn, and every local val in a
  231. proc. init is *not* implied on local slots, it's computed by dataflow.
  232. I guess you can put prove init(x); someplace to ensure it's doing what
  233. you mean.
  234. 2008-May-30
  235. -----------
  236. Can you alias a non-copyable? Hmm. There is very little point! You
  237. would not be able to copy an alias-of-a-non-copyable and, as it's an
  238. alias, you would not be able to move (modify) it either. The only use
  239. would be taking an alias to a non-copyable record that has copyable
  240. components you wish to extract. Useful? Perhaps. Does this violate the
  241. ownership model? I do not think so. Hard to prove though. Let's assume
  242. no aliasing of non-copyables. An alias is a cheap, non-transplanting
  243. copy. There are not many things to do with non-copyables anyways.
  244. 2008-June-03
  245. ------------
  246. More on module systems. Let's assume we have support for modules with
  247. types and values. And suppose we have first-class modules, so I can
  248. say
  249. // a module type
  250. type mt = mod { type e; fn mk()->e; fn eat(e)->nil; }
  251. // a 1st class module value, of module type
  252. let mt mv = mod { type e = int; fn mk()->e { ret 1; } fn eat(int i) { } }
  253. // a module use, turning a first class module into
  254. // a member of the static environment.
  255. use mt mm = mv;
  256. let mm.e me = mm.mk();
  257. mm.eat(me);
  258. What happens when we let 'me' escape from this scope? It has a type
  259. that cannot be denoted statically outside the scope, to begin with! So
  260. there would be no way of writing a function that gives it as a return
  261. value. Hooray for not supporting type inference :)
  262. Maybe this will work? Why don't most module systems work this way? Hmm.
  263. 2008-June-05
  264. ------------
  265. Concrete example. Two files implementing sets. One uses lists, one uses
  266. trees. Common module type interface for both.
  267. type set[e] = mod { type t;
  268. let t empty;
  269. fn union(@t a, @t b) -> t;
  270. fn singleton(@e v) -> t;
  271. fn contains(@e v) -> bool; }
  272. mod listSet[e] = set[e] { type t = list[e].t;
  273. let t empty = new list[e].t({});
  274. fn union(@t a, @t b) -> t { ret list[e].union(a,b); }
  275. fn singleton(@e v) -> t { ret new list[e].t({v}); }
  276. fn contains(@e v) -> bool { ret list[e].contains(v); } }
  277. mod treeSet[e] = set[e] { type t = tree[e].t;
  278. let t empty = new tree[e].t({});
  279. fn union(@t a, @t b) -> t { ret tree[e].union(a,b); }
  280. fn singleton(@e v) -> t { ret new tree[e].t({v}); }
  281. fn contains(@e v) -> bool { ret tree[e].contains(v); } }
  282. fn doSomeSetStuff[e](@set[e] s, @e elt) -> nil
  283. {
  284. use set[e] sm = s[e];
  285. sm.t v = sm.union(sm.empty, sm.singleton(elt));
  286. }
  287. let set[int] s1 = listSet[int];
  288. let set[int] s2 = treeSet[int];
  289. let set[int] s3 = mod set[int] { type t = []; ... }
  290. doSomeSetStuff[int](s1, 10);
  291. doSomeSetStuff[int](s2, 10);
  292. doSomeSetStuff[int](s3, 10);
  293. This has 4 new syntactic forms (the first-class module literal
  294. expression above doesn't work):
  295. - a possibly-generative module declaration: mod modName[params] = modType[args] { mod-elts }
  296. - a module type expression: mod { mod-ty-elts }
  297. - a module expression: mod modType[args] { mod-elts }
  298. - a module-use that "opens" a module, assigning new opaque types to
  299. each of its existential members.
  300. 2008-June-08
  301. ------------
  302. prog is a special form for a reason.
  303. first, prog has a dtor, which is simply useful. but we have finally so we can argue that
  304. a dtor is not enough to argue for a prog.
  305. more importantly, prog has local vars that are visible to all fns declared in prog, but
  306. can only be written-to by init, main and fini (none of which call one another).
  307. prog can also have dyns declared in it, which are visible to all the fns within it.
  308. these two make them *somewhat* like classes.
  309. (also they have an implicit module of chans connected to all their
  310. ports, manufactured when they're allocated)
  311. 2008-June-13
  312. ------------
  313. ALL TYPES COPYABLE, NOT ALL TRANSMITTABLE
  314. You can always form another reference to a process or port, to stick
  315. elsewhere in a data structure. You just can't always <- send it
  316. through a channel. So the proper predicate on a type is
  317. "transmittable", not "copyable". You won't form cycles because you
  318. can't store "into" a proc or port. The "move" operator is dead.
  319. One simply cannot declare a port or chan that carries a non-transmit
  320. type. Therefore each type is marked with whether or not it's a transmit
  321. type, and every type-parametric binding is marked with whether or not
  322. it can be instantiated for non-transmit types.
  323. Since this inference might be confusing, we require that parametric
  324. bindings carry a leading 'lim' on parameters that can handle limited
  325. types, and we make a 'lim type' type declaration form for marking
  326. limited types (which is checked against the definition). All other
  327. binding-sugar forms also carry a prefix-'lim' form: lim fn f(...) or
  328. lim mod m(...). Module and function expressions can close over limited
  329. values.
  330. Prog declarations can *not* close over limited values, because a 'lim prog'
  331. could never be spawned. It's by-definition a nonsense type.
  332. TUPLES
  333. With this change, the tuple constructor "," is back, functions taking
  334. "multiple values" just take a tuple, destructuring bind on tuples is
  335. available, and we're looking more and more like aleph. Woo.
  336. CHAN, send and receive syntaxes
  337. We do something a little like alef here, but not entirely:
  338. unary receive: <-port or <-vec[port], permitting say foo(<-port)
  339. binary receive: var <- port or var <- vec[port]
  340. binary send: chan <| data
  341. TAGS, DATA and TYPE
  342. the "tag" data constructor is for unions and enums, not alt. alt is
  343. a prefix keyword for 'alt type', 'alt tag', and 'alt port' constructs.
  344. tag types are like ocaml tag types: sugar for a
  345. sum-injection/iso-recursive-fold operation and a corresponding
  346. sum-projection/iso-recursive-unfold operation when used in
  347. destructuring context ('alt tag').
  348. The iso-recursive thing supports multiple mutually-iso-recursive types
  349. by bundling up all the mutually-recursive types in a scope, sorting them,
  350. mu-binding them and then defining each binding as an indexed projection
  351. out of the mu-binding. Thus, as follows:
  352. so:
  353. {
  354. type exp = tag varexp(var) | letexp(dec,exp);
  355. type dec = tag valdec(var,exp) | seqdec(dec,dec);
  356. }
  357. turns into bindings:
  358. exp -> #1 @ u[A,B].(var+(B,A), (var,A)+(B,B))
  359. dec -> #2 @ u[A,B].(var+(B,A), (var,A)+(B,B))
  360. this is absolutely delicious. hooray for re-covering ground someone
  361. else has done. Note that there are awful details about tag-naming the
  362. members of the type and considering them equal if they have equal
  363. structure (and equal tag-names? ooh). I'm somewhat partial to this
  364. design since it's simple and makes (some) more programs typecheck,
  365. while (some) others -- mostly those doing fancy things with modules
  366. and opaque types -- not. I'm ok with that tradeoff, I think. Though I
  367. will read through dreyer tldi 03 to be sure.
  368. 2008-June-28
  369. ------------
  370. An unfortunate fact: suppose we want to implement to make a module
  371. implementation of a data structure -- say a hashtable -- and you want
  372. it to be abstract and mutable, then you want to do *at least* things
  373. like:
  374. tab = tab.insert(tab, key, val)
  375. but doing that with CoW means that by the time you're in tab.insert
  376. you have 2 live refs to the exterior vec and you're going to have to
  377. duplicate it. Ouch!
  378. Besides, you really want to do something more like:
  379. tab.insert(tab, key, val)
  380. as a single statement, without having to update tab. this means you
  381. really want writable aliases as a param slot mode.
  382. What are the semantics?
  383. - writable aliases can only be formed in calls
  384. - writable aliases can only be formed on things you *already have*
  385. write permission on, have already CoW'ed
  386. - you can't form 2 aliases to the same thing, or to anything that
  387. *might be* the same thing. so if you form a write-alias a vec member
  388. in a callexpr, all other members of that vec are prohibited from
  389. being write-aliased simultaneously (since we can't statically know
  390. whether you are hitting the same slot in multiple
  391. index-expressions).
  392. - all typestates of write-aliased slots drop to nil in the caller
  393. *during* the call -- in case the call happens to be an iterator --
  394. and the callee is obliged to re-establish the write-aliased slots to
  395. the signature typestate before returning. Any preds that cover the
  396. write-aliased slot in the caller and are missing from the write-aliased
  397. slot signature typestate are dropped in the caller. If it wants them
  398. back it must assert them when it regains control.
  399. So we re-form ~ and ^ to mean read-alias and write-alias, respectively,
  400. and steal # from tuple-indexing (since we have destructuring assignment now
  401. anyways) and use it for static metaprogram escape (think #include!)