This document contains notes about the book "Programming in Standard ML" by Robert Harper.
The keyword val
introduces a new binding:
val something = 3; something;
:results: val it = 3 : int :end:
What follows is a definition of signatures of a regular expression package, as described in the book.
signature REGEXP = sig datatype regexp = Zero | One | Char of char | Plus of regexp * regexp | Times of regexp * regexp | Star of regexp exception SyntaxError of string val parse : string -> regexp val format : regexp -> string end
signature MATCHER = sig structure RegExp : REGEXP val accepts : RegExp.regexp -> string -> bool end
:results:
signature REGEXP
sig
datatype regexp
Char of char
| One |
| Plus of regexp * regexp |
| Star of regexp |
| Times of regexp * regexp |
| Zero |
exception SyntaxError of string
val parse : string -> regexp
val format : regexp -> string
end
signature MATCHER
sig
structure RegExp :
sig
datatype regexp
Char of char
| One |
| Plus of regexp * regexp |
| Star of regexp |
| Times of regexp * regexp |
| Zero |
exception SyntaxError of string
val parse : string -> regexp
val format : regexp -> string
end
val accepts : RegExp.regexp -> string -> bool
end
:end:
What can we get from the written definitions and the text in the book?
signature
is used to describe modules. Apparently this is how one describesdatatype
is used to define a type regexp
(regular expression), which isdatatype
are assigned using the equal sign =
. The ideadatatype
are separated by pipes |
.val
is used to introduce binding definitions inside module descriptions as:
after the binding->
to separate types of input argumentsMATCHER
depends on REGEXP
.The book goes on to show how one would write structure declarations:
structure RegExp :> REGEXP = ... structure Matcher :> MATCHER = ...
From this we can gather, that apparently to implement a signature using a
structure, one writes not :
, but :>
instead and then declare the structure
to be equal to the actual implementation. The actual implementation needs to
conform to the signature REGEXP
. This is type checked. SML also type checks,
that the implementation does not make use of anything, that is not specified in
the signature.
The book shows then a usage example:
val regexp = Matcher.RegExp.parse "(a+b)*" val matcher = Matcher.accepts regexp (* currying in action, returning a function *) val ex1 = matches "aabba" (* true *) val ex2 = matches "abac" (* false *)
One should use the long identifier Matcher.RegExp.parse
instead of
~RegExp.parse~, since SML makes a difference between the two and there can be
issues with using RegExp.parse
instead. The book calls this sharing, as the
implementation of RegExp.parse
, if used directly, could be used in multiple
contexts in the code. However, always writing the long identifier, the full path
to the function or member could become unreadable and typing it annoying. There
are ways to alleviate this problem:
structure M = Matcher structure R = M.RegExp (* can also use dots here *) val regexp = R.parse "((a + %).(b + %))*" val matches = M.accepts regexp val ex1 = matches "aabba" val ex2 = matches "abac"
:results: stdIn:36.15-36.22 Error: unbound structure: Matcher :end:
The book shows an overview of an implementation for the regular expression package:
structure RegExp :> REGEXP = struct
datatype regexp
Zero | One | Char of char |
Plus of regexp * regexp |
Times of regexp * regexp |
Star of regexp
fun tokenize s
...
fun parse s =
let
val (r, s') =
parse_rexp (tokenize (String.explode s))
in
case s' of
nil => r
| _ => raise SyntaxError "Bad input."
end
handle LexicalError =>
raise SyntaxError "Bad input."
...
fun format r =
String.implode (format_exp r)
end
What can we understand from the above code?
struct
andlet~-~in~-~end
, which creates bindings for use withinraise
is used to raise exceptions.handle
is used to specify, how to handle an exception from afun
is used to indicate implementations of functions. After=
toend
keyword. This iscase
for distinguishing multiple cases and the cases are_
stands for a case, in which the value of the=>
is used to separate cases from their consequences.String
, for example String.explode
anddatatype token
AtSign | Percent | Literal of char |
PlusSign | TimesSign |
Asterisk | LParen | RParen
exception LexicalError
fun tokenize nil
nil
| tokenize (#"+" :: cs) = PlusSign :: tokenize cs
| tokenize (#"." :: cs) = TimesSign :: tokenize cs
| tokenize (#"*" :: cs) = Asterisk :: tokenize cs
| tokenize (#"(" :: cs) = LParen :: tokenize cs
| tokenize (#")" :: cs) = RParen :: tokenize cs
| tokenize (#"@" :: cs) = AtSign :: tokenize cs
| tokenize (#"%" :: cs) = Percent :: tokenize cs
| tokenize (#"\\" :: c :: cs) = Literal c :: tokenize cs
| tokenize (#"\\" :: nil) = raise LexicalError
| tokenize (#" " :: cs) = tokenize cs
| tokenize (c :: cs) = Literal c :: tokenize cs
:results: datatype token = Asterisk | AtSign | | LParen | | Literal of char | | Percent | | PlusSign | | RParen | | TimesSign | exception LexicalError val tokenize = fn : char list -> token list :end:
What can we understand from the above code?
#
in front.::
is used as an operator for consing onto lists.nil
is the list terminating thing and probably the empty list.TYPE list
.The book goes on giving an unexplained dump of abbreviations in a grammar for regular expressions, which is then implemented. It seems difficult to understand it, because the abbreviations are not explained. Was it really too much work, to write out what the abbreviations stand for? The grammar is the following:
rexp :: = rtrm | rtrm+rexp rtrm :: = rfac | rfac.rtrm rfac :: = ratm | ratm* ratm :: = @ | % | a | (rexp)
What I can guess is:
I have no idea what rfac
stands for. "Factor"? But what sense does that make?
The code, that follows the grammar is not much clearer to me.
For this reason I skip the rest of the regular expression package implementation and go on with the rest of the book.
3; 3 + 4; 4 div 3; 4 mod 3;
val it = 7 : int