Introduction

This document contains notes about the book "Programming in Standard ML" by Robert Harper.

Keywords

val

The keyword val introduces a new binding:

val something = 3; something;

:results: val it = 3 : int :end:

A regular expression package

Specification / Type Declaration

What follows is a definition of signatures of a regular expression package, as described in the book.

signature MATCHER = sig structure RegExp : REGEXP val accepts : RegExp.regexp -> string -> bool end

What can we get from the written definitions and the text in the book?

signature is used to describe modules. Apparently this is how one describes

datatype is used to define a type regexp (regular expression), which is

Alternatives for datatype are assigned using the equal sign =. The idea

Alternatives for datatype are separated by pipes |.

val is used to introduce binding definitions inside module descriptions as

Types of function bindings are specified using the colon : after the binding

SML has the concept of exceptions.

SML seems to start definitions like signature definitions with a keyword, in

Function definitions use the arrow -> to separate types of input arguments

There seems to be a convention of naming signatures in all capital letters.

MATCHER depends on REGEXP.

Signatures are descriptions of what needs to be implemented.

Structures implement signatures.

In SMLNJ the dot ~.~ is used to access structure components. For example

The book goes on to show how one would write structure declarations:

structure RegExp :> REGEXP = ... structure Matcher :> MATCHER = ...

From this we can gather, that apparently to implement a signature using a structure, one writes not :, but :> instead and then declare the structure to be equal to the actual implementation. The actual implementation needs to conform to the signature REGEXP. This is type checked. SML also type checks, that the implementation does not make use of anything, that is not specified in the signature.

The book shows then a usage example:

val regexp = Matcher.RegExp.parse "(a+b)*" val matcher = Matcher.accepts regexp (* currying in action, returning a function *) val ex1 = matches "aabba" (* true *) val ex2 = matches "abac" (* false *)

One should use the long identifier Matcher.RegExp.parse instead of ~RegExp.parse~, since SML makes a difference between the two and there can be issues with using RegExp.parse instead. The book calls this sharing, as the implementation of RegExp.parse, if used directly, could be used in multiple contexts in the code. However, always writing the long identifier, the full path to the function or member could become unreadable and typing it annoying. There are ways to alleviate this problem:

structure M = Matcher structure R = M.RegExp (* can also use dots here *) val regexp = R.parse "((a + %).(b + %))*" val matches = M.accepts regexp val ex1 = matches "aabba" val ex2 = matches "abac"

:results: stdIn:36.15-36.22 Error: unbound structure: Matcher :end:

Implementation

The book shows an overview of an implementation for the regular expression package:

structure RegExp :> REGEXP = struct datatype regexp Zero | One | Char of char | Plus of regexp * regexp | Times of regexp * regexp | Star of regexp fun tokenize s ... fun parse s = let val (r, s') = parse_rexp (tokenize (String.explode s)) in case s' of nil => r | _ => raise SyntaxError "Bad input." end handle LexicalError => raise SyntaxError "Bad input." ... fun format r = String.implode (format_exp r) end

What can we understand from the above code?

Structures:

Structures, which are signature implementations, are wrapped by struct and

Let:

There is a syntax form let~-~in~-~end, which creates bindings for use within

Exception handling:

The keyword raise is used to raise exceptions.

The keyword handle is used to specify, how to handle an exception from a

Functions:

The keyword fun is used to indicate implementations of functions. After

Functions can return multiple values (perhaps only as list or tuple) and those

After the binding name follow the arguments names of a function.

After the argument names of a function follows the equal sign = to

Functions apparently do not end using the end keyword. This is

Cases:

There is a keyword case for distinguishing multiple cases and the cases are

Perhaps the underscore _ stands for a case, in which the value of the

The fat arrow => is used to separate cases from their consequences.

Strings:

Strings are written in double quotes.

There seem to be static methods of String, for example String.explode and

Implementation of tokenize

What can we understand from the above code?

Characters are written as a string with a # in front.
:: is used as an operator for consing onto lists.
SML can do recursive function calls.
Pattern matching can use destructuring for lists.
Exceptions are specified above functions, which might raise them.
nil is the list terminating thing and probably the empty list.
Backslashes in strings or as characters need to be escaped.
The book mentions a few concepts and associates them with the characters,
Alternation: ~+~: choose between things
Concatenation: ~.~: put things together
Empty regular expression: ~@~: ???
Null string: ~%~: ???
Iteration: ~*~: ???
Lists of things are displayed as TYPE list.

The book goes on giving an unexplained dump of abbreviations in a grammar for regular expressions, which is then implemented. It seems difficult to understand it, because the abbreviations are not explained. Was it really too much work, to write out what the abbreviations stand for? The grammar is the following:

What I can guess is:

~rexp~: stands for regular expression
~rtrm~: stands for terminal symbol, perhaps
~ratm~: stands for atom, perhaps

I have no idea what rfac stands for. "Factor"? But what sense does that make?

The code, that follows the grammar is not much clearer to me.

For this reason I skip the rest of the regular expression package implementation and go on with the rest of the book.

Test lab

3; 3 + 4; 4 div 3; 4 mod 3;

val it = 7 : int

chapter-01.org 11 KB Permalink Geschiedenis Ruwe