123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378 |
- @c -*-texinfo-*-
- @c This is part of the GNU Guile Reference Manual.
- @c Copyright (C) 2010, 2013 Free Software Foundation, Inc.
- @c See the file guile.texi for copying conditions.
- @c
- @c Based on the documentation at
- @c <http://planet.plt-scheme.org/package-source/jim/sxml-match.plt/1/1/doc.txt>,
- @c copyright 2005 Jim Bender, and released under the MIT/X11 license (like the
- @c rest of `sxml-match'.)
- @c
- @c Converted to Texinfo and modified by Ludovic Courtès, 2010.
- @node sxml-match
- @section @code{sxml-match}: Pattern Matching of SXML
- @cindex pattern matching (SXML)
- @cindex SXML pattern matching
- The @code{(sxml match)} module provides syntactic forms for pattern
- matching of SXML trees, in a ``by example'' style reminiscent of the
- pattern matching of the @code{syntax-rules} and @code{syntax-case} macro
- systems. @xref{SXML}, for more information on SXML.
- The following example@footnote{This example is taken from a paper by
- Krishnamurthi et al. Their paper was the first to show the usefulness of the
- @code{syntax-rules} style of pattern matching for transformation of XML, though
- the language described, XT3D, is an XML language.} provides a brief
- illustration, transforming a music album catalog language into HTML.
- @lisp
- (define (album->html x)
- (sxml-match x
- ((album (@@ (title ,t)) (catalog (num ,n) (fmt ,f)) ...)
- `(ul (li ,t)
- (li (b ,n) (i ,f)) ...))))
- @end lisp
- Three macros are provided: @code{sxml-match}, @code{sxml-match-let}, and
- @code{sxml-match-let*}.
- Compared to a standard s-expression pattern matcher (@pxref{Pattern
- Matching}), @code{sxml-match} provides the following benefits:
- @itemize
- @item
- matching of SXML elements does not depend on any degree of normalization of the
- SXML;
- @item
- matching of SXML attributes (within an element) is under-ordered; the order of
- the attributes specified within the pattern need not match the ordering with the
- element being matched;
- @item
- all attributes specified in the pattern must be present in the element being
- matched; in the spirit that XML is 'extensible', the element being matched may
- include additional attributes not specified in the pattern.
- @end itemize
- The present module is a descendant of WebIt!, and was inspired by an
- s-expression pattern matcher developed by Erik Hilsdale, Dan Friedman, and Kent
- Dybvig at Indiana University.
- @unnumberedsubsec Syntax
- @code{sxml-match} provides @code{case}-like form for pattern matching of XML
- nodes.
- @deffn {Scheme Syntax} sxml-match input-expression clause1 clause2 @dots{}
- Match @var{input-expression}, an SXML tree, according to the given @var{clause}s
- (one or more), each consisting of a pattern and one or more expressions to be
- evaluated if the pattern match succeeds. Optionally, each @var{clause} within
- @code{sxml-match} may include a @dfn{guard expression}.
- @end deffn
- The pattern notation is based on that of Scheme's @code{syntax-rules} and
- @code{syntax-case} macro systems. The grammar for the @code{sxml-match} syntax
- is given below:
- @verbatim
- match-form ::= (sxml-match input-expression
- clause+)
- clause ::= [node-pattern action-expression+]
- | [node-pattern (guard expression*) action-expression+]
- node-pattern ::= literal-pattern
- | pat-var-or-cata
- | element-pattern
- | list-pattern
- literal-pattern ::= string
- | character
- | number
- | #t
- | #f
- attr-list-pattern ::= (@ attribute-pattern*)
- | (@ attribute-pattern* . pat-var-or-cata)
- attribute-pattern ::= (tag-symbol attr-val-pattern)
- attr-val-pattern ::= literal-pattern
- | pat-var-or-cata
- | (pat-var-or-cata default-value-expr)
- element-pattern ::= (tag-symbol attr-list-pattern?)
- | (tag-symbol attr-list-pattern? nodeset-pattern)
- | (tag-symbol attr-list-pattern?
- nodeset-pattern? . pat-var-or-cata)
- list-pattern ::= (list nodeset-pattern)
- | (list nodeset-pattern? . pat-var-or-cata)
- | (list)
- nodeset-pattern ::= node-pattern
- | node-pattern ...
- | node-pattern nodeset-pattern
- | node-pattern ... nodeset-pattern
- pat-var-or-cata ::= (unquote var-symbol)
- | (unquote [var-symbol*])
- | (unquote [cata-expression -> var-symbol*])
- @end verbatim
- Within a list or element body pattern, ellipses may appear only once, but may be
- followed by zero or more node patterns.
- Guard expressions cannot refer to the return values of catamorphisms.
- Ellipses in the output expressions must appear only in an expression context;
- ellipses are not allowed in a syntactic form.
- The sections below illustrate specific aspects of the @code{sxml-match} pattern
- matcher.
- @unnumberedsubsec Matching XML Elements
- The example below illustrates the pattern matching of an XML element:
- @lisp
- (sxml-match '(e (@@ (i 1)) 3 4 5)
- ((e (@@ (i ,d)) ,a ,b ,c) (list d a b c))
- (,otherwise #f))
- @end lisp
- Each clause in @code{sxml-match} contains two parts: a pattern and one or more
- expressions which are evaluated if the pattern is successfully match. The
- example above matches an element @code{e} with an attribute @code{i} and three
- children.
- Pattern variables must be ``unquoted'' in the pattern. The above expression
- binds @var{d} to @code{1}, @var{a} to @code{3}, @var{b} to @code{4}, and @var{c}
- to @code{5}.
- @unnumberedsubsec Ellipses in Patterns
- As in @code{syntax-rules}, ellipses may be used to specify a repeated pattern.
- Note that the pattern @code{item ...} specifies zero-or-more matches of the
- pattern @code{item}.
- The use of ellipses in a pattern is illustrated in the code fragment below,
- where nested ellipses are used to match the children of repeated instances of an
- @code{a} element, within an element @code{d}.
- @lisp
- (define x '(d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10)))
- (sxml-match x
- ((d (a ,b ...) ...)
- (list (list b ...) ...)))
- @end lisp
- The above expression returns a value of @code{((1 2 3) (4 5) (6 7 8) (9 10))}.
- @unnumberedsubsec Ellipses in Quasiquote'd Output
- Within the body of an @code{sxml-match} form, a slightly extended version of
- quasiquote is provided, which allows the use of ellipses. This is illustrated
- in the example below.
- @lisp
- (sxml-match '(e 3 4 5 6 7)
- ((e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end"))
- (,otherwise #f))
- @end lisp
- The general pattern is that @code{`(something ,i ...)} is rewritten as
- @code{`(something ,@@i)}.
- @unnumberedsubsec Matching Nodesets
- A nodeset pattern is designated by a list in the pattern, beginning the
- identifier list. The example below illustrates matching a nodeset.
- @lisp
- (sxml-match '("i" "j" "k" "l" "m")
- ((list ,a ,b ,c ,d ,e)
- `((p ,a) (p ,b) (p ,c) (p ,d) (p ,e))))
- @end lisp
- This example wraps each nodeset item in an HTML paragraph element. This example
- can be rewritten and simplified through using ellipsis:
- @lisp
- (sxml-match '("i" "j" "k" "l" "m")
- ((list ,i ...)
- `((p ,i) ...)))
- @end lisp
- This version will match nodesets of any length, and wrap each item in the
- nodeset in an HTML paragraph element.
- @unnumberedsubsec Matching the ``Rest'' of a Nodeset
- Matching the ``rest'' of a nodeset is achieved by using a @code{. rest)} pattern
- at the end of an element or nodeset pattern.
- This is illustrated in the example below:
- @lisp
- (sxml-match '(e 3 (f 4 5 6) 7)
- ((e ,a (f . ,y) ,d)
- (list a y d)))
- @end lisp
- The above expression returns @code{(3 (4 5 6) 7)}.
- @unnumberedsubsec Matching the Unmatched Attributes
- Sometimes it is useful to bind a list of attributes present in the element being
- matched, but which do not appear in the pattern. This is achieved by using a
- @code{. rest)} pattern at the end of the attribute list pattern. This is
- illustrated in the example below:
- @lisp
- (sxml-match '(a (@@ (z 1) (y 2) (x 3)) 4 5 6)
- ((a (@@ (y ,www) . ,qqq) ,t ,u ,v)
- (list www qqq t u v)))
- @end lisp
- The above expression matches the attribute @code{y} and binds a list of the
- remaining attributes to the variable @var{qqq}. The result of the above
- expression is @code{(2 ((z 1) (x 3)) 4 5 6)}.
- This type of pattern also allows the binding of all attributes:
- @lisp
- (sxml-match '(a (@@ (z 1) (y 2) (x 3)))
- ((a (@@ . ,qqq))
- qqq))
- @end lisp
- @unnumberedsubsec Default Values in Attribute Patterns
- It is possible to specify a default value for an attribute which is used if the
- attribute is not present in the element being matched. This is illustrated in
- the following example:
- @lisp
- (sxml-match '(e 3 4 5)
- ((e (@@ (z (,d 1))) ,a ,b ,c) (list d a b c)))
- @end lisp
- The value @code{1} is used when the attribute @code{z} is absent from the
- element @code{e}.
- @unnumberedsubsec Guards in Patterns
- Guards may be added to a pattern clause via the @code{guard} keyword. A guard
- expression may include zero or more expressions which are evaluated only if the
- pattern is matched. The body of the clause is only evaluated if the guard
- expressions evaluate to @code{#t}.
- The use of guard expressions is illustrated below:
- @lisp
- (sxml-match '(a 2 3)
- ((a ,n) (guard (number? n)) n)
- ((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))
- @end lisp
- @unnumberedsubsec Catamorphisms
- The example below illustrates the use of explicit recursion within an
- @code{sxml-match} form. This example implements a simple calculator for the
- basic arithmetic operations, which are represented by the XML elements
- @code{plus}, @code{minus}, @code{times}, and @code{div}.
- @lisp
- (define simple-eval
- (lambda (x)
- (sxml-match x
- (,i (guard (integer? i)) i)
- ((plus ,x ,y) (+ (simple-eval x) (simple-eval y)))
- ((times ,x ,y) (* (simple-eval x) (simple-eval y)))
- ((minus ,x ,y) (- (simple-eval x) (simple-eval y)))
- ((div ,x ,y) (/ (simple-eval x) (simple-eval y)))
- (,otherwise (error "simple-eval: invalid expression" x)))))
- @end lisp
- Using the catamorphism feature of @code{sxml-match}, a more concise version of
- @code{simple-eval} can be written. The pattern @code{,(x)} recursively invokes
- the pattern matcher on the value bound in this position.
- @lisp
- (define simple-eval
- (lambda (x)
- (sxml-match x
- (,i (guard (integer? i)) i)
- ((plus ,(x) ,(y)) (+ x y))
- ((times ,(x) ,(y)) (* x y))
- ((minus ,(x) ,(y)) (- x y))
- ((div ,(x) ,(y)) (/ x y))
- (,otherwise (error "simple-eval: invalid expression" x)))))
- @end lisp
- @unnumberedsubsec Named-Catamorphisms
- It is also possible to explicitly name the operator in the ``cata'' position.
- Where @code{,(id*)} recurs to the top of the current @code{sxml-match},
- @code{,(cata -> id*)} recurs to @code{cata}. @code{cata} must evaluate to a
- procedure which takes one argument, and returns as many values as there are
- identifiers following @code{->}.
- Named catamorphism patterns allow processing to be split into multiple, mutually
- recursive procedures. This is illustrated in the example below: a
- transformation that formats a ``TV Guide'' into HTML.
- @lisp
- (define (tv-guide->html g)
- (define (cast-list cl)
- (sxml-match cl
- ((CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...)
- `(div (ul (li ,ch ": " ,a) ...)))))
- (define (prog p)
- (sxml-match p
- ((Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
- (Description ,desc ...))
- `(div (p ,start-time
- (br) ,series-title
- (br) ,desc ...)))
- ((Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
- (Description ,desc ...)
- ,(cast-list -> cl))
- `(div (p ,start-time
- (br) ,series-title
- (br) ,desc ...)
- ,cl))))
- (sxml-match g
- ((TVGuide (@@ (start ,start-date)
- (end ,end-date))
- (Channel (Name ,nm) ,(prog -> p) ...) ...)
- `(html (head (title "TV Guide"))
- (body (h1 "TV Guide")
- (div (h2 ,nm) ,p ...) ...)))))
- @end lisp
- @unnumberedsubsec @code{sxml-match-let} and @code{sxml-match-let*}
- @deffn {Scheme Syntax} sxml-match-let ((pat expr) ...) expression0 expression ...
- @deffnx {Scheme Syntax} sxml-match-let* ((pat expr) ...) expression0 expression ...
- These forms generalize the @code{let} and @code{let*} forms of Scheme to allow
- an XML pattern in the binding position, rather than a simple variable.
- @end deffn
- For example, the expression below:
- @lisp
- (sxml-match-let (((a ,i ,j) '(a 1 2)))
- (+ i j))
- @end lisp
- binds the variables @var{i} and @var{j} to @code{1} and @code{2} in the XML
- value given.
- @c Local Variables:
- @c coding: utf-8
- @c End:
|