sxml-match.texi 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378
  1. @c -*-texinfo-*-
  2. @c This is part of the GNU Guile Reference Manual.
  3. @c Copyright (C) 2010, 2013 Free Software Foundation, Inc.
  4. @c See the file guile.texi for copying conditions.
  5. @c
  6. @c Based on the documentation at
  7. @c <http://planet.plt-scheme.org/package-source/jim/sxml-match.plt/1/1/doc.txt>,
  8. @c copyright 2005 Jim Bender, and released under the MIT/X11 license (like the
  9. @c rest of `sxml-match'.)
  10. @c
  11. @c Converted to Texinfo and modified by Ludovic Courtès, 2010.
  12. @node sxml-match
  13. @section @code{sxml-match}: Pattern Matching of SXML
  14. @cindex pattern matching (SXML)
  15. @cindex SXML pattern matching
  16. The @code{(sxml match)} module provides syntactic forms for pattern
  17. matching of SXML trees, in a ``by example'' style reminiscent of the
  18. pattern matching of the @code{syntax-rules} and @code{syntax-case} macro
  19. systems. @xref{SXML}, for more information on SXML.
  20. The following example@footnote{This example is taken from a paper by
  21. Krishnamurthi et al. Their paper was the first to show the usefulness of the
  22. @code{syntax-rules} style of pattern matching for transformation of XML, though
  23. the language described, XT3D, is an XML language.} provides a brief
  24. illustration, transforming a music album catalog language into HTML.
  25. @lisp
  26. (define (album->html x)
  27. (sxml-match x
  28. ((album (@@ (title ,t)) (catalog (num ,n) (fmt ,f)) ...)
  29. `(ul (li ,t)
  30. (li (b ,n) (i ,f)) ...))))
  31. @end lisp
  32. Three macros are provided: @code{sxml-match}, @code{sxml-match-let}, and
  33. @code{sxml-match-let*}.
  34. Compared to a standard s-expression pattern matcher (@pxref{Pattern
  35. Matching}), @code{sxml-match} provides the following benefits:
  36. @itemize
  37. @item
  38. matching of SXML elements does not depend on any degree of normalization of the
  39. SXML;
  40. @item
  41. matching of SXML attributes (within an element) is under-ordered; the order of
  42. the attributes specified within the pattern need not match the ordering with the
  43. element being matched;
  44. @item
  45. all attributes specified in the pattern must be present in the element being
  46. matched; in the spirit that XML is 'extensible', the element being matched may
  47. include additional attributes not specified in the pattern.
  48. @end itemize
  49. The present module is a descendant of WebIt!, and was inspired by an
  50. s-expression pattern matcher developed by Erik Hilsdale, Dan Friedman, and Kent
  51. Dybvig at Indiana University.
  52. @unnumberedsubsec Syntax
  53. @code{sxml-match} provides @code{case}-like form for pattern matching of XML
  54. nodes.
  55. @deffn {Scheme Syntax} sxml-match input-expression clause1 clause2 @dots{}
  56. Match @var{input-expression}, an SXML tree, according to the given @var{clause}s
  57. (one or more), each consisting of a pattern and one or more expressions to be
  58. evaluated if the pattern match succeeds. Optionally, each @var{clause} within
  59. @code{sxml-match} may include a @dfn{guard expression}.
  60. @end deffn
  61. The pattern notation is based on that of Scheme's @code{syntax-rules} and
  62. @code{syntax-case} macro systems. The grammar for the @code{sxml-match} syntax
  63. is given below:
  64. @verbatim
  65. match-form ::= (sxml-match input-expression
  66. clause+)
  67. clause ::= [node-pattern action-expression+]
  68. | [node-pattern (guard expression*) action-expression+]
  69. node-pattern ::= literal-pattern
  70. | pat-var-or-cata
  71. | element-pattern
  72. | list-pattern
  73. literal-pattern ::= string
  74. | character
  75. | number
  76. | #t
  77. | #f
  78. attr-list-pattern ::= (@ attribute-pattern*)
  79. | (@ attribute-pattern* . pat-var-or-cata)
  80. attribute-pattern ::= (tag-symbol attr-val-pattern)
  81. attr-val-pattern ::= literal-pattern
  82. | pat-var-or-cata
  83. | (pat-var-or-cata default-value-expr)
  84. element-pattern ::= (tag-symbol attr-list-pattern?)
  85. | (tag-symbol attr-list-pattern? nodeset-pattern)
  86. | (tag-symbol attr-list-pattern?
  87. nodeset-pattern? . pat-var-or-cata)
  88. list-pattern ::= (list nodeset-pattern)
  89. | (list nodeset-pattern? . pat-var-or-cata)
  90. | (list)
  91. nodeset-pattern ::= node-pattern
  92. | node-pattern ...
  93. | node-pattern nodeset-pattern
  94. | node-pattern ... nodeset-pattern
  95. pat-var-or-cata ::= (unquote var-symbol)
  96. | (unquote [var-symbol*])
  97. | (unquote [cata-expression -> var-symbol*])
  98. @end verbatim
  99. Within a list or element body pattern, ellipses may appear only once, but may be
  100. followed by zero or more node patterns.
  101. Guard expressions cannot refer to the return values of catamorphisms.
  102. Ellipses in the output expressions must appear only in an expression context;
  103. ellipses are not allowed in a syntactic form.
  104. The sections below illustrate specific aspects of the @code{sxml-match} pattern
  105. matcher.
  106. @unnumberedsubsec Matching XML Elements
  107. The example below illustrates the pattern matching of an XML element:
  108. @lisp
  109. (sxml-match '(e (@@ (i 1)) 3 4 5)
  110. ((e (@@ (i ,d)) ,a ,b ,c) (list d a b c))
  111. (,otherwise #f))
  112. @end lisp
  113. Each clause in @code{sxml-match} contains two parts: a pattern and one or more
  114. expressions which are evaluated if the pattern is successfully match. The
  115. example above matches an element @code{e} with an attribute @code{i} and three
  116. children.
  117. Pattern variables must be ``unquoted'' in the pattern. The above expression
  118. binds @var{d} to @code{1}, @var{a} to @code{3}, @var{b} to @code{4}, and @var{c}
  119. to @code{5}.
  120. @unnumberedsubsec Ellipses in Patterns
  121. As in @code{syntax-rules}, ellipses may be used to specify a repeated pattern.
  122. Note that the pattern @code{item ...} specifies zero-or-more matches of the
  123. pattern @code{item}.
  124. The use of ellipses in a pattern is illustrated in the code fragment below,
  125. where nested ellipses are used to match the children of repeated instances of an
  126. @code{a} element, within an element @code{d}.
  127. @lisp
  128. (define x '(d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10)))
  129. (sxml-match x
  130. ((d (a ,b ...) ...)
  131. (list (list b ...) ...)))
  132. @end lisp
  133. The above expression returns a value of @code{((1 2 3) (4 5) (6 7 8) (9 10))}.
  134. @unnumberedsubsec Ellipses in Quasiquote'd Output
  135. Within the body of an @code{sxml-match} form, a slightly extended version of
  136. quasiquote is provided, which allows the use of ellipses. This is illustrated
  137. in the example below.
  138. @lisp
  139. (sxml-match '(e 3 4 5 6 7)
  140. ((e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end"))
  141. (,otherwise #f))
  142. @end lisp
  143. The general pattern is that @code{`(something ,i ...)} is rewritten as
  144. @code{`(something ,@@i)}.
  145. @unnumberedsubsec Matching Nodesets
  146. A nodeset pattern is designated by a list in the pattern, beginning the
  147. identifier list. The example below illustrates matching a nodeset.
  148. @lisp
  149. (sxml-match '("i" "j" "k" "l" "m")
  150. ((list ,a ,b ,c ,d ,e)
  151. `((p ,a) (p ,b) (p ,c) (p ,d) (p ,e))))
  152. @end lisp
  153. This example wraps each nodeset item in an HTML paragraph element. This example
  154. can be rewritten and simplified through using ellipsis:
  155. @lisp
  156. (sxml-match '("i" "j" "k" "l" "m")
  157. ((list ,i ...)
  158. `((p ,i) ...)))
  159. @end lisp
  160. This version will match nodesets of any length, and wrap each item in the
  161. nodeset in an HTML paragraph element.
  162. @unnumberedsubsec Matching the ``Rest'' of a Nodeset
  163. Matching the ``rest'' of a nodeset is achieved by using a @code{. rest)} pattern
  164. at the end of an element or nodeset pattern.
  165. This is illustrated in the example below:
  166. @lisp
  167. (sxml-match '(e 3 (f 4 5 6) 7)
  168. ((e ,a (f . ,y) ,d)
  169. (list a y d)))
  170. @end lisp
  171. The above expression returns @code{(3 (4 5 6) 7)}.
  172. @unnumberedsubsec Matching the Unmatched Attributes
  173. Sometimes it is useful to bind a list of attributes present in the element being
  174. matched, but which do not appear in the pattern. This is achieved by using a
  175. @code{. rest)} pattern at the end of the attribute list pattern. This is
  176. illustrated in the example below:
  177. @lisp
  178. (sxml-match '(a (@@ (z 1) (y 2) (x 3)) 4 5 6)
  179. ((a (@@ (y ,www) . ,qqq) ,t ,u ,v)
  180. (list www qqq t u v)))
  181. @end lisp
  182. The above expression matches the attribute @code{y} and binds a list of the
  183. remaining attributes to the variable @var{qqq}. The result of the above
  184. expression is @code{(2 ((z 1) (x 3)) 4 5 6)}.
  185. This type of pattern also allows the binding of all attributes:
  186. @lisp
  187. (sxml-match '(a (@@ (z 1) (y 2) (x 3)))
  188. ((a (@@ . ,qqq))
  189. qqq))
  190. @end lisp
  191. @unnumberedsubsec Default Values in Attribute Patterns
  192. It is possible to specify a default value for an attribute which is used if the
  193. attribute is not present in the element being matched. This is illustrated in
  194. the following example:
  195. @lisp
  196. (sxml-match '(e 3 4 5)
  197. ((e (@@ (z (,d 1))) ,a ,b ,c) (list d a b c)))
  198. @end lisp
  199. The value @code{1} is used when the attribute @code{z} is absent from the
  200. element @code{e}.
  201. @unnumberedsubsec Guards in Patterns
  202. Guards may be added to a pattern clause via the @code{guard} keyword. A guard
  203. expression may include zero or more expressions which are evaluated only if the
  204. pattern is matched. The body of the clause is only evaluated if the guard
  205. expressions evaluate to @code{#t}.
  206. The use of guard expressions is illustrated below:
  207. @lisp
  208. (sxml-match '(a 2 3)
  209. ((a ,n) (guard (number? n)) n)
  210. ((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))
  211. @end lisp
  212. @unnumberedsubsec Catamorphisms
  213. The example below illustrates the use of explicit recursion within an
  214. @code{sxml-match} form. This example implements a simple calculator for the
  215. basic arithmetic operations, which are represented by the XML elements
  216. @code{plus}, @code{minus}, @code{times}, and @code{div}.
  217. @lisp
  218. (define simple-eval
  219. (lambda (x)
  220. (sxml-match x
  221. (,i (guard (integer? i)) i)
  222. ((plus ,x ,y) (+ (simple-eval x) (simple-eval y)))
  223. ((times ,x ,y) (* (simple-eval x) (simple-eval y)))
  224. ((minus ,x ,y) (- (simple-eval x) (simple-eval y)))
  225. ((div ,x ,y) (/ (simple-eval x) (simple-eval y)))
  226. (,otherwise (error "simple-eval: invalid expression" x)))))
  227. @end lisp
  228. Using the catamorphism feature of @code{sxml-match}, a more concise version of
  229. @code{simple-eval} can be written. The pattern @code{,(x)} recursively invokes
  230. the pattern matcher on the value bound in this position.
  231. @lisp
  232. (define simple-eval
  233. (lambda (x)
  234. (sxml-match x
  235. (,i (guard (integer? i)) i)
  236. ((plus ,(x) ,(y)) (+ x y))
  237. ((times ,(x) ,(y)) (* x y))
  238. ((minus ,(x) ,(y)) (- x y))
  239. ((div ,(x) ,(y)) (/ x y))
  240. (,otherwise (error "simple-eval: invalid expression" x)))))
  241. @end lisp
  242. @unnumberedsubsec Named-Catamorphisms
  243. It is also possible to explicitly name the operator in the ``cata'' position.
  244. Where @code{,(id*)} recurs to the top of the current @code{sxml-match},
  245. @code{,(cata -> id*)} recurs to @code{cata}. @code{cata} must evaluate to a
  246. procedure which takes one argument, and returns as many values as there are
  247. identifiers following @code{->}.
  248. Named catamorphism patterns allow processing to be split into multiple, mutually
  249. recursive procedures. This is illustrated in the example below: a
  250. transformation that formats a ``TV Guide'' into HTML.
  251. @lisp
  252. (define (tv-guide->html g)
  253. (define (cast-list cl)
  254. (sxml-match cl
  255. ((CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...)
  256. `(div (ul (li ,ch ": " ,a) ...)))))
  257. (define (prog p)
  258. (sxml-match p
  259. ((Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
  260. (Description ,desc ...))
  261. `(div (p ,start-time
  262. (br) ,series-title
  263. (br) ,desc ...)))
  264. ((Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
  265. (Description ,desc ...)
  266. ,(cast-list -> cl))
  267. `(div (p ,start-time
  268. (br) ,series-title
  269. (br) ,desc ...)
  270. ,cl))))
  271. (sxml-match g
  272. ((TVGuide (@@ (start ,start-date)
  273. (end ,end-date))
  274. (Channel (Name ,nm) ,(prog -> p) ...) ...)
  275. `(html (head (title "TV Guide"))
  276. (body (h1 "TV Guide")
  277. (div (h2 ,nm) ,p ...) ...)))))
  278. @end lisp
  279. @unnumberedsubsec @code{sxml-match-let} and @code{sxml-match-let*}
  280. @deffn {Scheme Syntax} sxml-match-let ((pat expr) ...) expression0 expression ...
  281. @deffnx {Scheme Syntax} sxml-match-let* ((pat expr) ...) expression0 expression ...
  282. These forms generalize the @code{let} and @code{let*} forms of Scheme to allow
  283. an XML pattern in the binding position, rather than a simple variable.
  284. @end deffn
  285. For example, the expression below:
  286. @lisp
  287. (sxml-match-let (((a ,i ,j) '(a 1 2)))
  288. (+ i j))
  289. @end lisp
  290. binds the variables @var{i} and @var{j} to @code{1} and @code{2} in the XML
  291. value given.
  292. @c Local Variables:
  293. @c coding: utf-8
  294. @c End: