sxml.texi 40 KB


  1. @c -*-texinfo-*-
  2. @c This is part of the GNU Guile Reference Manual.
  3. @c Copyright (C) 2013, 2017, 2021 Free Software Foundation, Inc.
  4. @c See the file guile.texi for copying conditions.
  5. @c SXPath documentation based on SXPath.scm by Oleg Kiselyov,
  6. @c which is in the public domain according to <http://okmij.org/ftp/>
  7. @c and <http://ssax.sourceforge.net/>.
  8. @node SXML
  9. @section SXML
  10. SXML is a native representation of XML in terms of standard Scheme data
  11. types: lists, symbols, and strings. For example, the simple XML
  12. fragment:
  13. @example
  14. <parrot type="African Grey"><name>Alfie</name></parrot>
  15. @end example
  16. may be represented with the following SXML:
  17. @example
  18. (parrot (@@ (type "African Grey")) (name "Alfie"))
  19. @end example
  20. SXML is very general, and is capable of representing all of XML.
  21. Formally, this means that SXML is a conforming implementation of the
  22. @uref{http://www.w3.org/TR/xml-infoset/,XML Information Set} standard.
  23. Guile includes several facilities for working with XML and SXML:
  24. parsers, serializers, and transformers.
  25. @menu
  26. * SXML Overview:: XML, as it was meant to be
  27. * Reading and Writing XML:: Convenient XML parsing and serializing
  28. * SSAX:: Custom functional-style XML parsers
  29. * Transforming SXML:: Munging SXML with @code{pre-post-order}
  30. * SXML Tree Fold:: Fold-based SXML transformations
  31. * SXPath:: XPath for SXML
  32. * sxml ssax input-parse:: The SSAX tokenizer, optimized for Guile
  33. * sxml apply-templates:: A more XSLT-like approach to SXML transformations
  34. @end menu
  35. @node SXML Overview
  36. @subsection SXML Overview
  37. (This section needs to be written; volunteers welcome.)
  38. @node Reading and Writing XML
  39. @subsection Reading and Writing XML
  40. The @code{(sxml simple)} module presents a basic interface for parsing
  41. XML from a port into the Scheme SXML format, and for serializing it back
  42. to text.
  43. @example
  44. (use-modules (sxml simple))
  45. @end example
  46. @deffn {Scheme Procedure} xml->sxml [string-or-port] [#:namespaces='()] @
  47. [#:declare-namespaces?=#t] [#:trim-whitespace?=#f] @
  48. [#:entities='()] [#:default-entity-handler=#f] @
  49. [#:doctype-handler=#f]
  50. Use SSAX to parse an XML document into SXML. Takes one optional
  51. argument, @var{string-or-port}, which defaults to the current input
  52. port. Returns the resulting SXML document. If @var{string-or-port} is
  53. a port, it will be left pointing at the next available character in the
  54. port.
  55. @end deffn
  56. As is normal in SXML, XML elements parse as tagged lists. Attributes,
  57. if any, are placed after the tag, within an @code{@@} element. The root
  58. of the resulting XML will be contained in a special tag, @code{*TOP*}.
  59. This tag will contain the root element of the XML, but also any prior
  60. processing instructions.
  61. @example
  62. (xml->sxml "<foo/>")
  63. @result{} (*TOP* (foo))
  64. (xml->sxml "<foo>text</foo>")
  65. @result{} (*TOP* (foo "text"))
  66. (xml->sxml "<foo kind=\"bar\">text</foo>")
  67. @result{} (*TOP* (foo (@@ (kind "bar")) "text"))
  68. (xml->sxml "<?xml version=\"1.0\"?><foo/>")
  69. @result{} (*TOP* (*PI* xml "version=\"1.0\"") (foo))
  70. @end example
  71. All namespaces in the XML document must be declared, via @code{xmlns}
  72. attributes. SXML elements built from non-default namespaces will have
  73. their tags prefixed with their URI. Users can specify custom prefixes
  74. for certain namespaces with the @code{#:namespaces} keyword argument to
  75. @code{xml->sxml}. A namespace can be removed by using a @code{#f} custom
  76. prefix.
  77. @example
  78. (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>")
  79. @result{} (*TOP* (http://example.org/ns1:foo "text"))
  80. (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>"
  81. #:namespaces '((ns1 . "http://example.org/ns1")))
  82. @result{} (*TOP* (ns1:foo "text"))
  83. (xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\"><bar:baz/></foo>"
  84. #:namespaces '((ns2 . "http://example.org/ns2")))
  85. @result{} (*TOP* (foo (ns2:baz)))
  86. @end example
  87. By default, namespaces passed to @code{xml->sxml} are treated as if they
  88. were declared on the root element. Passing a false
  89. @code{#:declare-namespaces?} argument will disable this behavior,
  90. requiring in-document declarations of namespaces before use..
  91. @example
  92. (xml->sxml "<foo><ns2:baz/></foo>"
  93. #:namespaces '((ns2 . "http://example.org/ns2")))
  94. @result{} (*TOP* (foo (ns2:baz)))
  95. (xml->sxml "<foo><ns2:baz/></foo>"
  96. #:namespaces '((ns2 . "http://example.org/ns2"))
  97. #:declare-namespaces? #f)
  98. @result{} error: undeclared namespace: `bar'
  99. @end example
  100. By default, all whitespace in XML is significant. Passing the
  101. @code{#:trim-whitespace?} keyword argument to @code{xml->sxml} will trim
  102. whitespace in front, behind and between elements, treating it as
  103. ``unsignificant''. Whitespace in text fragments is left alone.
  104. @example
  105. (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
  106. @result{} (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n"))
  107. (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
  108. #:trim-whitespace? #t)
  109. @result{} (*TOP* (foo (bar " Alfie the parrot! ")))
  110. @end example
  111. Parsed entities may be declared with the @code{#:entities} keyword
  112. argument, or handled with the @code{#:default-entity-handler}. By
  113. default, only the standard @code{&lt;}, @code{&gt;}, @code{&amp;},
  114. @code{&apos;} and @code{&quot;} entities are defined, as well as the
  115. @code{&#@var{N};} and @code{&#x@var{N};} (decimal and hexadecimal)
  116. numeric character entities.
  117. @example
  118. (xml->sxml "<foo>&amp;</foo>")
  119. @result{} (*TOP* (foo "&"))
  120. (xml->sxml "<foo>&nbsp;</foo>")
  121. @result{} error: undefined entity: nbsp
  122. (xml->sxml "<foo>&#xA0;</foo>")
  123. @result{} (*TOP* (foo "\xa0"))
  124. (xml->sxml "<foo>&nbsp;</foo>"
  125. #:entities '((nbsp . "\xa0")))
  126. @result{} (*TOP* (foo "\xa0"))
  127. (xml->sxml "<foo>&nbsp; &foo;</foo>"
  128. #:default-entity-handler
  129. (lambda (port name)
  130. (case name
  131. ((nbsp) "\xa0")
  132. (else
  133. (format (current-warning-port)
  134. "~a:~a:~a: undefined entitity: ~a\n"
  135. (or (port-filename port) "<unknown file>")
  136. (port-line port) (port-column port)
  137. name)
  138. (symbol->string name)))))
  139. @print{} <unknown file>:0:17: undefined entitity: foo
  140. @result{} (*TOP* (foo "\xa0 foo"))
  141. @end example
  142. By default, @code{xml->sxml} skips over the @code{<!DOCTYPE>}
  143. declaration, if any. This behavior can be overridden with the
  144. @code{#:doctype-handler} argument, which should be a procedure of three
  145. arguments: the @dfn{docname} (a symbol), @dfn{systemid} (a string), and
  146. the internal doctype subset (as a string or @code{#f} if not present).
  147. The handler should return keyword arguments as multiple values, as if it
  148. were calling its continuation with keyword arguments. The continuation
  149. accepts the @code{#:entities} and @code{#:namespaces} keyword arguments,
  150. in the same format that @code{xml->sxml} itself takes. These entities
  151. and namespaces will be prepended to those given to the @code{xml->sxml}
  152. invocation.
  153. @example
  154. (define (handle-foo docname systemid internal-subset)
  155. (case docname
  156. ((foo)
  157. (values #:entities '((greets . "<i>Hello, world!</i>"))))
  158. (else
  159. (values))))
  160. (xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
  161. #:doctype-handler handle-foo)
  162. @result{} (*TOP* (p (i "Hello, world!")))
  163. @end example
  164. If the document has no doctype declaration, the @var{doctype-handler} is
  165. invoked with @code{#f} for the three arguments.
  166. In the future, the continuation may accept other keyword arguments, for
  167. example to validate the parsed SXML against the doctype.
  168. @deffn {Scheme Procedure} sxml->xml tree [port]
  169. Serialize the SXML tree @var{tree} as XML. The output will be written to
  170. the current output port, unless the optional argument @var{port} is
  171. present.
  172. @end deffn
  173. @deffn {Scheme Procedure} sxml->string sxml
  174. Detag an sxml tree @var{sxml} into a string. Does not perform any
  175. formatting.
  176. @end deffn
  177. @node SSAX
  178. @subsection SSAX: A Functional XML Parsing Toolkit
  179. Guile's XML parser is based on Oleg Kiselyov's powerful XML parsing
  180. toolkit, SSAX.
  181. @subsubsection History
  182. Back in the 1990s, when the world was young again and XML was the
  183. solution to all of its problems, there were basically two kinds of XML
  184. parsers out there: DOM parsers and SAX parsers.
  185. A DOM parser reads through an entire XML document, building up a tree of
  186. ``DOM objects'' representing the document structure. They are very easy
  187. to use, but sometimes you don't actually want all of the information in
  188. a document; building an object tree is not necessary if all you want to
  189. do is to count word frequencies in a document, for example.
  190. SAX parsers were created to give the programmer more control on the
  191. parsing process. A programmer gives the SAX parser a number of
  192. ``callbacks'': functions that will be called on various features of the
  193. XML stream as they are encountered. SAX parsers are more efficient, but
  194. much harder to use, as users typically have to manually maintain a
  195. stack of open elements.
  196. Kiselyov realized that the SAX programming model could be made much
  197. simpler if the callbacks were formulated not as a linear fold across the
  198. features of the XML stream, but as a @emph{tree fold} over the structure
  199. implicit in the XML. In this way, the user has a very convenient,
  200. functional-style interface that can still generate optimal parsers.
  201. The @code{xml->sxml} interface from the @code{(sxml simple)} module is a
  202. DOM-style parser built using SSAX, though it returns SXML instead of DOM
  203. objects.
  204. @subsubsection Implementation
  205. @code{(sxml ssax)} is a package of low-to-high level lexing and parsing
  206. procedures that can be combined to yield a SAX, a DOM, a validating
  207. parser, or a parser intended for a particular document type. The
  208. procedures in the package can be used separately to tokenize or parse
  209. various pieces of XML documents. The package supports XML Namespaces,
  210. internal and external parsed entities, user-controlled handling of
  211. whitespace, and validation. This module therefore is intended to be a
  212. framework, a set of ``Lego blocks'' you can use to build a parser
  213. following any discipline and performing validation to any degree. As an
  214. example of the parser construction, the source file includes a
  215. semi-validating SXML parser.
  216. SSAX has a ``sequential'' feel of SAX yet a ``functional style'' of DOM.
  217. Like a SAX parser, the framework scans the document only once and
  218. permits incremental processing. An application that handles document
  219. elements in order can run as efficiently as possible. @emph{Unlike} a
  220. SAX parser, the framework does not require an application register
  221. stateful callbacks and surrender control to the parser. Rather, it is
  222. the application that can drive the framework -- calling its functions to
  223. get the current lexical or syntax element. These functions do not
  224. maintain or mutate any state save the input port. Therefore, the
  225. framework permits parsing of XML in a pure functional style, with the
  226. input port being a monad (or a linear, read-once parameter).
  227. Besides the @var{port}, there is another monad -- @var{seed}. Most of
  228. the middle- and high-level parsers are single-threaded through the
  229. @var{seed}. The functions of this framework do not process or affect
  230. the @var{seed} in any way: they simply pass it around as an instance of
  231. an opaque datatype. User functions, on the other hand, can use the seed
  232. to maintain user's state, to accumulate parsing results, etc. A user
  233. can freely mix their own functions with those of the framework. On the
  234. other hand, the user may wish to instantiate a high-level parser:
  235. @code{SSAX:make-elem-parser} or @code{SSAX:make-parser}. In the latter
  236. case, the user must provide functions of specific signatures, which are
  237. called at predictable moments during the parsing: to handle character
  238. data, element data, or processing instructions (PI). The functions are
  239. always given the @var{seed}, among other parameters, and must return the
  240. new @var{seed}.
  241. From a functional point of view, XML parsing is a combined
  242. pre-post-order traversal of a ``tree'' that is the XML document itself.
  243. This down-and-up traversal tells the user about an element when its
  244. start tag is encountered. The user is notified about the element once
  245. more, after all element's children have been handled. The process of
  246. XML parsing therefore is a fold over the raw XML document. Unlike a
  247. fold over trees defined in [1], the parser is necessarily
  248. single-threaded -- obviously as elements in a text XML document are laid
  249. down sequentially. The parser therefore is a tree fold that has been
  250. transformed to accept an accumulating parameter [1,2].
  251. Formally, the denotational semantics of the parser can be expressed as
  252. @smallexample
  253. parser:: (Start-tag -> Seed -> Seed) ->
  254. (Start-tag -> Seed -> Seed -> Seed) ->
  255. (Char-Data -> Seed -> Seed) ->
  256. XML-text-fragment -> Seed -> Seed
  257. parser fdown fup fchar "<elem attrs> content </elem>" seed
  258. = fup "<elem attrs>" seed
  259. (parser fdown fup fchar "content" (fdown "<elem attrs>" seed))
  260. parser fdown fup fchar "char-data content" seed
  261. = parser fdown fup fchar "content" (fchar "char-data" seed)
  262. parser fdown fup fchar "elem-content content" seed
  263. = parser fdown fup fchar "content" (
  264. parser fdown fup fchar "elem-content" seed)
  265. @end smallexample
  266. Compare the last two equations with the left fold
  267. @smallexample
  268. fold-left kons elem:list seed = fold-left kons list (kons elem seed)
  269. @end smallexample
  270. The real parser created by @code{SSAX:make-parser} is slightly more
  271. complicated, to account for processing instructions, entity references,
  272. namespaces, processing of document type declaration, etc.
  273. The XML standard document referred to in this module is
  274. @uref{http://www.w3.org/TR/1998/REC-xml-19980210.html}
  275. The present file also defines a procedure that parses the text of an XML
  276. document or of a separate element into SXML, an S-expression-based model
  277. of an XML Information Set. SXML is also an Abstract Syntax Tree of an
  278. XML document. SXML is similar but not identical to DOM; SXML is
  279. particularly suitable for Scheme-based XML/HTML authoring, SXPath
  280. queries, and tree transformations. See SXML.html for more details.
  281. SXML is a term implementation of evaluation of the XML document [3].
  282. The other implementation is context-passing.
  283. The present frameworks fully supports the XML Namespaces Recommendation:
  284. @uref{http://www.w3.org/TR/REC-xml-names/}.
  285. Other links:
  286. @table @asis
  287. @item [1]
  288. Jeremy Gibbons, Geraint Jones, "The Under-appreciated Unfold," Proc.
  289. ICFP'98, 1998, pp. 273-279.
  290. @item [2]
  291. Richard S. Bird, The promotion and accumulation strategies in
  292. transformational programming, ACM Trans. Progr. Lang. Systems,
  293. 6(4):487-504, October 1984.
  294. @item [3]
  295. Ralf Hinze, "Deriving Backtracking Monad Transformers," Functional
  296. Pearl. Proc ICFP'00, pp. 186-197.
  297. @end table
  298. @subsubsection Usage
  299. @deffn {Scheme Procedure} current-ssax-error-port
  300. @end deffn
  301. @deffn {Scheme Procedure} with-ssax-error-to-port port thunk
  302. @end deffn
  303. @deffn {Scheme Procedure} xml-token? _
  304. @verbatim
  305. -- Scheme Procedure: pair? x
  306. Return `#t' if X is a pair; otherwise return `#f'.
  307. @end verbatim
  308. @end deffn
  309. @deffn {Scheme Syntax} xml-token-kind token
  310. @end deffn
  311. @deffn {Scheme Syntax} xml-token-head token
  312. @end deffn
  313. @deffn {Scheme Procedure} make-empty-attlist
  314. @end deffn
  315. @deffn {Scheme Procedure} attlist-add attlist name-value
  316. @end deffn
  317. @deffn {Scheme Procedure} attlist-null? x
  318. Return @code{#t} if @var{x} is the empty list, else @code{#f}.
  319. @end deffn
  320. @deffn {Scheme Procedure} attlist-remove-top attlist
  321. @end deffn
  322. @deffn {Scheme Procedure} attlist->alist attlist
  323. @end deffn
  324. @deffn {Scheme Procedure} attlist-fold kons knil lis1
  325. @end deffn
  326. @deffn {Scheme Procedure} define-parsed-entity! entity str
  327. Define a new parsed entity. @var{entity} should be a symbol.
  328. Instances of &@var{entity}; in XML text will be replaced with the string
  329. @var{str}, which will then be parsed.
  330. @end deffn
  331. @deffn {Scheme Procedure} reset-parsed-entity-definitions!
  332. Restore the set of parsed entity definitions to its initial state.
  333. @end deffn
  334. @deffn {Scheme Procedure} ssax:uri-string->symbol uri-str
  335. @end deffn
  336. @deffn {Scheme Procedure} ssax:skip-internal-dtd port
  337. @end deffn
  338. @deffn {Scheme Procedure} ssax:read-pi-body-as-string port
  339. @end deffn
  340. @deffn {Scheme Procedure} ssax:reverse-collect-str-drop-ws fragments
  341. @end deffn
  342. @deffn {Scheme Procedure} ssax:read-markup-token port
  343. @end deffn
  344. @deffn {Scheme Procedure} ssax:read-cdata-body port str-handler seed
  345. @end deffn
  346. @deffn {Scheme Procedure} ssax:read-char-ref port
  347. @end deffn
  348. @deffn {Scheme Procedure} ssax:read-attributes port entities
  349. @end deffn
  350. @deffn {Scheme Procedure} ssax:complete-start-tag tag-head port elems entities namespaces
  351. @end deffn
  352. @deffn {Scheme Procedure} ssax:read-external-id port
  353. @end deffn
  354. @deffn {Scheme Procedure} ssax:read-char-data port expect-eof? str-handler seed
  355. @end deffn
  356. @deffn {Scheme Procedure} ssax:xml->sxml port namespace-prefix-assig
  357. @end deffn
  358. @deffn {Scheme Syntax} ssax:make-parser . kw-val-pairs
  359. @end deffn
  360. @deffn {Scheme Syntax} ssax:make-pi-parser orig-handlers
  361. @end deffn
  362. @deffn {Scheme Syntax} ssax:make-elem-parser my-new-level-seed my-finish-element my-char-data-handler my-pi-handlers
  363. @end deffn
  364. @node Transforming SXML
  365. @subsection Transforming SXML
  366. @subsubsection Overview
  367. @heading SXML expression tree transformers
  368. @subheading Pre-Post-order traversal of a tree and creation of a new tree
  369. @smallexample
  370. pre-post-order:: <tree> x <bindings> -> <new-tree>
  371. @end smallexample
  372. where
  373. @smallexample
  374. <bindings> ::= (<binding> ...)
  375. <binding> ::= (<trigger-symbol> *preorder* . <handler>) |
  376. (<trigger-symbol> *macro* . <handler>) |
  377. (<trigger-symbol> <new-bindings> . <handler>) |
  378. (<trigger-symbol> . <handler>)
  379. <trigger-symbol> ::= XMLname | *text* | *default*
  380. <handler> :: <trigger-symbol> x [<tree>] -> <new-tree>
  381. @end smallexample
  382. The @code{pre-post-order} function, in the @code{(sxml transform)}
  383. module, visits the nodes and nodelists pre-post-order (depth-first).
  384. For each @code{<Node>} of the form @code{(@var{name} <Node> ...)}, it
  385. looks up an association with the given @var{name} among its
  386. @var{<bindings>}. If failed, @code{pre-post-order} tries to locate a
  387. @code{*default*} binding. It's an error if the latter attempt fails as
  388. well. Having found a binding, the @code{pre-post-order} function first
  389. checks to see if the binding is of the form
  390. @smallexample
  391. (<trigger-symbol> *preorder* . <handler>)
  392. @end smallexample
  393. If it is, the handler is 'applied' to the current node. Otherwise, the
  394. pre-post-order function first calls itself recursively for each child of
  395. the current node, with @var{<new-bindings>} prepended to the
  396. @var{<bindings>} in effect. The result of these calls is passed to the
  397. @var{<handler>} (along with the head of the current @var{<Node>}). To be
  398. more precise, the handler is _applied_ to the head of the current node
  399. and its processed children. The result of the handler, which should also
  400. be a @code{<tree>}, replaces the current @var{<Node>}. If the current
  401. @var{<Node>} is a text string or other atom, a special binding with a
  402. symbol @code{*text*} is looked up.
  403. A binding can also be of a form
  404. @smallexample
  405. (<trigger-symbol> *macro* . <handler>)
  406. @end smallexample
  407. This is equivalent to @code{*preorder*} described above. However, the
  408. result is re-processed again, with the current stylesheet.
  409. @subsubsection Usage
  410. @deffn {Scheme Procedure} SRV:send-reply . fragments
  411. Output the @var{fragments} to the current output port.
  412. The fragments are a list of strings, characters, numbers, thunks,
  413. @code{#f}, @code{#t} -- and other fragments. The function traverses the
  414. tree depth-first, writes out strings and characters, executes thunks,
  415. and ignores @code{#f} and @code{'()}. The function returns @code{#t} if
  416. anything was written at all; otherwise the result is @code{#f} If
  417. @code{#t} occurs among the fragments, it is not written out but causes
  418. the result of @code{SRV:send-reply} to be @code{#t}.
  419. @end deffn
  420. @deffn {Scheme Procedure} foldts fdown fup fhere seed tree
  421. @end deffn
  422. @deffn {Scheme Procedure} post-order tree bindings
  423. @end deffn
  424. @deffn {Scheme Procedure} pre-post-order tree bindings
  425. @end deffn
  426. @deffn {Scheme Procedure} replace-range beg-pred end-pred forest
  427. @end deffn
  428. @node SXML Tree Fold
  429. @subsection SXML Tree Fold
  430. @subsubsection Overview
  431. @code{(sxml fold)} defines a number of variants of the @dfn{fold}
  432. algorithm for use in transforming SXML trees. Additionally it defines
  433. the layout operator, @code{fold-layout}, which might be described as a
  434. context-passing variant of SSAX's @code{pre-post-order}.
  435. @subsubsection Usage
  436. @deffn {Scheme Procedure} foldt fup fhere tree
  437. The standard multithreaded tree fold.
  438. @var{fup} is of type [a] -> a. @var{fhere} is of type object -> a.
  439. @end deffn
  440. @deffn {Scheme Procedure} foldts fdown fup fhere seed tree
  441. The single-threaded tree fold originally defined in SSAX. @xref{SSAX},
  442. for more information.
  443. @end deffn
  444. @deffn {Scheme Procedure} foldts* fdown fup fhere seed tree
  445. A variant of @code{foldts} that allows pre-order tree
  446. rewrites. Originally defined in Andy Wingo's 2007 paper,
  447. @emph{Applications of fold to XML transformation}.
  448. @end deffn
  449. @deffn {Scheme Procedure} fold-values proc list . seeds
  450. A variant of @code{fold} that allows multi-valued seeds. Note that the
  451. order of the arguments differs from that of @code{fold}. @xref{SRFI-1
  452. Fold and Map}.
  453. @end deffn
  454. @deffn {Scheme Procedure} foldts*-values fdown fup fhere tree . seeds
  455. A variant of @code{foldts*} that allows multi-valued
  456. seeds. Originally defined in Andy Wingo's 2007 paper, @emph{Applications
  457. of fold to XML transformation}.
  458. @end deffn
  459. @deffn {Scheme Procedure} fold-layout tree bindings params layout stylesheet
  460. A traversal combinator in the spirit of @code{pre-post-order}.
  461. @xref{Transforming SXML}.
  462. @code{fold-layout} was originally presented in Andy Wingo's 2007 paper,
  463. @emph{Applications of fold to XML transformation}.
  464. @example
  465. bindings := (<binding>...)
  466. binding := (<tag> <handler-pair>...)
  467. | (*default* . <post-handler>)
  468. | (*text* . <text-handler>)
  469. tag := <symbol>
  470. handler-pair := (pre-layout . <pre-layout-handler>)
  471. | (post . <post-handler>)
  472. | (bindings . <bindings>)
  473. | (pre . <pre-handler>)
  474. | (macro . <macro-handler>)
  475. @end example
  476. @table @var
  477. @item pre-layout-handler
  478. A function of three arguments:
  479. @table @var
  480. @item kids
  481. the kids of the current node, before traversal
  482. @item params
  483. the params of the current node
  484. @item layout
  485. the layout coming into this node
  486. @end table
  487. @var{pre-layout-handler} is expected to use this information to return a
  488. layout to pass to the kids. The default implementation returns the
  489. layout given in the arguments.
  490. @item post-handler
  491. A function of five arguments:
  492. @table @var
  493. @item tag
  494. the current tag being processed
  495. @item params
  496. the params of the current node
  497. @item layout
  498. the layout coming into the current node, before any kids were processed
  499. @item klayout
  500. the layout after processing all of the children
  501. @item kids
  502. the already-processed child nodes
  503. @end table
  504. @var{post-handler} should return two values, the layout to pass to the
  505. next node and the final tree.
  506. @item text-handler
  507. @var{text-handler} is a function of three arguments:
  508. @table @var
  509. @item text
  510. the string
  511. @item params
  512. the current params
  513. @item layout
  514. the current layout
  515. @end table
  516. @var{text-handler} should return two values, the layout to pass to the
  517. next node and the value to which the string should transform.
  518. @end table
  519. @end deffn
  520. @node SXPath
  521. @subsection SXPath
  522. @subsubsection Overview
  523. @heading SXPath: SXML Query Language
  524. SXPath is a query language for SXML, an instance of XML Information set
  525. (Infoset) in the form of s-expressions. See @code{(sxml ssax)} for the
  526. definition of SXML and more details. SXPath is also a translation into
  527. Scheme of an XML Path Language, @uref{http://www.w3.org/TR/xpath,XPath}.
  528. XPath and SXPath describe means of selecting a set of Infoset's items or
  529. their properties.
  530. To facilitate queries, XPath maps the XML Infoset into an explicit tree,
  531. and introduces important notions of a location path and a current,
  532. context node. A location path denotes a selection of a set of nodes
  533. relative to a context node. Any XPath tree has a distinguished, root
  534. node -- which serves as the context node for absolute location paths.
  535. Location path is recursively defined as a location step joined with a
  536. location path. A location step is a simple query of the database
  537. relative to a context node. A step may include expressions that further
  538. filter the selected set. Each node in the resulting set is used as a
  539. context node for the adjoining location path. The result of the step is
  540. a union of the sets returned by the latter location paths.
  541. The SXML representation of the XML Infoset (see SSAX.scm) is rather
  542. suitable for querying as it is. Bowing to the XPath specification, we
  543. will refer to SXML information items as 'Nodes':
  544. @example
  545. <Node> ::= <Element> | <attributes-coll> | <attrib>
  546. | "text string" | <PI>
  547. @end example
  548. This production can also be described as
  549. @example
  550. <Node> ::= (name . <Nodeset>) | "text string"
  551. @end example
  552. An (ordered) set of nodes is just a list of the constituent nodes:
  553. @example
  554. <Nodeset> ::= (<Node> ...)
  555. @end example
  556. Nodesets, and Nodes other than text strings are both lists. A <Nodeset>
  557. however is either an empty list, or a list whose head is not a symbol. A
  558. symbol at the head of a node is either an XML name (in which case it's a
  559. tag of an XML element), or an administrative name such as '@@'. This
  560. uniform list representation makes processing rather simple and elegant,
  561. while avoiding confusion. The multi-branch tree structure formed by the
  562. mutually-recursive datatypes <Node> and <Nodeset> lends itself well to
  563. processing by functional languages.
  564. A location path is in fact a composite query over an XPath tree or its
  565. branch. A singe step is a combination of a projection, selection or a
  566. transitive closure. Multiple steps are combined via join and union
  567. operations. This insight allows us to @emph{elegantly} implement XPath
  568. as a sequence of projection and filtering primitives -- converters --
  569. joined by @dfn{combinators}. Each converter takes a node and returns a
  570. nodeset which is the result of the corresponding query relative to that
  571. node. A converter can also be called on a set of nodes. In that case it
  572. returns a union of the corresponding queries over each node in the set.
  573. The union is easily implemented as a list append operation as all nodes
  574. in a SXML tree are considered distinct, by XPath conventions. We also
  575. preserve the order of the members in the union. Query combinators are
  576. high-order functions: they take converter(s) (which is a Node|Nodeset ->
  577. Nodeset function) and compose or otherwise combine them. We will be
  578. concerned with only relative location paths [XPath]: an absolute
  579. location path is a relative path applied to the root node.
  580. Similarly to XPath, SXPath defines full and abbreviated notations for
  581. location paths. In both cases, the abbreviated notation can be
  582. mechanically expanded into the full form by simple rewriting rules. In
  583. the case of SXPath the corresponding rules are given in the
  584. documentation of the @code{sxpath} procedure.
  585. @xref{sxpath-procedure-docs,,SXPath procedure documentation}.
  586. The regression test suite at the end of the file @file{SXPATH-old.scm}
  587. shows a representative sample of SXPaths in both notations, juxtaposed
  588. with the corresponding XPath expressions. Most of the samples are
  589. borrowed literally from the XPath specification.
  590. Much of the following material is taken from the SXPath sources by Oleg
  591. Kiselyov et al.
  592. @subsubsection Basic Converters and Applicators
  593. A converter is a function mapping a nodeset (or a single node) to another
  594. nodeset. Its type can be represented like this:
  595. @example
  596. type Converter = Node|Nodeset -> Nodeset
  597. @end example
  598. A converter can also play the role of a predicate: in that case, if a
  599. converter, applied to a node or a nodeset, yields a non-empty nodeset,
  600. the converter-predicate is deemed satisfied. Likewise, an empty nodeset
  601. is equivalent to @code{#f} in denoting failure.
  602. @deffn {Scheme Procedure} nodeset? x
  603. Return @code{#t} if @var{x} is a nodeset.
  604. @end deffn
  605. @deffn {Scheme Procedure} node-typeof? crit
  606. This function implements a 'Node test' as defined in Sec. 2.3 of the
  607. XPath document. A node test is one of the components of a location
  608. step. It is also a converter-predicate in SXPath.
  609. The function @code{node-typeof?} takes a type criterion and returns a
  610. function, which, when applied to a node, will tell if the node satisfies
  611. the test.
  612. The criterion @var{crit} is a symbol, one of the following:
  613. @table @code
  614. @item id
  615. tests if the node has the right name (id)
  616. @item @@
  617. tests if the node is an <attributes-coll>
  618. @item *
  619. tests if the node is an <Element>
  620. @item *text*
  621. tests if the node is a text node
  622. @item *PI*
  623. tests if the node is a PI (processing instruction) node
  624. @item *any*
  625. @code{#t} for any type of node
  626. @end table
  627. @end deffn
  628. @deffn {Scheme Procedure} node-eq? other
  629. A curried equivalence converter predicate that takes a node @var{other}
  630. and returns a function that takes another node. The two nodes are
  631. compared using @code{eq?}.
  632. @end deffn
  633. @deffn {Scheme Procedure} node-equal? other
  634. A curried equivalence converter predicate that takes a node @var{other}
  635. and returns a function that takes another node. The two nodes are
  636. compared using @code{equal?}.
  637. @end deffn
  638. @deffn {Scheme Procedure} node-pos n
  639. Select the @var{n}'th element of a nodeset and return as a singular
  640. nodeset. If the @var{n}'th element does not exist, return an empty
  641. nodeset. If @var{n} is a negative number the node is picked from the
  642. tail of the list.
  643. @example
  644. ((node-pos 1) nodeset) ; return the the head of the nodeset (if exists)
  645. ((node-pos 2) nodeset) ; return the node after that (if exists)
  646. ((node-pos -1) nodeset) ; selects the last node of a non-empty nodeset
  647. ((node-pos -2) nodeset) ; selects the last but one node, if exists.
  648. @end example
  649. @end deffn
  650. @deffn {Scheme Procedure} filter pred?
  651. A filter applicator, which introduces a filtering context. The argument
  652. converter @var{pred?} is considered a predicate, with either @code{#f}
  653. or @code{nil} meaning failure.
  654. @end deffn
  655. @deffn {Scheme Procedure} take-until pred?
  656. @example
  657. take-until:: Converter -> Converter, or
  658. take-until:: Pred -> Node|Nodeset -> Nodeset
  659. @end example
  660. Given a converter-predicate @var{pred?} and a nodeset, apply the
  661. predicate to each element of the nodeset, until the predicate yields
  662. anything but @code{#f} or @code{nil}. Return the elements of the input
  663. nodeset that have been processed until that moment (that is, which fail
  664. the predicate).
  665. @code{take-until} is a variation of the @code{filter} above:
  666. @code{take-until} passes elements of an ordered input set up to (but not
  667. including) the first element that satisfies the predicate. The nodeset
  668. returned by @code{((take-until (not pred)) nset)} is a subset -- to be
  669. more precise, a prefix -- of the nodeset returned by @code{((filter
  670. pred) nset)}.
  671. @end deffn
  672. @deffn {Scheme Procedure} take-after pred?
  673. @example
  674. take-after:: Converter -> Converter, or
  675. take-after:: Pred -> Node|Nodeset -> Nodeset
  676. @end example
  677. Given a converter-predicate @var{pred?} and a nodeset, apply the
  678. predicate to each element of the nodeset, until the predicate yields
  679. anything but @code{#f} or @code{nil}. Return the elements of the input
  680. nodeset that have not been processed: that is, return the elements of
  681. the input nodeset that follow the first element that satisfied the
  682. predicate.
  683. @code{take-after} along with @code{take-until} partition an input
  684. nodeset into three parts: the first element that satisfies a predicate,
  685. all preceding elements and all following elements.
  686. @end deffn
  687. @deffn {Scheme Procedure} map-union proc lst
  688. Apply @var{proc} to each element of @var{lst} and return the list of results.
  689. If @var{proc} returns a nodeset, splice it into the result
  690. From another point of view, @code{map-union} is a function
  691. @code{Converter->Converter}, which places an argument-converter in a joining
  692. context.
  693. @end deffn
  694. @deffn {Scheme Procedure} node-reverse node-or-nodeset
  695. @example
  696. node-reverse :: Converter, or
  697. node-reverse:: Node|Nodeset -> Nodeset
  698. @end example
  699. Reverses the order of nodes in the nodeset. This basic converter is
  700. needed to implement a reverse document order (see the XPath
  701. Recommendation).
  702. @end deffn
  703. @deffn {Scheme Procedure} node-trace title
  704. @example
  705. node-trace:: String -> Converter
  706. @end example
  707. @code{(node-trace title)} is an identity converter. In addition it
  708. prints out the node or nodeset it is applied to, prefixed with the
  709. @var{title}. This converter is very useful for debugging.
  710. @end deffn
  711. @subsubsection Converter Combinators
  712. Combinators are higher-order functions that transmogrify a converter or
  713. glue a sequence of converters into a single, non-trivial converter. The
  714. goal is to arrive at converters that correspond to XPath location paths.
  715. From a different point of view, a combinator is a fixed, named
  716. @dfn{pattern} of applying converters. Given below is a complete set of
  717. such patterns that together implement XPath location path specification.
  718. As it turns out, all these combinators can be built from a small number
  719. of basic blocks: regular functional composition, @code{map-union} and
  720. @code{filter} applicators, and the nodeset union.
  721. @deffn {Scheme Procedure} select-kids test-pred?
  722. @code{select-kids} takes a converter (or a predicate) as an argument and
  723. returns another converter. The resulting converter applied to a nodeset
  724. returns an ordered subset of its children that satisfy the predicate
  725. @var{test-pred?}.
  726. @end deffn
  727. @deffn {Scheme Procedure} node-self pred?
  728. Similar to @code{select-kids} except that the predicate @var{pred?} is
  729. applied to the node itself rather than to its children. The resulting
  730. nodeset will contain either one component, or will be empty if the node
  731. failed the predicate.
  732. @end deffn
  733. @deffn {Scheme Procedure} node-join . selectors
  734. @example
  735. node-join:: [LocPath] -> Node|Nodeset -> Nodeset, or
  736. node-join:: [Converter] -> Converter
  737. @end example
  738. Join the sequence of location steps or paths as described above.
  739. @end deffn
  740. @deffn {Scheme Procedure} node-reduce . converters
  741. @example
  742. node-reduce:: [LocPath] -> Node|Nodeset -> Nodeset, or
  743. node-reduce:: [Converter] -> Converter
  744. @end example
  745. A regular functional composition of converters. From a different point
  746. of view, @code{((apply node-reduce converters) nodeset)} is equivalent
  747. to @code{(foldl apply nodeset converters)}, i.e., folding, or reducing,
  748. a list of converters with the nodeset as a seed.
  749. @end deffn
  750. @deffn {Scheme Procedure} node-or . converters
  751. @example
  752. node-or:: [Converter] -> Converter
  753. @end example
  754. This combinator applies all converters to a given node and produces the
  755. union of their results. This combinator corresponds to a union
  756. (@code{|} operation) for XPath location paths.
  757. @end deffn
  758. @deffn {Scheme Procedure} node-closure test-pred?
  759. @example
  760. node-closure:: Converter -> Converter
  761. @end example
  762. Select all @emph{descendants} of a node that satisfy a
  763. converter-predicate @var{test-pred?}. This combinator is similar to
  764. @code{select-kids} but applies to grand... children as well. This
  765. combinator implements the @code{descendant::} XPath axis. Conceptually,
  766. this combinator can be expressed as
  767. @example
  768. (define (node-closure f)
  769. (node-or
  770. (select-kids f)
  771. (node-reduce (select-kids (node-typeof? '*)) (node-closure f))))
  772. @end example
  773. This definition, as written, looks somewhat like a fixpoint, and it will
  774. run forever. It is obvious however that sooner or later
  775. @code{(select-kids (node-typeof? '*))} will return an empty nodeset. At
  776. this point further iterations will no longer affect the result and can
  777. be stopped.
  778. @end deffn
  779. @deffn {Scheme Procedure} node-parent rootnode
  780. @example
  781. node-parent:: RootNode -> Converter
  782. @end example
  783. @code{(node-parent rootnode)} yields a converter that returns a parent
  784. of a node it is applied to. If applied to a nodeset, it returns the
  785. list of parents of nodes in the nodeset. The @var{rootnode} does not
  786. have to be the root node of the whole SXML tree -- it may be a root node
  787. of a branch of interest.
  788. Given the notation of Philip Wadler's paper on semantics of XSLT,
  789. @verbatim
  790. parent(x) = { y | y=subnode*(root), x=subnode(y) }
  791. @end verbatim
  792. Therefore, @code{node-parent} is not the fundamental converter: it can
  793. be expressed through the existing ones. Yet @code{node-parent} is a
  794. rather convenient converter. It corresponds to a @code{parent::} axis
  795. of SXPath. Note that the @code{parent::} axis can be used with an
  796. attribute node as well.
  797. @end deffn
  798. @anchor{sxpath-procedure-docs}
  799. @deffn {Scheme Procedure} sxpath path
  800. Evaluate an abbreviated SXPath.
  801. @example
  802. sxpath:: AbbrPath -> Converter, or
  803. sxpath:: AbbrPath -> Node|Nodeset -> Nodeset
  804. @end example
  805. @var{path} is a list. It is translated to the full SXPath according to
  806. the following rewriting rules:
  807. @example
  808. (sxpath '())
  809. @result{} (node-join)
  810. (sxpath '(path-component ...))
  811. @result{} (node-join (sxpath1 path-component) (sxpath '(...)))
  812. (sxpath1 '//)
  813. @result{} (node-or
  814. (node-self (node-typeof? '*any*))
  815. (node-closure (node-typeof? '*any*)))
  816. (sxpath1 '(equal? x))
  817. @result{} (select-kids (node-equal? x))
  818. (sxpath1 '(eq? x))
  819. @result{} (select-kids (node-eq? x))
  820. (sxpath1 ?symbol)
  821. @result{} (select-kids (node-typeof? ?symbol)
  822. (sxpath1 procedure)
  823. @result{} procedure
  824. (sxpath1 '(?symbol ...))
  825. @result{} (sxpath1 '((?symbol) ...))
  826. (sxpath1 '(path reducer ...))
  827. @result{} (node-reduce (sxpath path) (sxpathr reducer) ...)
  828. (sxpathr number)
  829. @result{} (node-pos number)
  830. (sxpathr path-filter)
  831. @result{} (filter (sxpath path-filter))
  832. @end example
  833. @end deffn
  834. @node sxml ssax input-parse
  835. @subsection (sxml ssax input-parse)
  836. @subsubsection Overview
  837. A simple lexer.
  838. The procedures in this module surprisingly often suffice to parse an
  839. input stream. They either skip, or build and return tokens, according to
  840. inclusion or delimiting semantics. The list of characters to expect,
  841. include, or to break at may vary from one invocation of a function to
  842. another. This allows the functions to easily parse even
  843. context-sensitive languages.
  844. EOF is generally frowned on, and thrown up upon if encountered.
  845. Exceptions are mentioned specifically. The list of expected characters
  846. (characters to skip until, or break-characters) may include an EOF
  847. "character", which is to be coded as the symbol, @code{*eof*}.
  848. The input stream to parse is specified as a @dfn{port}, which is usually
  849. the last (and optional) argument. It defaults to the current input port
  850. if omitted.
  851. If the parser encounters an error, it will throw an exception to the key
  852. @code{parser-error}. The arguments will be of the form @code{(@var{port}
  853. @var{message} @var{specialising-msg}*)}.
  854. The first argument is a port, which typically points to the offending
  855. character or its neighborhood. You can then use @code{port-column} and
  856. @code{port-line} to query the current position. @var{message} is the
  857. description of the error. Other arguments supply more details about the
  858. problem.
  859. @subsubsection Usage
  860. @deffn {Scheme Procedure} peek-next-char [port]
  861. @end deffn
  862. @deffn {Scheme Procedure} assert-curr-char expected-chars comment [port]
  863. @end deffn
  864. @deffn {Scheme Procedure} skip-until arg [port]
  865. @end deffn
  866. @deffn {Scheme Procedure} skip-while skip-chars [port]
  867. @end deffn
  868. @deffn {Scheme Procedure} next-token prefix-skipped-chars break-chars [comment] [port]
  869. @end deffn
  870. @deffn {Scheme Procedure} next-token-of incl-list/pred [port]
  871. @end deffn
  872. @deffn {Scheme Procedure} read-text-line [port]
  873. @end deffn
  874. @deffn {Scheme Procedure} read-string n [port]
  875. @end deffn
  876. @deffn {Scheme Procedure} find-string-from-port? _ _ . _
  877. Looks for @var{str} in @var{<input-port>}, optionally within the first
  878. @var{max-no-char} characters.
  879. @end deffn
  880. @node sxml apply-templates
  881. @subsection (sxml apply-templates)
  882. @subsubsection Overview
  883. Pre-order traversal of a tree and creation of a new tree:
  884. @smallexample
  885. apply-templates:: tree x <templates> -> <new-tree>
  886. @end smallexample
  887. where
  888. @smallexample
  889. <templates> ::= (<template> ...)
  890. <template> ::= (<node-test> <node-test> ... <node-test> . <handler>)
  891. <node-test> ::= an argument to node-typeof? above
  892. <handler> ::= <tree> -> <new-tree>
  893. @end smallexample
  894. This procedure does a @emph{normal}, pre-order traversal of an SXML
  895. tree. It walks the tree, checking at each node against the list of
  896. matching templates.
  897. If the match is found (which must be unique, i.e., unambiguous), the
  898. corresponding handler is invoked and given the current node as an
  899. argument. The result from the handler, which must be a @code{<tree>},
  900. takes place of the current node in the resulting tree. The name of the
  901. function is not accidental: it resembles rather closely an
  902. @code{apply-templates} function of XSLT.
  903. @subsubsection Usage
  904. @deffn {Scheme Procedure} apply-templates tree templates
  905. @end deffn