sxml.texi 32 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913
  1. @c -*-texinfo-*-
  2. @c This is part of the GNU Guile Reference Manual.
  3. @c Copyright (C) 2013 Free Software Foundation, Inc.
  4. @c See the file guile.texi for copying conditions.
  5. @node SXML
  6. @section SXML
  7. SXML is a native representation of XML in terms of standard Scheme data
  8. types: lists, symbols, and strings. For example, the simple XML
  9. fragment:
  10. @example
  11. <parrot type="African Grey"><name>Alfie</name></parrot>
  12. @end example
  13. may be represented with the following SXML:
  14. @example
  15. (parrot (@@ (type "African Grey)) (name "Alfie"))
  16. @end example
  17. SXML is very general, and is capable of representing all of XML.
  18. Formally, this means that SXML is a conforming implementation of the
  19. @uref{XML Information Set,http://www.w3.org/TR/xml-infoset/} standard.
  20. Guile includes several facilities for working with XML and SXML:
  21. parsers, serializers, and transformers.
  22. @menu
  23. * SXML Overview:: XML, as it was meant to be
  24. * Reading and Writing XML:: Convenient XML parsing and serializing
  25. * SSAX:: Custom functional-style XML parsers
  26. * Transforming SXML:: Munging SXML with @code{pre-post-order}
  27. * SXML Tree Fold:: Fold-based SXML transformations
  28. * SXPath:: XPath for SXML
  29. * sxml apply-templates:: A more XSLT-like approach to SXML transformations
  30. * sxml ssax input-parse:: The SSAX tokenizer, optimized for Guile
  31. @end menu
  32. @node SXML Overview
  33. @subsection SXML Overview
  34. (This section needs to be written; volunteers welcome.)
  35. @node Reading and Writing XML
  36. @subsection Reading and Writing XML
  37. The @code{(sxml simple)} module presents a basic interface for parsing
  38. XML from a port into the Scheme SXML format, and for serializing it back
  39. to text.
  40. @example
  41. (use-modules (sxml simple))
  42. @end example
  43. @deffn {Scheme Procedure} xml->sxml [string-or-port] [#:namespaces='()] @
  44. [#:declare-namespaces?=#t] [#:trim-whitespace?=#f] @
  45. [#:entities='()] [#:default-entity-handler=#f] @
  46. [#:doctype-handler=#f]
  47. Use SSAX to parse an XML document into SXML. Takes one optional
  48. argument, @var{string-or-port}, which defaults to the current input
  49. port. Returns the resulting SXML document. If @var{string-or-port} is
  50. a port, it will be left pointing at the next available character in the
  51. port.
  52. @end deffn
  53. As is normal in SXML, XML elements parse as tagged lists. Attributes,
  54. if any, are placed after the tag, within an @code{@@} element. The root
  55. of the resulting XML will be contained in a special tag, @code{*TOP*}.
  56. This tag will contain the root element of the XML, but also any prior
  57. processing instructions.
  58. @example
  59. (xml->sxml "<foo/>")
  60. @result{} (*TOP* (foo))
  61. (xml->sxml "<foo>text</foo>")
  62. @result{} (*TOP* (foo "text"))
  63. (xml->sxml "<foo kind=\"bar\">text</foo>")
  64. @result{} (*TOP* (foo (@@ (kind "bar")) "text"))
  65. (xml->sxml "<?xml version=\"1.0\"?><foo/>")
  66. @result{} (*TOP* (*PI* xml "version=\"1.0\"") (foo))
  67. @end example
  68. All namespaces in the XML document must be declared, via @code{xmlns}
  69. attributes. SXML elements built from non-default namespaces will have
  70. their tags prefixed with their URI. Users can specify custom prefixes
  71. for certain namespaces with the @code{#:namespaces} keyword argument to
  72. @code{xml->sxml}.
  73. @example
  74. (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>")
  75. @result{} (*TOP* (http://example.org/ns1:foo "text"))
  76. (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>"
  77. #:namespaces '((ns1 . "http://example.org/ns1")))
  78. @result{} (*TOP* (ns1:foo "text"))
  79. (xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\"><bar:baz/></foo>"
  80. #:namespaces '((ns2 . "http://example.org/ns2")))
  81. @result{} (*TOP* (foo (ns2:baz)))
  82. @end example
  83. By default, namespaces passed to @code{xml->sxml} are treated as if they
  84. were declared on the root element. Passing a false
  85. @code{#:declare-namespaces?} argument will disable this behavior,
  86. requiring in-document declarations of namespaces before use..
  87. @example
  88. (xml->sxml "<foo><ns2:baz/></foo>"
  89. #:namespaces '((ns2 . "http://example.org/ns2")))
  90. @result{} (*TOP* (foo (ns2:baz)))
  91. (xml->sxml "<foo><ns2:baz/></foo>"
  92. #:namespaces '((ns2 . "http://example.org/ns2"))
  93. #:declare-namespaces? #f)
  94. @result{} error: undeclared namespace: `bar'
  95. @end example
  96. By default, all whitespace in XML is significant. Passing the
  97. @code{#:trim-whitespace?} keyword argument to @code{xml->sxml} will trim
  98. whitespace in front, behind and between elements, treating it as
  99. ``unsignificant''. Whitespace in text fragments is left alone.
  100. @example
  101. (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
  102. @result{} (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n"))
  103. (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
  104. #:trim-whitespace? #t)
  105. @result{} (*TOP* (foo (bar " Alfie the parrot! ")))
  106. @end example
  107. Parsed entities may be declared with the @code{#:entities} keyword
  108. argument, or handled with the @code{#:default-entity-handler}. By
  109. default, only the standard @code{&lt;}, @code{&gt;}, @code{&amp;},
  110. @code{&apos;} and @code{&quot;} entities are defined, as well as the
  111. @code{&#@var{N};} and @code{&#x@var{N};} (decimal and hexadecimal)
  112. numeric character entities.
  113. @example
  114. (xml->sxml "<foo>&amp;</foo>")
  115. @result{} (*TOP* (foo "&"))
  116. (xml->sxml "<foo>&nbsp;</foo>")
  117. @result{} error: undefined entity: nbsp
  118. (xml->sxml "<foo>&#xA0;</foo>")
  119. @result{} (*TOP* (foo "\xa0"))
  120. (xml->sxml "<foo>&nbsp;</foo>"
  121. #:entities '((nbsp . "\xa0")))
  122. @result{} (*TOP* (foo "\xa0"))
  123. (xml->sxml "<foo>&nbsp; &foo;</foo>"
  124. #:default-entity-handler
  125. (lambda (port name)
  126. (case name
  127. ((nbsp) "\xa0")
  128. (else
  129. (format (current-warning-port)
  130. "~a:~a:~a: undefined entitity: ~a\n"
  131. (or (port-filename port) "<unknown file>")
  132. (port-line port) (port-column port)
  133. name)
  134. (symbol->string name)))))
  135. @print{} <unknown file>:0:17: undefined entitity: foo
  136. @result{} (*TOP* (foo "\xa0 foo"))
  137. @end example
  138. By default, @code{xml->sxml} skips over the @code{<!DOCTYPE>}
  139. declaration, if any. This behavior can be overridden with the
  140. @code{#:doctype-handler} argument, which should be a procedure of three
  141. arguments: the @dfn{docname} (a symbol), @dfn{systemid} (a string), and
  142. the internal doctype subset (as a string or @code{#f} if not present).
  143. The handler should return keyword arguments as multiple values, as if it
  144. were calling its continuation with keyword arguments. The continuation
  145. accepts the @code{#:entities} and @code{#:namespaces} keyword arguments,
  146. in the same format that @code{xml->sxml} itself takes. These entities
  147. and namespaces will be prepended to those given to the @code{xml->sxml}
  148. invocation.
  149. @example
  150. (define (handle-foo docname systemid internal-subset)
  151. (case docname
  152. ((foo)
  153. (values #:entities '((greets . "<i>Hello, world!</i>"))))
  154. (else
  155. (values))))
  156. (xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
  157. #:doctype-handler handle-foo)
  158. @result{} (*TOP* (p (i "Hello, world!")))
  159. @end example
  160. If the document has no doctype declaration, the @var{doctype-handler} is
  161. invoked with @code{#f} for the three arguments.
  162. In the future, the continuation may accept other keyword arguments, for
  163. example to validate the parsed SXML against the doctype.
  164. @deffn {Scheme Procedure} sxml->xml tree [port]
  165. Serialize the SXML tree @var{tree} as XML. The output will be written to
  166. the current output port, unless the optional argument @var{port} is
  167. present.
  168. @end deffn
  169. @deffn {Scheme Procedure} sxml->string sxml
  170. Detag an sxml tree @var{sxml} into a string. Does not perform any
  171. formatting.
  172. @end deffn
  173. @node SSAX
  174. @subsection SSAX: A Functional XML Parsing Toolkit
  175. Guile's XML parser is based on Oleg Kiselyov's powerful XML parsing
  176. toolkit, SSAX.
  177. @subsubsection History
  178. Back in the 1990s, when the world was young again and XML was the
  179. solution to all of its problems, there were basically two kinds of XML
  180. parsers out there: DOM parsers and SAX parsers.
  181. A DOM parser reads through an entire XML document, building up a tree of
  182. ``DOM objects'' representing the document structure. They are very easy
  183. to use, but sometimes you don't actually want all of the information in
  184. a document; building an object tree is not necessary if all you want to
  185. do is to count word frequencies in a document, for example.
  186. SAX parsers were created to give the programmer more control on the
  187. parsing process. A programmer gives the SAX parser a number of
  188. ``callbacks'': functions that will be called on various features of the
  189. XML stream as they are encountered. SAX parsers are more efficient, but
  190. much harder to user, as users typically have to manually maintain a
  191. stack of open elements.
  192. Kiselyov realized that the SAX programming model could be made much
  193. simpler if the callbacks were formulated not as a linear fold across the
  194. features of the XML stream, but as a @emph{tree fold} over the structure
  195. implicit in the XML. In this way, the user has a very convenient,
  196. functional-style interface that can still generate optimal parsers.
  197. The @code{xml->sxml} interface from the @code{(sxml simple)} module is a
  198. DOM-style parser built using SSAX, though it returns SXML instead of DOM
  199. objects.
  200. @subsubsection Implementation
  201. @code{(sxml ssax)} is a package of low-to-high level lexing and parsing
  202. procedures that can be combined to yield a SAX, a DOM, a validating
  203. parser, or a parser intended for a particular document type. The
  204. procedures in the package can be used separately to tokenize or parse
  205. various pieces of XML documents. The package supports XML Namespaces,
  206. internal and external parsed entities, user-controlled handling of
  207. whitespace, and validation. This module therefore is intended to be a
  208. framework, a set of ``Lego blocks'' you can use to build a parser
  209. following any discipline and performing validation to any degree. As an
  210. example of the parser construction, this file includes a semi-validating
  211. SXML parser.
  212. SSAX has a ``sequential'' feel of SAX yet a ``functional style'' of DOM.
  213. Like a SAX parser, the framework scans the document only once and
  214. permits incremental processing. An application that handles document
  215. elements in order can run as efficiently as possible. @emph{Unlike} a
  216. SAX parser, the framework does not require an application register
  217. stateful callbacks and surrender control to the parser. Rather, it is
  218. the application that can drive the framework -- calling its functions to
  219. get the current lexical or syntax element. These functions do not
  220. maintain or mutate any state save the input port. Therefore, the
  221. framework permits parsing of XML in a pure functional style, with the
  222. input port being a monad (or a linear, read-once parameter).
  223. Besides the @var{port}, there is another monad -- @var{seed}. Most of
  224. the middle- and high-level parsers are single-threaded through the
  225. @var{seed}. The functions of this framework do not process or affect
  226. the @var{seed} in any way: they simply pass it around as an instance of
  227. an opaque datatype. User functions, on the other hand, can use the seed
  228. to maintain user's state, to accumulate parsing results, etc. A user
  229. can freely mix his own functions with those of the framework. On the
  230. other hand, the user may wish to instantiate a high-level parser:
  231. @code{SSAX:make-elem-parser} or @code{SSAX:make-parser}. In the latter
  232. case, the user must provide functions of specific signatures, which are
  233. called at predictable moments during the parsing: to handle character
  234. data, element data, or processing instructions (PI). The functions are
  235. always given the @var{seed}, among other parameters, and must return the
  236. new @var{seed}.
  237. From a functional point of view, XML parsing is a combined
  238. pre-post-order traversal of a ``tree'' that is the XML document itself.
  239. This down-and-up traversal tells the user about an element when its
  240. start tag is encountered. The user is notified about the element once
  241. more, after all element's children have been handled. The process of
  242. XML parsing therefore is a fold over the raw XML document. Unlike a
  243. fold over trees defined in [1], the parser is necessarily
  244. single-threaded -- obviously as elements in a text XML document are laid
  245. down sequentially. The parser therefore is a tree fold that has been
  246. transformed to accept an accumulating parameter [1,2].
  247. Formally, the denotational semantics of the parser can be expressed as
  248. @smallexample
  249. parser:: (Start-tag -> Seed -> Seed) ->
  250. (Start-tag -> Seed -> Seed -> Seed) ->
  251. (Char-Data -> Seed -> Seed) ->
  252. XML-text-fragment -> Seed -> Seed
  253. parser fdown fup fchar "<elem attrs> content </elem>" seed
  254. = fup "<elem attrs>" seed
  255. (parser fdown fup fchar "content" (fdown "<elem attrs>" seed))
  256. parser fdown fup fchar "char-data content" seed
  257. = parser fdown fup fchar "content" (fchar "char-data" seed)
  258. parser fdown fup fchar "elem-content content" seed
  259. = parser fdown fup fchar "content" (
  260. parser fdown fup fchar "elem-content" seed)
  261. @end smallexample
  262. Compare the last two equations with the left fold
  263. @smallexample
  264. fold-left kons elem:list seed = fold-left kons list (kons elem seed)
  265. @end smallexample
  266. The real parser created by @code{SSAX:make-parser} is slightly more
  267. complicated, to account for processing instructions, entity references,
  268. namespaces, processing of document type declaration, etc.
  269. The XML standard document referred to in this module is
  270. @uref{http://www.w3.org/TR/1998/REC-xml-19980210.html}
  271. The present file also defines a procedure that parses the text of an XML
  272. document or of a separate element into SXML, an S-expression-based model
  273. of an XML Information Set. SXML is also an Abstract Syntax Tree of an
  274. XML document. SXML is similar but not identical to DOM; SXML is
  275. particularly suitable for Scheme-based XML/HTML authoring, SXPath
  276. queries, and tree transformations. See SXML.html for more details.
  277. SXML is a term implementation of evaluation of the XML document [3].
  278. The other implementation is context-passing.
  279. The present frameworks fully supports the XML Namespaces Recommendation:
  280. @uref{http://www.w3.org/TR/REC-xml-names/}.
  281. Other links:
  282. @table @asis
  283. @item [1]
  284. Jeremy Gibbons, Geraint Jones, "The Under-appreciated Unfold," Proc.
  285. ICFP'98, 1998, pp. 273-279.
  286. @item [2]
  287. Richard S. Bird, The promotion and accumulation strategies in
  288. transformational programming, ACM Trans. Progr. Lang. Systems,
  289. 6(4):487-504, October 1984.
  290. @item [3]
  291. Ralf Hinze, "Deriving Backtracking Monad Transformers," Functional
  292. Pearl. Proc ICFP'00, pp. 186-197.
  293. @end table
  294. @subsubsection Usage
  295. @deffn {Scheme Procedure} current-ssax-error-port
  296. @end deffn
  297. @deffn {Scheme Procedure} with-ssax-error-to-port port thunk
  298. @end deffn
  299. @deffn {Scheme Procedure} xml-token? _
  300. @verbatim
  301. -- Scheme Procedure: pair? x
  302. Return `#t' if X is a pair; otherwise return `#f'.
  303. @end verbatim
  304. @end deffn
  305. @deffn {Scheme Syntax} xml-token-kind token
  306. @end deffn
  307. @deffn {Scheme Syntax} xml-token-head token
  308. @end deffn
  309. @deffn {Scheme Procedure} make-empty-attlist
  310. @end deffn
  311. @deffn {Scheme Procedure} attlist-add attlist name-value
  312. @end deffn
  313. @deffn {Scheme Procedure} attlist-null? x
  314. Return @code{#t} if @var{x} is the empty list, else @code{#f}.
  315. @end deffn
  316. @deffn {Scheme Procedure} attlist-remove-top attlist
  317. @end deffn
  318. @deffn {Scheme Procedure} attlist->alist attlist
  319. @end deffn
  320. @deffn {Scheme Procedure} attlist-fold kons knil lis1
  321. @end deffn
  322. @deffn {Scheme Procedure} define-parsed-entity! entity str
  323. Define a new parsed entity. @var{entity} should be a symbol.
  324. Instances of &@var{entity}; in XML text will be replaced with the string
  325. @var{str}, which will then be parsed.
  326. @end deffn
  327. @deffn {Scheme Procedure} reset-parsed-entity-definitions!
  328. Restore the set of parsed entity definitions to its initial state.
  329. @end deffn
  330. @deffn {Scheme Procedure} ssax:uri-string->symbol uri-str
  331. @end deffn
  332. @deffn {Scheme Procedure} ssax:skip-internal-dtd port
  333. @end deffn
  334. @deffn {Scheme Procedure} ssax:read-pi-body-as-string port
  335. @end deffn
  336. @deffn {Scheme Procedure} ssax:reverse-collect-str-drop-ws fragments
  337. @end deffn
  338. @deffn {Scheme Procedure} ssax:read-markup-token port
  339. @end deffn
  340. @deffn {Scheme Procedure} ssax:read-cdata-body port str-handler seed
  341. @end deffn
  342. @deffn {Scheme Procedure} ssax:read-char-ref port
  343. @end deffn
  344. @deffn {Scheme Procedure} ssax:read-attributes port entities
  345. @end deffn
  346. @deffn {Scheme Procedure} ssax:complete-start-tag tag-head port elems entities namespaces
  347. @end deffn
  348. @deffn {Scheme Procedure} ssax:read-external-id port
  349. @end deffn
  350. @deffn {Scheme Procedure} ssax:read-char-data port expect-eof? str-handler seed
  351. @end deffn
  352. @deffn {Scheme Procedure} ssax:xml->sxml port namespace-prefix-assig
  353. @end deffn
  354. @deffn {Scheme Syntax} ssax:make-parser . kw-val-pairs
  355. @end deffn
  356. @deffn {Scheme Syntax} ssax:make-pi-parser orig-handlers
  357. @end deffn
  358. @deffn {Scheme Syntax} ssax:make-elem-parser my-new-level-seed my-finish-element my-char-data-handler my-pi-handlers
  359. @end deffn
  360. @node Transforming SXML
  361. @subsection Transforming SXML
  362. @subsubsection Overview
  363. @heading SXML expression tree transformers
  364. @subheading Pre-Post-order traversal of a tree and creation of a new tree
  365. @smallexample
  366. pre-post-order:: <tree> x <bindings> -> <new-tree>
  367. @end smallexample
  368. where
  369. @smallexample
  370. <bindings> ::= (<binding> ...)
  371. <binding> ::= (<trigger-symbol> *preorder* . <handler>) |
  372. (<trigger-symbol> *macro* . <handler>) |
  373. (<trigger-symbol> <new-bindings> . <handler>) |
  374. (<trigger-symbol> . <handler>)
  375. <trigger-symbol> ::= XMLname | *text* | *default*
  376. <handler> :: <trigger-symbol> x [<tree>] -> <new-tree>
  377. @end smallexample
  378. The pre-post-order function visits the nodes and nodelists
  379. pre-post-order (depth-first). For each @code{<Node>} of the form
  380. @code{(@var{name} <Node> ...)}, it looks up an association with the
  381. given @var{name} among its @var{<bindings>}. If failed,
  382. @code{pre-post-order} tries to locate a @code{*default*} binding. It's
  383. an error if the latter attempt fails as well. Having found a binding,
  384. the @code{pre-post-order} function first checks to see if the binding is
  385. of the form
  386. @smallexample
  387. (<trigger-symbol> *preorder* . <handler>)
  388. @end smallexample
  389. If it is, the handler is 'applied' to the current node. Otherwise, the
  390. pre-post-order function first calls itself recursively for each child of
  391. the current node, with @var{<new-bindings>} prepended to the
  392. @var{<bindings>} in effect. The result of these calls is passed to the
  393. @var{<handler>} (along with the head of the current @var{<Node>}). To be
  394. more precise, the handler is _applied_ to the head of the current node
  395. and its processed children. The result of the handler, which should also
  396. be a @code{<tree>}, replaces the current @var{<Node>}. If the current
  397. @var{<Node>} is a text string or other atom, a special binding with a
  398. symbol @code{*text*} is looked up.
  399. A binding can also be of a form
  400. @smallexample
  401. (<trigger-symbol> *macro* . <handler>)
  402. @end smallexample
  403. This is equivalent to @code{*preorder*} described above. However, the
  404. result is re-processed again, with the current stylesheet.
  405. @subsubsection Usage
  406. @deffn {Scheme Procedure} SRV:send-reply . fragments
  407. Output the @var{fragments} to the current output port.
  408. The fragments are a list of strings, characters, numbers, thunks,
  409. @code{#f}, @code{#t} -- and other fragments. The function traverses the
  410. tree depth-first, writes out strings and characters, executes thunks,
  411. and ignores @code{#f} and @code{'()}. The function returns @code{#t} if
  412. anything was written at all; otherwise the result is @code{#f} If
  413. @code{#t} occurs among the fragments, it is not written out but causes
  414. the result of @code{SRV:send-reply} to be @code{#t}.
  415. @end deffn
  416. @deffn {Scheme Procedure} foldts fdown fup fhere seed tree
  417. @end deffn
  418. @deffn {Scheme Procedure} post-order tree bindings
  419. @end deffn
  420. @deffn {Scheme Procedure} pre-post-order tree bindings
  421. @end deffn
  422. @deffn {Scheme Procedure} replace-range beg-pred end-pred forest
  423. @end deffn
  424. @node SXML Tree Fold
  425. @subsection SXML Tree Fold
  426. @subsubsection Overview
  427. @code{(sxml fold)} defines a number of variants of the @dfn{fold}
  428. algorithm for use in transforming SXML trees. Additionally it defines
  429. the layout operator, @code{fold-layout}, which might be described as a
  430. context-passing variant of SSAX's @code{pre-post-order}.
  431. @subsubsection Usage
  432. @deffn {Scheme Procedure} foldt fup fhere tree
  433. The standard multithreaded tree fold.
  434. @var{fup} is of type [a] -> a. @var{fhere} is of type object -> a.
  435. @end deffn
  436. @deffn {Scheme Procedure} foldts fdown fup fhere seed tree
  437. The single-threaded tree fold originally defined in SSAX. @xref{SSAX},
  438. for more information.
  439. @end deffn
  440. @deffn {Scheme Procedure} foldts* fdown fup fhere seed tree
  441. A variant of @code{foldts} that allows pre-order tree
  442. rewrites. Originally defined in Andy Wingo's 2007 paper,
  443. @emph{Applications of fold to XML transformation}.
  444. @end deffn
  445. @deffn {Scheme Procedure} fold-values proc list . seeds
  446. A variant of @code{fold} that allows multi-valued seeds. Note that the
  447. order of the arguments differs from that of @code{fold}. @xref{SRFI-1
  448. Fold and Map}.
  449. @end deffn
  450. @deffn {Scheme Procedure} foldts*-values fdown fup fhere tree . seeds
  451. A variant of @code{foldts*} that allows multi-valued
  452. seeds. Originally defined in Andy Wingo's 2007 paper, @emph{Applications
  453. of fold to XML transformation}.
  454. @end deffn
  455. @deffn {Scheme Procedure} fold-layout tree bindings params layout stylesheet
  456. A traversal combinator in the spirit of @code{pre-post-order}.
  457. @xref{Transforming SXML}.
  458. @code{fold-layout} was originally presented in Andy Wingo's 2007 paper,
  459. @emph{Applications of fold to XML transformation}.
  460. @example
  461. bindings := (<binding>...)
  462. binding := (<tag> <bandler-pair>...)
  463. | (*default* . <post-handler>)
  464. | (*text* . <text-handler>)
  465. tag := <symbol>
  466. handler-pair := (pre-layout . <pre-layout-handler>)
  467. | (post . <post-handler>)
  468. | (bindings . <bindings>)
  469. | (pre . <pre-handler>)
  470. | (macro . <macro-handler>)
  471. @end example
  472. @table @var
  473. @item pre-layout-handler
  474. A function of three arguments:
  475. @table @var
  476. @item kids
  477. the kids of the current node, before traversal
  478. @item params
  479. the params of the current node
  480. @item layout
  481. the layout coming into this node
  482. @end table
  483. @var{pre-layout-handler} is expected to use this information to return a
  484. layout to pass to the kids. The default implementation returns the
  485. layout given in the arguments.
  486. @item post-handler
  487. A function of five arguments:
  488. @table @var
  489. @item tag
  490. the current tag being processed
  491. @item params
  492. the params of the current node
  493. @item layout
  494. the layout coming into the current node, before any kids were processed
  495. @item klayout
  496. the layout after processing all of the children
  497. @item kids
  498. the already-processed child nodes
  499. @end table
  500. @var{post-handler} should return two values, the layout to pass to the
  501. next node and the final tree.
  502. @item text-handler
  503. @var{text-handler} is a function of three arguments:
  504. @table @var
  505. @item text
  506. the string
  507. @item params
  508. the current params
  509. @item layout
  510. the current layout
  511. @end table
  512. @var{text-handler} should return two values, the layout to pass to the
  513. next node and the value to which the string should transform.
  514. @end table
  515. @end deffn
  516. @node SXPath
  517. @subsection SXPath
  518. @subsubsection Overview
  519. @heading SXPath: SXML Query Language
  520. SXPath is a query language for SXML, an instance of XML Information set
  521. (Infoset) in the form of s-expressions. See @code{(sxml ssax)} for the
  522. definition of SXML and more details. SXPath is also a translation into
  523. Scheme of an XML Path Language, @uref{http://www.w3.org/TR/xpath,XPath}.
  524. XPath and SXPath describe means of selecting a set of Infoset's items or
  525. their properties.
  526. To facilitate queries, XPath maps the XML Infoset into an explicit tree,
  527. and introduces important notions of a location path and a current,
  528. context node. A location path denotes a selection of a set of nodes
  529. relative to a context node. Any XPath tree has a distinguished, root
  530. node -- which serves as the context node for absolute location paths.
  531. Location path is recursively defined as a location step joined with a
  532. location path. A location step is a simple query of the database
  533. relative to a context node. A step may include expressions that further
  534. filter the selected set. Each node in the resulting set is used as a
  535. context node for the adjoining location path. The result of the step is
  536. a union of the sets returned by the latter location paths.
  537. The SXML representation of the XML Infoset (see SSAX.scm) is rather
  538. suitable for querying as it is. Bowing to the XPath specification, we
  539. will refer to SXML information items as 'Nodes':
  540. @example
  541. <Node> ::= <Element> | <attributes-coll> | <attrib>
  542. | "text string" | <PI>
  543. @end example
  544. This production can also be described as
  545. @example
  546. <Node> ::= (name . <Nodeset>) | "text string"
  547. @end example
  548. An (ordered) set of nodes is just a list of the constituent nodes:
  549. @example
  550. <Nodeset> ::= (<Node> ...)
  551. @end example
  552. Nodesets, and Nodes other than text strings are both lists. A <Nodeset>
  553. however is either an empty list, or a list whose head is not a symbol. A
  554. symbol at the head of a node is either an XML name (in which case it's a
  555. tag of an XML element), or an administrative name such as '@@'. This
  556. uniform list representation makes processing rather simple and elegant,
  557. while avoiding confusion. The multi-branch tree structure formed by the
  558. mutually-recursive datatypes <Node> and <Nodeset> lends itself well to
  559. processing by functional languages.
  560. A location path is in fact a composite query over an XPath tree or its
  561. branch. A singe step is a combination of a projection, selection or a
  562. transitive closure. Multiple steps are combined via join and union
  563. operations. This insight allows us to @emph{elegantly} implement XPath
  564. as a sequence of projection and filtering primitives -- converters --
  565. joined by @dfn{combinators}. Each converter takes a node and returns a
  566. nodeset which is the result of the corresponding query relative to that
  567. node. A converter can also be called on a set of nodes. In that case it
  568. returns a union of the corresponding queries over each node in the set.
  569. The union is easily implemented as a list append operation as all nodes
  570. in a SXML tree are considered distinct, by XPath conventions. We also
  571. preserve the order of the members in the union. Query combinators are
  572. high-order functions: they take converter(s) (which is a Node|Nodeset ->
  573. Nodeset function) and compose or otherwise combine them. We will be
  574. concerned with only relative location paths [XPath]: an absolute
  575. location path is a relative path applied to the root node.
  576. Similarly to XPath, SXPath defines full and abbreviated notations for
  577. location paths. In both cases, the abbreviated notation can be
  578. mechanically expanded into the full form by simple rewriting rules. In
  579. case of SXPath the corresponding rules are given as comments to a sxpath
  580. function, below. The regression test suite at the end of this file shows
  581. a representative sample of SXPaths in both notations, juxtaposed with
  582. the corresponding XPath expressions. Most of the samples are borrowed
  583. literally from the XPath specification, while the others are adjusted
  584. for our running example, tree1.
  585. @subsubsection Usage
  586. @deffn {Scheme Procedure} nodeset? x
  587. @end deffn
  588. @deffn {Scheme Procedure} node-typeof? crit
  589. @end deffn
  590. @deffn {Scheme Procedure} node-eq? other
  591. @end deffn
  592. @deffn {Scheme Procedure} node-equal? other
  593. @end deffn
  594. @deffn {Scheme Procedure} node-pos n
  595. @end deffn
  596. @deffn {Scheme Procedure} filter pred?
  597. @verbatim
  598. -- Scheme Procedure: filter pred list
  599. Return all the elements of 2nd arg LIST that satisfy predicate
  600. PRED. The list is not disordered - elements that appear in the
  601. result list occur in the same order as they occur in the argument
  602. list. The returned list may share a common tail with the argument
  603. list. The dynamic order in which the various applications of pred
  604. are made is not specified.
  605. (filter even? '(0 7 8 8 43 -4)) => (0 8 8 -4)
  606. @end verbatim
  607. @end deffn
  608. @deffn {Scheme Procedure} take-until pred?
  609. @end deffn
  610. @deffn {Scheme Procedure} take-after pred?
  611. @end deffn
  612. @deffn {Scheme Procedure} map-union proc lst
  613. @end deffn
  614. @deffn {Scheme Procedure} node-reverse node-or-nodeset
  615. @end deffn
  616. @deffn {Scheme Procedure} node-trace title
  617. @end deffn
  618. @deffn {Scheme Procedure} select-kids test-pred?
  619. @end deffn
  620. @deffn {Scheme Procedure} node-self pred?
  621. @verbatim
  622. -- Scheme Procedure: filter pred list
  623. Return all the elements of 2nd arg LIST that satisfy predicate
  624. PRED. The list is not disordered - elements that appear in the
  625. result list occur in the same order as they occur in the argument
  626. list. The returned list may share a common tail with the argument
  627. list. The dynamic order in which the various applications of pred
  628. are made is not specified.
  629. (filter even? '(0 7 8 8 43 -4)) => (0 8 8 -4)
  630. @end verbatim
  631. @end deffn
  632. @deffn {Scheme Procedure} node-join . selectors
  633. @end deffn
  634. @deffn {Scheme Procedure} node-reduce . converters
  635. @end deffn
  636. @deffn {Scheme Procedure} node-or . converters
  637. @end deffn
  638. @deffn {Scheme Procedure} node-closure test-pred?
  639. @end deffn
  640. @deffn {Scheme Procedure} node-parent rootnode
  641. @end deffn
  642. @deffn {Scheme Procedure} sxpath path
  643. @end deffn
  644. @node sxml ssax input-parse
  645. @subsection (sxml ssax input-parse)
  646. @subsubsection Overview
  647. A simple lexer.
  648. The procedures in this module surprisingly often suffice to parse an
  649. input stream. They either skip, or build and return tokens, according to
  650. inclusion or delimiting semantics. The list of characters to expect,
  651. include, or to break at may vary from one invocation of a function to
  652. another. This allows the functions to easily parse even
  653. context-sensitive languages.
  654. EOF is generally frowned on, and thrown up upon if encountered.
  655. Exceptions are mentioned specifically. The list of expected characters
  656. (characters to skip until, or break-characters) may include an EOF
  657. "character", which is to be coded as the symbol, @code{*eof*}.
  658. The input stream to parse is specified as a @dfn{port}, which is usually
  659. the last (and optional) argument. It defaults to the current input port
  660. if omitted.
  661. If the parser encounters an error, it will throw an exception to the key
  662. @code{parser-error}. The arguments will be of the form @code{(@var{port}
  663. @var{message} @var{specialising-msg}*)}.
  664. The first argument is a port, which typically points to the offending
  665. character or its neighborhood. You can then use @code{port-column} and
  666. @code{port-line} to query the current position. @var{message} is the
  667. description of the error. Other arguments supply more details about the
  668. problem.
  669. @subsubsection Usage
  670. @deffn {Scheme Procedure} peek-next-char [port]
  671. @end deffn
  672. @deffn {Scheme Procedure} assert-curr-char expected-chars comment [port]
  673. @end deffn
  674. @deffn {Scheme Procedure} skip-until arg [port]
  675. @end deffn
  676. @deffn {Scheme Procedure} skip-while skip-chars [port]
  677. @end deffn
  678. @deffn {Scheme Procedure} next-token prefix-skipped-chars break-chars [comment] [port]
  679. @end deffn
  680. @deffn {Scheme Procedure} next-token-of incl-list/pred [port]
  681. @end deffn
  682. @deffn {Scheme Procedure} read-text-line [port]
  683. @end deffn
  684. @deffn {Scheme Procedure} read-string n [port]
  685. @end deffn
  686. @deffn {Scheme Procedure} find-string-from-port? _ _ . _
  687. Looks for @var{str} in @var{<input-port>}, optionally within the first
  688. @var{max-no-char} characters.
  689. @end deffn
  690. @node sxml apply-templates
  691. @subsection (sxml apply-templates)
  692. @subsubsection Overview
  693. Pre-order traversal of a tree and creation of a new tree:
  694. @smallexample
  695. apply-templates:: tree x <templates> -> <new-tree>
  696. @end smallexample
  697. where
  698. @smallexample
  699. <templates> ::= (<template> ...)
  700. <template> ::= (<node-test> <node-test> ... <node-test> . <handler>)
  701. <node-test> ::= an argument to node-typeof? above
  702. <handler> ::= <tree> -> <new-tree>
  703. @end smallexample
  704. This procedure does a @emph{normal}, pre-order traversal of an SXML
  705. tree. It walks the tree, checking at each node against the list of
  706. matching templates.
  707. If the match is found (which must be unique, i.e., unambiguous), the
  708. corresponding handler is invoked and given the current node as an
  709. argument. The result from the handler, which must be a @code{<tree>},
  710. takes place of the current node in the resulting tree. The name of the
  711. function is not accidental: it resembles rather closely an
  712. @code{apply-templates} function of XSLT.
  713. @subsubsection Usage
  714. @deffn {Scheme Procedure} apply-templates tree templates
  715. @end deffn