package.html 8.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176
  1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  2. <html>
  3. <head>
  4. <title>The gnu.lists package</title>
  5. </head>
  6. <body>
  7. <p>Contains utility classes and interfaces for sequences (lists), arrays, and trees.</p>
  8. <p>Features:
  9. <ul>
  10. <li>
  11. The classes and interfaces are compatible with the Java2 Collections
  12. framework, but do not require them. A configuration option allows you
  13. to specify whether Sequence extends java.util.List or not.</li>
  14. <li>
  15. Positions in a Sequence and iterators are the same kind of object,
  16. and are represented using magic cookies. In most collections frameworks,
  17. when you define a new class that implements a sequence, you would
  18. typically define a new iterator class as well. You never do that
  19. with <code>gnu.lists</code>; instead you implement suitable methods
  20. in the sequence class itself. This approach may seem strange, but it has
  21. important performance advantages, especially in applications
  22. where you want to go up and down nested sequences (i.e. in trees, such
  23. as XML elements), or remember sets of positions (such as XML node-sets).</li>
  24. <li>
  25. The classes are designed for efficiency, especially concentrating on
  26. minimizing object allocation.</li>
  27. <li>
  28. The framework supports general multi-dimensional arrays.</li>
  29. <li>
  30. The package is self-contained; it does not depend on classes or
  31. features not in JDK 1.1.</li>
  32. <li>
  33. Various useful sequence implementations, including Lisp-style linked
  34. lists; arrays with various element types; adjustable arrays that use a
  35. "buffer-gap" to make insertions and deletions more efficient; and
  36. adjustable arrays that automatically update positions on insertions
  37. and deletions.</li>
  38. <li> The <a href="TreeList.html"><code>TreeList</code></a> class can compactly represent
  39. a nested heterogenous tree. One intended usage is as a very efficient
  40. representation for XML data.</li>
  41. <li>
  42. The <a href="Consumer.html"><code>Consumer</code></a> interface is an abstraction of "data sinks" -
  43. objects that can receive data and do something with them. It is
  44. similar to SAX's DocumentHandler but at a more abstract (and sometimes
  45. more efficient) level.
  46. </ul>
  47. <h2 id="iteration">The iteration and position framework</h2>
  48. <p>
  49. A <i>position</i> points <em>between</em> two elements in its "containing"
  50. sequence, or it points before the first element (if any), or after the last.
  51. Sometimes, we loosely say that the position refers to or points to
  52. an element of the sequence; in that case we mean the position is
  53. immediately before the element.
  54. <p>
  55. An <i>iterator</i> is an object that has a current position,
  56. but that can be moved so it gets a new position.
  57. What needs to be emphasized is that <em>all</em>
  58. <a href="Sequence.html"><code>Sequence</code></a> implementations.
  59. use the <em>same</em> <a href="SeqPosition.html"><code>SeqPosition</code></a>
  60. class to implement positions and iteration.
  61. In other "collections frameworks" each sequence class has its corresponding
  62. iterator class, but in this framework you instead add the methods
  63. that handle iteration to the sequence class itself, extending
  64. the <a href="AbstractSequence.html"><code>AbstractSequence</code></a> class.
  65. A consequence of this is that if you have an unused
  66. <a href="SeqPosition.html"><code>SeqPosition</code></a> object, you can
  67. use it to iterate over <em>any</em> <code>Sequence</code> you choose.
  68. This potentially saves memory allocation, which gains you the most
  69. when iterating over a nested sequence structure, like a tree of XML data.
  70. <p>
  71. We do this by requiring that any position be representable using a pair
  72. of an <code>int</code> and an <code>Object</code>.
  73. In other words, the state of an iterator is
  74. restricted to be an (<code>int</code>, <code>Object</code>) pair.
  75. Of course that is all you <em>could</em> need, since if you need more
  76. state you can always create a helper object,
  77. and store the extra state there. The assumption we make though is that for
  78. most <code>Sequence</code>s, an (<code>int</code>, <code>Object</code>)
  79. would be enough, without creating any new objects (beyond,
  80. sometimes, the <code>SeqPosition</code> itself).
  81. <p>
  82. The <code>int</code> part of a position-state is normally
  83. named <code>ipos</code>, and the <code>Object</code> part of a
  84. position-state is named <code>xpos</code>.
  85. Neither of these values have any meaning in themselves, they only have
  86. meaning as interpreted by the specific <code>Sequence</code>.
  87. For example, <code>ipos</code> might be the offset of the position in
  88. the sequence, but it might also some "magic cookie".
  89. If a <code>Sequence</code> supports insertions and deletions,
  90. and you want positions to be automatically adjusted, then a position
  91. <em>has</em> to be a magic cookie rather than an offset.
  92. (A typical implementation is for such a position to be an index
  93. into a table of positions managed by the <code>Sequence</code> itself.)
  94. So a complete position is actually a (<code>int</code>, <code>Object</code>,
  95. <code>AbstractSequence</code>) triple, where the <code>AbstractSequence</code>
  96. component is the object that interprets the (<code>int</code>,
  97. <code>Object</code>) pair.
  98. Normally, the <code>AbstractSequence</code> part of a position triple is the
  99. <code>Sequence</code> "containing" the position, but it can be any
  100. <code>AbstractSequence</code>
  101. that implements the various methods that provide the semantics
  102. for the (<code>int</code>, <code>Object</code>) pair.
  103. <p>
  104. The <a href="PositionContainer.html"><code>PositionContainer</code></a>
  105. interface is a more abstract version of <code>SeqPosition</code>, in
  106. that it can represents more than one position.
  107. For example, instead of an array of separately allocated
  108. <code>SeqPosition</code> objects, you might use some data structure
  109. that implements <code>PositionContainer</code>,
  110. which is abstractly a list of (<code>int</code>, <code>Object</code>,
  111. <code>Sequence</code>)-triples.
  112. <h2>The consumer protocol</h2>
  113. <p>
  114. You typically use an iteration framework it process the elements
  115. of a sequence in such a way that the data consumer is active
  116. and in control, while the sequence itself (the data producer) is a
  117. passive object. The <a href="Consumer.html"><code>Consumer</code></a>
  118. works the other way round: The data producer is active and in control,
  119. while the data consumer is passive, consuming elements when requested
  120. by the data producer. The <code>Consumer</code> is a more abstract
  121. variant of <a href="http://www.megginson.com/SAX/index.html">SAX</a>'s
  122. <code>ContentHandler</code> interface for processing "events";
  123. however <code>Consumer</code> is intended for general value sequences,
  124. and is less tied to XML.</p>
  125. <h2>Iteration</h2>
  126. <pre>
  127. int iter = 0; // standard start position
  128. for (;;)
  129. {
  130. iter = list.nextPos(iter);
  131. if (iter == 0)
  132. break;
  133. Object value = list.getPosPrevious(iter);
  134. ... use value ...;
  135. }
  136. </pre>
  137. <h2>Position values for buffer-based sequences</h2>
  138. <p>
  139. The position encoding for sequences implemented using an array is simple.
  140. Assume the next index (as returned by <code>nextIndex</code>) is <var>i</var>.
  141. If the position is a before/backwards position, then position value
  142. is <code>(</code><var>i</var><code>&lt;&lt;1)</code>.
  143. If the position is an after/forwards position, then position value
  144. is <code>((</code><var>i</var><code>&lt;&lt;1))|1</code>.</p>
  145. <p>
  146. But what each each item in the buffer has variable size?
  147. One example is a UTF-8 string. Another example is a buffer of text
  148. where each "item" is a line.
  149. Then we have the choice: Should the position value encode the
  150. index (making <code>nextIndex</code> and <code>get</code> cheap), or
  151. should it encode the offset into the buffer (making sequential access cheap)?
  152. Since sequential access is preferred, we do the latter.
  153. Thus for a before/backwards position, if the buffer offset for
  154. item <var>i</var> is <var>p<sub>i</sub></var>, then the position value
  155. is <code>(</code><var>p<sub>i</sub></var><code> &lt;&lt; 1)</code>.
  156. However, there is a complication when moving forwards using
  157. a <code>ListIterator</code> since using <code>set</code> or <code>remove</code>
  158. affects the <em>previous</em> item. It may be much more expensive to
  159. calculate the buffer position of the previous item than the next item.
  160. (Given <var>p<sub>i</sub></var> it may be cheap to
  161. calculate <var>p<sub>i+1</sub></var> but expensive to
  162. calculate <var>p<sub>i-1</sub></var>.)
  163. Therefore, we suggest using
  164. <code>(((</code><var>p<sub>i-1</sub></var>+1)<code>&lt;&lt;1))|1</code>,
  165. where <var>p<sub>-1</sub></var> is defined as -1.
  166. The addition of 1 allows us to handle the case of <var>i=0</var> without
  167. conflicting with the use of -1 to mean end-of-sequence.
  168. Plus it makes the previous case of a simple array a special case.
  169. </body>
  170. </html>