README.syntax 9.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210
  1. 1. OVERVIEW
  2. This README file describes the syntax of the arguments that may be passed to
  3. the FTS3 MATCH operator used for full-text queries. For example, if table
  4. "t1" is an Fts3 virtual table, the following SQL query:
  5. SELECT * FROM t1 WHERE <col> MATCH <full-text query>
  6. may be used to retrieve all rows that match a specified for full-text query.
  7. The text "<col>" should be replaced by either the name of the fts3 table
  8. (in this case "t1"), or by the name of one of the columns of the fts3
  9. table. <full-text-query> should be replaced by an SQL expression that
  10. computes to a string containing an Fts3 query.
  11. If the left-hand-side of the MATCH operator is set to the name of the
  12. fts3 table, then by default the query may be matched against any column
  13. of the table. If it is set to a column name, then by default the query
  14. may only match the specified column. In both cases this may be overriden
  15. as part of the query text (see sections 2 and 3 below).
  16. As of SQLite version 3.6.8, Fts3 supports two slightly different query
  17. formats; the standard syntax, which is used by default, and the enhanced
  18. query syntax which can be selected by compiling with the pre-processor
  19. symbol SQLITE_ENABLE_FTS3_PARENTHESIS defined.
  20. -DSQLITE_ENABLE_FTS3_PARENTHESIS
  21. 2. STANDARD QUERY SYNTAX
  22. When using the standard Fts3 query syntax, a query usually consists of a
  23. list of terms (words) separated by white-space characters. To match a
  24. query, a row (or column) of an Fts3 table must contain each of the specified
  25. terms. For example, the following query:
  26. <col> MATCH 'hello world'
  27. matches rows (or columns, if <col> is the name of a column name) that
  28. contain at least one instance of the token "hello", and at least one
  29. instance of the token "world". Tokens may be grouped into phrases using
  30. quotation marks. In this case, a matching row or column must contain each
  31. of the tokens in the phrase in the order specified, with no intervening
  32. tokens. For example, the query:
  33. <col> MATCH '"hello world" joe"
  34. matches the first of the following two documents, but not the second or
  35. third:
  36. "'Hello world', said Joe."
  37. "One should always greet the world with a cheery hello, thought Joe."
  38. "How many hello world programs could their be?"
  39. As well as grouping tokens together by phrase, the binary NEAR operator
  40. may be used to search for rows that contain two or more specified tokens
  41. or phrases within a specified proximity of each other. The NEAR operator
  42. must always be specified in upper case. The word "near" in lower or mixed
  43. case is treated as an ordinary token. For example, the following query:
  44. <col> MATCH 'engineering NEAR consultancy'
  45. matches rows that contain both the "engineering" and "consultancy" tokens
  46. in the same column with not more than 10 other words between them. It does
  47. not matter which of the two terms occurs first in the document, only that
  48. they be seperated by only 10 tokens or less. The user may also specify
  49. a different required proximity by adding "/N" immediately after the NEAR
  50. operator, where N is an integer. For example:
  51. <col> MATCH 'engineering NEAR/5 consultancy'
  52. searches for a row containing an instance of each specified token seperated
  53. by not more than 5 other tokens. More than one NEAR operator can be used
  54. in as sequence. For example this query:
  55. <col> MATCH 'reliable NEAR/2 engineering NEAR/5 consultancy'
  56. searches for a row that contains an instance of the token "reliable"
  57. seperated by not more than two tokens from an instance of "engineering",
  58. which is in turn separated by not more than 5 other tokens from an
  59. instance of the term "consultancy". Phrases enclosed in quotes may
  60. also be used as arguments to the NEAR operator.
  61. Similar to the NEAR operator, one or more tokens or phrases may be
  62. separated by OR operators. In this case, only one of the specified tokens
  63. or phrases must appear in the document. For example, the query:
  64. <col> MATCH 'hello OR world'
  65. matches rows that contain either the term "hello", or the term "world",
  66. or both. Note that unlike in many programming languages, the OR operator
  67. has a higher precedence than the AND operators implied between white-space
  68. separated tokens. The following query matches documents that contain the
  69. term 'sqlite' and at least one of the terms 'fantastic' or 'impressive',
  70. not those that contain both 'sqlite' and 'fantastic' or 'impressive':
  71. <col> MATCH 'sqlite fantastic OR impressive'
  72. Any token that is part of an Fts3 query expression, whether or not it is
  73. part of a phrase enclosed in quotes, may have a '*' character appended to
  74. it. In this case, the token matches all terms that begin with the characters
  75. of the token, not just those that exactly match it. For example, the
  76. following query:
  77. <col> MATCH 'sql*'
  78. matches all rows that contain the term "SQLite", as well as those that
  79. contain "SQL".
  80. A token that is not part of a quoted phrase may be preceded by a '-'
  81. character, which indicates that matching rows must not contain the
  82. specified term. For example, the following:
  83. <col> MATCH '"database engine" -sqlite'
  84. matches rows that contain the phrase "database engine" but do not contain
  85. the term "sqlite". If the '-' character occurs inside a quoted phrase,
  86. it is ignored. It is possible to use both the '-' prefix and the '*' postfix
  87. on a single term. At this time, all Fts3 queries must contain at least
  88. one term or phrase that is not preceded by the '-' prefix.
  89. Regardless of whether or not a table name or column name is used on the
  90. left hand side of the MATCH operator, a specific column of the fts3 table
  91. may be associated with each token in a query by preceding a token with
  92. a column name followed by a ':' character. For example, regardless of what
  93. is specified for <col>, the following query requires that column "col1"
  94. of the table contains the term "hello", and that column "col2" of the
  95. table contains the term "world". If the table does not contain columns
  96. named "col1" and "col2", then an error is returned and the query is
  97. not run.
  98. <col> MATCH 'col1:hello col2:world'
  99. It is not possible to associate a specific table column with a quoted
  100. phrase or a term preceded by a '-' operator. A '*' character may be
  101. appended to a term associated with a specific column for prefix matching.
  102. 3. ENHANCED QUERY SYNTAX
  103. The enhanced query syntax is quite similar to the standard query syntax,
  104. with the following four differences:
  105. 1) Parenthesis are supported. When using the enhanced query syntax,
  106. parenthesis may be used to overcome the built-in precedence of the
  107. supplied binary operators. For example, the following query:
  108. <col> MATCH '(hello world) OR (simple example)'
  109. matches documents that contain both "hello" and "world", and documents
  110. that contain both "simple" and "example". It is not possible to forumlate
  111. such a query using the standard syntax.
  112. 2) Instead of separating tokens and phrases by whitespace, an AND operator
  113. may be explicitly specified. This does not change query processing at
  114. all, but may be used to improve readability. For example, the following
  115. query is handled identically to the one above:
  116. <col> MATCH '(hello AND world) OR (simple AND example)'
  117. As with the OR and NEAR operators, the AND operator must be specified
  118. in upper case. The word "and" specified in lower or mixed case is
  119. handled as a regular token.
  120. 3) The '-' token prefix is not supported. Instead, a new binary operator,
  121. NOT, is included. The NOT operator requires that the query specified
  122. as its left-hand operator matches, but that the query specified as the
  123. right-hand operator does not. For example, to query for all rows that
  124. contain the term "example" but not the term "simple", the following
  125. query could be used:
  126. <col> MATCH 'example NOT simple'
  127. As for all other operators, the NOT operator must be specified in
  128. upper case. Otherwise it will be treated as a regular token.
  129. 4) Unlike in the standard syntax, where the OR operator has a higher
  130. precedence than the implicit AND operator, when using the enhanced
  131. syntax implicit and explict AND operators have a higher precedence
  132. than OR operators. Using the enhanced syntax, the following two
  133. queries are equivalent:
  134. <col> MATCH 'sqlite fantastic OR impressive'
  135. <col> MATCH '(sqlite AND fantastic) OR impressive'
  136. however, when using the standard syntax, the query:
  137. <col> MATCH 'sqlite fantastic OR impressive'
  138. is equivalent to the enhanced syntax query:
  139. <col> MATCH 'sqlite AND (fantastic OR impressive)'
  140. The precedence of all enhanced syntax operators, in order from highest
  141. to lowest, is:
  142. NEAR (highest precedence, tightest grouping)
  143. NOT
  144. AND
  145. OR (lowest precedence, loosest grouping)
  146. Using the advanced syntax, it is possible to specify expressions enclosed
  147. in parenthesis as operands to the NOT, AND and OR operators. However both
  148. the left and right hand side operands of NEAR operators must be either
  149. tokens or phrases. Attempting the following query will return an error:
  150. <col> MATCH 'sqlite NEAR (fantastic OR impressive)'
  151. Queries of this form must be re-written as:
  152. <col> MATCH 'sqlite NEAR fantastic OR sqlite NEAR impressive'