strscans.nim 26 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736
  1. #
  2. #
  3. # Nim's Runtime Library
  4. # (c) Copyright 2016 Andreas Rumpf
  5. #
  6. # See the file "copying.txt", included in this
  7. # distribution, for details about the copyright.
  8. #
  9. ##[
  10. This module contains a `scanf`:idx: macro that can be used for extracting
  11. substrings from an input string. This is often easier than regular expressions.
  12. Some examples as an appetizer:
  13. .. code-block:: nim
  14. # check if input string matches a triple of integers:
  15. const input = "(1,2,4)"
  16. var x, y, z: int
  17. if scanf(input, "($i,$i,$i)", x, y, z):
  18. echo "matches and x is ", x, " y is ", y, " z is ", z
  19. # check if input string matches an ISO date followed by an identifier followed
  20. # by whitespace and a floating point number:
  21. var year, month, day: int
  22. var identifier: string
  23. var myfloat: float
  24. if scanf(input, "$i-$i-$i $w$s$f", year, month, day, identifier, myfloat):
  25. echo "yes, we have a match!"
  26. As can be seen from the examples, strings are matched verbatim except for
  27. substrings starting with ``$``. These constructions are available:
  28. ================= ========================================================
  29. ``$b`` Matches a binary integer. This uses ``parseutils.parseBin``.
  30. ``$o`` Matches an octal integer. This uses ``parseutils.parseOct``.
  31. ``$i`` Matches a decimal integer. This uses ``parseutils.parseInt``.
  32. ``$h`` Matches a hex integer. This uses ``parseutils.parseHex``.
  33. ``$f`` Matches a floating pointer number. Uses ``parseFloat``.
  34. ``$w`` Matches an ASCII identifier: ``[A-Z-a-z_][A-Za-z_0-9]*``.
  35. ``$s`` Skips optional whitespace.
  36. ``$$`` Matches a single dollar sign.
  37. ``$.`` Matches if the end of the input string has been reached.
  38. ``$*`` Matches until the token following the ``$*`` was found.
  39. The match is allowed to be of 0 length.
  40. ``$+`` Matches until the token following the ``$+`` was found.
  41. The match must consist of at least one char.
  42. ``${foo}`` User defined matcher. Uses the proc ``foo`` to perform
  43. the match. See below for more details.
  44. ``$[foo]`` Call user defined proc ``foo`` to **skip** some optional
  45. parts in the input string. See below for more details.
  46. ================= ========================================================
  47. Even though ``$*`` and ``$+`` look similar to the regular expressions ``.*``
  48. and ``.+`` they work quite differently, there is no non-deterministic
  49. state machine involved and the matches are non-greedy. ``[$*]``
  50. matches ``[xyz]`` via ``parseutils.parseUntil``.
  51. Furthermore no backtracking is performed, if parsing fails after a value
  52. has already been bound to a matched subexpression this value is not restored
  53. to its original value. This rarely causes problems in practice and if it does
  54. for you, it's easy enough to bind to a temporary variable first.
  55. Startswith vs full match
  56. ========================
  57. ``scanf`` returns true if the input string **starts with** the specified
  58. pattern. If instead it should only return true if there is also nothing
  59. left in the input, append ``$.`` to your pattern.
  60. User definable matchers
  61. =======================
  62. One very nice advantage over regular expressions is that ``scanf`` is
  63. extensible with ordinary Nim procs. The proc is either enclosed in ``${}``
  64. or in ``$[]``. ``${}`` matches and binds the result
  65. to a variable (that was passed to the ``scanf`` macro) while ``$[]`` merely
  66. optional tokens.
  67. In this example, we define a helper proc ``someSep`` that skips some separators
  68. which we then use in our scanf pattern to help us in the matching process:
  69. .. code-block:: nim
  70. proc someSep(input: string; start: int; seps: set[char] = {':','-','.'}): int =
  71. # Note: The parameters and return value must match to what ``scanf`` requires
  72. result = 0
  73. while start+result < input.len and input[start+result] in seps: inc result
  74. if scanf(input, "$w$[someSep]$w", key, value):
  75. ...
  76. It also possible to pass arguments to a user definable matcher:
  77. .. code-block:: nim
  78. proc ndigits(input: string; intVal: var int; start: int; n: int): int =
  79. # matches exactly ``n`` digits. Matchers need to return 0 if nothing
  80. # matched or otherwise the number of processed chars.
  81. var x = 0
  82. var i = 0
  83. while i < n and i+start < input.len and input[i+start] in {'0'..'9'}:
  84. x = x * 10 + input[i+start].ord - '0'.ord
  85. inc i
  86. # only overwrite if we had a match
  87. if i == n:
  88. result = n
  89. intVal = x
  90. # match an ISO date extracting year, month, day at the same time.
  91. # Also ensure the input ends after the ISO date:
  92. var year, month, day: int
  93. if scanf("2013-01-03", "${ndigits(4)}-${ndigits(2)}-${ndigits(2)}$.", year, month, day):
  94. ...
  95. The scanp macro
  96. ===============
  97. This module also implements a ``scanp`` macro, which syntax somewhat resembles
  98. an EBNF or PEG grammar, except that it uses Nim's expression syntax and so has
  99. to use prefix instead of postfix operators.
  100. ============== ===============================================================
  101. ``(E)`` Grouping
  102. ``*E`` Zero or more
  103. ``+E`` One or more
  104. ``?E`` Zero or One
  105. ``E{n,m}`` From ``n`` up to ``m`` times ``E``
  106. ``~E`` Not predicate
  107. ``a ^* b`` Shortcut for ``?(a *(b a))``. Usually used for separators.
  108. ``a ^* b`` Shortcut for ``?(a +(b a))``. Usually used for separators.
  109. ``'a'`` Matches a single character
  110. ``{'a'..'b'}`` Matches a character set
  111. ``"s"`` Matches a string
  112. ``E -> a`` Bind matching to some action
  113. ``$_`` Access the currently matched character
  114. ============== ===============================================================
  115. Note that unordered or ordered choice operators (``/``, ``|``) are
  116. not implemented.
  117. Simple example that parses the ``/etc/passwd`` file line by line:
  118. .. code-block:: nim
  119. const
  120. etc_passwd = """root:x:0:0:root:/root:/bin/bash
  121. daemon:x:1:1:daemon:/usr/sbin:/bin/sh
  122. bin:x:2:2:bin:/bin:/bin/sh
  123. sys:x:3:3:sys:/dev:/bin/sh
  124. nobody:x:65534:65534:nobody:/nonexistent:/bin/sh
  125. messagebus:x:103:107::/var/run/dbus:/bin/false
  126. """
  127. proc parsePasswd(content: string): seq[string] =
  128. result = @[]
  129. var idx = 0
  130. while true:
  131. var entry = ""
  132. if scanp(content, idx, +(~{'\L', '\0'} -> entry.add($_)), '\L'):
  133. result.add entry
  134. else:
  135. break
  136. The ``scanp`` maps the grammar code into Nim code that performs the parsing.
  137. The parsing is performed with the help of 3 helper templates that that can be
  138. implemented for a custom type.
  139. These templates need to be named ``atom`` and ``nxt``. ``atom`` should be
  140. overloaded to handle both single characters and sets of character.
  141. .. code-block:: nim
  142. import streams
  143. template atom(input: Stream; idx: int; c: char): bool =
  144. ## Used in scanp for the matching of atoms (usually chars).
  145. peekChar(input) == c
  146. template atom(input: Stream; idx: int; s: set[char]): bool =
  147. peekChar(input) in s
  148. template nxt(input: Stream; idx, step: int = 1) =
  149. inc(idx, step)
  150. setPosition(input, idx)
  151. if scanp(content, idx, +( ~{'\L', '\0'} -> entry.add(peekChar($input))), '\L'):
  152. result.add entry
  153. Calling ordinary Nim procs inside the macro is possible:
  154. .. code-block:: nim
  155. proc digits(s: string; intVal: var int; start: int): int =
  156. var x = 0
  157. while result+start < s.len and s[result+start] in {'0'..'9'} and s[result+start] != ':':
  158. x = x * 10 + s[result+start].ord - '0'.ord
  159. inc result
  160. intVal = x
  161. proc extractUsers(content: string): seq[string] =
  162. # Extracts the username and home directory
  163. # of each entry (with UID greater than 1000)
  164. const
  165. digits = {'0'..'9'}
  166. result = @[]
  167. var idx = 0
  168. while true:
  169. var login = ""
  170. var uid = 0
  171. var homedir = ""
  172. if scanp(content, idx, *(~ {':', '\0'}) -> login.add($_), ':', * ~ ':', ':',
  173. digits($input, uid, $index), ':', *`digits`, ':', * ~ ':', ':',
  174. *('/', * ~{':', '/'}) -> homedir.add($_), ':', *('/', * ~{'\L', '/'}), '\L'):
  175. if uid >= 1000:
  176. result.add login & " " & homedir
  177. else:
  178. break
  179. When used for matching, keep in mind that likewise scanf, no backtracking
  180. is performed.
  181. .. code-block:: nim
  182. proc skipUntil(s: string; until: string; unless = '\0'; start: int): int =
  183. # Skips all characters until the string `until` is found. Returns 0
  184. # if the char `unless` is found first or the end is reached.
  185. var i = start
  186. var u = 0
  187. while true:
  188. if i >= s.len or s[i] == unless:
  189. return 0
  190. elif s[i] == until[0]:
  191. u = 1
  192. while i+u < s.len and u < until.len and s[i+u] == until[u]:
  193. inc u
  194. if u >= until.len: break
  195. inc(i)
  196. result = i+u-start
  197. iterator collectLinks(s: string): string =
  198. const quote = {'\'', '"'}
  199. var idx, old = 0
  200. var res = ""
  201. while idx < s.len:
  202. old = idx
  203. if scanp(s, idx, "<a", skipUntil($input, "href=", '>', $index),
  204. `quote`, *( ~`quote`) -> res.add($_)):
  205. yield res
  206. res = ""
  207. idx = old + 1
  208. for r in collectLinks(body):
  209. echo r
  210. In this example both macros are combined seamlessly in order to maximise
  211. efficiency and perform different checks.
  212. .. code-block:: nim
  213. iterator parseIps*(soup: string): string =
  214. ## ipv4 only!
  215. const digits = {'0'..'9'}
  216. var a, b, c, d: int
  217. var buf = ""
  218. var idx = 0
  219. while idx < soup.len:
  220. if scanp(soup, idx, (`digits`{1,3}, '.', `digits`{1,3}, '.',
  221. `digits`{1,3}, '.', `digits`{1,3}) -> buf.add($_)):
  222. discard buf.scanf("$i.$i.$i.$i", a, b, c, d)
  223. if (a >= 0 and a <= 254) and
  224. (b >= 0 and b <= 254) and
  225. (c >= 0 and c <= 254) and
  226. (d >= 0 and d <= 254):
  227. yield buf
  228. buf.setLen(0) # need to clear `buf` each time, cause it might contain garbage
  229. idx.inc
  230. ]##
  231. import macros, parseutils
  232. proc conditionsToIfChain(n, idx, res: NimNode; start: int): NimNode =
  233. assert n.kind == nnkStmtList
  234. if start >= n.len: return newAssignment(res, newLit true)
  235. var ifs: NimNode = nil
  236. if n[start+1].kind == nnkEmpty:
  237. ifs = conditionsToIfChain(n, idx, res, start+3)
  238. else:
  239. ifs = newIfStmt((n[start+1],
  240. newTree(nnkStmtList, newCall(bindSym"inc", idx, n[start+2]),
  241. conditionsToIfChain(n, idx, res, start+3))))
  242. result = newTree(nnkStmtList, n[start], ifs)
  243. proc notZero(x: NimNode): NimNode = newCall(bindSym"!=", x, newLit 0)
  244. proc buildUserCall(x: string; args: varargs[NimNode]): NimNode =
  245. let y = parseExpr(x)
  246. result = newTree(nnkCall)
  247. if y.kind in nnkCallKinds: result.add y[0]
  248. else: result.add y
  249. for a in args: result.add a
  250. if y.kind in nnkCallKinds:
  251. for i in 1..<y.len: result.add y[i]
  252. macro scanf*(input: string; pattern: static[string]; results: varargs[typed]): bool =
  253. ## See top level documentation of this module about how ``scanf`` works.
  254. template matchBind(parser) {.dirty.} =
  255. var resLen = genSym(nskLet, "resLen")
  256. conds.add newLetStmt(resLen, newCall(bindSym(parser), inp, results[i], idx))
  257. conds.add resLen.notZero
  258. conds.add resLen
  259. template at(s: string; i: int): char = (if i < s.len: s[i] else: '\0')
  260. template matchError() =
  261. error("type mismatch between pattern '$" & pattern[p] & "' (position: " & $p &
  262. ") and " & $getTypeInst(results[i]) & " var '" & repr(results[i]) & "'")
  263. var i = 0
  264. var p = 0
  265. var idx = genSym(nskVar, "idx")
  266. var res = genSym(nskVar, "res")
  267. let inp = genSym(nskLet, "inp")
  268. result = newTree(nnkStmtListExpr, newLetStmt(inp, input), newVarStmt(idx, newLit 0), newVarStmt(res, newLit false))
  269. var conds = newTree(nnkStmtList)
  270. var fullMatch = false
  271. while p < pattern.len:
  272. if pattern[p] == '$':
  273. inc p
  274. case pattern[p]
  275. of '$':
  276. var resLen = genSym(nskLet, "resLen")
  277. conds.add newLetStmt(resLen, newCall(bindSym"skip", inp, newLit($pattern[p]), idx))
  278. conds.add resLen.notZero
  279. conds.add resLen
  280. of 'w':
  281. if i < results.len and getType(results[i]).typeKind == ntyString:
  282. matchBind "parseIdent"
  283. else:
  284. matchError
  285. inc i
  286. of 'b':
  287. if i < results.len and getType(results[i]).typeKind == ntyInt:
  288. matchBind "parseBin"
  289. else:
  290. matchError
  291. inc i
  292. of 'o':
  293. if i < results.len and getType(results[i]).typeKind == ntyInt:
  294. matchBind "parseOct"
  295. else:
  296. matchError
  297. inc i
  298. of 'i':
  299. if i < results.len and getType(results[i]).typeKind == ntyInt:
  300. matchBind "parseInt"
  301. else:
  302. matchError
  303. inc i
  304. of 'h':
  305. if i < results.len and getType(results[i]).typeKind == ntyInt:
  306. matchBind "parseHex"
  307. else:
  308. matchError
  309. inc i
  310. of 'f':
  311. if i < results.len and getType(results[i]).typeKind == ntyFloat:
  312. matchBind "parseFloat"
  313. else:
  314. matchError
  315. inc i
  316. of 's':
  317. conds.add newCall(bindSym"inc", idx, newCall(bindSym"skipWhitespace", inp, idx))
  318. conds.add newEmptyNode()
  319. conds.add newEmptyNode()
  320. of '.':
  321. if p == pattern.len-1:
  322. fullMatch = true
  323. else:
  324. error("invalid format string")
  325. of '*', '+':
  326. if i < results.len and getType(results[i]).typeKind == ntyString:
  327. var min = ord(pattern[p] == '+')
  328. var q=p+1
  329. var token = ""
  330. while q < pattern.len and pattern[q] != '$':
  331. token.add pattern[q]
  332. inc q
  333. var resLen = genSym(nskLet, "resLen")
  334. conds.add newLetStmt(resLen, newCall(bindSym"parseUntil", inp, results[i], newLit(token), idx))
  335. conds.add newCall(bindSym"!=", resLen, newLit min)
  336. conds.add resLen
  337. else:
  338. matchError
  339. inc i
  340. of '{':
  341. inc p
  342. var nesting = 0
  343. let start = p
  344. while true:
  345. case pattern.at(p)
  346. of '{': inc nesting
  347. of '}':
  348. if nesting == 0: break
  349. dec nesting
  350. of '\0': error("expected closing '}'")
  351. else: discard
  352. inc p
  353. let expr = pattern.substr(start, p-1)
  354. if i < results.len:
  355. var resLen = genSym(nskLet, "resLen")
  356. conds.add newLetStmt(resLen, buildUserCall(expr, inp, results[i], idx))
  357. conds.add newCall(bindSym"!=", resLen, newLit 0)
  358. conds.add resLen
  359. else:
  360. error("no var given for $" & expr & " (position: " & $p & ")")
  361. inc i
  362. of '[':
  363. inc p
  364. var nesting = 0
  365. let start = p
  366. while true:
  367. case pattern.at(p)
  368. of '[': inc nesting
  369. of ']':
  370. if nesting == 0: break
  371. dec nesting
  372. of '\0': error("expected closing ']'")
  373. else: discard
  374. inc p
  375. let expr = pattern.substr(start, p-1)
  376. conds.add newCall(bindSym"inc", idx, buildUserCall(expr, inp, idx))
  377. conds.add newEmptyNode()
  378. conds.add newEmptyNode()
  379. else: error("invalid format string")
  380. inc p
  381. else:
  382. var token = ""
  383. while p < pattern.len and pattern[p] != '$':
  384. token.add pattern[p]
  385. inc p
  386. var resLen = genSym(nskLet, "resLen")
  387. conds.add newLetStmt(resLen, newCall(bindSym"skip", inp, newLit(token), idx))
  388. conds.add resLen.notZero
  389. conds.add resLen
  390. result.add conditionsToIfChain(conds, idx, res, 0)
  391. if fullMatch:
  392. result.add newCall(bindSym"and", res,
  393. newCall(bindSym">=", idx, newCall(bindSym"len", inp)))
  394. else:
  395. result.add res
  396. template atom*(input: string; idx: int; c: char): bool =
  397. ## Used in scanp for the matching of atoms (usually chars).
  398. ## EOF is matched as ``'\0'``.
  399. (idx < input.len and input[idx] == c) or (idx == input.len and c == '\0')
  400. template atom*(input: string; idx: int; s: set[char]): bool =
  401. (idx < input.len and input[idx] in s) or (idx == input.len and '\0' in s)
  402. template hasNxt*(input: string; idx: int): bool = idx < input.len
  403. #template prepare*(input: string): int = 0
  404. template success*(x: int): bool = x != 0
  405. template nxt*(input: string; idx, step: int = 1) = inc(idx, step)
  406. macro scanp*(input, idx: typed; pattern: varargs[untyped]): bool =
  407. ## See top level documentation of this module about how ``scanp`` works.
  408. type StmtTriple = tuple[init, cond, action: NimNode]
  409. template interf(x): untyped = bindSym(x, brForceOpen)
  410. proc toIfChain(n: seq[StmtTriple]; idx, res: NimNode; start: int): NimNode =
  411. if start >= n.len: return newAssignment(res, newLit true)
  412. var ifs: NimNode = nil
  413. if n[start].cond.kind == nnkEmpty:
  414. ifs = toIfChain(n, idx, res, start+1)
  415. else:
  416. ifs = newIfStmt((n[start].cond,
  417. newTree(nnkStmtList, n[start].action,
  418. toIfChain(n, idx, res, start+1))))
  419. result = newTree(nnkStmtList, n[start].init, ifs)
  420. proc attach(x, attached: NimNode): NimNode =
  421. if attached == nil: x
  422. else: newStmtList(attached, x)
  423. proc placeholder(n, x, j: NimNode): NimNode =
  424. if n.kind == nnkPrefix and n[0].eqIdent("$"):
  425. let n1 = n[1]
  426. if n1.eqIdent"_" or n1.eqIdent"current":
  427. result = newTree(nnkBracketExpr, x, j)
  428. elif n1.eqIdent"input":
  429. result = x
  430. elif n1.eqIdent"i" or n1.eqIdent"index":
  431. result = j
  432. else:
  433. error("unknown pattern " & repr(n))
  434. else:
  435. result = copyNimNode(n)
  436. for i in 0 ..< n.len:
  437. result.add placeholder(n[i], x, j)
  438. proc atm(it, input, idx, attached: NimNode): StmtTriple =
  439. template `!!`(x): untyped = attach(x, attached)
  440. case it.kind
  441. of nnkIdent:
  442. var resLen = genSym(nskLet, "resLen")
  443. result = (newLetStmt(resLen, newCall(it, input, idx)),
  444. newCall(interf"success", resLen),
  445. !!newCall(interf"nxt", input, idx, resLen))
  446. of nnkCallKinds:
  447. # *{'A'..'Z'} !! s.add(!_)
  448. template buildWhile(input, idx, init, cond, action): untyped =
  449. while hasNxt(input, idx):
  450. init
  451. if not cond: break
  452. action
  453. # (x) a # bind action a to (x)
  454. if it[0].kind == nnkPar and it.len == 2:
  455. result = atm(it[0], input, idx, placeholder(it[1], input, idx))
  456. elif it.kind == nnkInfix and it[0].eqIdent"->":
  457. # bind matching to some action:
  458. result = atm(it[1], input, idx, placeholder(it[2], input, idx))
  459. elif it.kind == nnkInfix and it[0].eqIdent"as":
  460. let cond = if it[1].kind in nnkCallKinds: placeholder(it[1], input, idx)
  461. else: newCall(it[1], input, idx)
  462. result = (newLetStmt(it[2], cond),
  463. newCall(interf"success", it[2]),
  464. !!newCall(interf"nxt", input, idx, it[2]))
  465. elif it.kind == nnkPrefix and it[0].eqIdent"*":
  466. let (init, cond, action) = atm(it[1], input, idx, attached)
  467. result = (getAst(buildWhile(input, idx, init, cond, action)),
  468. newEmptyNode(), newEmptyNode())
  469. elif it.kind == nnkPrefix and it[0].eqIdent"+":
  470. # x+ is the same as xx*
  471. result = atm(newTree(nnkTupleConstr, it[1], newTree(nnkPrefix, ident"*", it[1])),
  472. input, idx, attached)
  473. elif it.kind == nnkPrefix and it[0].eqIdent"?":
  474. # optional.
  475. let (init, cond, action) = atm(it[1], input, idx, attached)
  476. if cond.kind == nnkEmpty:
  477. error("'?' operator applied to a non-condition")
  478. else:
  479. result = (newTree(nnkStmtList, init, newIfStmt((cond, action))),
  480. newEmptyNode(), newEmptyNode())
  481. elif it.kind == nnkPrefix and it[0].eqIdent"~":
  482. # not operator
  483. let (init, cond, action) = atm(it[1], input, idx, attached)
  484. if cond.kind == nnkEmpty:
  485. error("'~' operator applied to a non-condition")
  486. else:
  487. result = (init, newCall(bindSym"not", cond), action)
  488. elif it.kind == nnkInfix and it[0].eqIdent"|":
  489. let a = atm(it[1], input, idx, attached)
  490. let b = atm(it[2], input, idx, attached)
  491. if a.cond.kind == nnkEmpty or b.cond.kind == nnkEmpty:
  492. error("'|' operator applied to a non-condition")
  493. else:
  494. result = (newStmtList(a.init,
  495. newIfStmt((a.cond, a.action), (newTree(nnkStmtListExpr, b.init, b.cond), b.action))),
  496. newEmptyNode(), newEmptyNode())
  497. elif it.kind == nnkInfix and it[0].eqIdent"^*":
  498. # a ^* b is rewritten to: (a *(b a))?
  499. #exprList = expr ^+ comma
  500. template tmp(a, b): untyped = ?(a, *(b, a))
  501. result = atm(getAst(tmp(it[1], it[2])), input, idx, attached)
  502. elif it.kind == nnkInfix and it[0].eqIdent"^+":
  503. # a ^* b is rewritten to: (a +(b a))?
  504. template tmp(a, b): untyped = (a, *(b, a))
  505. result = atm(getAst(tmp(it[1], it[2])), input, idx, attached)
  506. elif it.kind == nnkCommand and it.len == 2 and it[0].eqIdent"pred":
  507. # enforce that the wrapped call is interpreted as a predicate, not a non-terminal:
  508. result = (newEmptyNode(), placeholder(it[1], input, idx), newEmptyNode())
  509. else:
  510. var resLen = genSym(nskLet, "resLen")
  511. result = (newLetStmt(resLen, placeholder(it, input, idx)),
  512. newCall(interf"success", resLen), !!newCall(interf"nxt", input, idx, resLen))
  513. of nnkStrLit..nnkTripleStrLit:
  514. var resLen = genSym(nskLet, "resLen")
  515. result = (newLetStmt(resLen, newCall(interf"skip", input, it, idx)),
  516. newCall(interf"success", resLen), !!newCall(interf"nxt", input, idx, resLen))
  517. of nnkCurly, nnkAccQuoted, nnkCharLit:
  518. result = (newEmptyNode(), newCall(interf"atom", input, idx, it), !!newCall(interf"nxt", input, idx))
  519. of nnkCurlyExpr:
  520. if it.len == 3 and it[1].kind == nnkIntLit and it[2].kind == nnkIntLit:
  521. var h = newTree(nnkTupleConstr, it[0])
  522. for count in 2i64 .. it[1].intVal: h.add(it[0])
  523. for count in it[1].intVal .. it[2].intVal-1: h.add(newTree(nnkPrefix, ident"?", it[0]))
  524. result = atm(h, input, idx, attached)
  525. elif it.len == 2 and it[1].kind == nnkIntLit:
  526. var h = newTree(nnkTupleConstr, it[0])
  527. for count in 2i64 .. it[1].intVal: h.add(it[0])
  528. result = atm(h, input, idx, attached)
  529. else:
  530. error("invalid pattern")
  531. of nnkPar, nnkTupleConstr:
  532. if it.len == 1 and it.kind == nnkPar:
  533. result = atm(it[0], input, idx, attached)
  534. else:
  535. # concatenation:
  536. var conds: seq[StmtTriple] = @[]
  537. for x in it: conds.add atm(x, input, idx, attached)
  538. var res = genSym(nskVar, "res")
  539. result = (newStmtList(newVarStmt(res, newLit false),
  540. toIfChain(conds, idx, res, 0)), res, newEmptyNode())
  541. else:
  542. error("invalid pattern")
  543. #var idx = genSym(nskVar, "idx")
  544. var res = genSym(nskVar, "res")
  545. result = newTree(nnkStmtListExpr, #newVarStmt(idx, newCall(interf"prepare", input)),
  546. newVarStmt(res, newLit false))
  547. var conds: seq[StmtTriple] = @[]
  548. for it in pattern:
  549. conds.add atm(it, input, idx, nil)
  550. result.add toIfChain(conds, idx, res, 0)
  551. result.add res
  552. when defined(debugScanp):
  553. echo repr result
  554. when isMainModule:
  555. proc twoDigits(input: string; x: var int; start: int): int =
  556. if start+1 < input.len and input[start] == '0' and input[start+1] == '0':
  557. result = 2
  558. x = 13
  559. else:
  560. result = 0
  561. proc someSep(input: string; start: int; seps: set[char] = {';',',','-','.'}): int =
  562. result = 0
  563. while start+result < input.len and input[start+result] in seps: inc result
  564. proc demangle(s: string; res: var string; start: int): int =
  565. while result+start < s.len and s[result+start] in {'_', '@'}: inc result
  566. res = ""
  567. while result+start < s.len and s[result+start] > ' ' and s[result+start] != '_':
  568. res.add s[result+start]
  569. inc result
  570. while result+start < s.len and s[result+start] > ' ':
  571. inc result
  572. proc parseGDB(resp: string): seq[string] =
  573. const
  574. digits = {'0'..'9'}
  575. hexdigits = digits + {'a'..'f', 'A'..'F'}
  576. whites = {' ', '\t', '\C', '\L'}
  577. result = @[]
  578. var idx = 0
  579. while true:
  580. var prc = ""
  581. var info = ""
  582. if scanp(resp, idx, *`whites`, '#', *`digits`, +`whites`, ?("0x", *`hexdigits`, " in "),
  583. demangle($input, prc, $index), *`whites`, '(', * ~ ')', ')',
  584. *`whites`, "at ", +(~{'\C', '\L'} -> info.add($_)) ):
  585. result.add prc & " " & info
  586. else:
  587. break
  588. var key, val: string
  589. var intval: int
  590. var floatval: float
  591. doAssert scanf("abc:: xyz 89 33.25", "$w$s::$s$w$s$i $f", key, val, intval, floatVal)
  592. doAssert key == "abc"
  593. doAssert val == "xyz"
  594. doAssert intval == 89
  595. doAssert floatVal == 33.25
  596. var binval: int
  597. var octval: int
  598. var hexval: int
  599. doAssert scanf("0b0101 0o1234 0xabcd", "$b$s$o$s$h", binval, octval, hexval)
  600. doAssert binval == 0b0101
  601. doAssert octval == 0o1234
  602. doAssert hexval == 0xabcd
  603. let xx = scanf("$abc", "$$$i", intval)
  604. doAssert xx == false
  605. let xx2 = scanf("$1234", "$$$i", intval)
  606. doAssert xx2
  607. let yy = scanf(";.--Breakpoint00 [output]", "$[someSep]Breakpoint${twoDigits}$[someSep({';','.','-'})] [$+]$.", intVal, key)
  608. doAssert yy
  609. doAssert key == "output"
  610. doAssert intVal == 13
  611. var ident = ""
  612. var idx = 0
  613. let zz = scanp("foobar x x x xWZ", idx, +{'a'..'z'} -> add(ident, $_), *(*{' ', '\t'}, "x"), ~'U', "Z")
  614. doAssert zz
  615. doAssert ident == "foobar"
  616. const digits = {'0'..'9'}
  617. var year = 0
  618. var idx2 = 0
  619. if scanp("201655-8-9", idx2, `digits`{4,6} -> (year = year * 10 + ord($_) - ord('0')), "-8", "-9"):
  620. doAssert year == 201655
  621. const gdbOut = """
  622. #0 @foo_96013_1208911747@8 (x0=...)
  623. at c:/users/anwender/projects/nim/temp.nim:11
  624. #1 0x00417754 in tempInit000 () at c:/users/anwender/projects/nim/temp.nim:13
  625. #2 0x0041768d in NimMainInner ()
  626. at c:/users/anwender/projects/nim/lib/system.nim:2605
  627. #3 0x004176b1 in NimMain ()
  628. at c:/users/anwender/projects/nim/lib/system.nim:2613
  629. #4 0x004176db in main (argc=1, args=0x712cc8, env=0x711ca8)
  630. at c:/users/anwender/projects/nim/lib/system.nim:2620"""
  631. const result = @["foo c:/users/anwender/projects/nim/temp.nim:11",
  632. "tempInit000 c:/users/anwender/projects/nim/temp.nim:13",
  633. "NimMainInner c:/users/anwender/projects/nim/lib/system.nim:2605",
  634. "NimMain c:/users/anwender/projects/nim/lib/system.nim:2613",
  635. "main c:/users/anwender/projects/nim/lib/system.nim:2620"]
  636. #doAssert parseGDB(gdbOut) == result
  637. # bug #6487
  638. var count = 0
  639. proc test(): string =
  640. inc count
  641. result = ",123123"
  642. var a: int
  643. discard scanf(test(), ",$i", a)
  644. doAssert count == 1