loop.texi 31 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635
  1. @c Copyright (C) 2006-2015 Free Software Foundation, Inc.
  2. @c Free Software Foundation, Inc.
  3. @c This is part of the GCC manual.
  4. @c For copying conditions, see the file gcc.texi.
  5. @c ---------------------------------------------------------------------
  6. @c Loop Representation
  7. @c ---------------------------------------------------------------------
  8. @node Loop Analysis and Representation
  9. @chapter Analysis and Representation of Loops
  10. GCC provides extensive infrastructure for work with natural loops, i.e.,
  11. strongly connected components of CFG with only one entry block. This
  12. chapter describes representation of loops in GCC, both on GIMPLE and in
  13. RTL, as well as the interfaces to loop-related analyses (induction
  14. variable analysis and number of iterations analysis).
  15. @menu
  16. * Loop representation:: Representation and analysis of loops.
  17. * Loop querying:: Getting information about loops.
  18. * Loop manipulation:: Loop manipulation functions.
  19. * LCSSA:: Loop-closed SSA form.
  20. * Scalar evolutions:: Induction variables on GIMPLE.
  21. * loop-iv:: Induction variables on RTL.
  22. * Number of iterations:: Number of iterations analysis.
  23. * Dependency analysis:: Data dependency analysis.
  24. * Omega:: A solver for linear programming problems.
  25. @end menu
  26. @node Loop representation
  27. @section Loop representation
  28. @cindex Loop representation
  29. @cindex Loop analysis
  30. This chapter describes the representation of loops in GCC, and functions
  31. that can be used to build, modify and analyze this representation. Most
  32. of the interfaces and data structures are declared in @file{cfgloop.h}.
  33. Loop structures are analyzed and this information disposed or updated
  34. at the discretion of individual passes. Still most of the generic
  35. CFG manipulation routines are aware of loop structures and try to
  36. keep them up-to-date. By this means an increasing part of the
  37. compilation pipeline is setup to maintain loop structure across
  38. passes to allow attaching meta information to individual loops
  39. for consumption by later passes.
  40. In general, a natural loop has one entry block (header) and possibly
  41. several back edges (latches) leading to the header from the inside of
  42. the loop. Loops with several latches may appear if several loops share
  43. a single header, or if there is a branching in the middle of the loop.
  44. The representation of loops in GCC however allows only loops with a
  45. single latch. During loop analysis, headers of such loops are split and
  46. forwarder blocks are created in order to disambiguate their structures.
  47. Heuristic based on profile information and structure of the induction
  48. variables in the loops is used to determine whether the latches
  49. correspond to sub-loops or to control flow in a single loop. This means
  50. that the analysis sometimes changes the CFG, and if you run it in the
  51. middle of an optimization pass, you must be able to deal with the new
  52. blocks. You may avoid CFG changes by passing
  53. @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} flag to the loop discovery,
  54. note however that most other loop manipulation functions will not work
  55. correctly for loops with multiple latch edges (the functions that only
  56. query membership of blocks to loops and subloop relationships, or
  57. enumerate and test loop exits, can be expected to work).
  58. Body of the loop is the set of blocks that are dominated by its header,
  59. and reachable from its latch against the direction of edges in CFG@. The
  60. loops are organized in a containment hierarchy (tree) such that all the
  61. loops immediately contained inside loop L are the children of L in the
  62. tree. This tree is represented by the @code{struct loops} structure.
  63. The root of this tree is a fake loop that contains all blocks in the
  64. function. Each of the loops is represented in a @code{struct loop}
  65. structure. Each loop is assigned an index (@code{num} field of the
  66. @code{struct loop} structure), and the pointer to the loop is stored in
  67. the corresponding field of the @code{larray} vector in the loops
  68. structure. The indices do not have to be continuous, there may be
  69. empty (@code{NULL}) entries in the @code{larray} created by deleting
  70. loops. Also, there is no guarantee on the relative order of a loop
  71. and its subloops in the numbering. The index of a loop never changes.
  72. The entries of the @code{larray} field should not be accessed directly.
  73. The function @code{get_loop} returns the loop description for a loop with
  74. the given index. @code{number_of_loops} function returns number of
  75. loops in the function. To traverse all loops, use @code{FOR_EACH_LOOP}
  76. macro. The @code{flags} argument of the macro is used to determine
  77. the direction of traversal and the set of loops visited. Each loop is
  78. guaranteed to be visited exactly once, regardless of the changes to the
  79. loop tree, and the loops may be removed during the traversal. The newly
  80. created loops are never traversed, if they need to be visited, this
  81. must be done separately after their creation. The @code{FOR_EACH_LOOP}
  82. macro allocates temporary variables. If the @code{FOR_EACH_LOOP} loop
  83. were ended using break or goto, they would not be released;
  84. @code{FOR_EACH_LOOP_BREAK} macro must be used instead.
  85. Each basic block contains the reference to the innermost loop it belongs
  86. to (@code{loop_father}). For this reason, it is only possible to have
  87. one @code{struct loops} structure initialized at the same time for each
  88. CFG@. The global variable @code{current_loops} contains the
  89. @code{struct loops} structure. Many of the loop manipulation functions
  90. assume that dominance information is up-to-date.
  91. The loops are analyzed through @code{loop_optimizer_init} function. The
  92. argument of this function is a set of flags represented in an integer
  93. bitmask. These flags specify what other properties of the loop
  94. structures should be calculated/enforced and preserved later:
  95. @itemize
  96. @item @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES}: If this flag is set, no
  97. changes to CFG will be performed in the loop analysis, in particular,
  98. loops with multiple latch edges will not be disambiguated. If a loop
  99. has multiple latches, its latch block is set to NULL@. Most of
  100. the loop manipulation functions will not work for loops in this shape.
  101. No other flags that require CFG changes can be passed to
  102. loop_optimizer_init.
  103. @item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such
  104. a way that each loop has only one entry edge, and additionally, the
  105. source block of this entry edge has only one successor. This creates a
  106. natural place where the code can be moved out of the loop, and ensures
  107. that the entry edge of the loop leads from its immediate super-loop.
  108. @item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to
  109. force the latch block of each loop to have only one successor. This
  110. ensures that the latch of the loop does not belong to any of its
  111. sub-loops, and makes manipulation with the loops significantly easier.
  112. Most of the loop manipulation functions assume that the loops are in
  113. this shape. Note that with this flag, the ``normal'' loop without any
  114. control flow inside and with one exit consists of two basic blocks.
  115. @item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
  116. edges in the strongly connected components that are not natural loops
  117. (have more than one entry block) are marked with
  118. @code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags. The
  119. flag is not set for blocks and edges that belong to natural loops that
  120. are in such an irreducible region (but it is set for the entry and exit
  121. edges of such a loop, if they lead to/from this region).
  122. @item @code{LOOPS_HAVE_RECORDED_EXITS}: The lists of exits are recorded
  123. and updated for each loop. This makes some functions (e.g.,
  124. @code{get_loop_exit_edges}) more efficient. Some functions (e.g.,
  125. @code{single_exit}) can be used only if the lists of exits are
  126. recorded.
  127. @end itemize
  128. These properties may also be computed/enforced later, using functions
  129. @code{create_preheaders}, @code{force_single_succ_latches},
  130. @code{mark_irreducible_loops} and @code{record_loop_exits}.
  131. The properties can be queried using @code{loops_state_satisfies_p}.
  132. The memory occupied by the loops structures should be freed with
  133. @code{loop_optimizer_finalize} function. When loop structures are
  134. setup to be preserved across passes this function reduces the
  135. information to be kept up-to-date to a minimum (only
  136. @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} set).
  137. The CFG manipulation functions in general do not update loop structures.
  138. Specialized versions that additionally do so are provided for the most
  139. common tasks. On GIMPLE, @code{cleanup_tree_cfg_loop} function can be
  140. used to cleanup CFG while updating the loops structures if
  141. @code{current_loops} is set.
  142. At the moment loop structure is preserved from the start of GIMPLE
  143. loop optimizations until the end of RTL loop optimizations. During
  144. this time a loop can be tracked by its @code{struct loop} and number.
  145. @node Loop querying
  146. @section Loop querying
  147. @cindex Loop querying
  148. The functions to query the information about loops are declared in
  149. @file{cfgloop.h}. Some of the information can be taken directly from
  150. the structures. @code{loop_father} field of each basic block contains
  151. the innermost loop to that the block belongs. The most useful fields of
  152. loop structure (that are kept up-to-date at all times) are:
  153. @itemize
  154. @item @code{header}, @code{latch}: Header and latch basic blocks of the
  155. loop.
  156. @item @code{num_nodes}: Number of basic blocks in the loop (including
  157. the basic blocks of the sub-loops).
  158. @item @code{depth}: The depth of the loop in the loops tree, i.e., the
  159. number of super-loops of the loop.
  160. @item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
  161. sub-loop, and the sibling of the loop in the loops tree.
  162. @end itemize
  163. There are other fields in the loop structures, many of them used only by
  164. some of the passes, or not updated during CFG changes; in general, they
  165. should not be accessed directly.
  166. The most important functions to query loop structures are:
  167. @itemize
  168. @item @code{flow_loops_dump}: Dumps the information about loops to a
  169. file.
  170. @item @code{verify_loop_structure}: Checks consistency of the loop
  171. structures.
  172. @item @code{loop_latch_edge}: Returns the latch edge of a loop.
  173. @item @code{loop_preheader_edge}: If loops have preheaders, returns
  174. the preheader edge of a loop.
  175. @item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
  176. another loop.
  177. @item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs
  178. to a loop (including its sub-loops).
  179. @item @code{find_common_loop}: Finds the common super-loop of two loops.
  180. @item @code{superloop_at_depth}: Returns the super-loop of a loop with
  181. the given depth.
  182. @item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the
  183. number of insns in the loop, on GIMPLE and on RTL.
  184. @item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
  185. loop.
  186. @item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
  187. with @code{EDGE_LOOP_EXIT} flag.
  188. @item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
  189. @code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the
  190. loop in depth-first search order in reversed CFG, ordered by dominance
  191. relation, and breath-first search order, respectively.
  192. @item @code{single_exit}: Returns the single exit edge of the loop, or
  193. @code{NULL} if the loop has more than one exit. You can only use this
  194. function if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used.
  195. @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
  196. @item @code{just_once_each_iteration_p}: Returns true if the basic block
  197. is executed exactly once during each iteration of a loop (that is, it
  198. does not belong to a sub-loop, and it dominates the latch of the loop).
  199. @end itemize
  200. @node Loop manipulation
  201. @section Loop manipulation
  202. @cindex Loop manipulation
  203. The loops tree can be manipulated using the following functions:
  204. @itemize
  205. @item @code{flow_loop_tree_node_add}: Adds a node to the tree.
  206. @item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
  207. @item @code{add_bb_to_loop}: Adds a basic block to a loop.
  208. @item @code{remove_bb_from_loops}: Removes a basic block from loops.
  209. @end itemize
  210. Most low-level CFG functions update loops automatically. The following
  211. functions handle some more complicated cases of CFG manipulations:
  212. @itemize
  213. @item @code{remove_path}: Removes an edge and all blocks it dominates.
  214. @item @code{split_loop_exit_edge}: Splits exit edge of the loop,
  215. ensuring that PHI node arguments remain in the loop (this ensures that
  216. loop-closed SSA form is preserved). Only useful on GIMPLE.
  217. @end itemize
  218. Finally, there are some higher-level loop transformations implemented.
  219. While some of them are written so that they should work on non-innermost
  220. loops, they are mostly untested in that case, and at the moment, they
  221. are only reliable for the innermost loops:
  222. @itemize
  223. @item @code{create_iv}: Creates a new induction variable. Only works on
  224. GIMPLE@. @code{standard_iv_increment_position} can be used to find a
  225. suitable place for the iv increment.
  226. @item @code{duplicate_loop_to_header_edge},
  227. @code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
  228. on GIMPLE) duplicate the body of the loop prescribed number of times on
  229. one of the edges entering loop header, thus performing either loop
  230. unrolling or loop peeling. @code{can_duplicate_loop_p}
  231. (@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated
  232. loop.
  233. @item @code{loop_version}, @code{tree_ssa_loop_version}: These function
  234. create a copy of a loop, and a branch before them that selects one of
  235. them depending on the prescribed condition. This is useful for
  236. optimizations that need to verify some assumptions in runtime (one of
  237. the copies of the loop is usually left unchanged, while the other one is
  238. transformed in some way).
  239. @item @code{tree_unroll_loop}: Unrolls the loop, including peeling the
  240. extra iterations to make the number of iterations divisible by unroll
  241. factor, updating the exit condition, and removing the exits that now
  242. cannot be taken. Works only on GIMPLE.
  243. @end itemize
  244. @node LCSSA
  245. @section Loop-closed SSA form
  246. @cindex LCSSA
  247. @cindex Loop-closed SSA form
  248. Throughout the loop optimizations on tree level, one extra condition is
  249. enforced on the SSA form: No SSA name is used outside of the loop in
  250. that it is defined. The SSA form satisfying this condition is called
  251. ``loop-closed SSA form'' -- LCSSA@. To enforce LCSSA, PHI nodes must be
  252. created at the exits of the loops for the SSA names that are used
  253. outside of them. Only the real operands (not virtual SSA names) are
  254. held in LCSSA, in order to save memory.
  255. There are various benefits of LCSSA:
  256. @itemize
  257. @item Many optimizations (value range analysis, final value
  258. replacement) are interested in the values that are defined in the loop
  259. and used outside of it, i.e., exactly those for that we create new PHI
  260. nodes.
  261. @item In induction variable analysis, it is not necessary to specify the
  262. loop in that the analysis should be performed -- the scalar evolution
  263. analysis always returns the results with respect to the loop in that the
  264. SSA name is defined.
  265. @item It makes updating of SSA form during loop transformations simpler.
  266. Without LCSSA, operations like loop unrolling may force creation of PHI
  267. nodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
  268. updated locally. However, since we only keep real operands in LCSSA, we
  269. cannot use this advantage (we could have local updating of real
  270. operands, but it is not much more efficient than to use generic SSA form
  271. updating for it as well; the amount of changes to SSA is the same).
  272. @end itemize
  273. However, it also means LCSSA must be updated. This is usually
  274. straightforward, unless you create a new value in loop and use it
  275. outside, or unless you manipulate loop exit edges (functions are
  276. provided to make these manipulations simple).
  277. @code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to
  278. LCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of
  279. LCSSA is preserved.
  280. @node Scalar evolutions
  281. @section Scalar evolutions
  282. @cindex Scalar evolutions
  283. @cindex IV analysis on GIMPLE
  284. Scalar evolutions (SCEV) are used to represent results of induction
  285. variable analysis on GIMPLE@. They enable us to represent variables with
  286. complicated behavior in a simple and consistent way (we only use it to
  287. express values of polynomial induction variables, but it is possible to
  288. extend it). The interfaces to SCEV analysis are declared in
  289. @file{tree-scalar-evolution.h}. To use scalar evolutions analysis,
  290. @code{scev_initialize} must be used. To stop using SCEV,
  291. @code{scev_finalize} should be used. SCEV analysis caches results in
  292. order to save time and memory. This cache however is made invalid by
  293. most of the loop transformations, including removal of code. If such a
  294. transformation is performed, @code{scev_reset} must be called to clean
  295. the caches.
  296. Given an SSA name, its behavior in loops can be analyzed using the
  297. @code{analyze_scalar_evolution} function. The returned SCEV however
  298. does not have to be fully analyzed and it may contain references to
  299. other SSA names defined in the loop. To resolve these (potentially
  300. recursive) references, @code{instantiate_parameters} or
  301. @code{resolve_mixers} functions must be used.
  302. @code{instantiate_parameters} is useful when you use the results of SCEV
  303. only for some analysis, and when you work with whole nest of loops at
  304. once. It will try replacing all SSA names by their SCEV in all loops,
  305. including the super-loops of the current loop, thus providing a complete
  306. information about the behavior of the variable in the loop nest.
  307. @code{resolve_mixers} is useful if you work with only one loop at a
  308. time, and if you possibly need to create code based on the value of the
  309. induction variable. It will only resolve the SSA names defined in the
  310. current loop, leaving the SSA names defined outside unchanged, even if
  311. their evolution in the outer loops is known.
  312. The SCEV is a normal tree expression, except for the fact that it may
  313. contain several special tree nodes. One of them is
  314. @code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be
  315. expressed. The other one is @code{POLYNOMIAL_CHREC}. Polynomial chrec
  316. has three arguments -- base, step and loop (both base and step may
  317. contain further polynomial chrecs). Type of the expression and of base
  318. and step must be the same. A variable has evolution
  319. @code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
  320. loop) equivalent to @code{x_1} in the following example
  321. @smallexample
  322. while (@dots{})
  323. @{
  324. x_1 = phi (base, x_2);
  325. x_2 = x_1 + step;
  326. @}
  327. @end smallexample
  328. Note that this includes the language restrictions on the operations.
  329. For example, if we compile C code and @code{x} has signed type, then the
  330. overflow in addition would cause undefined behavior, and we may assume
  331. that this does not happen. Hence, the value with this SCEV cannot
  332. overflow (which restricts the number of iterations of such a loop).
  333. In many cases, one wants to restrict the attention just to affine
  334. induction variables. In this case, the extra expressive power of SCEV
  335. is not useful, and may complicate the optimizations. In this case,
  336. @code{simple_iv} function may be used to analyze a value -- the result
  337. is a loop-invariant base and step.
  338. @node loop-iv
  339. @section IV analysis on RTL
  340. @cindex IV analysis on RTL
  341. The induction variable on RTL is simple and only allows analysis of
  342. affine induction variables, and only in one loop at once. The interface
  343. is declared in @file{cfgloop.h}. Before analyzing induction variables
  344. in a loop L, @code{iv_analysis_loop_init} function must be called on L.
  345. After the analysis (possibly calling @code{iv_analysis_loop_init} for
  346. several loops) is finished, @code{iv_analysis_done} should be called.
  347. The following functions can be used to access the results of the
  348. analysis:
  349. @itemize
  350. @item @code{iv_analyze}: Analyzes a single register used in the given
  351. insn. If no use of the register in this insn is found, the following
  352. insns are scanned, so that this function can be called on the insn
  353. returned by get_condition.
  354. @item @code{iv_analyze_result}: Analyzes result of the assignment in the
  355. given insn.
  356. @item @code{iv_analyze_expr}: Analyzes a more complicated expression.
  357. All its operands are analyzed by @code{iv_analyze}, and hence they must
  358. be used in the specified insn or one of the following insns.
  359. @end itemize
  360. The description of the induction variable is provided in @code{struct
  361. rtx_iv}. In order to handle subregs, the representation is a bit
  362. complicated; if the value of the @code{extend} field is not
  363. @code{UNKNOWN}, the value of the induction variable in the i-th
  364. iteration is
  365. @smallexample
  366. delta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)),
  367. @end smallexample
  368. with the following exception: if @code{first_special} is true, then the
  369. value in the first iteration (when @code{i} is zero) is @code{delta +
  370. mult * base}. However, if @code{extend} is equal to @code{UNKNOWN},
  371. then @code{first_special} must be false, @code{delta} 0, @code{mult} 1
  372. and the value in the i-th iteration is
  373. @smallexample
  374. subreg_@{mode@} (base + i * step)
  375. @end smallexample
  376. The function @code{get_iv_value} can be used to perform these
  377. calculations.
  378. @node Number of iterations
  379. @section Number of iterations analysis
  380. @cindex Number of iterations analysis
  381. Both on GIMPLE and on RTL, there are functions available to determine
  382. the number of iterations of a loop, with a similar interface. The
  383. number of iterations of a loop in GCC is defined as the number of
  384. executions of the loop latch. In many cases, it is not possible to
  385. determine the number of iterations unconditionally -- the determined
  386. number is correct only if some assumptions are satisfied. The analysis
  387. tries to verify these conditions using the information contained in the
  388. program; if it fails, the conditions are returned together with the
  389. result. The following information and conditions are provided by the
  390. analysis:
  391. @itemize
  392. @item @code{assumptions}: If this condition is false, the rest of
  393. the information is invalid.
  394. @item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If
  395. this condition is true, the loop exits in the first iteration.
  396. @item @code{infinite}: If this condition is true, the loop is infinite.
  397. This condition is only available on RTL@. On GIMPLE, conditions for
  398. finiteness of the loop are included in @code{assumptions}.
  399. @item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression
  400. that gives number of iterations. The number of iterations is defined as
  401. the number of executions of the loop latch.
  402. @end itemize
  403. Both on GIMPLE and on RTL, it necessary for the induction variable
  404. analysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
  405. On GIMPLE, the results are stored to @code{struct tree_niter_desc}
  406. structure. Number of iterations before the loop is exited through a
  407. given exit can be determined using @code{number_of_iterations_exit}
  408. function. On RTL, the results are returned in @code{struct niter_desc}
  409. structure. The corresponding function is named
  410. @code{check_simple_exit}. There are also functions that pass through
  411. all the exits of a loop and try to find one with easy to determine
  412. number of iterations -- @code{find_loop_niter} on GIMPLE and
  413. @code{find_simple_exit} on RTL@. Finally, there are functions that
  414. provide the same information, but additionally cache it, so that
  415. repeated calls to number of iterations are not so costly --
  416. @code{number_of_latch_executions} on GIMPLE and @code{get_simple_loop_desc}
  417. on RTL.
  418. Note that some of these functions may behave slightly differently than
  419. others -- some of them return only the expression for the number of
  420. iterations, and fail if there are some assumptions. The function
  421. @code{number_of_latch_executions} works only for single-exit loops.
  422. The function @code{number_of_cond_exit_executions} can be used to
  423. determine number of executions of the exit condition of a single-exit
  424. loop (i.e., the @code{number_of_latch_executions} increased by one).
  425. @node Dependency analysis
  426. @section Data Dependency Analysis
  427. @cindex Data Dependency Analysis
  428. The code for the data dependence analysis can be found in
  429. @file{tree-data-ref.c} and its interface and data structures are
  430. described in @file{tree-data-ref.h}. The function that computes the
  431. data dependences for all the array and pointer references for a given
  432. loop is @code{compute_data_dependences_for_loop}. This function is
  433. currently used by the linear loop transform and the vectorization
  434. passes. Before calling this function, one has to allocate two vectors:
  435. a first vector will contain the set of data references that are
  436. contained in the analyzed loop body, and the second vector will contain
  437. the dependence relations between the data references. Thus if the
  438. vector of data references is of size @code{n}, the vector containing the
  439. dependence relations will contain @code{n*n} elements. However if the
  440. analyzed loop contains side effects, such as calls that potentially can
  441. interfere with the data references in the current analyzed loop, the
  442. analysis stops while scanning the loop body for data references, and
  443. inserts a single @code{chrec_dont_know} in the dependence relation
  444. array.
  445. The data references are discovered in a particular order during the
  446. scanning of the loop body: the loop body is analyzed in execution order,
  447. and the data references of each statement are pushed at the end of the
  448. data reference array. Two data references syntactically occur in the
  449. program in the same order as in the array of data references. This
  450. syntactic order is important in some classical data dependence tests,
  451. and mapping this order to the elements of this array avoids costly
  452. queries to the loop body representation.
  453. Three types of data references are currently handled: ARRAY_REF,
  454. INDIRECT_REF and COMPONENT_REF@. The data structure for the data reference
  455. is @code{data_reference}, where @code{data_reference_p} is a name of a
  456. pointer to the data reference structure. The structure contains the
  457. following elements:
  458. @itemize
  459. @item @code{base_object_info}: Provides information about the base object
  460. of the data reference and its access functions. These access functions
  461. represent the evolution of the data reference in the loop relative to
  462. its base, in keeping with the classical meaning of the data reference
  463. access function for the support of arrays. For example, for a reference
  464. @code{a.b[i][j]}, the base object is @code{a.b} and the access functions,
  465. one for each array subscript, are:
  466. @code{@{i_init, + i_step@}_1, @{j_init, +, j_step@}_2}.
  467. @item @code{first_location_in_loop}: Provides information about the first
  468. location accessed by the data reference in the loop and about the access
  469. function used to represent evolution relative to this location. This data
  470. is used to support pointers, and is not used for arrays (for which we
  471. have base objects). Pointer accesses are represented as a one-dimensional
  472. access that starts from the first location accessed in the loop. For
  473. example:
  474. @smallexample
  475. for1 i
  476. for2 j
  477. *((int *)p + i + j) = a[i][j];
  478. @end smallexample
  479. The access function of the pointer access is @code{@{0, + 4B@}_for2}
  480. relative to @code{p + i}. The access functions of the array are
  481. @code{@{i_init, + i_step@}_for1} and @code{@{j_init, +, j_step@}_for2}
  482. relative to @code{a}.
  483. Usually, the object the pointer refers to is either unknown, or we can't
  484. prove that the access is confined to the boundaries of a certain object.
  485. Two data references can be compared only if at least one of these two
  486. representations has all its fields filled for both data references.
  487. The current strategy for data dependence tests is as follows:
  488. If both @code{a} and @code{b} are represented as arrays, compare
  489. @code{a.base_object} and @code{b.base_object};
  490. if they are equal, apply dependence tests (use access functions based on
  491. base_objects).
  492. Else if both @code{a} and @code{b} are represented as pointers, compare
  493. @code{a.first_location} and @code{b.first_location};
  494. if they are equal, apply dependence tests (use access functions based on
  495. first location).
  496. However, if @code{a} and @code{b} are represented differently, only try
  497. to prove that the bases are definitely different.
  498. @item Aliasing information.
  499. @item Alignment information.
  500. @end itemize
  501. The structure describing the relation between two data references is
  502. @code{data_dependence_relation} and the shorter name for a pointer to
  503. such a structure is @code{ddr_p}. This structure contains:
  504. @itemize
  505. @item a pointer to each data reference,
  506. @item a tree node @code{are_dependent} that is set to @code{chrec_known}
  507. if the analysis has proved that there is no dependence between these two
  508. data references, @code{chrec_dont_know} if the analysis was not able to
  509. determine any useful result and potentially there could exist a
  510. dependence between these data references, and @code{are_dependent} is
  511. set to @code{NULL_TREE} if there exist a dependence relation between the
  512. data references, and the description of this dependence relation is
  513. given in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects}
  514. arrays,
  515. @item a boolean that determines whether the dependence relation can be
  516. represented by a classical distance vector,
  517. @item an array @code{subscripts} that contains a description of each
  518. subscript of the data references. Given two array accesses a
  519. subscript is the tuple composed of the access functions for a given
  520. dimension. For example, given @code{A[f1][f2][f3]} and
  521. @code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2,
  522. g2), (f3, g3)}.
  523. @item two arrays @code{dir_vects} and @code{dist_vects} that contain
  524. classical representations of the data dependences under the form of
  525. direction and distance dependence vectors,
  526. @item an array of loops @code{loop_nest} that contains the loops to
  527. which the distance and direction vectors refer to.
  528. @end itemize
  529. Several functions for pretty printing the information extracted by the
  530. data dependence analysis are available: @code{dump_ddrs} prints with a
  531. maximum verbosity the details of a data dependence relations array,
  532. @code{dump_dist_dir_vectors} prints only the classical distance and
  533. direction vectors for a data dependence relations array, and
  534. @code{dump_data_references} prints the details of the data references
  535. contained in a data reference array.
  536. @node Omega
  537. @section Omega a solver for linear programming problems
  538. @cindex Omega a solver for linear programming problems
  539. The data dependence analysis contains several solvers triggered
  540. sequentially from the less complex ones to the more sophisticated.
  541. For ensuring the consistency of the results of these solvers, a data
  542. dependence check pass has been implemented based on two different
  543. solvers. The second method that has been integrated to GCC is based
  544. on the Omega dependence solver, written in the 1990's by William Pugh
  545. and David Wonnacott. Data dependence tests can be formulated using a
  546. subset of the Presburger arithmetics that can be translated to linear
  547. constraint systems. These linear constraint systems can then be
  548. solved using the Omega solver.
  549. The Omega solver is using Fourier-Motzkin's algorithm for variable
  550. elimination: a linear constraint system containing @code{n} variables
  551. is reduced to a linear constraint system with @code{n-1} variables.
  552. The Omega solver can also be used for solving other problems that can
  553. be expressed under the form of a system of linear equalities and
  554. inequalities. The Omega solver is known to have an exponential worst
  555. case, also known under the name of ``omega nightmare'' in the
  556. literature, but in practice, the omega test is known to be efficient
  557. for the common data dependence tests.
  558. The interface used by the Omega solver for describing the linear
  559. programming problems is described in @file{omega.h}, and the solver is
  560. @code{omega_solve_problem}.