divcost-analysis 2.6 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
  1. Analysis of cycle costs for SH4:
  2. -> udiv_le128: 5
  3. -> udiv_ge64k: 6
  4. -> udiv udiv_25: 10
  5. -> pos_divisor: 3
  6. -> pos_result linear: 5
  7. -> pos_result - -: 5
  8. -> div_le128: 7
  9. -> div_ge64k: 9
  10. sdivsi3 -> udiv_25 13
  11. udiv25 -> div_ge64k_end: 15
  12. div_ge64k_end -> rts: 13
  13. div_le128 -> div_le128_2: 2, r1 latency 3
  14. udiv_le128 -> div_le128_2: 2, r1 latency 3
  15. (u)div_le128 -> div_by_1: 9
  16. (u)div_le128 -> rts: 17
  17. div_by_1(_neg) -> rts: 4
  18. div_ge64k -> div_r8: 2
  19. div_ge64k -> div_ge64k_2: 3
  20. udiv_ge64k -> udiv_r8: 3
  21. udiv_ge64k -> div_ge64k_2: 3 + LS
  22. (u)div_ge64k -> div_ge64k_end: 13
  23. div_r8 -> div_r8_2: 2
  24. udiv_r8 -> div_r8_2: 2 + LS
  25. (u)div_r8 -> rts: 21
  26. -> - + neg_result: 5
  27. -> + - neg_result: 5
  28. -> div_le128_neg: 7
  29. -> div_ge64k_neg: 9
  30. -> div_r8_neg: 11
  31. -> <64k div_ge64k_neg_end: 28
  32. -> >=64k div_ge64k_neg_end: 22
  33. div_ge64k_neg_end ft -> rts: 14
  34. div_r8_neg_end -> rts: 4
  35. div_r8_neg -> div_r8_neg_end: 18
  36. div_le128_neg -> div_by_1_neg: 4
  37. div_le128_neg -> rts 18
  38. sh4-200 absolute divisor range:
  39. 1 [2..128] [129..64K) [64K..|dividend|/256] >=64K,>|dividend/256|
  40. udiv 18 22 38 32 30
  41. sdiv pos: 20 24 41 35 32
  42. sdiv neg: 15 25 42 36 33
  43. sh4-300 absolute divisor range:
  44. 8 bit 16 bit 24 bit > 24 bit
  45. udiv 15 35 28 25
  46. sdiv 14 36 34 31
  47. fp-based:
  48. unsigned: 42 + 3 + 3 (lingering ftrc latency + sts fpul,rx) at caller's site
  49. signed: 33 + 3 + 3 (lingering ftrc latency + sts fpul,rx) at caller's site
  50. call-div1: divisor range:
  51. [1..64K) >= 64K
  52. unsigned: 63 58
  53. signed: 76 76
  54. SFUNC_STATIC call overhead:
  55. mov.l 0f,r1
  56. bsrf r1
  57. SFUNC_GOT call overhead - current:
  58. mov.l 0f,r1
  59. mova 0f,r0
  60. mov.l 1f,r2
  61. add r1,r0
  62. mov.l @(r0,r2),r0
  63. jmp @r0
  64. ; 3 cycles worse than SFUNC_STATIC
  65. SFUNC_GOT call overhead - improved assembler:
  66. mov.l 0f,r1
  67. mova 0f,r0
  68. mov.l @(r0,r1),r0
  69. jmp @r0
  70. ; 2 cycles worse than SFUNC_STATIC
  71. Copyright (C) 2006-2015 Free Software Foundation, Inc.
  72. Copying and distribution of this file, with or without modification,
  73. are permitted in any medium without royalty provided the copyright
  74. notice and this notice are preserved.