r800-djnz.txt 6.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168
  1. It seems in some cases on R800 a djnz or jr instruction take one cycle longer
  2. than expected. I discovered this purely by accident: I was doing another timing
  3. test in which I varied the number of instructions inside a loop. For a specific
  4. amount of instructions I got an expected result. This happened when a djnz
  5. instruction was the very last instruction in a 256-byte memory page. In the
  6. following test I'll investigate this in some more detail.
  7. Test-program:
  8. org #c000
  9. di
  10. ld b,0
  11. ds 248 ; vary between 248-251
  12. out (#e6),a
  13. loop djnz loop
  14. in a,(#e6)
  15. ld l,a
  16. in a,(#e7)
  17. ld h,a
  18. ret
  19. By varying the value of the 'ds' directive between 248 and 251, the start
  20. address of the djnz instruction varies from 0xc0fd to 0xc100. For each position
  21. we measure how long the test takes using the E6-timer (this timer ticks once
  22. every 28 R800 clock cycles, there's also approx 18% overhead for refresh)
  23. djnz-start-address | real turborGT | openmsx-rev11821 | openmsx-fixed
  24. --------------------+---------------+------------------+--------------
  25. 0xC0FD | 31 - 33 | 31 - 32 | 31 - 32
  26. 0xC0FE | 41 - 42 | 31 - 32 (!!!) | 41 - 42
  27. 0xC0FF | 52 - 53 | 52 - 53 | 52 - 53
  28. 0xC100 | 31 - 32 | 31 - 32 | 31 - 32
  29. (there's some variation in the measured E6-ticks, we list the range of all
  30. measured values in this table)
  31. Translated to clock ticks per iteration:
  32. address | real | rev11821 | fixed
  33. --------+------+----------+------
  34. 0xC0FD | 3 | 3 | 3
  35. 0xC0FE | 4 | 3 | 4
  36. 0xC0FF | 5 | 5 | 5
  37. 0xC100 | 3 | 3 | 3
  38. There are 3 different cases in this test:
  39. 0xC0FD and 0xC100:
  40. djnz instruction takes 3 cycles, this is the normal case for a djnz
  41. instruction which does execute a jump. There are no page-breaks: all
  42. opcode bytes are fetched from the same 256-byte memory page.
  43. 0xC0FE:
  44. This is the unexpected case, for some yet unknown reason the instruction
  45. takes 1 cycle extra. Note that all opcode bytes are fetched from the
  46. same 256-byte memory page. See below for a possible explanation.
  47. 0xC0FF:
  48. Here both opcode bytes of the djnz instruction are located in a different
  49. memory page, so per iteration there are two extra cycles for two page break,
  50. thus 5 cycles in total.
  51. I did a similar test for the jr instruction:
  52. org #c000
  53. di
  54. ld b,0
  55. ds 242 ; vary between 239-242
  56. out (#e6),a
  57. test2a djnz test2b
  58. in a,(#e6)
  59. ld l,a
  60. in a,(#e7)
  61. ld h,a
  62. ret
  63. test2b jr test2a
  64. jr-start-address | real turborGT | openmsx-rev11821 | openmsx-fixed
  65. ------------------+---------------+------------------+--------------
  66. 0xC0FD | 63 - 64 | 62 - 63 | 62 - 63
  67. 0xC0FE | 73 - 74 | 62 - 64 (!!!) | 73 - 74
  68. 0xC0FF | 83 - 84 | 83 - 84 | 83 - 84
  69. 0xC100 | 83 - 84 | 83 - 85 | 83 - 84
  70. address | real | rev11821 | fixed
  71. --------+------+----------+------
  72. 0xC0FD | 6 | 6 | 6
  73. 0xC0FE | 7 | 6 | 7
  74. 0xC0FF | 8 | 8 | 8
  75. 0xC100 | 8 | 8 | 8
  76. Again 3 different cases in this test:
  77. 0xC0FD:
  78. Both jr and djnz instruction take 3 cycles (when they do jump) there are no
  79. page break or no extra penalty cycles.
  80. 0xC0FE:
  81. Now the jr instruction is the last in a memory page, this shows the same
  82. penalty cycle as a djnz instruction. Again there is no 'real' page break in
  83. this test.
  84. 0xC0FF and 0xC100:
  85. In both these cases there are two page breaks per iteration (but at different
  86. locations for the two cases). So 8 cycles per iteration in total.
  87. Possible explanation (this is just a guess but it does explain the
  88. measurements):
  89. This behaviour could be explained by some (very limited) pipeline behaviour in
  90. the R800: it seems that the decision to cause a page-break on the next
  91. instruction is already made during the execution of the current instruction,
  92. after the last opcode byte is fetched, but before the destination address of
  93. the jump is calculated. (Though when the jump does actually go to another page,
  94. this is also a reason for a page-break).
  95. It's possible all R800 instructions have this pipeline behaviour, but it's only
  96. visible in the djnz and the jr instructions. All other instructions either don't
  97. jump (the PC simply increments) and then it's not possible to externally detect
  98. when exactly the 'must-fetch-opcode-using-page-break-decision' is made. Or the
  99. instruction does jump but it already causes a page-break for another reason, so
  100. the possible extra page-break is always masked. The list of instructions that
  101. cause a 'jump' is:
  102. djnz, jr, jr-conditional
  103. These show the extra page-break behaviour
  104. jp, jp-conditional, jp (hl/ix/iy)
  105. There's _always_ a forced page-break after this instruction. (The exact
  106. reason why is not clear to me.)
  107. call, call-conditional, ret, ret-conditional, rst, accept-an-IRQ
  108. All these instructions (or events) push/pop on/from the stack. When
  109. switching between data read/write and opcode fetch there's always a
  110. page-break;
  111. otir, otdr, inir, indr, ldir, lddr, cpir, cpdr
  112. (When these are repeated PC is decremented by 2.) All these instructions
  113. do a data read or write and thus there already is a page-break when
  114. switching back to opcode fetching (even if opcode and data are in the
  115. same memory page).
  116. halt
  117. I'm not sure about this instruction: it's possible this instruction
  118. constantly decrements PC by one (after it's been increased). At least
  119. this is how Z80 does it. So it might be the case that a halt instruction
  120. as the last instruction in a memory page 'loops' a bit slower. Though
  121. this effect will be hard to notice. At the moment I don't know how to
  122. write a test to detect this. Suggestions are welcome.
  123. As you've already seen in the measurements above: I implemented a forced
  124. page-break in the djnz and jr instructions when they are located at address
  125. 0x..FE. After this fix the measurements in openMSX match the real hardware.
  126. Wouter