123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168 |
- It seems in some cases on R800 a djnz or jr instruction take one cycle longer
- than expected. I discovered this purely by accident: I was doing another timing
- test in which I varied the number of instructions inside a loop. For a specific
- amount of instructions I got an expected result. This happened when a djnz
- instruction was the very last instruction in a 256-byte memory page. In the
- following test I'll investigate this in some more detail.
- Test-program:
- org #c000
- di
- ld b,0
- ds 248 ; vary between 248-251
- out (#e6),a
- loop djnz loop
- in a,(#e6)
- ld l,a
- in a,(#e7)
- ld h,a
- ret
- By varying the value of the 'ds' directive between 248 and 251, the start
- address of the djnz instruction varies from 0xc0fd to 0xc100. For each position
- we measure how long the test takes using the E6-timer (this timer ticks once
- every 28 R800 clock cycles, there's also approx 18% overhead for refresh)
- djnz-start-address | real turborGT | openmsx-rev11821 | openmsx-fixed
- --------------------+---------------+------------------+--------------
- 0xC0FD | 31 - 33 | 31 - 32 | 31 - 32
- 0xC0FE | 41 - 42 | 31 - 32 (!!!) | 41 - 42
- 0xC0FF | 52 - 53 | 52 - 53 | 52 - 53
- 0xC100 | 31 - 32 | 31 - 32 | 31 - 32
- (there's some variation in the measured E6-ticks, we list the range of all
- measured values in this table)
- Translated to clock ticks per iteration:
- address | real | rev11821 | fixed
- --------+------+----------+------
- 0xC0FD | 3 | 3 | 3
- 0xC0FE | 4 | 3 | 4
- 0xC0FF | 5 | 5 | 5
- 0xC100 | 3 | 3 | 3
- There are 3 different cases in this test:
- 0xC0FD and 0xC100:
- djnz instruction takes 3 cycles, this is the normal case for a djnz
- instruction which does execute a jump. There are no page-breaks: all
- opcode bytes are fetched from the same 256-byte memory page.
- 0xC0FE:
- This is the unexpected case, for some yet unknown reason the instruction
- takes 1 cycle extra. Note that all opcode bytes are fetched from the
- same 256-byte memory page. See below for a possible explanation.
- 0xC0FF:
- Here both opcode bytes of the djnz instruction are located in a different
- memory page, so per iteration there are two extra cycles for two page break,
- thus 5 cycles in total.
- I did a similar test for the jr instruction:
- org #c000
- di
- ld b,0
- ds 242 ; vary between 239-242
- out (#e6),a
- test2a djnz test2b
- in a,(#e6)
- ld l,a
- in a,(#e7)
- ld h,a
- ret
- test2b jr test2a
- jr-start-address | real turborGT | openmsx-rev11821 | openmsx-fixed
- ------------------+---------------+------------------+--------------
- 0xC0FD | 63 - 64 | 62 - 63 | 62 - 63
- 0xC0FE | 73 - 74 | 62 - 64 (!!!) | 73 - 74
- 0xC0FF | 83 - 84 | 83 - 84 | 83 - 84
- 0xC100 | 83 - 84 | 83 - 85 | 83 - 84
- address | real | rev11821 | fixed
- --------+------+----------+------
- 0xC0FD | 6 | 6 | 6
- 0xC0FE | 7 | 6 | 7
- 0xC0FF | 8 | 8 | 8
- 0xC100 | 8 | 8 | 8
- Again 3 different cases in this test:
- 0xC0FD:
- Both jr and djnz instruction take 3 cycles (when they do jump) there are no
- page break or no extra penalty cycles.
- 0xC0FE:
- Now the jr instruction is the last in a memory page, this shows the same
- penalty cycle as a djnz instruction. Again there is no 'real' page break in
- this test.
- 0xC0FF and 0xC100:
- In both these cases there are two page breaks per iteration (but at different
- locations for the two cases). So 8 cycles per iteration in total.
- Possible explanation (this is just a guess but it does explain the
- measurements):
- This behaviour could be explained by some (very limited) pipeline behaviour in
- the R800: it seems that the decision to cause a page-break on the next
- instruction is already made during the execution of the current instruction,
- after the last opcode byte is fetched, but before the destination address of
- the jump is calculated. (Though when the jump does actually go to another page,
- this is also a reason for a page-break).
- It's possible all R800 instructions have this pipeline behaviour, but it's only
- visible in the djnz and the jr instructions. All other instructions either don't
- jump (the PC simply increments) and then it's not possible to externally detect
- when exactly the 'must-fetch-opcode-using-page-break-decision' is made. Or the
- instruction does jump but it already causes a page-break for another reason, so
- the possible extra page-break is always masked. The list of instructions that
- cause a 'jump' is:
- djnz, jr, jr-conditional
- These show the extra page-break behaviour
- jp, jp-conditional, jp (hl/ix/iy)
- There's _always_ a forced page-break after this instruction. (The exact
- reason why is not clear to me.)
- call, call-conditional, ret, ret-conditional, rst, accept-an-IRQ
- All these instructions (or events) push/pop on/from the stack. When
- switching between data read/write and opcode fetch there's always a
- page-break;
- otir, otdr, inir, indr, ldir, lddr, cpir, cpdr
- (When these are repeated PC is decremented by 2.) All these instructions
- do a data read or write and thus there already is a page-break when
- switching back to opcode fetching (even if opcode and data are in the
- same memory page).
- halt
- I'm not sure about this instruction: it's possible this instruction
- constantly decrements PC by one (after it's been increased). At least
- this is how Z80 does it. So it might be the case that a halt instruction
- as the last instruction in a memory page 'loops' a bit slower. Though
- this effect will be hard to notice. At the moment I don't know how to
- write a test to detect this. Suggestions are welcome.
- As you've already seen in the measurements above: I implemented a forced
- page-break in the djnz and jr instructions when they are located at address
- 0x..FE. After this fix the measurements in openMSX match the real hardware.
- Wouter
|