123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406 |
- I was investigating whether the 'R' register behaves the same on Z80 and R800.
- I initially wrote this program:
- di
- ld hl,#0100
- xor a
- ld b,a
- ld r,a
- loop: ld a,r ; (2)
- ld (hl),a ; (4)
- inc hl ; (1)
- djnz loop ; (3)
- ei
- ret
- On Z80, for every iteration of this loop, R is increased by 5 (+2 for 'ld a,r'
- and +1 for the other 3 instructions). On R800 R is also increased by 5 each
- iteration. Though sometimes it's increased by 6!
- The increase by 6 instead of 5 happened every 18th or 19th (alternating)
- Iteration. One iteration takes 10 cycles (taking R800-page-breaks into account),
- so that's on average every 185th clock cycle.
- This number 185 is very close to the number of useful cycles in a full R800
- refresh cycle. Currently (2010-08-16) R800-refresh in openmsx stalls the R800
- for 26 cycles every 210 cycles (so that leaves 210-26=184 useful cycles).
- So all this lead me to the hypotheses that a R800-refresh cycle also increases
- the R register by one.
- This would be very useful because it would allow to measure in more detail how
- refresh works exactly on R800 (the current model in openMSX (210/26) is not
- 100% correct). So that's what I'll do in the rest of this document.
- New test program. This records sequences longer than 256, it's important that
- each iteration takes the same amount of cycles. So I had to replace the djnz
- instruction.
- di
- ld hl,#0100
- xor a
- ld r,a
- loop: ld a,r ; (2)
- ld (hl),a ; (4)
- inc hl ; (1)
- ld a,h ; (1)
- cp #c0 ; (2)
- jr nz,loop ; (3)
- [ some code to transform memory block #0100-#C000 into
- the differential of this block, so it's easy to see by
- how much R increased each iteration ]
- Each iteration takes 13 cycles. Per iteration R is increased by either 7 or 8.
- The difference table (a small portion) looked like this:
- 07 07 07 07 07 07 07 07 07 07 07 07 07 08 07 07
- 07 07 07 07 07 07 07 07 07 07 07 08 07 07 07 07
- 07 07 07 07 07 07 07 07 07 08 07 07 07 07 07 07
- 07 07 07 07 07 07 07 08 07 07 07 07 07 07 07 07
- 07 07 07 07 07 07 08 07 07 07 07 07 07 07 07 07
- 07 07 07 07 08 07 07 07 07 07 07 07 07 07 07 07
- 07 07 08 07 07 07 07 07 07 07 07 07 07 07 07 07
- 08 07 07 07 07 07 07 07 07 07 07 07 07 07 07 08
- 07 07 07 07 07 07 07 07 07 07 07 07 07 07 08 07
- 07 07 07 07 07 07 07 07 07 07 07 07 08 07 07 07
- ...
- These tables always have only two different numbers, in this case a series of
- 7's followed by one 8, then again a series of 7's followed by one 8 and so one.
- The actual number '7' or '8' is not important for timing, only the pattern of
- these numbers in the table is important. So I'll create a more compact notation
- for the table above:
- 14 14 14 14 15 14 14 14 14 15 14 ... (these are decimal numbers)
- or even
- 4*14 15 ...
- This means there are 4 repetitions of '13*0x07 + 1* 0x08' (length 14) followed by
- one time '14*0x07 + 1*0x08' (length 15).
- So this notation indicates how many iterations there are before there is a
- refresh.
- For this test on average there are '(4 * 14 + 1 * 15) / 5 = 14.2' iterations
- between two refresh cycles. Each iteration takes 13 cycles. So that's 184.6
- useful clock cycles between refresh cycles.
- Next I realized this test program could be speed up one cycle:
- ld c,#c0
- loop: ld a,r ; (2)
- ld (hl),a ; (4)
- inc hl ; (1)
- ld a,h ; (1)
- cp c ; (1)
- jr nz,loop ; (3)
- An iteration now takes 12 cycles (R is still increased by 7 or 8).
- Difference table looks like this
- 16 3*15 16 2*15 16 2*15 (this sequence repeats all the time)
- That's (3*16+7*15)/10 = 15.3 iteration
- or 15.3*12 = 183.6 clock cycles between refresh.
- Then I started inserting extra instructions in this test program:
- ld c,#c0
- loop: ld a,r ; (2)
- ld (hl),a ; (4)
- [***]
- inc hl ; (1)
- ld a,h ; (1)
- cp c ; (1)
- djnz loop ; (3)
- * NOP
- 13 cycles per iteration (R increases 8 or 9)
- pattern: 8*14 15
- cycles between refresh: 183.44 (8*14+15)/9*13
- * 2 x NOP
- 14 cycles/iteration (R incr 9 or 10)
- pattern: 5*13 14
- cycles/refresh = 184.33
- I noticed the start of the difference table was a bit different, it went like
- this:
- 9*13 14 7*13 14 5*13 14 5*13 14 5*13 14
- So it took some time before it stabilized on the '5*13 14' pattern. For the
- rest of the tests I didn't look at the start of the table anymore. I only
- searched for the 'stable' pattern.
- I ran this same test again, but now the (stable) pattern was
- 9*13 14
- that gives
- cycles/refresh = 183.4
- This is very interesting behaviour, depending on some (yet unknown) initial
- conditions, _the_whole_test_ runs slightly faster or slower. This is
- interesting because it may give a clue to why there is considerable variation
- to *some* speed measurements on a real R800 (see doc/r800-test.txt) while for
- other tests the results are much more stable. (Also in the current R800 openMSX
- emulation, the speed measurements are always stable).
- I didn't repeat the previous tests. Maybe they also show different patterns
- on different runs. I did repeat all future tests (there always seems to be
- either only 1 or 2 different stable patterns)
- * NEG (like 2xNOP also takes 2 cycles)
- 14 cycles/iteration (R incr 9 or 10)
- pattern: 7*13 14
- cycles/refresh = 183.75
- * 3 x NOP
- 15 cycles/iteration (R incr 10 or 11)
- pattern: 2*12 13
- 3*12 13
- cycles/refresh = 185
- 183.75
- * IM 1 (3 cycles, like 3xNOP)
- 15 cycles/iteration (R incr 9 or 10)
- pattern: same as 3 x NOP
- * 4 x NOP
- 16 cycles/iteration (R incr 11 or 12)
- pattern: 11 12
- cycles/refresh = 184
- * LD (HL),A (4 cycles (with page-breaks))
- 16 cycles/iteration (R incr 8 or 9)
- pattern: same as 4 x NOP
- * 5 x NOP
- 17 cycles/iteration (R incr 12 or 13)
- pattern: 4*11 10
- 7*11 10 6*11 10
- cycles/refresh = 183.6
- 184.73
- * BIT 0,(HL) (5 cycles)
- 17 cycles/iteration (R incr 9 or 10)
- pattern: 5*11 10 4*11 10
- 6*11 10
- cycles/refresh = 183.90
- 184.57
- I still need to do more experiments, but some *guesses* so far: useful number
- of cycles always seems to be within (183, 185]. That's a variation of more than
- one clock cycle. Maybe refresh also waits for an even clock cycle (just like IO
- does). This could explain why there are two stable patterns for some tests (in
- one case you have to insert an extra cycle to align to an even number of
- cycles, in the other case you're already aligned). This could explain why there
- can be variation in speed between different runs of the same test. And finally
- it explains why the documentation (e.g. atoc or even the turbor datapack) talks
- about half clock cycles for the duration of the refresh (in 50% of the cases it
- needs to add one cycle). Those docs talk about 21.5 cycles refresh every 222
- cycles, those number seem wrong to me (don't match measurements on real HW),
- but at least I now have an idea about that half clock cycle.
- ---------
- I implemented the above guess (refresh waits for even clock cycle) in openMSX
- revision 11643. I can now more or less reproduce the results above: The
- number of useful clock cycles per refresh is also in range (183, 185], but the
- number per test is not the same as above (the 'stable patterns' from the tests
- above are different). Also in openMSX there's always only one stable pattern,
- while on a real R800, for some test, there clearly were two different possible
- patterns.
- New test: above we measured the number of useful clock cycles per 'refresh
- cycle'. Now we're going to measure how many cycles the refresh itself takes.
- The idea goes like this: by observing the R register we can detect that there
- was a refresh, so by doing a longer test we can count how many refresh cycles
- there were. We can combine this with measuring the total time of the test
- (using the E6-timer). We can also calculate how many 'useful' cycles there
- were in the whole test. So the difference between actual and useful cycles
- must be cycles spend on refresh. And finally if we divide that by the counted
- number of refreshes, we should know how many cycles a single refresh takes.
- Full test program:
- org #c000
- di
- ld hl,#0100
- ld c,#bc
- ld e,3
- l1 ld a,r ; This loop waits till there was a refresh.
- ld d,a ; It's an attempt to get more stable measurements.
- ld a,r ; Note that this loop will not terminate in Z80
- sub d ; mode.
- sub e
- jr z,l1
- out (#e6),a
- loop ld a,r ; (2) Actual test loop, same as in tests above
- ld (hl),a ; (4) One iteration takes 12 cycles
- inc hl ; (1)
- ;[***] ; extra instructions here (see below)
- ld a,h ; (1)
- cp c ; (1)
- jr nz,loop ; (3/2) 3 cycles when jump is taken, 2 otherwise
- in a,(#e6)
- ld l,a
- in a,(#e7)
- ld h,a
- ld (#be00),hl ; store total duration of the test
- ld hl,#0100 ; Next we do some post-processing on the data:
- l2 inc hl ; First calculate the difference table, same
- ld a,(hl) ; routine as in tests above (though not shown
- dec hl ; there).
- sub (hl)
- ld (hl),a
- inc hl
- ld a,h
- cp c
- jr nz,l2
- ld hl,#bc00
- ld de,#bc00+1
- ld bc,#200-1
- ld (hl),0
- ldir
- ld de,#100 ; Next we calculate a histogram.
- ld hl,#bc00 ; We expect this histogram to be all zero, except for
- l3 ld a,(de) ; two entries: the iterations with and without a
- ld l,a ; refresh cycle.
- inc (hl) ; There is a single other point in this histogram,
- jr nz,nc ; that's because we don't calculate the difference of
- inc h ; the very last iteration correctly (we'd have to
- inc (hl) ; sample the R register one more time outside the
- dec h ; loop). But for now we ignore this outlier.
- nc inc de
- ld a,d
- cp h
- jr nz, l3
- ei
- ret
- Test results:
- * no extra instructions:
- real machine:
- histogram: 0x07: 0xAECE
- 0x08: 0x0C31
- 0x66: 0x0001 -> belongs to 0x07, can be seen by the pattern
- in the difference table, I won't show this
- entry anymore in the next results
- E6-ticks: 0x5B75
- total cycles: E6-timer * 28 cycles/E6-tick = 655564 cycles
- useful cycles: 0xBB00 iterations * 12 cycles/iteration = 574464 cycles
- overhead cycles: 655564 - 574464 = 81100 cycles
- cycles per refresh = 81100 cycles / 0xc31 refreshes
- = 25.985 cycles/refresh
- openMSX revision 11648:
- histogram: 0x07: 0xAECD+1
- 0x08: 0x0C32
- E6-ticks: 0x5B78
- --> 26.004 cycles/refresh
- * extra instruction: NOP (1 cycles, increases R by 1)
- real machine:
- histogram: 0x8: 0xADCA
- 0x9: 0x0D36
- E6-ticks: 0x6318
- --> 26.011 cycles/refresh
- openMSX revision 11648:
- histogram: 0x8: 0xADC8
- 0x9: 0x0D38
- E6-ticks: 0x632A
- --> 26.144 cycles/refresh
- * extra instruction: IM 1 (3 cycles, incr R 2)
- real machine:
- histogram: 0x9: 0xABCA
- 0xA: 0x0F36
- E6-ticks: 0x721B
- --> 25.636 cycles/refresh
- openMSX revision 11648:
- histogram: 0x9: 0xABBC
- 0xA: 0x0F44
- E6-ticks: 0x727E
- --> 26.25 cycles/refresh
- !! Big difference between openmsx and real MSX !!
- * extra instructions: EXX ; MULUW HL,BC ; EXX (1+36+1 cycles, incr R 1+2+1)
- real machine:
- histogram: 0xB: 0x882D
- 0xC: 0x32D3
- E6-ticks: 0x17D33 (of course I couldn't measure how many times the
- timer overflowed, but it must be one time)
- --> 26.042 cycles/refresh
- openMSX revision 11648:
- histogram: 0xB: 0x8801
- 0xC: 0x32FF
- E6-ticks: 0x17E80
- --> 26.669 cycles/refresh
- !! Big difference between openmsx and real MSX !!
- Preliminary conclusion:
- Refresh seems to take about 26 cycles, this is also the value we currently
- use in openMSX. Though especially for the 'IM 1' case there's still a
- relatively big difference between openMSX and the real hardware. Needs
- more experiments.
- One other difference between Z80 and R800
- di
- xor a
- ld r,a
- ld a,r
- ld c,a ; Z80: c=2 R800: c=1
- ld a,r
- ld b,a ; Z80: b=5 R800: b=4
- ld a,r ; Z80: a=8 R800: a=7
- And of course once in a while the numbers for R800 are (partially) increased by
- one.
- This difference can possibly be explained by a difference in the order of
- storing the result to the 'R' register and increasing the 'R' register on each
- M1 cycle.
|