123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428 |
- Goal of the experiment
- ======================
- On a MSX2+ machine or a turboR machine in Z80 mode, there are extra wait cycles
- when accessing a VDP IO-port (0x98-0x9b) compared to other IO ports (these are
- introduced by the T9769{A,B,C} chip).
- In openMSX we also added those wait cycles in R800 mode. That's likely wrong.
- We verify this hypothesis in this experiment. Spoiler: in R800 mode there are
- indeed no extra wait cycles.
- Sub-goals
- =========
- I don't have my real MSX computers permanently connected anymore. When I do set
- them up, it's not too much extra effort to run a few more tests.
- Emulation of Z80 timing is pretty straight forward. R800 is more complex. We
- know about the following peculiarities:
- - Align to even cycle on IO instructions (because R800, which runs at 7MHz,
- needs to access the IO bus, which runs at 3.5MHz).
- - Doing two accesses to VDP IO-ports introduces a delay when those accesses are
- "too close together".
- - Refresh behavior. Every so many cycles (current estimate: 210 cycles @7MHz),
- normal R800 instruction execution gets interrupted for a refresh period
- (current estimate: this period lasts 25.5 cycles. Best guess for that half
- cycle: because 50% chance it needs to align to an even cycle).
- Design of the experiment
- ========================
- I (repeatedly) run test programs with the following structure:
- DI
- OUT (#20),A ; reset counter
- <body>
- IN A,(#20) ; latch counter, and read 32-bit value in [DE:HL]
- LD L,A
- IN A,(#21)
- LD H,A
- IN A,(#22)
- LD E,A
- IN A,(#23)
- LD D,A
- EI
- RET
- Where <body> varies from test-to-test.
- Alex Wulms gave me a FPGA test board that implements a 32-bit counter which
- runs at 3.5MHz and is accessed via IO-ports 0x20-0x23. In openMSX this device
- is emulated as the 'hirestimer' extension.
- - Writing (any) value to port 0x20 resets the counter to zero.
- - That counter increments at a frequency of 3.5MHz (because the MSX IO-bus runs
- at this frequency).
- - Reading IO port 0x20 latches the value of the counter and returns the 8 LSB
- bits.
- - Reading IO ports 0x21-0x23 retrieves the other bits of the latched value.
- So in Z80 mode (runs at 3.5MHz) this board allows to make cycle-accurate
- measurements. But the R800 runs at 7MHz, so we can only measure up-to 2 cycles
- accurate.
- The test code is located at the start of a 256-byte block (specifically at
- address 0x0100 in my case). This ensures that none of the instruction fetches
- take an extra cycle for the "page-break".
- I ran all tests on a turbor-GT machine, both in Z80 and R800 mode.
- (Not important) I ran all tests in compass, because that allows me to easily:
- - Edit the test program
- - Re-assemble (CTRL-A)
- - Execute (CTRL-G)
- - Look at the result: the value in registers DE:HL (F4)
- The actual tests
- ================
- In this section I'll present _processed_ test results. Because often the
- details are not important for our main goal and too detailed results would only
- add confusion. However these details could be important to (in the future)
- investigate the sub-goals. Therefor I still included the raw results in an
- appendix.
- 1) Establish a base-line
- ------------------------
- Replace <body> with a sequence of n NOP instructions.
- n | Z80 | R800
- ---+-----+-----
- 0 | 12 | 5
- 1 | 17 | 5
- 2 | 22 | 6
- 3 | 27 | 6
- 4 | 32 | 7
- 5 | 37 | 7
- The case n=0 (an empty <body>) is important. It measures the time between
- resetting the counter and immediately after reading it back. Thus
- OUT (#20),A ; reset counter
- IN A,(#20) ; read it back
- This means that for the actual duration of the <body>, we must subtract this
- n=0 result.
- On a Z80 both an 'OUT (n),A' and 'IN A,(n)' instruction take 12 cycles. So the
- value 12 in the above table makes sense. (Actually it also depends where
- exactly within the OUT instruction the counter gets reset and where exactly the
- counter gets latched, but apparently that happens at the same relative offsets
- in the OUT/IN instructions).
- On Z80, for the n>=1 cases we then get the, totally as expected, result that a
- NOP instruction takes 5 cycles.
- For R800 n=0 we measure 5 cycles at 3.5MHz, this could be either 9 or 10 cycles
- at 7MHz. In our current R800 emulation model an IN or OUT instruction takes 9
- cycles plus an optional cycle to 'align' the 7MHz R800-bus with the 3.5MHz
- IO-bus. So that would be 10 cycles @7MHz in this case which matches the
- measured value.
- The case R800 n=1 also measures 5, that's because the cost of the NOP
- instruction is "absorbed" by no longer having to wait for 1 cycle to align both
- buses. And also for n>=2, the measured value only goes up every 2nd increment
- of n.
- 2a) Measure a single OUT instruction, non-VDP port
- --------------------------------------------------
- In these group of tests the body of our test program is
- NOP, repeated m times
- OUT (#30),A
- NOP, repeated n times
- I arbitrarily picked IO port 0x30 because it has no function and because (I
- hoped) it introduces no extra wait cycles when accessing it. I put a variable
- amount (possibly zero) of NOP instruction both in front and after the OUT
- instruction.
- Here are the results:
- m | n | Z80 | R800
- ---+---+-----+------------
- 0 | 0 | 24 | 10 (sometimes 22)
- 0 | 1 | 29 | 10 (sometimes 23)
- 0 | 2 | 34 | 11 (sometimes 23)
- 0 | 3 | 39 | 11 (sometimes 24)
- 0 | 4 | 44 | 12 (sometimes 24)
- 0 | 5 | 49 | 12 (sometimes 25)
- --+---+-----+------------
- 1 | 0 | 29 | 10 (*)
- 1 | 1 | 34 | 10 (*)
- 1 | 2 | 39 | 11 (sometimes 23,24,25)
- --+---+-----+------------
- 2 | 0 | 34 | 11 (sometimes 23)
- 2 | 1 | 39 | 11 (sometimes 24)
- 2 | 2 | 44 | 12 (sometimes 24,25)
- (*) The lack of a 'sometimes' annotation does not mean it never occurs,
- instead it probably means I didn't measure long enough to observe it.
- Z80 is completely as expected. Here's the formula that predicts the result:
- 12 + 12 + 5*(m+n)
- The 'normal' R800 values (ignoring the 'sometimes' annotations) can also be
- explained. There are now two IO instructions that can 'absorb' the cost of a
- NOP. Thus again the measured value only goes up by 1 for every 2nd increment of
- 'm' or 'n' independently.
- Now for the 'sometimes' values. For example for R800 n=0 we mostly measured
- value 11, but from time time instead we measured the value 23. Very roughly
- speaking, the chance of measuring the alternative value went up with the
- duration of the test (so mostly visible in test 3b with high 'n' values).
- This phenomenon can be explained by the R800 refresh stuff. At regular
- intervals the R800 gets interrupted from normal instruction execution for
- refresh. If such an interruption falls in the test then we'll measure a higher
- duration. The value of this extra duration is 12 or 13 (glossing over the m=1,
- n=2 result). This corresponds with ~25 R800 cycles. (This matches the current
- emulation model).
- 2b) Measure a single OUT instruction, VDP port
- ----------------------------------------------
- This is the most important test in this experiment (from the point of view of
- the original goal). It's the same test as in 2a) but with port 0x30 replaced
- with 0x98.
- m | n | Z80 | R800
- ---|---+-----+------------
- 0 | 0 | 25 | 10 (sometimes 22)
- 0 | 1 | 30 | 10 (sometimes 23)
- 0 | 2 | 35 | 11 (sometimes 23)
- 0 | 3 | 40 | 11 (sometimes 24)
- 0 | 4 | 45 | 12 (sometimes 24)
- 0 | 5 | 50 | 12 (sometimes 25)
- --+---+-----+------------
- 1 | 0 | 30 | 10 (sometimes 22)
- 1 | 1 | 35 | 10 (sometimes 23)
- 1 | 2 | 40 | 11 (sometimes 23,25)
- --+---+-----+------------
- 2 | 0 | 35 | 11 (sometimes 23)
- 2 | 1 | 40 | 11 (*)
- 2 | 2 | 45 | 12 (*)
- Comparing 2a) with 2b) gives:
- - Identical results for R800.
- - For Z80 2b) is exactly 1 cycle higher than 2a).
- Thus we can confirm our hypothesis:
- - On a turboR in Z80 mode, accessing IO-port 0x98-9b introduces 1 wait cycle.
- - But doing the same in R800 mode does not.
- 3a) Two OUT instructions, VDP-port followed by non-VDP port
- -----------------------------------------------------------
- Ok, so in the previous test we've already accomplished our goal. But we know
- that there's something else going on in R800 (but not in Z80) mode, namely wait
- cycles between VDP accesses that are "too close together". While we're add it,
- let's also do some measurements on those.
- To get a new base-line, we first measure access to a VDP-port followed by
- access to a non-VDP port. More specifically, the new test body is:
- OUT (#98),A
- NOP, repeated n times
- OUT (#30),A
- Which results in:
- n | Z80 | R800
- ---+-----+------------------
- 0 | 37 | 15 (sometimes 27)
- 1 | 42 | 15 (sometimes 28)
- 2 | 47 | 16 (sometimes 28)
- 3b) Two VDP-OUT instructions
- ----------------------------
- Now the more interesting test, we make the test body:
- OUT (#98),A
- NOP, repeated n times
- OUT (#98),A
- And we get:
- n | Z80 | R800
- ---+-----+-------------------------
- 0 | 38 | 41 (sometimes 53)
- 1 | 43 | 41 (sometimes 53)
- 2 | 48 | 41 (sometimes 53)
- 5 | 63 | 41 (sometimes 53)
- 10 | 88 | 41 (sometimes 53)
- 20 | 138 | 41 (sometimes 53)
- 30 | 188 | 41 (sometimes 42,53)
- 40 | 238 | 41 (sometimes 47,48,53)
- 50 | 288 | 41 (sometimes 52,53)
- 51 | 293 | 41 (sometimes 53)
- 52 | 298 | 41 (sometimes 53,54)
- 53 | 303 | 41 (sometimes 54,55)
- 54 | 308 | 42 (sometimes 54,55)
- 55 | 313 | 42 (sometimes 55,56)
- 56 | 318 | 43 (sometimes 55,56)
- 57 | 323 | 43 (sometimes 56,57)
- 58 | 328 | 44 (sometimes 56,57)
- 59 | 333 | 44 (sometimes 56,57,58)
- 60 | 338 | 45 (sometimes 57,58)
- 65 | 363 | 47 (sometimes 59,60,61)
- 70 | 388 | 50 (sometimes 62,63)
- (*) for higher values of 'n' the qualification 'sometimes' becomes more and
- more frequent
- The trend is clear. For n<=53 the output remains constant. For n>53 the output
- is again linearly rising, and in fact following the same curve as for non-VDP
- ports.
- If you work out the details of how many wait cycles are added, you'll find that
- it matches pretty well with the current openMSX emulation model: openMSX
- requires at least 62 cycles @7MHz between two consecutive VDP IO access. (TODO
- see next chapter.
- TODO Comparison with openMSX
- ============================
- TODO repeat these experiments in openMSX and compare.
- [I plan to work on this soon, but need to take a break and want already to
- publish this.]
- Appendix: raw test measurements
- ===============================
- Notes:
- - Test results are written as hex values (all other values are decimal)
- - Comma separated values within one cell are repeated measurements of the same
- test.
- - In some cases I did not repeated a test, in other cases many times. This is
- based on when I 'guessed' the variation in the result was interesting. Or
- when I was expecting some variation, but didn't get it in the initial
- measurements, I kept measuring until I did see a different result.
- - Though that does mean there is some bias in the results. E.g. you cannot
- use these results to get an accurate estimate of the frequency at which the
- different values occur.
- 1) No (extra) OUT instructions
- <body> =
- NOP, repeated n times
- n | Z80 | R800
- --+-------+-----
- 0 | c | 5
- 1 | 11,11 | 5,5
- 2 | 16,16 | 6,6
- 3 | 1b,1b | 6,6
- 4 | 20,20 | 7,7
- 5 | 25,25 | 7,7
- 2a) Single OUT instruction, non-VDP port
- <body> =
- NOP, repeated m times
- OUT (#30),A
- NOP, repeated n times
- m | n | Z80 | R800
- --+---+-------+------------
- 0 | 0 | 18,18 | 16,a,a,a,16
- 0 | 1 | 1d | a,a,17,a,a
- 0 | 2 | 22 | b,b,b,b,b,17
- 0 | 3 | 27,27 | b,b,b,b,18
- 0 | 4 | 2c | 18,c,c,c
- 0 | 5 | 31,31 | c,19,19,c,c
- --+---+-------+------------
- 1 | 0 | 1d,1d | a,a,a,a,a,a,a ... (46 measurements)
- 1 | 1 | 22 | a,a,a,a,a,a,a ... (11 measurements)
- 1 | 2 | 27 | b,b,b,b,b,19,18,17,17
- --+---+-------+------------
- 2 | 0 | 22 | b,b,b,b,17
- 2 | 1 | 27 | b,b,b,b,b,b,b,18
- 2 | 2 | 2c | 19,c,c,c,18,c
- 2b) Single OUT instruction, VDP port
- <body> =
- NOP, repeated m times
- OUT (#98),A
- NOP, repeated n times
- m | n | Z80 | R800
- --|---+-------+------------
- 0 | 0 | 19,19 | a,a,a,a,16
- 0 | 1 | 1e | a,a,a,17
- 0 | 2 | 23 | b,b,b,17
- 0 | 3 | 28 | b,b,b,b,b,18
- 0 | 4 | 2d | c,18,c
- 0 | 5 | 32 | c,19,c,c
- --+---+-------+------------
- 1 | 0 | 1e | a,a,a,16,a,a
- 1 | 1 | 23 | a,17,a,a,17
- 1 | 2 | 28 | b,b,b,b,b,b,b,b,19,17,b,b
- --+---+-------+------------
- 2 | 0 | 23 | b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,17,17,b
- 2 | 1 | 28 | b,b,b,b,b,b,b
- 2 | 2 | 2d | c,c,c,c,c,c,c,c,c,c
- 3a) Two OUT instructions, VDP-port followed by non-VDP port
- <body> =
- OUT (#98),A
- NOP, repeated n times
- OUT (#30),A
- n | Z80 | R800
- --+-------+------------------
- 0 | 25,25 | f,1b,f,f,f,f
- 1 | 2a | 1c,f,f,f,f
- 2 | 2f | 1c,10,10,1c,10,10
- 3b) Two OUT instructions, VDP-port followed by another VDP port
- <body> =
- OUT (#98),A
- NOP, repeated n times
- OUT (#98),A
- n | Z80 | R800
- ---+--------+-------------------------
- 0 | 26,26 | 29,29,29,35,29,29,29
- 1 | 2b | 29,29,35,29,29,29,35
- 2 | 30 | 29,35,29,35,29,29,29
- 5 | 3f | 35,29,29,29,29
- 10 | 58 | 29,29,35,29,29,29,35
- 20 | 8a | 29,29,29,29,35
- 30 | bc,bc | 2a,29,35,29,29,29,29,29
- 40 | ee,ee | 29,29,30,35,29,29,2f,30,29,29,29,29,2f (30 not a typo!)
- 50 | 120,120| 35,29,34,29,29,29,29,34,29,29,29,25 (34 not a typo!)
- 51 | 125 | 29,35,29,29,29,29,29,35,35,29,29
- 52 | 12a | 36,29,35,29,29,29,36,29,36,29,36,29,36,35
- 53 | 12f,12f| 29,29,29,36,29,37,29,29,29,29,36,36,29,29
- 54 | 134 | 2a,2a,2a,2a,2a,2a,37,2a,2a,2a,36,2a,2a,2a,37,2a,2a,2a,36,2a,37
- 55 | 139 | 2a,38,36,38,2a,38,37,2a,37,2a,38,36,2a,37,2a,38
- 56 | 13e | 2b,2b,2b,2b,2b,38,37,2b,2b,37,2b,2b,38
- 57 | 143 | 39,37,37,38,2b,39,38,2b,2b,2b,38,2b,2b,2b,38
- 58 | 148 | 38,39,2c,38,2c,2c,2c,2c,39,38,38,2c,2c,2c,2c,39
- 59 | 14d | 39,2c,39,39,39,2c,38,3a,39,2c,2c,39,2c,2c,2c
- 60 | 152,152| 39,2d,39,2d,3a,2d,3a,39,2d,39,2d
- 65 | 16b | 3d,3b,3c,3b,2f,2f,3d,3c,3b,2f,2f,2f,3c,2f
- 70 | 184,184| 3e,3e,3f,32,3e,3e,32,32,3f,32,3f,32,3e,3f,32
|