turbor-vdp-io.txt 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428
  1. Goal of the experiment
  2. ======================
  3. On a MSX2+ machine or a turboR machine in Z80 mode, there are extra wait cycles
  4. when accessing a VDP IO-port (0x98-0x9b) compared to other IO ports (these are
  5. introduced by the T9769{A,B,C} chip).
  6. In openMSX we also added those wait cycles in R800 mode. That's likely wrong.
  7. We verify this hypothesis in this experiment. Spoiler: in R800 mode there are
  8. indeed no extra wait cycles.
  9. Sub-goals
  10. =========
  11. I don't have my real MSX computers permanently connected anymore. When I do set
  12. them up, it's not too much extra effort to run a few more tests.
  13. Emulation of Z80 timing is pretty straight forward. R800 is more complex. We
  14. know about the following peculiarities:
  15. - Align to even cycle on IO instructions (because R800, which runs at 7MHz,
  16. needs to access the IO bus, which runs at 3.5MHz).
  17. - Doing two accesses to VDP IO-ports introduces a delay when those accesses are
  18. "too close together".
  19. - Refresh behavior. Every so many cycles (current estimate: 210 cycles @7MHz),
  20. normal R800 instruction execution gets interrupted for a refresh period
  21. (current estimate: this period lasts 25.5 cycles. Best guess for that half
  22. cycle: because 50% chance it needs to align to an even cycle).
  23. Design of the experiment
  24. ========================
  25. I (repeatedly) run test programs with the following structure:
  26. DI
  27. OUT (#20),A ; reset counter
  28. <body>
  29. IN A,(#20) ; latch counter, and read 32-bit value in [DE:HL]
  30. LD L,A
  31. IN A,(#21)
  32. LD H,A
  33. IN A,(#22)
  34. LD E,A
  35. IN A,(#23)
  36. LD D,A
  37. EI
  38. RET
  39. Where <body> varies from test-to-test.
  40. Alex Wulms gave me a FPGA test board that implements a 32-bit counter which
  41. runs at 3.5MHz and is accessed via IO-ports 0x20-0x23. In openMSX this device
  42. is emulated as the 'hirestimer' extension.
  43. - Writing (any) value to port 0x20 resets the counter to zero.
  44. - That counter increments at a frequency of 3.5MHz (because the MSX IO-bus runs
  45. at this frequency).
  46. - Reading IO port 0x20 latches the value of the counter and returns the 8 LSB
  47. bits.
  48. - Reading IO ports 0x21-0x23 retrieves the other bits of the latched value.
  49. So in Z80 mode (runs at 3.5MHz) this board allows to make cycle-accurate
  50. measurements. But the R800 runs at 7MHz, so we can only measure up-to 2 cycles
  51. accurate.
  52. The test code is located at the start of a 256-byte block (specifically at
  53. address 0x0100 in my case). This ensures that none of the instruction fetches
  54. take an extra cycle for the "page-break".
  55. I ran all tests on a turbor-GT machine, both in Z80 and R800 mode.
  56. (Not important) I ran all tests in compass, because that allows me to easily:
  57. - Edit the test program
  58. - Re-assemble (CTRL-A)
  59. - Execute (CTRL-G)
  60. - Look at the result: the value in registers DE:HL (F4)
  61. The actual tests
  62. ================
  63. In this section I'll present _processed_ test results. Because often the
  64. details are not important for our main goal and too detailed results would only
  65. add confusion. However these details could be important to (in the future)
  66. investigate the sub-goals. Therefor I still included the raw results in an
  67. appendix.
  68. 1) Establish a base-line
  69. ------------------------
  70. Replace <body> with a sequence of n NOP instructions.
  71. n | Z80 | R800
  72. ---+-----+-----
  73. 0 | 12 | 5
  74. 1 | 17 | 5
  75. 2 | 22 | 6
  76. 3 | 27 | 6
  77. 4 | 32 | 7
  78. 5 | 37 | 7
  79. The case n=0 (an empty <body>) is important. It measures the time between
  80. resetting the counter and immediately after reading it back. Thus
  81. OUT (#20),A ; reset counter
  82. IN A,(#20) ; read it back
  83. This means that for the actual duration of the <body>, we must subtract this
  84. n=0 result.
  85. On a Z80 both an 'OUT (n),A' and 'IN A,(n)' instruction take 12 cycles. So the
  86. value 12 in the above table makes sense. (Actually it also depends where
  87. exactly within the OUT instruction the counter gets reset and where exactly the
  88. counter gets latched, but apparently that happens at the same relative offsets
  89. in the OUT/IN instructions).
  90. On Z80, for the n>=1 cases we then get the, totally as expected, result that a
  91. NOP instruction takes 5 cycles.
  92. For R800 n=0 we measure 5 cycles at 3.5MHz, this could be either 9 or 10 cycles
  93. at 7MHz. In our current R800 emulation model an IN or OUT instruction takes 9
  94. cycles plus an optional cycle to 'align' the 7MHz R800-bus with the 3.5MHz
  95. IO-bus. So that would be 10 cycles @7MHz in this case which matches the
  96. measured value.
  97. The case R800 n=1 also measures 5, that's because the cost of the NOP
  98. instruction is "absorbed" by no longer having to wait for 1 cycle to align both
  99. buses. And also for n>=2, the measured value only goes up every 2nd increment
  100. of n.
  101. 2a) Measure a single OUT instruction, non-VDP port
  102. --------------------------------------------------
  103. In these group of tests the body of our test program is
  104. NOP, repeated m times
  105. OUT (#30),A
  106. NOP, repeated n times
  107. I arbitrarily picked IO port 0x30 because it has no function and because (I
  108. hoped) it introduces no extra wait cycles when accessing it. I put a variable
  109. amount (possibly zero) of NOP instruction both in front and after the OUT
  110. instruction.
  111. Here are the results:
  112. m | n | Z80 | R800
  113. ---+---+-----+------------
  114. 0 | 0 | 24 | 10 (sometimes 22)
  115. 0 | 1 | 29 | 10 (sometimes 23)
  116. 0 | 2 | 34 | 11 (sometimes 23)
  117. 0 | 3 | 39 | 11 (sometimes 24)
  118. 0 | 4 | 44 | 12 (sometimes 24)
  119. 0 | 5 | 49 | 12 (sometimes 25)
  120. --+---+-----+------------
  121. 1 | 0 | 29 | 10 (*)
  122. 1 | 1 | 34 | 10 (*)
  123. 1 | 2 | 39 | 11 (sometimes 23,24,25)
  124. --+---+-----+------------
  125. 2 | 0 | 34 | 11 (sometimes 23)
  126. 2 | 1 | 39 | 11 (sometimes 24)
  127. 2 | 2 | 44 | 12 (sometimes 24,25)
  128. (*) The lack of a 'sometimes' annotation does not mean it never occurs,
  129. instead it probably means I didn't measure long enough to observe it.
  130. Z80 is completely as expected. Here's the formula that predicts the result:
  131. 12 + 12 + 5*(m+n)
  132. The 'normal' R800 values (ignoring the 'sometimes' annotations) can also be
  133. explained. There are now two IO instructions that can 'absorb' the cost of a
  134. NOP. Thus again the measured value only goes up by 1 for every 2nd increment of
  135. 'm' or 'n' independently.
  136. Now for the 'sometimes' values. For example for R800 n=0 we mostly measured
  137. value 11, but from time time instead we measured the value 23. Very roughly
  138. speaking, the chance of measuring the alternative value went up with the
  139. duration of the test (so mostly visible in test 3b with high 'n' values).
  140. This phenomenon can be explained by the R800 refresh stuff. At regular
  141. intervals the R800 gets interrupted from normal instruction execution for
  142. refresh. If such an interruption falls in the test then we'll measure a higher
  143. duration. The value of this extra duration is 12 or 13 (glossing over the m=1,
  144. n=2 result). This corresponds with ~25 R800 cycles. (This matches the current
  145. emulation model).
  146. 2b) Measure a single OUT instruction, VDP port
  147. ----------------------------------------------
  148. This is the most important test in this experiment (from the point of view of
  149. the original goal). It's the same test as in 2a) but with port 0x30 replaced
  150. with 0x98.
  151. m | n | Z80 | R800
  152. ---|---+-----+------------
  153. 0 | 0 | 25 | 10 (sometimes 22)
  154. 0 | 1 | 30 | 10 (sometimes 23)
  155. 0 | 2 | 35 | 11 (sometimes 23)
  156. 0 | 3 | 40 | 11 (sometimes 24)
  157. 0 | 4 | 45 | 12 (sometimes 24)
  158. 0 | 5 | 50 | 12 (sometimes 25)
  159. --+---+-----+------------
  160. 1 | 0 | 30 | 10 (sometimes 22)
  161. 1 | 1 | 35 | 10 (sometimes 23)
  162. 1 | 2 | 40 | 11 (sometimes 23,25)
  163. --+---+-----+------------
  164. 2 | 0 | 35 | 11 (sometimes 23)
  165. 2 | 1 | 40 | 11 (*)
  166. 2 | 2 | 45 | 12 (*)
  167. Comparing 2a) with 2b) gives:
  168. - Identical results for R800.
  169. - For Z80 2b) is exactly 1 cycle higher than 2a).
  170. Thus we can confirm our hypothesis:
  171. - On a turboR in Z80 mode, accessing IO-port 0x98-9b introduces 1 wait cycle.
  172. - But doing the same in R800 mode does not.
  173. 3a) Two OUT instructions, VDP-port followed by non-VDP port
  174. -----------------------------------------------------------
  175. Ok, so in the previous test we've already accomplished our goal. But we know
  176. that there's something else going on in R800 (but not in Z80) mode, namely wait
  177. cycles between VDP accesses that are "too close together". While we're add it,
  178. let's also do some measurements on those.
  179. To get a new base-line, we first measure access to a VDP-port followed by
  180. access to a non-VDP port. More specifically, the new test body is:
  181. OUT (#98),A
  182. NOP, repeated n times
  183. OUT (#30),A
  184. Which results in:
  185. n | Z80 | R800
  186. ---+-----+------------------
  187. 0 | 37 | 15 (sometimes 27)
  188. 1 | 42 | 15 (sometimes 28)
  189. 2 | 47 | 16 (sometimes 28)
  190. 3b) Two VDP-OUT instructions
  191. ----------------------------
  192. Now the more interesting test, we make the test body:
  193. OUT (#98),A
  194. NOP, repeated n times
  195. OUT (#98),A
  196. And we get:
  197. n | Z80 | R800
  198. ---+-----+-------------------------
  199. 0 | 38 | 41 (sometimes 53)
  200. 1 | 43 | 41 (sometimes 53)
  201. 2 | 48 | 41 (sometimes 53)
  202. 5 | 63 | 41 (sometimes 53)
  203. 10 | 88 | 41 (sometimes 53)
  204. 20 | 138 | 41 (sometimes 53)
  205. 30 | 188 | 41 (sometimes 42,53)
  206. 40 | 238 | 41 (sometimes 47,48,53)
  207. 50 | 288 | 41 (sometimes 52,53)
  208. 51 | 293 | 41 (sometimes 53)
  209. 52 | 298 | 41 (sometimes 53,54)
  210. 53 | 303 | 41 (sometimes 54,55)
  211. 54 | 308 | 42 (sometimes 54,55)
  212. 55 | 313 | 42 (sometimes 55,56)
  213. 56 | 318 | 43 (sometimes 55,56)
  214. 57 | 323 | 43 (sometimes 56,57)
  215. 58 | 328 | 44 (sometimes 56,57)
  216. 59 | 333 | 44 (sometimes 56,57,58)
  217. 60 | 338 | 45 (sometimes 57,58)
  218. 65 | 363 | 47 (sometimes 59,60,61)
  219. 70 | 388 | 50 (sometimes 62,63)
  220. (*) for higher values of 'n' the qualification 'sometimes' becomes more and
  221. more frequent
  222. The trend is clear. For n<=53 the output remains constant. For n>53 the output
  223. is again linearly rising, and in fact following the same curve as for non-VDP
  224. ports.
  225. If you work out the details of how many wait cycles are added, you'll find that
  226. it matches pretty well with the current openMSX emulation model: openMSX
  227. requires at least 62 cycles @7MHz between two consecutive VDP IO access. (TODO
  228. see next chapter.
  229. TODO Comparison with openMSX
  230. ============================
  231. TODO repeat these experiments in openMSX and compare.
  232. [I plan to work on this soon, but need to take a break and want already to
  233. publish this.]
  234. Appendix: raw test measurements
  235. ===============================
  236. Notes:
  237. - Test results are written as hex values (all other values are decimal)
  238. - Comma separated values within one cell are repeated measurements of the same
  239. test.
  240. - In some cases I did not repeated a test, in other cases many times. This is
  241. based on when I 'guessed' the variation in the result was interesting. Or
  242. when I was expecting some variation, but didn't get it in the initial
  243. measurements, I kept measuring until I did see a different result.
  244. - Though that does mean there is some bias in the results. E.g. you cannot
  245. use these results to get an accurate estimate of the frequency at which the
  246. different values occur.
  247. 1) No (extra) OUT instructions
  248. <body> =
  249. NOP, repeated n times
  250. n | Z80 | R800
  251. --+-------+-----
  252. 0 | c | 5
  253. 1 | 11,11 | 5,5
  254. 2 | 16,16 | 6,6
  255. 3 | 1b,1b | 6,6
  256. 4 | 20,20 | 7,7
  257. 5 | 25,25 | 7,7
  258. 2a) Single OUT instruction, non-VDP port
  259. <body> =
  260. NOP, repeated m times
  261. OUT (#30),A
  262. NOP, repeated n times
  263. m | n | Z80 | R800
  264. --+---+-------+------------
  265. 0 | 0 | 18,18 | 16,a,a,a,16
  266. 0 | 1 | 1d | a,a,17,a,a
  267. 0 | 2 | 22 | b,b,b,b,b,17
  268. 0 | 3 | 27,27 | b,b,b,b,18
  269. 0 | 4 | 2c | 18,c,c,c
  270. 0 | 5 | 31,31 | c,19,19,c,c
  271. --+---+-------+------------
  272. 1 | 0 | 1d,1d | a,a,a,a,a,a,a ... (46 measurements)
  273. 1 | 1 | 22 | a,a,a,a,a,a,a ... (11 measurements)
  274. 1 | 2 | 27 | b,b,b,b,b,19,18,17,17
  275. --+---+-------+------------
  276. 2 | 0 | 22 | b,b,b,b,17
  277. 2 | 1 | 27 | b,b,b,b,b,b,b,18
  278. 2 | 2 | 2c | 19,c,c,c,18,c
  279. 2b) Single OUT instruction, VDP port
  280. <body> =
  281. NOP, repeated m times
  282. OUT (#98),A
  283. NOP, repeated n times
  284. m | n | Z80 | R800
  285. --|---+-------+------------
  286. 0 | 0 | 19,19 | a,a,a,a,16
  287. 0 | 1 | 1e | a,a,a,17
  288. 0 | 2 | 23 | b,b,b,17
  289. 0 | 3 | 28 | b,b,b,b,b,18
  290. 0 | 4 | 2d | c,18,c
  291. 0 | 5 | 32 | c,19,c,c
  292. --+---+-------+------------
  293. 1 | 0 | 1e | a,a,a,16,a,a
  294. 1 | 1 | 23 | a,17,a,a,17
  295. 1 | 2 | 28 | b,b,b,b,b,b,b,b,19,17,b,b
  296. --+---+-------+------------
  297. 2 | 0 | 23 | b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,17,17,b
  298. 2 | 1 | 28 | b,b,b,b,b,b,b
  299. 2 | 2 | 2d | c,c,c,c,c,c,c,c,c,c
  300. 3a) Two OUT instructions, VDP-port followed by non-VDP port
  301. <body> =
  302. OUT (#98),A
  303. NOP, repeated n times
  304. OUT (#30),A
  305. n | Z80 | R800
  306. --+-------+------------------
  307. 0 | 25,25 | f,1b,f,f,f,f
  308. 1 | 2a | 1c,f,f,f,f
  309. 2 | 2f | 1c,10,10,1c,10,10
  310. 3b) Two OUT instructions, VDP-port followed by another VDP port
  311. <body> =
  312. OUT (#98),A
  313. NOP, repeated n times
  314. OUT (#98),A
  315. n | Z80 | R800
  316. ---+--------+-------------------------
  317. 0 | 26,26 | 29,29,29,35,29,29,29
  318. 1 | 2b | 29,29,35,29,29,29,35
  319. 2 | 30 | 29,35,29,35,29,29,29
  320. 5 | 3f | 35,29,29,29,29
  321. 10 | 58 | 29,29,35,29,29,29,35
  322. 20 | 8a | 29,29,29,29,35
  323. 30 | bc,bc | 2a,29,35,29,29,29,29,29
  324. 40 | ee,ee | 29,29,30,35,29,29,2f,30,29,29,29,29,2f (30 not a typo!)
  325. 50 | 120,120| 35,29,34,29,29,29,29,34,29,29,29,25 (34 not a typo!)
  326. 51 | 125 | 29,35,29,29,29,29,29,35,35,29,29
  327. 52 | 12a | 36,29,35,29,29,29,36,29,36,29,36,29,36,35
  328. 53 | 12f,12f| 29,29,29,36,29,37,29,29,29,29,36,36,29,29
  329. 54 | 134 | 2a,2a,2a,2a,2a,2a,37,2a,2a,2a,36,2a,2a,2a,37,2a,2a,2a,36,2a,37
  330. 55 | 139 | 2a,38,36,38,2a,38,37,2a,37,2a,38,36,2a,37,2a,38
  331. 56 | 13e | 2b,2b,2b,2b,2b,38,37,2b,2b,37,2b,2b,38
  332. 57 | 143 | 39,37,37,38,2b,39,38,2b,2b,2b,38,2b,2b,2b,38
  333. 58 | 148 | 38,39,2c,38,2c,2c,2c,2c,39,38,38,2c,2c,2c,2c,39
  334. 59 | 14d | 39,2c,39,39,39,2c,38,3a,39,2c,2c,39,2c,2c,2c
  335. 60 | 152,152| 39,2d,39,2d,3a,2d,3a,39,2d,39,2d
  336. 65 | 16b | 3d,3b,3c,3b,2f,2f,3d,3c,3b,2f,2f,2f,3c,2f
  337. 70 | 184,184| 3e,3e,3f,32,3e,3e,32,32,3f,32,3f,32,3e,3f,32