vdp-timing-2.html 32 KB


  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  3. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  4. <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  5. <head>
  6. <link title="Purple" rel="stylesheet" href="manual-purple.css" type="text/css" />
  7. <title>V9938 VRAM timings, part II</title>
  8. </head>
  9. <body>
  10. <h1>V9938 VRAM timings, part II</h1>
  11. Measurements done by: Joost Yervante Damad, Alex Wulms, Wouter Vermaelen<br/>
  12. Analysis done by: Wouter Vermaelen<br/>
  13. Text written by: Wouter Vermaelen<br/>
  14. with help from the rest of the openMSX team.
  15. <h2>Follow-up</h2>
  16. <h5>V9938</h5>
  17. <p>This text is a follow-up on the earlier <a href="vdp-timing.html">V9938 VRAM
  18. timings</a> document. If you haven't done so already it's probably a very good
  19. idea to (re-)read that document first, because this text assumes you remember
  20. all the earlier details ;-)</p>
  21. <p>Early 2013 the openMSX team made some logic-analyzer-measurements of the
  22. actual communication between a V9938 and the connected VRAM. The goal of those
  23. measurements was to improve the emulation of the VDP command engine in openMSX.
  24. That goal was fully achieved: to the best of our knowledge the timing of the
  25. VDP command engine is now fully accurate: e.g. recent versions of openMSX now
  26. generate identical pictures as real MSX machines for the LINE-speed test
  27. picture shown in the <a href="vdp-timing.html#motivation">motivation</a>
  28. section of the previous document.</p>
  29. <p>A nice side effect of these measurements was that, next to the command
  30. engine speed, we also obtained information about what happens when the CPU
  31. (Z80) reads or writes the VRAM too fast, and what exactly is "too fast". This
  32. behaviour is now also implemented in openMSX, though only (yet) for the V9938
  33. bitmap screen modes (the information in this text should allow us to also
  34. implement the timing for the other screen modes).</p>
  35. <p>Because the focus was the command engine and because a V9938 can only
  36. execute commands in bitmap screen modes (screen 5-8), our measurements were
  37. mostly focused on those screen modes. And especially the analysis of the
  38. results was initially focused on the bitmap screen modes. In this text we will
  39. now look at the other screen modes, even though there is less measurement data
  40. available for those modes.</p>
  41. <h5>TMS9918</h5>
  42. <p>The second half of this text looks at the TMS9918 VDP. We only made
  43. measurements on a V9938 and as we'll see below the results cannot be
  44. extrapolated to TMS9918. Luckily there are other sources of information
  45. available that allow to piece together similar (though less detailed) timing
  46. diagrams for the TMS9918 as for the V9938.</p>
  47. <h5>results</h5>
  48. <p>Similar like in the previous document, we'll start by presenting all the
  49. results in one big diagram. I strongly recommend to open this image in an image
  50. viewer that easily allows to scroll and zoom-in and -out (so maybe not a
  51. web-browser). It may also be useful to have this image open while reading the
  52. later sections in this text.</p>
  53. <img src="vdp-timing-v2.png" width="1200"/>
  54. <p>At the top of this diagram the results of the previous article are repeated,
  55. these are the V9938 bitmap modes. In the middle are the V9938 character- and
  56. text-modes. And at the bottom you see the TMS9918 results. They're all included
  57. in one big diagram to allow to more easily compare them.</p>
  58. <p>Horizontally you see the detailed timing of one display line. On V9938 one
  59. line takes 1368 cycles, on TMS9918 it takes 342 cycles. The line is divided in
  60. different phases (indicated by the different background colors) corresponding
  61. to the left- and right-border, display cycle, etc. Notice that the text modes
  62. have different border widths than the other modes. Also notice that the lengths
  63. of these periods are not exactly the same between the two VDP types.</p>
  64. <p>Because these horizontal phases don't fully correspond between the two VDP
  65. types, I had to make a choice in how to align the results of both VDPs. I
  66. choose to align the display cycle (the period where the actual pixels are
  67. shown). Though this means that 'cycle 0' is not located in exactly the same
  68. horizontal position for both VDPs. But this doesn't matter: as explained in the
  69. previous article, 'cycle 0' is anyway an arbitrary choice. Just keep it in mind
  70. when comparing the results.</p>
  71. <p>For the V9938 the diagram shows the RAS and CAS0/1 signals, for TMS9918 that
  72. information is missing. But that's OK: on V9938 there are burst and non-burst
  73. accesses, there are idle cycles, dummy accesses, etc. So the RAS/CAS signals do
  74. provide extra information. As we'll see in the 2nd half of this text, on
  75. TMS9918 the communication with the VRAM is much simpler, so the RAS/CAS signals
  76. anyway don't contribute as much extra information.</p>
  77. <h2>V9938 (MSX2 VDP)</h2>
  78. <h3>Character modes</h3>
  79. <p>In this section we'll look at the 3 character-based display modes: Graphics
  80. Mode 1, Graphics Mode 2 and Multi-color Mode (aka screen 1, 2 and 3). As we'll
  81. see, from a VRAM access point of view, these 3 modes are closely related.</p>
  82. <h4>Graphics mode 2 (aka screen 2)</h4>
  83. <h5>refresh</h5>
  84. <p>The VRAM needs to be regularly refreshed. The V9938 does this by regular
  85. reads (from all banks of the DRAM chips). Those reads are located at the
  86. following moments in time (VDP cycle counts within one display line). Remember
  87. that one display line has 1368 VDP cycles and that cycle 0 is semi-arbitrarily
  88. chosen within the line (so only relative numbers matter).</p>
  89. <table>
  90. <tr><td>284</td><td>412</td><td>540</td><td>668</td>
  91. <td>796</td><td>924</td><td>1052</td><td>1180</td></tr>
  92. </table>
  93. <p>Note that these are the exact same locations as for the bitmap screen
  94. modes.</p>
  95. <h5>character rendering</h5>
  96. <p>To be able to render each character we need to:</p>
  97. <ul>
  98. <li>Read 1 byte from the name-table &rarr;
  99. Which character should be displayed?</li>
  100. <li>Read 1 byte from the pattern-table &rarr;
  101. How does (this line of) that character look?</li>
  102. <li>Read 1 byte from the color-table &rarr;
  103. What are the colors of (this line of) that character?</li>
  104. </ul>
  105. <p>There are 32 characters on a line, so these 3 reads also repeat 32 times.
  106. The name-, pattern- and color-table reads are located at these respective
  107. moments in time:</p>
  108. <table>
  109. <tr><td>214 + 32n + 0</td><td>214 + 32n + 18</td><td>214 + 32n + 24</td><td>with 0 &le; n &lt; 32</td></tr>
  110. </table>
  111. <p>Notice there's some room between the name-table read and the
  112. pattern-/color-table reads. <i>The address required to read from the latter two
  113. tables depends on the result obtained from the first read. So possibly there's
  114. extra room at this location to give the VDP more time to calculate those
  115. addresses.</i> In fact there fit exactly 2&times;6 cycles between the name- and
  116. pattern-table read. This leaves room for 2 other VRAM accesses:</p>
  117. <ul>
  118. <li>The first of those is used for refresh (once every 4 characters) or as a
  119. potential CPU/cmd access slot.</li>
  120. <li>The second access is used to read 32 sprite y-coordinates. These are
  121. required to figure out which of the 32 possible sprites are visible on
  122. the next display line.</li>
  123. </ul>
  124. <h5>sprite rendering</h5>
  125. <p>Above we saw there are 32 reads to figure out which sprites are visible. In
  126. this section we'll see what additional VRAM reads are performed to actually
  127. render the visible sprites.</p>
  128. <p>In sprite mode 1 there are maximally 4 sprites visible on a line. For each
  129. of those we need to read the sprite attributes (this is the x- and
  130. y-coordinate, the sprite pattern number and the sprite color). And we also need
  131. to read 1 or 2 bytes from the sprite-pattern-table (for 8&times;8 or
  132. 16&times;16 sprites). Similar to sprite mode 2, the y-coordinate is re-read
  133. (the sprite-visibility pass already read it). The VDP always reads 2 sprite
  134. pattern bytes (for 8&times;8 sprites that 2nd byte is ignored). And even if
  135. fewer than 4 sprites are actually visible, the VDP always performs VRAM reads
  136. for all 4 sprites (and ignores the results of the redundant reads).</p>
  137. <ul>
  138. <li>The 4 sprite attributes are read (using a burst read) at times:
  139. <table>
  140. <tr><td>x+0</td><td>x+4</td><td>x+8</td><td>x+12</td><td>with x = 1242, 1306, 6, 70</td></tr>
  141. </table>
  142. </li>
  143. <li>The 2 sprite pattern bytes are read (also using a burst read) at times:
  144. <table>
  145. <tr><td>x+0</td><td>x+4</td><td>with x = 1274, 1342, 38, 102</td></tr>
  146. </table>
  147. </li>
  148. </ul>
  149. <p>Next there are a bunch of dummy reads:</p>
  150. <ul>
  151. <li>The burst that reads the 4 attribute bytes is extended with 2 dummy reads
  152. (often but not always these seem to read the x,y-coordinates of the 31st
  153. sprite).</li>
  154. <li>For every (burst) read from the sprite-pattern-table, 16 cycles later,
  155. there's a second dummy (burst) read of 2 bytes from the sprite pattern
  156. table.</li>
  157. <li>Right after every read from the sprite-pattern-table (4&times;2 times),
  158. there's a 1 byte read from closely before the start of the
  159. sprite-attribute-table (where in sprite mode 2 the sprite-color-table is
  160. located).</li>
  161. </ul>
  162. <p>(As you may have guessed already) all these dummy reads can be explained by
  163. looking at how the V9938 implements sprite mode 2:</p>
  164. <ul>
  165. <li>Per group of 2 sprites (4 groups), sprite mode 2 reads 6 bytes from the
  166. sprite attribute table (using a single(*) burst read). Sprite mode 1 reads
  167. 4&times;(4+2) bytes (in a single burst). In both cases that's 24 bytes
  168. total.</li>
  169. <li>In total, sprite mode 2 reads 8&times;2 bytes from the
  170. sprite-pattern-table. Sprite mode 1 reads (4+4)&times;2 bytes.</li>
  171. <li>In total, sprite mode 2 reads 8 bytes from the sprite-color-table. Sprite
  172. mode 1 does 8 dummy reads.</li>
  173. </ul>
  174. <p>(*)<i>Actually, this is not true. But I believe this is how it was
  175. intended: the 6 bytes are located in the same DRAM bank. In reality the
  176. V9938 performs 2 burst reads of 3 bytes each. But because there are only
  177. 28 cycles available it has to use a, strictly speaking, invalid DRAM
  178. timing. See previous article for more details.</i></p>
  179. <p>So the timing of all sprite mode 2 reads is (almost) identical to all the
  180. sprite mode 1 normal + dummy reads, only slightly shifted in time.</p>
  181. <p><i>I guess all these dummy reads in sprite mode 1 are the result of
  182. chip-area optimizations done by Yamaha engineers. In the current design the
  183. same state machine can generate the CAS/RAS signals for both sprite modes. And
  184. similarly a large part of the address generation logic for sprite mode 1 and 2
  185. can be shared.</i> But unfortunately it does mean the VRAM bandwidth used for
  186. those dummy reads is not available for CPU VRAM access or (on V9958) for
  187. command execution.</p>
  188. <h5>dummy reads</h5>
  189. <p>Next to all the dummy reads related to sprite rendering, there are also
  190. dummy reads at these moments in time:</p>
  191. <table>
  192. <tr><td>182</td><td>200</td><td>206</td><td>1218</td></tr>
  193. </table>
  194. <p>These are the locations where you would expect respectively reads from the
  195. name-, pattern-, color-, and sprite-attribute-table if you'd extend a display
  196. line to 33 instead of 32 characters. This is similar to how in the bitmap
  197. screen modes there is a dummy preamble for the bitmap data and a dummy
  198. postamble for the sprite y-coordinate. All 4 dummy reads read address
  199. 0x1ffff.</p>
  200. <h5>CPU/command-engine access slots</h5>
  201. <p>These are the available CPU/cmd access slots:</p>
  202. <table>
  203. <tr><td>32</td><td>96</td><td>166</td><td>174</td><td>188</td>
  204. <td>220</td><td>252</td><td>316</td><td>348</td><td>380</td></tr>
  205. <tr><td>444</td><td>476</td><td>508</td><td>572</td><td>604</td>
  206. <td>636</td><td>700</td><td>732</td><td>764</td><td>828</td></tr>
  207. <tr><td>860</td><td>892</td><td>956</td><td>988</td><td>1020</td>
  208. <td>1084</td><td>1116</td><td>1148</td><td>1212</td><td>1268</td></tr>
  209. <tr><td>1334</td></tr>
  210. </tr>
  211. </table>
  212. <p>This is very similar to the CPU/cmd slots in bitmap mode with sprites
  213. enabled. In both cases there are 31 available slots and the largest distance
  214. between (the start of) two slots is 70 VDP cycles. Only some of the slots
  215. are slightly shifted when comparing both modes.</p>
  216. <h4>Graphics mode 1 (aka screen 1)</h4>
  217. <p>From a VRAM access point of view this mode is quasi identical to Graphics
  218. mode 2. It does exactly the same number of VRAM reads as Graphics mode 2 at
  219. exactly the same moments in time. Only the addresses in the color-table are a
  220. bit different. So it does re-read the same color information for all 8 lines
  221. in a character.</p>
  222. <h4>Multi-color mode (aka screen 3)</h4>
  223. <p>Maybe surprisingly, from a VRAM access point of view, also this mode is
  224. quasi identical to Graphics mode 2. This mode doesn't have a color-table. But
  225. the V9938 still uses the same VRAM access schema as it uses for Graphics mode
  226. 2. The reads from the color-table are replaced with dummy reads from address
  227. 0x1ffff, but all the rest is identical.</p>
  228. <p>So unfortunately the VRAM bandwidth used to access the color-table does not
  229. become available for CPU-VRAM access (or the command engine on V9958).</p>
  230. <h3>Text modes</h3>
  231. <p>Now we'll look at the two text modes. We'll again see that, from a VRAM
  232. access point of view, both modes are very similar.</p>
  233. <h4>Text mode 2 (aka screen 0, width 80)</h4>
  234. <h5>refresh</h5>
  235. <p>In all V9938 bitmap and character modes the refresh was handled identically.
  236. For text modes it's different. Now there are only 7 (instead of 8) refresh
  237. reads per display line and they are clustered together near the start of the
  238. line, located at these moments in time:</p>
  239. <table>
  240. <tr><td>74</td><td>82</td><td>90</td><td>98</td>
  241. <td>106</td><td>114</td><td>122</td></tr>
  242. </table>
  243. <p><i>So apparently 7 refreshes per line are enough to keep the DRAM content
  244. intact. Too bad the other screen modes use 8. Also using only 7 could have made
  245. command execution slightly faster.</i></p>
  246. <h5>dummy reads</h5>
  247. <p>There are 2 dummy reads from address 0x1ffff located at:</p>
  248. <table>
  249. <tr><td>230</td><td>238</td></tr>
  250. <table>
  251. <h5>text rendering</h5>
  252. <p>The text rendering itself is pretty straight-forward. We need to read 80
  253. bytes from the name-table and also 80 bytes from the pattern-table (the
  254. pattern-addresses depend on the values read from the name-table). For the blink
  255. color feature we also need 80 bits (10 bytes) from the color-table.</p>
  256. <p>Rendering is performed in 20 groups of 4 characters. Each group starts
  257. reading 4 bytes from the name-table using a burst read, these reads are
  258. located at:</p>
  259. <table>
  260. <tr><td>g+0</td><td>g+4</td><td>g+8</td><td>g+12</td></tr>
  261. </table>
  262. <p>with g one of:</p>
  263. <table>
  264. <tr><td>246</td><td>294</td><td>342</td><td>390</td><td>438</td>
  265. <td>486</td><td>534</td><td>582</td><td>630</td><td>678</td></tr>
  266. <tr><td>726</td><td>774</td><td>822</td><td>870</td><td>918</td>
  267. <td>966</td><td>1014</td><td>1062</td><td>1110</td><td>1158</td></tr>
  268. </table>
  269. <p>Next we read 1 byte from the color-table (at cycle g+18). This gives 8 bits,
  270. so we only need to do this for every other group (in the other group this
  271. access is used as a CPU/cmd access slot). Last we read 4 bytes from the pattern
  272. table. These must be non-burst reads because potentially bits 15-8 of the
  273. pattern-address for each character are different. These reads start at:</p>
  274. <table>
  275. <tr><td>g+24</td><td>g+30</td><td>g+36</td><td>g+42</td></tr>
  276. </table>
  277. <p>Note that combined all reads in 1 group take 48 cycles, and that's also the
  278. distance between 2 groups (in character mode there where a few spare cycles in
  279. a group). So it is really required to process the characters in groups of 4,
  280. otherwise burst reads aren't possible and all the required VRAM accesses don't
  281. fit in the available cycle budget (1 narrow pixel is 2 VDP cycles, thus 4
  282. characters of each 6 pixels take 48 cycles).</p>
  283. <h5>CPU/command-engine access slots</h5>
  284. <p>These are the positions of the available CPU/cmd access slots:</p>
  285. <table>
  286. <tr><td>2</td><td>10</td><td>18</td><td>26</td><td>34</td>
  287. <td>42</td><td>50</td><td>58</td><td>66</td><td>166</td></tr>
  288. <tr><td>174</td><td>182</td><td>190</td><td>198</td><td>206</td>
  289. <td>214</td><td>222</td><td>312</td><td>408</td><td>504</td></tr>
  290. <tr><td>600</td><td>696</td><td>792</td><td>888</td><td>984</td>
  291. <td>1080</td><td>1176</td><td>1206</td><td>1214</td><td>1222</td></tr>
  292. <tr><td>1230</td><td>1238</td><td>1246</td><td>1254</td><td>1262</td>
  293. <td>1270</td><td>1278</td><td>1286</td><td>1294</td><td>1302</td></tr>
  294. <tr><td>1310</td><td>1318</td><td>1326</td><td>1336</td><td>1346</td>
  295. <td>1354</td><td>1362</td></tr>
  296. </table>
  297. <p>There are 47 access slots, but they are very unevenly distributed. Often the
  298. distance between two slots is 96 cycles, and one time even 100 VDP cycles! This
  299. means that, even though there are more slots compared to bitmap/character mode,
  300. the Z80 must access the VRAM more slowly in this mode! So to be safe there must
  301. be 20 Z80 cycles between two CPU-VRAM accesses (see previous article for the
  302. details of this calculation).</p>
  303. <h4>Text mode 1 (aka screen 0, width 40)</h4>
  304. <p>As mentioned before, from a VRAM access point of view, Text mode 1 is
  305. similar to Text mode 2. Actually from a VRAM access allocation point of view
  306. it's identical, only the actually VRAM read addresses are different. This may
  307. seem strange because Text mode 1 logically needs a lot less data than Text mode
  308. 2. This is because over half of the performed reads are dummy reads:</p>
  309. <ul>
  310. <li>In Text mode 1 there are still 4 (burst) reads from the name-table, but the
  311. 3rd and 4th are dummy reads. In our (limited) measurements the address for
  312. read 3 and 4 was the same as for read 2, but with CAS1 active instead of
  313. CAS0 (but doesn't really matter as the result is ignored).</li>
  314. <li>There are also 10 dummy reads at the locations that are reserved for reads
  315. from the color-table. All these read address 0x1ffff.</li>
  316. <li>And similarly there are 4 reads from the pattern-table, but the 3rd and
  317. 4th read address 0x1ffff.</li>
  318. </ul>
  319. <h5>CPU/command-engine access slots</h5>
  320. <p>The available CPU/cmd access slots are identical to those in Text mode 2.
  321. <i>It's unfortunate there are so many dummy reads in this mode. If this wasn't
  322. the case, the available VRAM bandwidth for CPU accesses (or commands on V9958)
  323. could have been a lot higher. Especially because the Z80 already cannot access
  324. VRAM very fast in this mode. And also because, as we'll see below, on TMS9918
  325. there's no such timing constraint for this mode.</i></p>
  326. <h3>Stuff not measured</h3>
  327. <p>There are a number of V9938 cases we don't have measurements for. As already
  328. said in the beginning of this text, the original goal of these measurements was
  329. to improve the accuracy of the command engine emulation. And we only had a very
  330. limited amount of time the day we did this experiment. Re-doing the experiment
  331. is certainly possible and not even that hard. But it takes a lot of time to
  332. setup, for, as we'll see below, not too much useful extra knowledge. But of
  333. course I'd be very happy to hear from other people who do want to repeat and/or
  334. extend our measurements!</p>
  335. <h4>Graphics mode 3 (aka screen 4)</h4>
  336. <p>This uses very likely the same timing as Graphics mode 2 (aka screen 2), but
  337. with the address generation logic of sprite mode 1 replaced with the one for
  338. sprite mode 2. The timing of all VRAM accesses, even the sprite accesses, can
  339. remain identical. <i>The existence of this screen mode might have been
  340. (another) reason to design the timing of sprite mode 1 on V9938 in such a
  341. strange way.</i></p>
  342. <h4>Character modes with sprites disabled</h4>
  343. <p>In bitmap modes, when sprite rendering is disabled, the VRAM bandwidth that
  344. was allocated to sprite rendering becomes available for CPU/cmd accesses. Likely
  345. the same is true for character modes (and text modes never have sprites).
  346. Unfortunately since we didn't measure this combination we don't know exactly
  347. where those slots are located. But it should be possible to make a very
  348. reasonable estimate.</p>
  349. <h4>Text modes with screen disabled (or vertical border)</h4>
  350. <p>All non-text modes behave identical when screen display is disabled (and the
  351. behavior during screen disabled is identical to the behavior during vertical
  352. border lines): the VRAM reads for screen- and sprite-rendering are gone and
  353. replaced by CPU/cmd access slots, but e.g. the refresh accesses remain. In the
  354. 2 text modes those refresh accesses are located in different positions compared
  355. to bitmap/character modes. It's not known whether:</p>
  356. <ul>
  357. <li>Screen-disabled in text-mode is the same as screen-disabled in the other
  358. screen modes.</li>
  359. <li>Or whether it has a dedicated schema with the refresh-reads in the same
  360. positions as in the text-mode screen-enabled case.</li>
  361. <!-- TODO S#2 HR bit in vertical border, screen-disabled -->
  362. </ul>
  363. <p>This might make a difference for the exact position of the CPU/cmd access
  364. slots. But because there are usually plenty of slots available in screen-off
  365. mode, this likely won't matter (much).</p>
  366. <h2>TMS9918 (MSX1 VDP)</h2>
  367. <p>All our measurements were performed on a V9938 (MSX2 VDP). It's very likely
  368. we can extrapolate the results to a V9958 (MSX2+ VDP). But for sure we cannot
  369. use these results to derive anything meaningful for the TMS9918. Fortunately
  370. there already is some interesting information available in
  371. <a href="http://spatula-city.org/~im14u2c/vdp-99xx/">these documents</a> from
  372. Karl Guttag. Especially this
  373. <a href="http://spatula-city.org/~im14u2c/vdp-99xx/e2/1978_9918_Master_Timing_by_Serigio_Maggi.jpg">
  374. timing picture</a> looks promising. Combined with information found in the
  375. <a href="http://map.grauw.nl/resources/video/texasinstruments_tms9918.pdf">
  376. TMS9918 application manual</a> I was able to deduce the stuff below. <i>This
  377. wasn't easy because that timing picture does contain some (confusing) mistakes,
  378. although I can easily forgive those mistakes because drawing this stuff by hand
  379. is very tedious</i> ;-)</p>
  380. <h5>general (memory) timings</h5>
  381. <p>The TMS9918 runs at 5.37MHz (1.5&times;3.58MHz, 4&times; slower than the
  382. V9938). One display line takes 342 cycles (as expected, 4&times; less than on
  383. V9938). One memory access takes only 2 cycles or 372ns. So compared to V9938
  384. each memory access takes slightly longer (on V9938 one access takes 6 cycles or
  385. 279ns). The TMS9918 never uses burst memory reads.</p>
  386. <h5>Graphics mode 2 (aka screen 2)</h5>
  387. <p>For the actual arrangement of the accesses I'll refer to the big timing
  388. diagram (see top of this article). Most things are reasonably straight-forward.
  389. One notable thing is the arrangement of the 32 sprite y-coordinate reads (for
  390. the visibility check): the first 8 follow a different pattern than the last 24.
  391. This is done to not have too long periods without CPU VRAM access slot. Reading
  392. the other sprite data (during the horizontal border) also shows some
  393. irregularities. Like on V9938, the y-coordinates of the visible sprites are
  394. read twice, but apart from these 4 redundant reads, there are no dummy reads or
  395. idle cycles (unlike V9938). It is not known whether the TMS9918 performs reads
  396. for sprites that are not actually visible or that those slots are available for
  397. CPU access (V9938 performs dummy reads).</p>
  398. <p>The TMS9918 application manual mentions in section 2.1.5 "&hellip; CPU
  399. windows occur once every 16 memory cycles &hellip;". This confirms the
  400. above.</p>
  401. <h5>Graphics mode 1 (aka screen 1)</h5>
  402. <p>I couldn't find anything specific about the timing of this mode in the above
  403. documentation. But because it requires the same number of reads from VRAM as
  404. Graphics mode 2, it's logical to assume the timing is identical.</p>
  405. <h5>Multi-color mode (aka screen 3)</h5>
  406. <p>The documentation also doesn't have specific timing information for this
  407. mode. In this mode the color-table isn't used, so one possibility is that
  408. accesses to the color-table are replaced by CPU access slots (not the case on
  409. V9938). This is confirmed by the following quote from the application manual,
  410. section 2.1.5 "&hellip; in the Multicolor mode, CPU windows occur at least once
  411. out of every four memory cycles &hellip;". Though when you look at the
  412. sprite-accesses in the horizontal border area this quote isn't true: there's
  413. still one location where there are 15 memory cycles between 2 CPU access slots!
  414. <i>On the other hand, it would be possible to distribute the sprite and cpu
  415. accesses more evenly in the horizontal border. So maybe that hand-drawn timing
  416. picture is wrong? Or maybe it doesn't correspond to the final TMS9918
  417. design?</i></p>
  418. <h5>Text mode 1 (aka screen 0, width 40)</h5>
  419. <p>Again see the big timing diagram. There's nothing really special about this
  420. mode. The following quote from the application manual confirms this
  421. arrangement: section 2.1.5 "&hellip; In the Text mode the CPU windows occur at
  422. least once out of every three memory cycles &hellip;". Note that reads from the
  423. name-table are not immediately followed by reads from the pattern-table.
  424. <i>Possibly because the addresses in the latter table depend on the results
  425. from the former reads and the VDP needs time to calculate those
  426. addresses.</i></p>
  427. <h5>display-disabled</h5>
  428. <p>On TMS9918 there's no register that allows to disable sprite rendering
  429. (there is on V9938), but it is possible to disable the whole screen rendering.
  430. The documentation does hint that the screen-disabled behavior is the same as
  431. the behavior during the vertical border (just as on V9938).</p>
  432. <p>You may have noticed that in the above TMS9918 display-mode sections we
  433. didn't mention any refresh reads. Also note that the hand-drawn TMS9918 timing
  434. picture mentions something called 'refresh mode' (but nothing called 'vertical
  435. border' or 'screen-disabled'). So I believe that on TMS9918 the VRAM is not
  436. refreshed during each display line, but instead it's refreshed during the
  437. vertical border.</p>
  438. <p>On V9938 each display line performs 8 refresh reads (only 7 in text mode),
  439. so it takes 64 lines or about 4ms to refresh 128kB VRAM (and only 2ms if you
  440. rely on RAS-without-CAS refresh). On TMS9918 each vertical border line performs
  441. 32 refresh reads. So during the whole vertical border the full 16kB VRAM is
  442. fully refreshed multiple times. Though between two vertical borders there are
  443. 192 display lines or about 12ms. So DRAM chips connected to TMS9918 have to be
  444. able to retain their content longer without refresh than those connected to
  445. V9938. The TMS9918 refresh schema does make more efficient use of the available
  446. VRAM bandwidth (only do refresh when there's plenty of bandwidth available). On
  447. the other hand the TMS9918 schema would make something like the
  448. V9938-overscan-trick impossible (overscan = show display lines everywhere,
  449. 'skip' the vertical border).</p>
  450. <h5>MSX1 CPU-VRAM access</h5>
  451. <p>So what do the above timings mean for a MSX1 Z80 programmer? In various fora
  452. (MSX or other) you find discussions about how fast it's allowed to access the
  453. VRAM (read/write data from/to IO port 0x98). The general consensus seems to be
  454. "at least 29 Z80 cycles between two accesses". For example an OUT(#99),A
  455. instruction takes 12 cycles (on MSX), so you need 17 extra cycles before the
  456. next such instruction.</p>
  457. <p>This value of 29 cycles seems to come directly from the TMS9918 application
  458. manual: it says in the worst case there must be 6&micro;s+2&micro;s=8&micro;s
  459. between two accesses. Translated to Z80 cycles this gives 28.6 and rounded up
  460. 29 Z80 cycles. Though IMHO this result isn't very satisfactory. That value
  461. 8&micro;s is only given with one significant digit, so it could just as well be
  462. 7.5&micro;s or 8.5&micro;s. Rounded up to the nearest Z80 cycle that's between
  463. 27 and 31 Z80 cycles. Many people use the 29-cycles rule and apparently that
  464. works fine in practice. But you also see reports that only 28 cycles
  465. often(?)/always(?) work as well. It would be nice if we could measure the
  466. exact value.</p>
  467. <p><i>That value 2&micro;s is also mentioned in table 2.2 of the TMS9918
  468. application manual. Other values in that table seem to be accurate to
  469. &plusmn;0.05&micro;s, so it's possible (even likely?) those 2&micro;s can be
  470. read as 2.0&micro;s. It must be an integer multiple of VDP clock cycles: 10
  471. cycles is 1.86&micro;s, 11 cycles is 2.05&micro;s. If I have to make a guess
  472. I'd pick the latter (this still results in a total CPU-VRAM access time of 29
  473. Z80 cycles). Though in the rest of this text I'm still assuming the larger
  474. uncertainty interval.</i></p>
  475. <p>Sometimes you find discussions about when it's allowed to go faster than the
  476. worst case requirement. Here the consensus is that in the vertical border you
  477. can go as fast as you want (seems to be correct, see below). Sometimes you see
  478. suggestions that it's also fine to go faster in the horizontal border or when
  479. sprites aren't used (this seems wrong, or at least only partly correct).</p>
  480. <p>Anyway, in the remainder as this section I'd like to dig a little deeper.
  481. Now that the exact VRAM access allocation schemas are known we can say a little
  482. more. But unfortunately some details will remain unclear.</p>
  483. <ul>
  484. <li><p>In the worst case (Graphics mode 1 and 2) there are 16 memory cycles
  485. (32 VDP cycles, 21.3 Z80 cycles) between two CPU slots. The application
  486. manual also mentions an additional CPU-access waiting time of 2&micro;s.
  487. Though as explained above that could be anywhere from
  488. 1.5&micro;s to 2.5&micro;s, this is between 8 to 13 VDP cycles or 5 to 9
  489. Z80 cycles. <i>(It's not clear where this time is coming from, maybe
  490. something similar to the V9938 'slot-reservation' delay of 16 cycles).</i>
  491. Combined this gives between 26.7 and 30.2 Z80 cycles. But unfortunately
  492. this isn't more accurate than the range we already found above.</p>
  493. <p>In this <a href="http://www.msx.org/forum/development/msx-development/bitbuster-depack-vram?page=6">
  494. post</a> dvik suggests it's OK to use tighter timings when "sprites aren't
  495. used". This would mean that the memory slots that are otherwise used for
  496. sprite rendering are given to the CPU. But what does that mean "sprites
  497. aren't used"? If it means there simply aren't any sprites visible, the
  498. TMS9918 still has to fetch 32 y-coordinates to figure out there indeed
  499. aren't any sprites visible (and then it can maybe omit the reads for the
  500. actual sprite rendering). But this doesn't improve the worst case timing.
  501. Another possibility is to explicitly disable sprite rendering. TMS9918 has
  502. no bit in some register to do this. The only possibility I see is to have
  503. a sprite with y-coordinate = 208. It <b>might</b> indeed be the case that
  504. the TMS9918 stops fetching sprite y-coordinates in this scenario (the
  505. V9938 does not), but without further tests I personally wouldn't trust
  506. this. It would be nice if someone could confirm or reject this
  507. theory.</p></li>
  508. <li><p>In the best case (the vertical border), there are only 4 VDP cycles
  509. between CPU slots. Taking the uncertainty of those '2&micro;s' into
  510. account, that results in a minimum distance of 8-12 Z80 cycles between two
  511. VRAM accesses. The fastest Z80 I/O instruction takes 12 cycles (on MSX,
  512. taking the extra Z80 wait cycle into account). So this confirms that in
  513. the vertical border you can indeed access VRAM as fast as you want.</p></li>
  514. <li><p>Text mode has maximum 6 VDP cycles between CPU slots. So rounded that's
  515. somewhere between 10 and 13 Z80 cycles. So it's <i>likely</i> OK to also
  516. in text mode access the VRAM as fast as you want, but we can only be
  517. certain if we know a more accurate value for those '2&micro;s'. It's worth
  518. repeating that on V9938 you can <b>not</b> access VRAM as fast in this
  519. mode. Keep that in mind when writing MSX1 software that needs to be upwards
  520. compatible with MSX2.</p></li>
  521. <li><p>Multi-color mode is unclear: the application manual says there's a CPU
  522. slot at least every 4 memory accesses. But as explained above I don't
  523. believe this (it's true for the display area, but not for the horizontal
  524. border). If it were true you only need 11-15 Z80 cycles between two
  525. CPU-VRAM accesses. But if you do take the border into account you get
  526. 26-29 Z80 cycles, thus only slightly better than Graphics mode 2.</p></li>
  527. </ul>
  528. <hr/>
  529. <p align="right" style="font-size:smaller;">
  530. 2014/08/09, Wouter Vermaelen
  531. </p>
  532. </body>
  533. </html>