notes 2.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
  1. --------------- title page ----------------
  2. -------------- credit page ----------------
  3. ------------ problem slide 1 --------------
  4. - why FPGA?
  5. - CPU? computational power
  6. - GPU? communication facilities
  7. - model? human brain
  8. ----------- the machine slide -------------
  9. - what we want to build
  10. ------------ problem slide 2 --------------
  11. - simple model
  12. - differential equations
  13. - more tractable
  14. - real-time deadline
  15. - some neurons live on the same FPGA
  16. ---------- requirements slide --------------
  17. - latency >> bandwidth
  18. - inevitable faults (thousands of links at gbps)
  19. - enable small FPGAs
  20. - fully exhaust resources
  21. - transceivers
  22. - max. links, max. link rates
  23. - gain bandwidth, reduce hops
  24. - heterogenous interop
  25. --------- standard IP cores slide ----------
  26. - many standards
  27. - many difficulties
  28. ----------- architecture slide -------------
  29. - difficulties of custom communication
  30. - additional work
  31. - serial transceiver layer
  32. - send/receive 32-bit words
  33. - physical
  34. - conversion of words
  35. - idle symbols/alignment
  36. - link
  37. - serialization of flits/words
  38. - reliability
  39. - CRC (without header), sequence number, acknowledgement
  40. - unackowledges flits in replay buffer, resend
  41. - routing and switching
  42. - hop-by-hop routing
  43. - abstraction
  44. - primitives for applications
  45. ----------- abstractions slide -------------
  46. - packets
  47. - send/receive buffers
  48. - polling/interrupts informs packet delivery
  49. - bluespec
  50. - adds 10-20 extra cycles latency
  51. - FIFO type abstraction
  52. - remote DMA
  53. - direct memory access
  54. - read/write translation
  55. - transparency
  56. - with bursts
  57. - blocking
  58. - read/write until successful
  59. - deadlock risk
  60. - software pipes
  61. - linux pipe semantics
  62. - testing application on pc
  63. ------------- results slide ----------------
  64. - altera core
  65. - many comparisons, 4 key results
  66. - inherent area/performance trade-off
  67. - bandwidth
  68. - utilization when instantiating each system as many times as necessary to use all transceiver resources
  69. - protocols in black implement reliability
  70. - latency
  71. - comparison with 10G bluelink/ethernet
  72. - flits can be accepted in a single cycle
  73. - lightly loaded case more likely with more transceivers
  74. - overhead
  75. - better use up to 256-bit packets
  76. - area
  77. - bluelink compares very favorably
  78. - 10G has 65% of LUT/reg of 10G ethernet
  79. - 40G will fit same area
  80. - 15% memory of 10G bluelink