strlst
/
uni


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
							    --------------- title page ----------------
    -------------- credit page ----------------
    ------------ problem slide 1 --------------
  - why FPGA?
    - CPU? computational power
    - GPU? communication facilities
  - model? human brain
    ----------- the machine slide -------------
  - what we want to build
    ------------ problem slide 2 --------------
  - simple model
    - differential equations
    - more tractable
  - real-time deadline
  - some neurons live on the same FPGA
    ---------- requirements slide --------------
  - latency >> bandwidth
  - inevitable faults (thousands of links at gbps)
  - enable small FPGAs
  - fully exhaust resources
    - transceivers
    - max. links, max. link rates
      - gain bandwidth, reduce hops
  - heterogenous interop
    --------- standard IP cores slide ----------
  - many standards
  - many difficulties
    ----------- architecture slide -------------
  - difficulties of custom communication
    - additional work
  - serial transceiver layer
    - send/receive 32-bit words
  - physical
    - conversion of words
    - idle symbols/alignment
  - link
    - serialization of flits/words
  - reliability
    - CRC (without header), sequence number, acknowledgement
    - unackowledges flits in replay buffer, resend
  - routing and switching
    - hop-by-hop routing
  - abstraction
    - primitives for applications
    ----------- abstractions slide -------------
  - packets
    - send/receive buffers
    - polling/interrupts informs packet delivery
  - bluespec
    - adds 10-20 extra cycles latency
    - FIFO type abstraction
  - remote DMA
    - direct memory access
    - read/write translation
    - transparency
    - with bursts
  - blocking
    - read/write until successful
    - deadlock risk
  - software pipes
    - linux pipe semantics
    - testing application on pc
    ------------- results slide ----------------
  - altera core
  - many comparisons, 4 key results
  - inherent area/performance trade-off
  - bandwidth
    - utilization when instantiating each system as many times as necessary to use all transceiver resources
    - protocols in black implement reliability
  - latency
    - comparison with 10G bluelink/ethernet
    - flits can be accepted in a single cycle
    - lightly loaded case more likely with more transceivers
  - overhead
    - better use up to 256-bit packets
  - area
    - bluelink compares very favorably
    - 10G has 65% of LUT/reg of 10G ethernet
    - 40G will fit same area
    - 15% memory of 10G bluelink