cxgb.txt 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353
  1. Chelsio N210 10Gb Ethernet Network Controller
  2. Driver Release Notes for Linux
  3. Version 2.1.1
  4. June 20, 2005
  5. CONTENTS
  6. ========
  7. INTRODUCTION
  8. FEATURES
  9. PERFORMANCE
  10. DRIVER MESSAGES
  11. KNOWN ISSUES
  12. SUPPORT
  13. INTRODUCTION
  14. ============
  15. This document describes the Linux driver for Chelsio 10Gb Ethernet Network
  16. Controller. This driver supports the Chelsio N210 NIC and is backward
  17. compatible with the Chelsio N110 model 10Gb NICs.
  18. FEATURES
  19. ========
  20. Adaptive Interrupts (adaptive-rx)
  21. ---------------------------------
  22. This feature provides an adaptive algorithm that adjusts the interrupt
  23. coalescing parameters, allowing the driver to dynamically adapt the latency
  24. settings to achieve the highest performance during various types of network
  25. load.
  26. The interface used to control this feature is ethtool. Please see the
  27. ethtool manpage for additional usage information.
  28. By default, adaptive-rx is disabled.
  29. To enable adaptive-rx:
  30. ethtool -C <interface> adaptive-rx on
  31. To disable adaptive-rx, use ethtool:
  32. ethtool -C <interface> adaptive-rx off
  33. After disabling adaptive-rx, the timer latency value will be set to 50us.
  34. You may set the timer latency after disabling adaptive-rx:
  35. ethtool -C <interface> rx-usecs <microseconds>
  36. An example to set the timer latency value to 100us on eth0:
  37. ethtool -C eth0 rx-usecs 100
  38. You may also provide a timer latency value while disabling adaptive-rx:
  39. ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
  40. If adaptive-rx is disabled and a timer latency value is specified, the timer
  41. will be set to the specified value until changed by the user or until
  42. adaptive-rx is enabled.
  43. To view the status of the adaptive-rx and timer latency values:
  44. ethtool -c <interface>
  45. TCP Segmentation Offloading (TSO) Support
  46. -----------------------------------------
  47. This feature, also known as "large send", enables a system's protocol stack
  48. to offload portions of outbound TCP processing to a network interface card
  49. thereby reducing system CPU utilization and enhancing performance.
  50. The interface used to control this feature is ethtool version 1.8 or higher.
  51. Please see the ethtool manpage for additional usage information.
  52. By default, TSO is enabled.
  53. To disable TSO:
  54. ethtool -K <interface> tso off
  55. To enable TSO:
  56. ethtool -K <interface> tso on
  57. To view the status of TSO:
  58. ethtool -k <interface>
  59. PERFORMANCE
  60. ===========
  61. The following information is provided as an example of how to change system
  62. parameters for "performance tuning" an what value to use. You may or may not
  63. want to change these system parameters, depending on your server/workstation
  64. application. Doing so is not warranted in any way by Chelsio Communications,
  65. and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
  66. of data or damage to equipment.
  67. Your distribution may have a different way of doing things, or you may prefer
  68. a different method. These commands are shown only to provide an example of
  69. what to do and are by no means definitive.
  70. Making any of the following system changes will only last until you reboot
  71. your system. You may want to write a script that runs at boot-up which
  72. includes the optimal settings for your system.
  73. Setting PCI Latency Timer:
  74. setpci -d 1425:* 0x0c.l=0x0000F800
  75. Disabling TCP timestamp:
  76. sysctl -w net.ipv4.tcp_timestamps=0
  77. Disabling SACK:
  78. sysctl -w net.ipv4.tcp_sack=0
  79. Setting large number of incoming connection requests:
  80. sysctl -w net.ipv4.tcp_max_syn_backlog=3000
  81. Setting maximum receive socket buffer size:
  82. sysctl -w net.core.rmem_max=1024000
  83. Setting maximum send socket buffer size:
  84. sysctl -w net.core.wmem_max=1024000
  85. Set smp_affinity (on a multiprocessor system) to a single CPU:
  86. echo 1 > /proc/irq/<interrupt_number>/smp_affinity
  87. Setting default receive socket buffer size:
  88. sysctl -w net.core.rmem_default=524287
  89. Setting default send socket buffer size:
  90. sysctl -w net.core.wmem_default=524287
  91. Setting maximum option memory buffers:
  92. sysctl -w net.core.optmem_max=524287
  93. Setting maximum backlog (# of unprocessed packets before kernel drops):
  94. sysctl -w net.core.netdev_max_backlog=300000
  95. Setting TCP read buffers (min/default/max):
  96. sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
  97. Setting TCP write buffers (min/pressure/max):
  98. sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
  99. Setting TCP buffer space (min/pressure/max):
  100. sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
  101. TCP window size for single connections:
  102. The receive buffer (RX_WINDOW) size must be at least as large as the
  103. Bandwidth-Delay Product of the communication link between the sender and
  104. receiver. Due to the variations of RTT, you may want to increase the buffer
  105. size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
  106. "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
  107. At 10Gb speeds, use the following formula:
  108. RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
  109. Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
  110. RX_WINDOW sizes of 256KB - 512KB should be sufficient.
  111. Setting the min, max, and default receive buffer (RX_WINDOW) size:
  112. sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
  113. TCP window size for multiple connections:
  114. The receive buffer (RX_WINDOW) size may be calculated the same as single
  115. connections, but should be divided by the number of connections. The
  116. smaller window prevents congestion and facilitates better pacing,
  117. especially if/when MAC level flow control does not work well or when it is
  118. not supported on the machine. Experimentation may be necessary to attain
  119. the correct value. This method is provided as a starting point for the
  120. correct receive buffer size.
  121. Setting the min, max, and default receive buffer (RX_WINDOW) size is
  122. performed in the same manner as single connection.
  123. DRIVER MESSAGES
  124. ===============
  125. The following messages are the most common messages logged by syslog. These
  126. may be found in /var/log/messages.
  127. Driver up:
  128. Chelsio Network Driver - version 2.1.1
  129. NIC detected:
  130. eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
  131. Link up:
  132. eth#: link is up at 10 Gbps, full duplex
  133. Link down:
  134. eth#: link is down
  135. KNOWN ISSUES
  136. ============
  137. These issues have been identified during testing. The following information
  138. is provided as a workaround to the problem. In some cases, this problem is
  139. inherent to Linux or to a particular Linux Distribution and/or hardware
  140. platform.
  141. 1. Large number of TCP retransmits on a multiprocessor (SMP) system.
  142. On a system with multiple CPUs, the interrupt (IRQ) for the network
  143. controller may be bound to more than one CPU. This will cause TCP
  144. retransmits if the packet data were to be split across different CPUs
  145. and re-assembled in a different order than expected.
  146. To eliminate the TCP retransmits, set smp_affinity on the particular
  147. interrupt to a single CPU. You can locate the interrupt (IRQ) used on
  148. the N110/N210 by using ifconfig:
  149. ifconfig <dev_name> | grep Interrupt
  150. Set the smp_affinity to a single CPU:
  151. echo 1 > /proc/irq/<interrupt_number>/smp_affinity
  152. It is highly suggested that you do not run the irqbalance daemon on your
  153. system, as this will change any smp_affinity setting you have applied.
  154. The irqbalance daemon runs on a 10 second interval and binds interrupts
  155. to the least loaded CPU determined by the daemon. To disable this daemon:
  156. chkconfig --level 2345 irqbalance off
  157. By default, some Linux distributions enable the kernel feature,
  158. irqbalance, which performs the same function as the daemon. To disable
  159. this feature, add the following line to your bootloader:
  160. noirqbalance
  161. Example using the Grub bootloader:
  162. title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
  163. root (hd0,0)
  164. kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
  165. initrd /initrd-2.4.21-27.ELsmp.img
  166. 2. After running insmod, the driver is loaded and the incorrect network
  167. interface is brought up without running ifup.
  168. When using 2.4.x kernels, including RHEL kernels, the Linux kernel
  169. invokes a script named "hotplug". This script is primarily used to
  170. automatically bring up USB devices when they are plugged in, however,
  171. the script also attempts to automatically bring up a network interface
  172. after loading the kernel module. The hotplug script does this by scanning
  173. the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
  174. for HWADDR=<mac_address>.
  175. If the hotplug script does not find the HWADDRR within any of the
  176. ifcfg-eth# files, it will bring up the device with the next available
  177. interface name. If this interface is already configured for a different
  178. network card, your new interface will have incorrect IP address and
  179. network settings.
  180. To solve this issue, you can add the HWADDR=<mac_address> key to the
  181. interface config file of your network controller.
  182. To disable this "hotplug" feature, you may add the driver (module name)
  183. to the "blacklist" file located in /etc/hotplug. It has been noted that
  184. this does not work for network devices because the net.agent script
  185. does not use the blacklist file. Simply remove, or rename, the net.agent
  186. script located in /etc/hotplug to disable this feature.
  187. 3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
  188. on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
  189. If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
  190. chipset, you may experience the "133-Mhz Mode Split Completion Data
  191. Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
  192. bus PCI-X bus.
  193. AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
  194. can provide stale data via split completion cycles to a PCI-X card that
  195. is operating at 133 Mhz", causing data corruption.
  196. AMD's provides three workarounds for this problem, however, Chelsio
  197. recommends the first option for best performance with this bug:
  198. For 133Mhz secondary bus operation, limit the transaction length and
  199. the number of outstanding transactions, via BIOS configuration
  200. programming of the PCI-X card, to the following:
  201. Data Length (bytes): 1k
  202. Total allowed outstanding transactions: 2
  203. Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
  204. section 56, "133-MHz Mode Split Completion Data Corruption" for more
  205. details with this bug and workarounds suggested by AMD.
  206. It may be possible to work outside AMD's recommended PCI-X settings, try
  207. increasing the Data Length to 2k bytes for increased performance. If you
  208. have issues with these settings, please revert to the "safe" settings
  209. and duplicate the problem before submitting a bug or asking for support.
  210. NOTE: The default setting on most systems is 8 outstanding transactions
  211. and 2k bytes data length.
  212. 4. On multiprocessor systems, it has been noted that an application which
  213. is handling 10Gb networking can switch between CPUs causing degraded
  214. and/or unstable performance.
  215. If running on an SMP system and taking performance measurements, it
  216. is suggested you either run the latest netperf-2.4.0+ or use a binding
  217. tool such as Tim Hockin's procstate utilities (runon)
  218. <http://www.hockin.org/~thockin/procstate/>.
  219. Binding netserver and netperf (or other applications) to particular
  220. CPUs will have a significant difference in performance measurements.
  221. You may need to experiment which CPU to bind the application to in
  222. order to achieve the best performance for your system.
  223. If you are developing an application designed for 10Gb networking,
  224. please keep in mind you may want to look at kernel functions
  225. sched_setaffinity & sched_getaffinity to bind your application.
  226. If you are just running user-space applications such as ftp, telnet,
  227. etc., you may want to try the runon tool provided by Tim Hockin's
  228. procstate utility. You could also try binding the interface to a
  229. particular CPU: runon 0 ifup eth0
  230. SUPPORT
  231. =======
  232. If you have problems with the software or hardware, please contact our
  233. customer support team via email at support@chelsio.com or check our website
  234. at http://www.chelsio.com
  235. ===============================================================================
  236. Chelsio Communications
  237. 370 San Aleso Ave.
  238. Suite 100
  239. Sunnyvale, CA 94085
  240. http://www.chelsio.com
  241. This program is free software; you can redistribute it and/or modify
  242. it under the terms of the GNU General Public License, version 2, as
  243. published by the Free Software Foundation.
  244. You should have received a copy of the GNU General Public License along
  245. with this program; if not, write to the Free Software Foundation, Inc.,
  246. 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
  247. THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
  248. WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
  249. MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
  250. Copyright (c) 2003-2005 Chelsio Communications. All rights reserved.
  251. ===============================================================================