knfsd-stats.txt 6.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160
  1. Kernel NFS Server Statistics
  2. ============================
  3. This document describes the format and semantics of the statistics
  4. which the kernel NFS server makes available to userspace. These
  5. statistics are available in several text form pseudo files, each of
  6. which is described separately below.
  7. In most cases you don't need to know these formats, as the nfsstat(8)
  8. program from the nfs-utils distribution provides a helpful command-line
  9. interface for extracting and printing them.
  10. All the files described here are formatted as a sequence of text lines,
  11. separated by newline '\n' characters. Lines beginning with a hash
  12. '#' character are comments intended for humans and should be ignored
  13. by parsing routines. All other lines contain a sequence of fields
  14. separated by whitespace.
  15. /proc/fs/nfsd/pool_stats
  16. ------------------------
  17. This file is available in kernels from 2.6.30 onwards, if the
  18. /proc/fs/nfsd filesystem is mounted (it almost always should be).
  19. The first line is a comment which describes the fields present in
  20. all the other lines. The other lines present the following data as
  21. a sequence of unsigned decimal numeric fields. One line is shown
  22. for each NFS thread pool.
  23. All counters are 64 bits wide and wrap naturally. There is no way
  24. to zero these counters, instead applications should do their own
  25. rate conversion.
  26. pool
  27. The id number of the NFS thread pool to which this line applies.
  28. This number does not change.
  29. Thread pool ids are a contiguous set of small integers starting
  30. at zero. The maximum value depends on the thread pool mode, but
  31. currently cannot be larger than the number of CPUs in the system.
  32. Note that in the default case there will be a single thread pool
  33. which contains all the nfsd threads and all the CPUs in the system,
  34. and thus this file will have a single line with a pool id of "0".
  35. packets-arrived
  36. Counts how many NFS packets have arrived. More precisely, this
  37. is the number of times that the network stack has notified the
  38. sunrpc server layer that new data may be available on a transport
  39. (e.g. an NFS or UDP socket or an NFS/RDMA endpoint).
  40. Depending on the NFS workload patterns and various network stack
  41. effects (such as Large Receive Offload) which can combine packets
  42. on the wire, this may be either more or less than the number
  43. of NFS calls received (which statistic is available elsewhere).
  44. However this is a more accurate and less workload-dependent measure
  45. of how much CPU load is being placed on the sunrpc server layer
  46. due to NFS network traffic.
  47. sockets-enqueued
  48. Counts how many times an NFS transport is enqueued to wait for
  49. an nfsd thread to service it, i.e. no nfsd thread was considered
  50. available.
  51. The circumstance this statistic tracks indicates that there was NFS
  52. network-facing work to be done but it couldn't be done immediately,
  53. thus introducing a small delay in servicing NFS calls. The ideal
  54. rate of change for this counter is zero; significantly non-zero
  55. values may indicate a performance limitation.
  56. This can happen either because there are too few nfsd threads in the
  57. thread pool for the NFS workload (the workload is thread-limited),
  58. or because the NFS workload needs more CPU time than is available in
  59. the thread pool (the workload is CPU-limited). In the former case,
  60. configuring more nfsd threads will probably improve the performance
  61. of the NFS workload. In the latter case, the sunrpc server layer is
  62. already choosing not to wake idle nfsd threads because there are too
  63. many nfsd threads which want to run but cannot, so configuring more
  64. nfsd threads will make no difference whatsoever. The overloads-avoided
  65. statistic (see below) can be used to distinguish these cases.
  66. threads-woken
  67. Counts how many times an idle nfsd thread is woken to try to
  68. receive some data from an NFS transport.
  69. This statistic tracks the circumstance where incoming
  70. network-facing NFS work is being handled quickly, which is a good
  71. thing. The ideal rate of change for this counter will be close
  72. to but less than the rate of change of the packets-arrived counter.
  73. overloads-avoided
  74. Counts how many times the sunrpc server layer chose not to wake an
  75. nfsd thread, despite the presence of idle nfsd threads, because
  76. too many nfsd threads had been recently woken but could not get
  77. enough CPU time to actually run.
  78. This statistic counts a circumstance where the sunrpc layer
  79. heuristically avoids overloading the CPU scheduler with too many
  80. runnable nfsd threads. The ideal rate of change for this counter
  81. is zero. Significant non-zero values indicate that the workload
  82. is CPU limited. Usually this is associated with heavy CPU usage
  83. on all the CPUs in the nfsd thread pool.
  84. If a sustained large overloads-avoided rate is detected on a pool,
  85. the top(1) utility should be used to check for the following
  86. pattern of CPU usage on all the CPUs associated with the given
  87. nfsd thread pool.
  88. - %us ~= 0 (as you're *NOT* running applications on your NFS server)
  89. - %wa ~= 0
  90. - %id ~= 0
  91. - %sy + %hi + %si ~= 100
  92. If this pattern is seen, configuring more nfsd threads will *not*
  93. improve the performance of the workload. If this patten is not
  94. seen, then something more subtle is wrong.
  95. threads-timedout
  96. Counts how many times an nfsd thread triggered an idle timeout,
  97. i.e. was not woken to handle any incoming network packets for
  98. some time.
  99. This statistic counts a circumstance where there are more nfsd
  100. threads configured than can be used by the NFS workload. This is
  101. a clue that the number of nfsd threads can be reduced without
  102. affecting performance. Unfortunately, it's only a clue and not
  103. a strong indication, for a couple of reasons:
  104. - Currently the rate at which the counter is incremented is quite
  105. slow; the idle timeout is 60 minutes. Unless the NFS workload
  106. remains constant for hours at a time, this counter is unlikely
  107. to be providing information that is still useful.
  108. - It is usually a wise policy to provide some slack,
  109. i.e. configure a few more nfsds than are currently needed,
  110. to allow for future spikes in load.
  111. Note that incoming packets on NFS transports will be dealt with in
  112. one of three ways. An nfsd thread can be woken (threads-woken counts
  113. this case), or the transport can be enqueued for later attention
  114. (sockets-enqueued counts this case), or the packet can be temporarily
  115. deferred because the transport is currently being used by an nfsd
  116. thread. This last case is not very interesting and is not explicitly
  117. counted, but can be inferred from the other counters thus:
  118. packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )
  119. More
  120. ----
  121. Descriptions of the other statistics file should go here.
  122. Greg Banks <gnb@sgi.com>
  123. 26 Mar 2009