The Sysadmin Notebook  

Sitemap

Linux Performance Monitoring

Notes on Linux Performance Monitoring

Contents

To monitor system performance on a Linux system, the 'top' program is a good starting point. Running 'top' without options will display:

The five-line 'Summary Area' is broken into:

  1. One-line system load summary
  2. Two line summary of tasks and CPU load
  3. Two-line summary of memory usage

Type 'h' in top to list the available commands

The three 'Summary Area' parts can be toggled on/off using 'l', 't' and 'm' - for load, tasks and memory respectively. On a multiprocessor system, '1' will toggle the CPU load display between one line per processor or one line for combined processor display

The 'Task Area' lists running tasks in a dynamic table. Each row corresponds to a single process, and each column provides infomation as per table below.

To toggle a column on/off, type 'f' to access the fields dialogue, and select the appropriate toggle letter. To order tasks by a different column, type 'F' or 'O' and select the appropriate toggle letter. To change the order the fields are displayed in, type 'o', and select the appropriate toggle letter: lowercase moves the field to the right, uppercase moves the field to the left.

Type 'k' to pick a process to kill or 'r' to renice a process.

Task Area columns in top
Label Description Toggle
PID Process Id A
PPID Parent Process Id B
RUSER Real username C
UID User Id D
USER Username E
GROUP Group name F
TTY Controlling TTY G
PR Priority H
NI Nice value I
P Last used CPU J
%CPU CPU usage K
TIME CPU time in seconds L
TIME+ CPU time in hundreths M
%MEM Memory usage N
VIRT Virtual Image in kb O
SWAP Swapped size in kb P
RES Resident size in kb Q
CODE Code size in kb R
DATA Data plus Stack size in kb S
SHR Shared memory size in kb T
nFLT Page fault count U
nDRT Dirty pages count V
S Process Status W
COMMAND Command name X
WCHAN Sleeping in function Y
Flags Task Flags Z

vmstat

Top Bottom

Reports virtual memory statistics. Simply typing 'vmstat' in a terminal session will show a summary of memory statistics since system startup. An optional delay (in seconds) can be specified to produce the first summary display followed by delta value updates each 'delay' seconds. A second optional count can be specified to limit the number of reports produced.

The command 'vmstat 5 5' will produce 5 reports of memory usage. The first line will be the memory usage since startup, and subsequent lines will detail memory usage over each 5 second interval that passes, until 5 reports in total are displayed.

linux:~> vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 1803304  74796 609296    0    0   153    71  408  730 10  2 78  9
 0  0      0 1803068  74944 609380    0    0    28    16  474  527  2  1 94  3
 0  1      0 1802672  75160 609408    0    0    41    17  478  570  2  1 93  4
 0  0      0 1802612  75224 609460    0    0    10    30  428  478  1  1 94  4
 0  0      0 1802528  75344 609508    0    0    22   147  557  633  1  1 94  4

The report is organised in columns for processes, memory, swap usage, I/O, system, and cpu usage. Each column is further divided as follows.

Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep (waiting for I/O or something else)
Memory (in KB)
swpd: the amount of virtual memory used
free: the amount of idle memory
buff: the amount of memory used as buffers
cache: the amount of memory used as cache
inact: the amount of inactive memory (-a option)
active: the amount of active memory (-a option)
Swap
si: Amount of memory swapped in from disk (per second)
so: Amount of memory swapped to disk (per second)
IO
bi: Blocks received from a block device (blocks/s)
bo: Blocks sent to a block device (blocks/s)
System
in: The number of interrupts per second, including the clock
cs: The number of context switches per second
CPU These are percentages of total CPU time
us: Time spent running non-kernel code (user time, including nice time)
sy: Time spent running kernel code (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown

Sustained high swap rates (si and so) are usually bad. The system will start spending all of its time swapping, and make no progress on any actual work. You will also see the number of runnable (r and b) processes increase. If the situation gets bad enough and free memory gets too low, the Out-of-memory (oom) logic will start killing random processes. At this point, either reducing the number of processes that normally run or adding additional RAM are about the only options.

If the r column is consistently higher than the number of CPUs in the machine, you are most likely CPU-bound and would benefit from more or faster processors. You can find out what's eating your CPU time using a tool such as top. With similar reasoning, you can deduce whether your bottleneck is caused by memory or I/O and then use an appropriate tool to narrow down the problem or upgrade your hardware. Don't forget that, regardless of how much memory you have, the free column eventually dwindles. This is normal and is a result of your memory being used for I/O cache and buffers -- it is not necessarily indicative of a memory shortage.

iostat

Top Bottom

The iostat command at its most basic provides an overview of CPU and disk I/O statistics: Below the first line (which contains the system's kernel version and hostname, along with the current date), iostat displays an overview of the system's average CPU utilization since the last reboot. The CPU utilization report includes the following percentages:

Below the CPU utilization report is the device utilization report. This report contains one line for each active disk device on the system

mpstat

Top Bottom

On multiprocessor systems, mpstat allows the utilization for each CPU to be displayed individually, making it possible to determine how effectively each CPU is being used.

free

Top Bottom

Displays amount of free and used memory. A better solution than using free -s would be to run free using the watch command. For example, to display memory utilization every two seconds (the default display interval for watch), use:

watch free.

To get an update every second, and highlight the changes try:

watch -n 1 -d free