Linux Performance Monitoring
Notes on Linux Performance Monitoring
Contents
To monitor system performance on a Linux system, the 'top' program is a good starting point. Running 'top' without options will display:
- A five-line 'Summary Area'
- A 'Task Area', showing a list of running process order by CPU usage
The five-line 'Summary Area' is broken into:
- One-line system load summary
- Two line summary of tasks and CPU load
- Two-line summary of memory usage
Type 'h' in top to list the available commands
The three 'Summary Area' parts can be toggled on/off using 'l', 't' and 'm' - for load, tasks and memory respectively. On a multiprocessor system, '1' will toggle the CPU load display between one line per processor or one line for combined processor display
The 'Task Area' lists running tasks in a dynamic table. Each row corresponds to a single process, and each column provides infomation as per table below.
To toggle a column on/off, type 'f' to access the fields dialogue, and select the appropriate toggle letter. To order tasks by a different column, type 'F' or 'O' and select the appropriate toggle letter. To change the order the fields are displayed in, type 'o', and select the appropriate toggle letter: lowercase moves the field to the right, uppercase moves the field to the left.
Type 'k' to pick a process to kill or 'r' to renice a process.
| Label | Description | Toggle |
|---|---|---|
| PID | Process Id | A |
| PPID | Parent Process Id | B |
| RUSER | Real username | C |
| UID | User Id | D |
| USER | Username | E |
| GROUP | Group name | F |
| TTY | Controlling TTY | G |
| PR | Priority | H |
| NI | Nice value | I |
| P | Last used CPU | J |
| %CPU | CPU usage | K |
| TIME | CPU time in seconds | L |
| TIME+ | CPU time in hundreths | M |
| %MEM | Memory usage | N |
| VIRT | Virtual Image in kb | O |
| SWAP | Swapped size in kb | P |
| RES | Resident size in kb | Q |
| CODE | Code size in kb | R |
| DATA | Data plus Stack size in kb | S |
| SHR | Shared memory size in kb | T |
| nFLT | Page fault count | U |
| nDRT | Dirty pages count | V |
| S | Process Status | W |
| COMMAND | Command name | X |
| WCHAN | Sleeping in function | Y |
| Flags | Task Flags | Z |
vmstat
Top BottomReports virtual memory statistics. Simply typing 'vmstat' in a terminal session will show a summary of memory statistics since system startup. An optional delay (in seconds) can be specified to produce the first summary display followed by delta value updates each 'delay' seconds. A second optional count can be specified to limit the number of reports produced.
The command 'vmstat 5 5' will produce 5 reports of memory usage. The first line will be the memory usage since startup, and subsequent lines will detail memory usage over each 5 second interval that passes, until 5 reports in total are displayed.
linux:~> vmstat 5 5 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 1803304 74796 609296 0 0 153 71 408 730 10 2 78 9 0 0 0 1803068 74944 609380 0 0 28 16 474 527 2 1 94 3 0 1 0 1802672 75160 609408 0 0 41 17 478 570 2 1 93 4 0 0 0 1802612 75224 609460 0 0 10 30 428 478 1 1 94 4 0 0 0 1802528 75344 609508 0 0 22 147 557 633 1 1 94 4
The report is organised in columns for processes, memory, swap usage, I/O, system, and cpu usage. Each column is further divided as follows.
- Procs
- r: The number of processes waiting for run time.
- b: The number of processes in uninterruptible sleep (waiting for I/O or something else)
- Memory (in KB)
- swpd: the amount of virtual memory used
- free: the amount of idle memory
- buff: the amount of memory used as buffers
- cache: the amount of memory used as cache
- inact: the amount of inactive memory (-a option)
- active: the amount of active memory (-a option)
- Swap
- si: Amount of memory swapped in from disk (per second)
- so: Amount of memory swapped to disk (per second)
- IO
- bi: Blocks received from a block device (blocks/s)
- bo: Blocks sent to a block device (blocks/s)
- System
- in: The number of interrupts per second, including the clock
- cs: The number of context switches per second
- CPU These are percentages of total CPU time
- us: Time spent running non-kernel code (user time, including nice time)
- sy: Time spent running kernel code (system time)
- id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time
- wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle
- st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown
Sustained high swap rates (si and so) are usually bad. The system will start spending all of its time swapping, and make no progress on any actual work. You will also see the number of runnable (r and b) processes increase. If the situation gets bad enough and free memory gets too low, the Out-of-memory (oom) logic will start killing random processes. At this point, either reducing the number of processes that normally run or adding additional RAM are about the only options.
If the r column is consistently higher than the number of CPUs in the machine, you are most likely CPU-bound and would benefit from more or faster processors. You can find out what's eating your CPU time using a tool such as top. With similar reasoning, you can deduce whether your bottleneck is caused by memory or I/O and then use an appropriate tool to narrow down the problem or upgrade your hardware. Don't forget that, regardless of how much memory you have, the free column eventually dwindles. This is normal and is a result of your memory being used for I/O cache and buffers -- it is not necessarily indicative of a memory shortage.
iostat
Top BottomThe iostat command at its most basic provides an overview of CPU and disk I/O statistics: Below the first line (which contains the system's kernel version and hostname, along with the current date), iostat displays an overview of the system's average CPU utilization since the last reboot. The CPU utilization report includes the following percentages:
- Percentage of time spent in user mode (running applications, etc.)
- Percentage of time spent in user mode (for processes that have altered their scheduling priority using nice(2))
- Percentage of time spent in kernel mode
- Percentage of time spent idle
Below the CPU utilization report is the device utilization report. This report contains one line for each active disk device on the system
mpstat
Top BottomOn multiprocessor systems, mpstat allows the utilization for each CPU to be displayed individually, making it possible to determine how effectively each CPU is being used.
free
Top BottomDisplays amount of free and used memory. A better solution than using free -s would be to run free using the watch command. For example, to display memory utilization every two seconds (the default display interval for watch), use:
watch free.
To get an update every second, and highlight the changes try:
watch -n 1 -d free
