Chapter 14 Study Guide
Troubleshooting Methodology
- Proactive Maintenance
Taking the necessary steps to minimize future problems Includes performing system back-ups and identifying potential problem areas.
- Reactive Maintenance
Correcting problems when they arise Always document the solution to help quickly resolve future problems
Troubleshooting Procedures
- Gather as much information as possible
System log files Run information utilities such as ps or mount “tail –f /path/to/logfile” opens a log file for continuous viewing
- Isolate the problem
Determine if the problem is persistent, intermittent, and how many users are effected
- List possible causes and solutions
google is your best friend
- Implement and test solution
- Document your solution and process
- Prioritize problems
Solve most severe problems first Spending too much time on small problems can result in reduced productivity
- Try to solve the root of the problem
A short term solution might fail in the long term because of an underlying problem
Hardware Related Problems
- Can come from damaged hardware or improper hardware or software configuration
- Using the dmesg command or viewing the /var/log/boot.log and var/log/messages files can isolate most hardware problems
- The absence of or improper drivers prevents the OS from using the associated hardware
Use lsusb to view only usb devices Use lspci to view only PCI devices
- Lsmod command lists the drivers loaded into the kernel
By comparing the output of dmesg, lsusb, and lspci with the lsmod output, you can determine if a driver is missing
- Hard drives are the most common hardware component to fail
Software Related Problems
- Can be application or OS related
- Application Related Problems
Can fail during execution due to missing program libraries and files, process restrictions, or conflicting applications Identify missing files in a package by using the –V option with the rpm command Use the ldd command to identify which shared libraries are required by certain programs It is good practice to run the ldconfig command to ensure the shared library directories are updated The ulimit command can be used to increase the number of processes the user can start in a shell
- Operating System Related Problems
Typically include problems with X windows, boot loaders, and filesystems Use xwininfo or xdpyinfo commands to attemp to isolate problems with X windows Placing the word “linear” and removing “compact” from the /etc/lilo.conf file often fixes LILO boot loader problems The GRUB boot loader errors are typically the result of a missing file in the /boot directory File systems can become corrupted due to high use accessing the hard drive Corrupted filesystems can be identified by very slow write requests, errors printed to the console, or failure to mount
User Interface Related Problems
- Users need to understand how to use their desktop environment, but often will not
- Assistive technologies are tools you can use to modify your desktop experience
Accessed by opening the system menu and navigate to preferences, assistive technologies
Linux Performance
Monitor system performance using command-line included in the sysstat package. To make it easier to identify performance problems, a network administrator should run performance utilities on healthy Linux systems to develop a baseline.
Performance Problems:
- Software
- Hardware
- Combination of the two
Software Problems
Software that requires too many system resources may use CPU, memory, and peripheral devices creating poor performance. Too many processes running or rouge processes
Hardware Problems
Improperly configured hardware (May still Work) Old ( Most companies retire computer equipment after two to five years of use) Jabbering: Sending large amounts of information to the CPU when not in use.
Resolutions
Software problems can sometimes be resolved by changing hardware. Move or Remove the software Upgrading or adding another CPU Use bus mastering peripheral components (Devices that can perform processes normally performed by the CPU) Adding RAM to increase system speed Replace slower disk drives with faster ones Use disk striping RAID Keep CD and DVD drives on a separate hard disk controller
Monitoring Performance with sysstat Utilities
Using information from the /proc directory and system devices, the System Statistics (sysstat) package contains utilities that monitor the system. To install the latest version of sysstat on a Linux system, use the following method: 1. yum install sysstat
Three of the System Statistics (sysstat) package performance monitoring utilities include:
- mpstat (multiple processor statistics) command
- iostat (input/output statistics)
- sar (system activity reported) command
mpstat (multiple processor statistics)
Used to monitor CPU performance for all processors on the system since the system was started or rebooted.
To monitor a single cpu use the –P option followed by the processor number.
Example: mpstat –P 0 would display the first processor on the system. Limited in abilities
Examining the Output of the mpstat command
- %user= % of time the processor spent executing user programs and daemons
- %nice= % of time the processor spent executing programs and daemons that had nondefault nice values
- %sys= % of time the processor spent maintaining itself
- %iowait= % of time the CPU was idle when an outstanding disk I/O request existed.
- %irq= % of time the CPU is using to respond to normal interrupts that span multiple CPUs.
- %soft= % of time the CPU is using to respond to normal interrupts that span multiple CPUs
- %steal= % of time the CPU is waiting to respond to virtual CPU requests
- %guest= % of time the CPU is executing another virtual CPU
- %idle= % of time the CPU did not spend executing tasks. Should be greater than 25% over a long period of time.
iostat (input/output statistics)
Measusres the flow of information to and from disk devices. Displays CPU statistics similar to mpstat Limited in abilities Adds transfers per second (tps) and block
sar (system activity reporter)
Displays more information than the mpstat or iostat command Displays CPU statistics by default Most widely used performance monitoring tool on UNIX and Linux systems Scheduled using the cron daemon to run every 10 minutes for the current day
logged to a file in the /var/log/sa directory called sa#. The # represents the day of the month
Only one month of records is kept but can be changed by editing the cron table located at /etc/cron.d/sysstat Can display different statistics by specifying options sar (system activity reporter) Displays more information than the mpstat or iostat command Displays CPU statistics by default Most widely used performance monitoring tool on UNIX and Linux systems Scheduled using the cron daemon to run every 10 minutes for the current day
logged to a file in the /var/log/sa directory called sa#. The # represents the day of the month
Only one month of records is kept but can be changed by editing the cron table located at /etc/cron.d/sysstat Can display different statistics by specifying options
Common options with the sar caommand
- CPU Usage of ALL CPUs (sar -u) Default
- CPU Usage of Individual CPU or Core (sar -P)
- Memory Free and Used (sar -r)
- Display swapping statistics (sar -W)
- Reports run queue and load average (sar -q)
- sar -u 1 3 Displays real time CPU usage every 1 second for 3 times.
Other Performance Monitoring Utilities
- top utility (discussed in Chapter 9)
- free command can be used to display the total amounts of physical and swap memory (in Kilobytes) and their utilizations.
- vmstat command indicates more information than the free command to indicate whether more physical memory is required.