Learning Objectives
By the end of this chapter, you should be able to:
Troubleshooting Levels
What are the 3 Troubleshooting Levels a sys admin can be at?
Even the best administered systems will develop problems. Troubleshooting can isolate whether the problems arise from software or hardware, as well as whether they are local to the system, or come from within the local network or the Internet.
Troubleshooting properly requires judgment and experience, and while it will always be somewhat of an art form, following good methodical procedures can really help isolate the sources of problems in a reproducible fashion.
Basic Troubleshooting Techniques
Troubleshooting involves taking a number of steps which need to be repeated iteratively until solutions are found. A basic recipe might be:
If, on the other hand, you elect to respect your intuition and check hunches, you should make sure you can get sufficient data quickly enough to decide whether or not to continue or abandon an intuitive path, based on whether it looks like it will be productive.
While ignoring intuition can sometimes make solving a problem take longer, the troubleshooter’s previous track record is the critical benchmark for evaluating whether to invest resources this way. In other words, useful intuition is not magic, it is distilled experience.
Things to Check: Networking
Network problems can be caused either by software or hardware, and can be as simple as is the device driver loaded, or is the network cable connected. If the network is up and running but performance is terrible, it really falls under the banner of performance tuning, not troubleshooting. The problems may be external to the machine, or require adjustment of the various networking parameters including buffer sizes, etc.
What to check when there are networking issues:
Details:
Things to Check: File Integrity
There are a number of ways to check for corrupt configuration files and binaries.
For RPM-based systems use:
$ rpm -V some_package
to check a single package, and:
$ rpm -Va
checks all packages.
In Debian, the only way to do integrity checking is with debsums. Running debsums somepackage will check the checksums on the files in that package. However, not all packages maintain checksums so this might be less than useful:
$ debsums options some_package
aide does intrusion detection and is another way to check for changes in files:
$ aide –check
will run a scan on your files and compare them to the last scan.
Boot Process Failures
If the system fails to boot properly or fully, being familiar with what happens at each stage is important in identifying particular sources of problems.
Boot Process Failures:
Details:
Filesystem Corruption and Recovery
If during the boot process, one or more filesystems fail to mount, ___ may be used to attempt repair. However, before doing that one should check that ___ has not been misconfigured or corrupted. Note once again that you could have a problem with a filesystem type the kernel you are running does not understand.
If the root filesystem has been mounted you can examine this file, but / may have been mounted as read-only, so to edit the file and fix it, you can run:
$ sudo mount -o remount,rw /
to remount it with write permission.
If /etc/fstab, seems to be correct, you can move to fsck. First, you should try:
$ sudo mount -a
to try and mount all filesystems. If this does not succeed completely, you can try to manually mount the ones with problems. You should first run fsck to just examine; afterwards, you can run it again to have it try and fix any errors found.
Using the Virtual Consoles
By default, Linux defines ___ virtual consoles (also called virtual terminals) to allow local access to the system. The first six are usually login text consoles. Console #___ is used by most distributions as the system console. The #___ console is usually the graphical console, if you have one; however, some distributions (including RHEL) use console 1.
You can use Ctrl-Alt-FX (where X is the number of the console) to go between the consoles, for example Ctrl-Alt-F5 goes to console 5.