Q18 Troubleshooting Inaccessible Server Flashcards
(1 cards)
Suppose you are a network administrator and find an important server of your company is not accessible.
Describe the steps you would like to take to troubleshoot this issue so that you can find the real problem of this inaccessibility.
This follows a systematic problem-solving approach, often using an OSI model based methodology:
- Define and Scope the Problem:
◦ Confirm the server is actually inaccessible.
◦ Is it inaccessible from all clients/locations, or only specific ones? Is it inaccessible to all services, or just one? This helps narrow down if the problem is server-specific, network-wide, location-specific, or service-specific.
◦ Gather information from users experiencing the issue (when it started, what changed). - Check Basic Connectivity (Physical/Data Link/Network Layers):
◦ Physical: (Especially if local clients are affected) Check physical connections (cables, switch lights) and port status on network devices.
Network (Layer 3):
▪ From a client, attempt to ping the server by its IP address.
* If successful, the network path to the server and the server host itself (at the IP layer) are likely functional. Proceed to step 3.
* If unsuccessful, there is a problem with the network path or the server host is down at Layer 3.
▪ If ping by IP fails, attempt to ping other known-good devices on the server’s local subnet and devices on your own local subnet to isolate the network segment experiencing issues.
▪ Use traceroute (or mtr) from the client to the server’s IP address to identify where connectivity breaks along the path. This helps pinpoint problematic routers or network segments.
▪ Check network device configurations (routers, switches) along the path, especially the default gateway of the client and server, and routing tables.
- Check Name Resolution (If Ping by Name Fails but IP Succeeds):
◦ If you can ping the server by IP but not by its hostname, the issue is with DNS name resolution.
◦ Check the client’s DNS configuration (/etc/resolv.conf or equivalent).
◦ nCheck the DNS server(s) the client uses. Verify the DNS server is running and has the correct record (A/AAAA) for the server’s hostname. - Verify Service Availability (Application Layer):
◦ If basic IP connectivity is fine, log in to the server itself (e.g., via console, SSH if accessible via another IP/interface, or KVM).
◦ Check if the specific server application/service process is running using commands like ps aux | grep <service-name>.
◦ Check if the service is listening on the expected port using netstat -tulnp.
◦ Look for recent errors in the server's application logs and system logs (/var/log/syslog, dmesg, etc.) around the time the problem started.</service-name> - Check Access Control and Firewalls:
◦ Verify that local server firewalls (e.g., iptables rules), host-based access control files (/etc/hosts.allow, /etc/hosts.deny), or network firewalls along the path are not blocking the client’s access to the server’s IP and port. - Check Server Load and Resources:
◦ If the service is running but unresponsive, the server might be overloaded. Use tools like top, htop, or vmstat on the server to check CPU, memory, and disk usage.
◦ Check the number of active connections to the server’s service port using netstat to see if it’s unusually high, potentially indicating a DoS attack. - Review Recent Changes: Consider any recent system updates, configuration changes, or software installations on the server or network that might have coincided with the issue.
- Document: Throughout the process, take notes of the steps you take, the commands you run, and their output. This helps track what has been tried and the results.
- Devise and Test Solution: Based on the identified root cause, plan and implement a solution (e.g., restart service, fix firewall rule, address resource issue). Test thoroughly to confirm the server is accessible again.
- Document Solution: Record the problem, the cause, and how it was fixed for future reference.