Troubleshooting and Performance Optimization Flashcards
troubleshooting methodology
- identify problem
- establish theory of probably cause
- test the theory
- establish plan of action
- implement a solution/escalate
- verify functionality
- perform root cause analysis
- document the solution
refined troubleshooting
- identify problem scope
- reproduce the problem
- check log files
- read documentation
BIOS failure possible causes
- overheating
- unsupported features
- newer options may require UEFI
BIOS failure possible solutions
- keep server rooms/data centers properly ventilated
- update (flash) BIOS
- acquire UEFI motherboards/enable UEFI options
POST failure possible causes
- TPM firmware detects a boot configuration change
- failed hardware components
POST failure possible solutions
- enter TPM recovery code/configure boot options
- search for reported POST code to identify problem
- replace failed components
memory failure possible causes
- POST failure message
- random OS freezes/reboots
memory failure possible solutions
- run memory diagnostics
- replace failed components
processor failure/performance degradation possible causes
- overheating
- throttling slows CPU as temperature increases
- VMs with manual CPU affinity specified are performing poorly
processor failure/performance degradation possible solutions
- ensure HVAC is running correctly
- don’t manually link VMs to specific CPU cores
boot sequence possible causes
- OS not found due to changing disk order/partitions
- booting from USB might fail if not enabled in BIOS
boot sequence possible solutions
- configure bootable disk order in BIOS
- configure bootable disk partitions in OS
- flash BIOS so USB boot is supported
storage failure possible causes
- drive failure
- RAID array drive failures resulting in slow performance
storage failure possible solutions
- run disk diagnostics
- replace failed drives
- have hot spare disks in place
power failure possible causes
- power supply
- power surge
power failure possible solutions
- use redundant power sources
- use UPSs
- use surge protectors
environment failure possible causes
- HVAC malfunctioning causes overheating
- accumulated dust hampers airflow/add layer of insulation
- low humidity increases ESD
environment failure possible solutions
- ensure HVAC is running properly
- clear dust from components/air intake fans
- ensure HVAC keeps consistent relative humidity
crash cart tools
- multimeter to test power supplies
- hardware diagnostics tools for components
- can of compressed air to remove dust
- antistatic wrist strap/ESD mats
- tools for testing bad RAM chips
logon failure possible causes
- incorrect credentials
- corrupt user profile
- can’t locate authentication server
logon failure possible solutions
- reset user password
- save old user profile/remove corrupt user profile and registry references
- ensure client station points to correct DNS server
user unable to access resource possible causes
- insufficient permissions
- encryption is enabled
- Windows UAC configuration is too restrictive
- UNIX/Linux sudo is not configured to enable user access to certain commands
user unable to access resource possible solutions
- check user effective access
- check group membership
- ensure user has decryption key
- loosen UAC settings
- modify sudoers configuration file
memory leak possible causes
- poorly written software
- malware
- runaway processes