Troubleshooting Patterns
General Linux troubleshooting patterns explained in a public-safe way.
Use this page when something is broken but the cause is not clear yet.
Basic troubleshooting mindset
Do not guess first. Check facts.
Ask:
1. What is the exact problem?
2. When did it start?
3. What changed recently?
4. Is it one user, one service, one server, or many systems?
5. Is there an error message?
6. Can the problem be reproduced?
7. What do the logs say?
First command set
When the issue is unclear, start with:
hostnamectl
uptime
df -h
free -h
systemctl --failed
ip a
ip r
ss -tulpn
journalctl -xe
These commands help check:
system identity
uptime and load
disk usage
memory usage
failed services
network addresses
routes
listening ports
recent errors
Pattern: service is not working
First checks:
systemctl status SERVICE_NAME
journalctl -u SERVICE_NAME -n 100
journalctl -u SERVICE_NAME -p err
systemctl --failed
Questions:
1. Is the service running?
2. Is it failed?
3. Did it fail after restart or reboot?
4. Does the log show missing files, permissions, ports, or config errors?
5. Is the service listening on the expected port?
Check port:
ss -tulpn | grep :PORT
Pattern: server is slow
First checks:
uptime
top
free -h
df -h
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head
Questions:
1. Is CPU usage high?
2. Is memory full?
3. Is swap heavily used?
4. Is disk full?
5. Is one process causing the issue?
6. Did a backup, scan, or batch job start?
Pattern: disk is full
First checks:
df -h
df -i
sudo du -h --max-depth=1 /var | sort -h
sudo lsof | grep deleted
Questions:
1. Which filesystem is full?
2. Is disk space full or are inodes full?
3. Which directory is growing?
4. Are logs too large?
5. Are deleted files still held open by a process?
6. Can files be safely archived, compressed, or removed?
Pattern: memory issue
First checks:
free -h
top
ps aux --sort=-%mem | head
swapon --show
Check for OOM killer:
journalctl -k | grep -i "out of memory"
dmesg -T | grep -i "killed process"
Questions:
1. Is memory actually full?
2. Is swap being used?
3. Which process uses memory?
4. Is this normal for the application?
5. Was a process killed by the OOM killer?
Pattern: network issue
First checks:
ip a
ip r
ip route get 192.0.2.10
cat /etc/resolv.conf
dig example.com
ping -c 4 example.com
ss -tulpn
Questions:
1. Does the server have the correct IP?
2. Is the route/default gateway correct?
3. Does DNS resolve?
4. Can the server reach the destination IP?
5. Is the service listening?
6. Is a firewall blocking traffic?
Pattern: DNS issue
First checks:
ping -c 4 192.0.2.10
ping -c 4 example.com
cat /etc/resolv.conf
dig example.com
dig @8.8.8.8 example.com
cat /etc/hosts
Questions:
1. Does IP connectivity work?
2. Does hostname resolution fail?
3. Which DNS server is configured?
4. Is /etc/hosts overriding anything?
5. Does a different DNS server work?
Pattern: permission denied
First checks:
whoami
id
ls -l FILE
ls -ld DIRECTORY
namei -l /path/to/file
getenforce
Questions:
1. Which user is running the command?
2. Who owns the file?
3. What are the permissions?
4. Does the user belong to the correct group?
5. Are parent directory permissions blocking access?
6. Is SELinux involved?
Pattern: SSH login fails
First checks:
ssh -vvv USERNAME@example-server
On the server:
getent passwd USERNAME
id USERNAME
sudo passwd -S USERNAME
ls -ld /home/USERNAME
ls -ld /home/USERNAME/.ssh
ls -l /home/USERNAME/.ssh/authorized_keys
systemctl status sshd
journalctl -u sshd -n 100
sudo tail -n 100 /var/log/secure
Questions:
1. Is the username correct?
2. Does the user exist?
3. Is the account locked?
4. Is the shell valid?
5. Are SSH key permissions correct?
6. Does sshd allow the login method?
7. Is the firewall allowing SSH?
Pattern: service is not listening on port
First checks:
systemctl status SERVICE_NAME
journalctl -u SERVICE_NAME -n 100
ss -tulpn | grep :PORT
Questions:
1. Is the service running?
2. Is it configured for the expected port?
3. Is another process already using the port?
4. Is the service listening only on localhost?
5. Does the log show bind or permission errors?
Pattern: after reboot problem
First checks:
uptime
last reboot
systemctl --failed
journalctl -b
journalctl -b -1
df -h
findmnt
ip a
ip r
Questions:
1. Did all services start?
2. Did all filesystems mount?
3. Are there failed units?
4. Did network come up correctly?
5. Was /etc/fstab changed?
6. Was a package or kernel updated?
Pattern: logs show errors
Useful commands:
journalctl -xe
journalctl -p err -b
journalctl -u SERVICE_NAME -n 100
sudo tail -n 100 /var/log/messages
dmesg -T | tail -n 100
Questions:
1. What exact error is repeated?
2. When did it start?
3. Which service or process reports it?
4. Is the error caused by disk, network, permissions, config, or dependency?
Pattern: unknown issue
Use a wide first check:
date
hostnamectl
uptime
df -h
df -i
free -h
systemctl --failed
ip a
ip r
ss -tulpn
journalctl -p err -b
Then narrow down:
service issue โ systemd/logs
network issue โ networking/firewall/DNS
disk issue โ filesystem/storage
permission issue โ ownership/groups/SELinux
performance issue โ CPU/memory/disk/processes
Documenting a fix
When you solve something, write down:
Problem:
Symptoms:
Commands used:
Finding:
Fix:
Verification:
What I learned:
Example:
Problem:
Service was not reachable.
Finding:
Service was failed because another process used the same port.
Fix:
Stopped wrong process and restarted correct service.
Verification:
Service was running and listening on expected port.