Troubleshooting Patterns

General Linux troubleshooting patterns explained in a public-safe way.

Use this page when something is broken but the cause is not clear yet.


Basic troubleshooting mindset

Do not guess first. Check facts.

Ask:

1. What is the exact problem?
2. When did it start?
3. What changed recently?
4. Is it one user, one service, one server, or many systems?
5. Is there an error message?
6. Can the problem be reproduced?
7. What do the logs say?

First command set

When the issue is unclear, start with:

hostnamectl
uptime
df -h
free -h
systemctl --failed
ip a
ip r
ss -tulpn
journalctl -xe

These commands help check:

system identity
uptime and load
disk usage
memory usage
failed services
network addresses
routes
listening ports
recent errors

Pattern: service is not working

First checks:

systemctl status SERVICE_NAME
journalctl -u SERVICE_NAME -n 100
journalctl -u SERVICE_NAME -p err
systemctl --failed

Questions:

1. Is the service running?
2. Is it failed?
3. Did it fail after restart or reboot?
4. Does the log show missing files, permissions, ports, or config errors?
5. Is the service listening on the expected port?

Check port:

ss -tulpn | grep :PORT

Pattern: server is slow

First checks:

uptime
top
free -h
df -h
ps aux --sort=-%cpu | head
ps aux --sort=-%mem | head

Questions:

1. Is CPU usage high?
2. Is memory full?
3. Is swap heavily used?
4. Is disk full?
5. Is one process causing the issue?
6. Did a backup, scan, or batch job start?

Pattern: disk is full

First checks:

df -h
df -i
sudo du -h --max-depth=1 /var | sort -h
sudo lsof | grep deleted

Questions:

1. Which filesystem is full?
2. Is disk space full or are inodes full?
3. Which directory is growing?
4. Are logs too large?
5. Are deleted files still held open by a process?
6. Can files be safely archived, compressed, or removed?

Pattern: memory issue

First checks:

free -h
top
ps aux --sort=-%mem | head
swapon --show

Check for OOM killer:

journalctl -k | grep -i "out of memory"
dmesg -T | grep -i "killed process"

Questions:

1. Is memory actually full?
2. Is swap being used?
3. Which process uses memory?
4. Is this normal for the application?
5. Was a process killed by the OOM killer?

Pattern: network issue

First checks:

ip a
ip r
ip route get 192.0.2.10
cat /etc/resolv.conf
dig example.com
ping -c 4 example.com
ss -tulpn

Questions:

1. Does the server have the correct IP?
2. Is the route/default gateway correct?
3. Does DNS resolve?
4. Can the server reach the destination IP?
5. Is the service listening?
6. Is a firewall blocking traffic?

Pattern: DNS issue

First checks:

ping -c 4 192.0.2.10
ping -c 4 example.com
cat /etc/resolv.conf
dig example.com
dig @8.8.8.8 example.com
cat /etc/hosts

Questions:

1. Does IP connectivity work?
2. Does hostname resolution fail?
3. Which DNS server is configured?
4. Is /etc/hosts overriding anything?
5. Does a different DNS server work?

Pattern: permission denied

First checks:

whoami
id
ls -l FILE
ls -ld DIRECTORY
namei -l /path/to/file
getenforce

Questions:

1. Which user is running the command?
2. Who owns the file?
3. What are the permissions?
4. Does the user belong to the correct group?
5. Are parent directory permissions blocking access?
6. Is SELinux involved?

Pattern: SSH login fails

First checks:

ssh -vvv USERNAME@example-server

On the server:

getent passwd USERNAME
id USERNAME
sudo passwd -S USERNAME
ls -ld /home/USERNAME
ls -ld /home/USERNAME/.ssh
ls -l /home/USERNAME/.ssh/authorized_keys
systemctl status sshd
journalctl -u sshd -n 100
sudo tail -n 100 /var/log/secure

Questions:

1. Is the username correct?
2. Does the user exist?
3. Is the account locked?
4. Is the shell valid?
5. Are SSH key permissions correct?
6. Does sshd allow the login method?
7. Is the firewall allowing SSH?

Pattern: service is not listening on port

First checks:

systemctl status SERVICE_NAME
journalctl -u SERVICE_NAME -n 100
ss -tulpn | grep :PORT

Questions:

1. Is the service running?
2. Is it configured for the expected port?
3. Is another process already using the port?
4. Is the service listening only on localhost?
5. Does the log show bind or permission errors?

Pattern: after reboot problem

First checks:

uptime
last reboot
systemctl --failed
journalctl -b
journalctl -b -1
df -h
findmnt
ip a
ip r

Questions:

1. Did all services start?
2. Did all filesystems mount?
3. Are there failed units?
4. Did network come up correctly?
5. Was /etc/fstab changed?
6. Was a package or kernel updated?

Pattern: logs show errors

Useful commands:

journalctl -xe
journalctl -p err -b
journalctl -u SERVICE_NAME -n 100
sudo tail -n 100 /var/log/messages
dmesg -T | tail -n 100

Questions:

1. What exact error is repeated?
2. When did it start?
3. Which service or process reports it?
4. Is the error caused by disk, network, permissions, config, or dependency?

Pattern: unknown issue

Use a wide first check:

date
hostnamectl
uptime
df -h
df -i
free -h
systemctl --failed
ip a
ip r
ss -tulpn
journalctl -p err -b

Then narrow down:

service issue โ†’ systemd/logs
network issue โ†’ networking/firewall/DNS
disk issue โ†’ filesystem/storage
permission issue โ†’ ownership/groups/SELinux
performance issue โ†’ CPU/memory/disk/processes

Documenting a fix

When you solve something, write down:

Problem:
Symptoms:
Commands used:
Finding:
Fix:
Verification:
What I learned:

Example:

Problem:
Service was not reachable.

Finding:
Service was failed because another process used the same port.

Fix:
Stopped wrong process and restarted correct service.

Verification:
Service was running and listening on expected port.