Cluster
Basic Veritas cluster checks and safety notes.
Use this page when checking cluster status, service groups, resources, nodes, and general cluster health.
Safety rule
Before running Veritas commands, ask:
1. Am I on the correct server?
2. Is this production or test?
3. Am I only checking status, or changing state?
4. Which service group is affected?
5. Which node is active?
6. Is the group frozen?
7. Is there a fault?
8. Do I have approval to change something?
If unsure, only run read-only status commands.
Basic terms
Cluster = group of servers working together
Node = one server in the cluster
Service group = group of resources managed together
Resource = one managed item, such as IP, mount, disk group, app
Online = running
Offline = stopped
Faulted = problem state
Frozen = automatic actions may be blocked
First cluster status check
hastatus -sum
This is usually the first command to run.
It can show:
cluster state
nodes
service groups
which node a group is running on
faulted resources
Watch cluster status live
hastatus
Exit with:
Ctrl + C
Check service groups
hagrp -state
Check one service group:
hagrp -state SERVICE_GROUP
Display one service group:
hagrp -display SERVICE_GROUP
Check resources
hares -state
Display all resources:
hares -display
Display one resource:
hares -display RESOURCE_NAME
Check systems/nodes
hasys -display
Check if a group is frozen
hagrp -display SERVICE_GROUP | grep -i frozen
Common read-only checks
These are generally status/display commands:
hastatus -sum
hagrp -state
hagrp -display
hares -state
hares -display
hasys -display
Still be careful and understand the environment.
Commands that change state
These commands can affect availability.
Do not run without approval and understanding.
hagrp -online SERVICE_GROUP -sys NODE_NAME
hagrp -offline SERVICE_GROUP -sys NODE_NAME
hagrp -switch SERVICE_GROUP -to NODE_NAME
hagrp -freeze SERVICE_GROUP
hagrp -unfreeze SERVICE_GROUP
hagrp -clear SERVICE_GROUP
hares -clear RESOURCE_NAME
OS checks before blaming the cluster
Sometimes the cluster is only showing a problem caused by the OS or application.
hostnamectl
uptime
df -h
free -h
ip a
ip r
ss -tulpn
systemctl --failed
journalctl -xe
First command set
hastatus -sum
hagrp -state
hares -state
hasys -display
hostnamectl
uptime
df -h
free -h
systemctl --failed
journalctl -xe
Dangerous actions
Be careful with:
hagrp -online
hagrp -offline
hagrp -switch
hagrp -freeze
hagrp -unfreeze
hagrp -clear
hares -clear
stopping cluster services
changing cluster configuration
changing disk/mount/network resources