Cluster

Basic Veritas cluster checks and safety notes.

Use this page when checking cluster status, service groups, resources, nodes, and general cluster health.


Safety rule

Before running Veritas commands, ask:

1. Am I on the correct server?
2. Is this production or test?
3. Am I only checking status, or changing state?
4. Which service group is affected?
5. Which node is active?
6. Is the group frozen?
7. Is there a fault?
8. Do I have approval to change something?

If unsure, only run read-only status commands.


Basic terms

Cluster       = group of servers working together
Node          = one server in the cluster
Service group = group of resources managed together
Resource      = one managed item, such as IP, mount, disk group, app
Online        = running
Offline       = stopped
Faulted       = problem state
Frozen        = automatic actions may be blocked

First cluster status check

hastatus -sum

This is usually the first command to run.

It can show:

cluster state
nodes
service groups
which node a group is running on
faulted resources

Watch cluster status live

hastatus

Exit with:

Ctrl + C

Check service groups

hagrp -state

Check one service group:

hagrp -state SERVICE_GROUP

Display one service group:

hagrp -display SERVICE_GROUP

Check resources

hares -state

Display all resources:

hares -display

Display one resource:

hares -display RESOURCE_NAME

Check systems/nodes

hasys -display

Check if a group is frozen

hagrp -display SERVICE_GROUP | grep -i frozen

Common read-only checks

These are generally status/display commands:

hastatus -sum
hagrp -state
hagrp -display
hares -state
hares -display
hasys -display

Still be careful and understand the environment.


Commands that change state

These commands can affect availability.

Do not run without approval and understanding.

hagrp -online SERVICE_GROUP -sys NODE_NAME
hagrp -offline SERVICE_GROUP -sys NODE_NAME
hagrp -switch SERVICE_GROUP -to NODE_NAME
hagrp -freeze SERVICE_GROUP
hagrp -unfreeze SERVICE_GROUP
hagrp -clear SERVICE_GROUP
hares -clear RESOURCE_NAME

OS checks before blaming the cluster

Sometimes the cluster is only showing a problem caused by the OS or application.

hostnamectl
uptime
df -h
free -h
ip a
ip r
ss -tulpn
systemctl --failed
journalctl -xe

First command set

hastatus -sum
hagrp -state
hares -state
hasys -display
hostnamectl
uptime
df -h
free -h
systemctl --failed
journalctl -xe

Dangerous actions

Be careful with:

hagrp -online
hagrp -offline
hagrp -switch
hagrp -freeze
hagrp -unfreeze
hagrp -clear
hares -clear
stopping cluster services
changing cluster configuration
changing disk/mount/network resources