This MIB module describes the vCenter High Availability Service (VCHA).
A VCHA cluster consists of three VMs identified by a single instance UUID.
One is the Active vCenter VM that serves client requests. Second is the
Passive VM that is identical to the Active vCenter VM in terms of database
and filesystem state. Passive VM constantly receives updates from Active
VM and takes over the role of Active vCenter VM in the event of a
failover. Third is the Witness VM that acts as a quorum VM in a VCHA
cluster. The sole purpose of Witness VM is to avoid classic split-brain
problem in a VCHA cluster.
client
+
|
|
+----------------v---+ +--------------------+
| Public IP | |
| | | |
| Active vCenter | | Passive vCenter |
| | | |
+---Private-IP+------+ +------+Private-IP---+
^ <--------------------------> ^
| DB & File replication |
+ +
+ +
+ +
+------> <----------+
+----Private-IP----+
| |
| Witness vCenter |
| (Quorum) |
| |
+------------------+
All events will not be repeated for the duration of a given state entered.
It is highly recommended that the administrator links the SNMP trap receiver
to both public network and vCenter HA cluster network, so that the
monitoring system is able to get notified as long as one of the
networks is up.
This informative notification is sent from the Active node when it
notices a peer node rejoin the cluster. It is sent only once.
vmwVchaNodeLeft
.1.3.6.1.4.1.6876.53.0.105
This warning notification is sent from the Active node when it notices
a peer node has left the cluster. This is sent only once. Operator
should check the liveness and connectivity of the departed node and try
to bring it back by either rebooting the appliance or resolving the
network problem.
vmwVchaNodeIsolated
.1.3.6.1.4.1.6876.53.0.110
This warning notification is sent when a node is network isolated from
the cluster. This notification can only be sent from the isolated node,
not by other nodes in the cluster. After being isolated, the node will
reboot itself trigging coldStart notification. In case of Active node
failure, the cluster will trigger a reelection and every slave node will
be declared as isolated temporarily before the cluster re-election
completes.
vmwVchaClusterStateChanged
.1.3.6.1.4.1.6876.53.0.130
This notification is sent only once from the Active node when vCenter
HA cluster state changes to either healthy, degraded or isolated. Please
see VmwVchaClusterStateType for detailed description of each state. And
administrator should receive another notification describing the state
change of cluster subsystem (cluster membership, DB replication or file
replication) which is trigger of cluster state change.
vmwVchaClusterModeChanged
.1.3.6.1.4.1.6876.53.0.150
This notification is sent only once from the Active node when vCenter
HA cluster mode changes to either enabled, maintenance or disabled.
vmwVchaPublicIpUp
.1.3.6.1.4.1.6876.53.0.205
This informative notification is sent only once when the public IP
address is brought up on the Active node. At this time, the Active node
is reachable from the client and will be able to serve client requests
when services are up and running.
vmwVchaPublicIpDown
.1.3.6.1.4.1.6876.53.0.206
This informative notification is sent only once when the public
network interface is brought down on the Active node. This can happen
when InitiateFailover is invoked on the Active node or vcha process
gracefully shuts down resulting in a reboot of the appliance (triggered
by network isolation). During this time, clients cannot connect to
vCenter Server and users will experience downtime until the public
network interface is brought up. In either case, users should not
expect more than five minutes of downtime. If VCHA cluster is still not
connectable, the operator should verify the reachability of each node
through the cluster network.
vmwVchaFailoverTriggered
.1.3.6.1.4.1.6876.53.0.210
This informative notification is sent only once when a failover is
triggered from the Active node to Passive node. Passive node should
take over the Active role if the cluster is in healthy state.
vmwVchaFailoverSucceeded
.1.3.6.1.4.1.6876.53.0.220
This informative notification is sent only once when the Passive node
takes over the Active role and brings up the public network interface.
vmwVchaFailoverFailedDisabledMode
.1.3.6.1.4.1.6876.53.0.225
This warning notification is sent only once when the Active node fails
to initiate a failover because the cluster is in disabled mode.
vmwVchaFailoverFailedNodeLost
.1.3.6.1.4.1.6876.53.0.226
This warning notification is sent only once when the Active node fails
to initiate a failover because the cluster does not have all three
nodes connected.
vmwVchaFailoverFailedPassiveNotReady
.1.3.6.1.4.1.6876.53.0.227
This warning notification is sent only once when the Active node fails
to initiate a failover because vPostgres service on the Passive node is
not ready to take over.
vmwVchaContinueAsActive
.1.3.6.1.4.1.6876.53.0.230
This informative notification is sent only once when the last Active
node continue as the Active node to servce client's request. This can
happen in many scenarios:
1. After triggering a planned failover, DB or file replicator failed to
flush data to the Passive node and failover didn't proceed because of
data loss.
2. After triggering a planned or forced failover, Passive node failed to
pick up the Active role for reasons like: auto failover cannot happen in
maintenance mode or cluster is in disabled mode.
vmwVchaDbReplicationStateChanged
.1.3.6.1.4.1.6876.53.0.300
This informative notification is sent only once from the Active node
when database replication state changes to sync, async or no
replication. Database replication is not healthy when it is in async or
no replication state. Reasons include large network delays or vPostgres
service becoming unresponsive on the Passive node.
vmwVchaFileReplicationStateChanged
.1.3.6.1.4.1.6876.53.0.350
This informative notification is sent only once from the Active node
when file replication state changes to in-sync or out-of-sync. File
replication state is out-of-sync when VCHA fails to set a watch on a
file at the Active node or fails to replicate a file from the Active
node to Passive. Administrators should check the corresponding KB
article for recovery action.