Connection Manager

From VSI OpenVMS Wiki
Jump to: navigation, search

Connection Manager is the part of the OpenVMS Cluster Software that ensures the integrity of cluster membership and maintenance of a clear record of cluster membership.

Functions

Connection Manager is responsible for:

  • preventing cluster partitioning (see below)
  • tracking which nodes in OpenVMS cluster are active and which are not
  • delivering messages to remote nodes
  • removing nodes from the cluster
  • providing a highly available message service in which other software componenets, such as the Distributes Lock Manager, can synchronize access to shared resources.

Cluster Partitioning

Cluster partitioning is a condition in which nodes in an existing OpenVMS Cluster configuration divide into two or more independent clusters. Cluster partitioning can result in data file corruption because the distributed lock manager cannot coordinate access to shared resources for multiple OpenVMS Cluster systems. The connection manager prevents cluster partitioning using a quorum algorithm.

Quorum Algorithm

The quorum algorithm is a mathematical method for determining if a majority of OpenVMS cluster members are in the cluster so that resources can be shared across the cluster. Quorum is the number of votes that must be present for the cluster to function.

See Quorum for details on calculating cluster votes, two-node clusters, and more.

State Transitions

OpenVMS Cluster state transitions occur when a computer joins or leaves an OpenVMS Cluster system and when the cluster recognizes a quorum disk state change. The connection manager controls these events to ensure the preservation of data integrity throughout the cluster.

Adding a Member

Early in its boot sequence, a computer seeking membership in an OpenVMS Cluster system sends messages to current members asking to join the cluster.

The first cluster member that receives the membership request acts as the new computer's advocate and proposes reconfiguring the cluster to include the computer in the cluster. While the new computer is booting, no applications are affected.

The connection manager will not allow a computer to join the OpenVMS Cluster system if the node's value for EXPECTED_VOTES would readjust quorum higher than calculated votes to cause the OpenVMS Cluster to suspend activity.

During a configuration change due to a computer being added to an OpenVMS Cluster,all current OpenVMS Cluster members must establish communications with the new computer. Once communications are established, the new computer is admitted to the cluster. In some cases, the lock database is rebuilt.

Losing a Member

During normal cluster operation, messages sent from one computer to another are acknowledged when received. If a message is not acknowledged within a period determined by the OpenVMS Cluster Communications software, then the repair attempt phase begins.

The system assumes that the virtual circuit to an OpenVMS cluster member is broken, and the path needs to be repaired. Repair attempts continue for PAPOLLINTERVAL; if the path is not fixed, it is considered irrevocably broken, and the cluster must be reconfigured. If a cluster member is shut down or fails, OpenVMS causes datagrams to be sent from the computer shutting down to the other members. These datagrams state the computer's intention to sever communications and to stop sharing resources. The failure detection and repair attempt phases are bypassed, and the reconfiguration phase begins immediately.

In the reconfiguration phase, one of the remaining computers acts as coordinator and exchanges messages with all other cluster members to determine an optimal cluster configuration with the most members and the most votes. This phase, during which all user (application) activity is blocked, usually lasts less than 3 seconds, although the actual time depends on the configuration.

During the recovery phase, the following stages occur in parallel or consecutively:

  • I/O operations started prior to the transition complete
  • a rebuild is performed
  • if quorum is lost, disks undergo mount verification
  • quorum disk votes are vaidated
  • disks are rebuilt
  • XFC cache is flushed
  • clusterwide logicals are recovered.

During the application recovery phase, the journal file is replayed, recovery units are cleaned up, and users log in again.

Cluster Membership

OpenVMS Cluster systems based on LAN or IP network use a cluster group number and a cluster password to allow multiple independent OpenVMS Cluster systems to coexist on the same extended LAN or IP network and to prevent accidental access to a cluster by unauthorized computers.

The cluster group number and password are stored in the cluster authorization file, CLUSTER_AUTHORIZE.DAT in SYS$COMMON:[SYSEXE]. This file is created during the installation of the operating system, if you indicate that you want to set up a cluster that utilizes the shared memory or the LAN. The installation procedure then prompts you for the cluster group number and password.

See also