[et_pb_section fb_built=»1″ admin_label=»section» _builder_version=»3.22″][et_pb_row admin_label=»row» _builder_version=»3.22″ background_size=»initial» background_position=»top_left» background_repeat=»repeat»][et_pb_column type=»4_4″ _builder_version=»3.0.47″][et_pb_text _builder_version=»3.22.2″ background_size=»initial» background_position=»top_left» background_repeat=»repeat»]
With VCD 9.7 Appliance embedded PostgresSQL Database is introduced in HA configuration.
This article describes how to recover from a primary or standby database failure in a High Availability Cluster.
1. Desired State of the High Availability Cluster
The current state of the Cluster can be checked through the «vCloud Director Appliance Management» page, which is available under «https://primary_cell_ip_address:5480″.
If you are able to login to the console of a cell, it is also possible to get the current state with the following command.
sudo -n -u postgres /opt/vmware/vpostgres/current/bin/repmgr cluster show ID | Name | Role | Status | Upstream | Location | Connection string----+-------+---------+-----------+----------+----------+----------------------------------------------- 3889 | vcd04 | standby | running | vcd02 | default | host=192.168.194.14 user=repmgr dbname=repmgr 5066 | vcd01 | standby | running | vcd02 | default | host=192.168.194.11 user=repmgr dbname=repmgr 28278 | vcd02 | primary | * running | | default | host=192.168.194.12 user=repmgr dbname=repmgr
2. Example with some failed components of the HA Cluster.
Overview from the «vCloud Director Appliance Management» page:
Output from the cell console:
sudo -n -u postgres /opt/vmware/vpostgres/current/bin/repmgr cluster show ID | Name | Role | Status | Upstream | Location | Connection string----+-------+---------+---------------+----------+----------+----------------------------------------------- 3889 | vcd04 | standby | running | vcd02 | default | host=192.168.194.14 user=repmgr dbname=repmgr 14714 | vcd01 | primary | - failed | | default | host=192.168.194.11 user=repmgr dbname=repmgr 27985 | vcd03 | standby | ? unreachable | vcd01 | default | host=192.168.194.13 user=repmgr dbname=repmgr
As you can see in the above example one standby and one primary node of the HA Cluster is failed.
3. Recommended way to recover a primary node
sudo -n -u postgres /opt/vmware/vpostgres/current/bin/repmgr primary unregister --node-id=14714INFO: node vcd01 (ID: 14714) was successfully unregisteredsudo -n -u postgres /opt/vmware/vpostgres/current/bin/repmgr cluster show ID | Name | Role | Status | Upstream | Location | Connection string----+-------+---------+-----------+----------+----------+----------------------------------------------- 3889 | vcd04 | standby | running | vcd02 | default | host=192.168.194.14 user=repmgr dbname=repmgr 28278 | vcd02 | primary | * running | | default | host=192.168.194.12 user=repmgr dbname=repmgr 27985 | vcd03 | standby | ? unreachable | vcd01 | default | host=192.168.194.13 user=repmgr dbname=repmgr
sudo -n -u postgres /opt/vmware/vpostgres/current/bin/repmgr cluster show ID | Name | Role | Status | Upstream | Location | Connection string----+-------+---------+---------------+----------+----------+----------------------------------------------- 3889 | vcd04 | standby | running | vcd02 | default | host=192.168.194.14 user=repmgr dbname=repmgr 5066 | vcd01 | standby | running | vcd02 | default | host=192.168.194.11 user=repmgr dbname=repmgr 27985 | vcd03 | standby | ? unreachable | vcd01 | default | host=192.168.194.13 user=repmgr dbname=repmgr 28278 | vcd02 | primary | * running | | default | host=192.168.194.12 user=repmgr dbname=repmgr
4. Recommended way to recover a standby node
sudo -n -u postgres /opt/vmware/vpostgres/current/bin/repmgr standby unregister --node-id=27985INFO: connecting to local standbyINFO: connecting to primary databaseNOTICE: unregistering node 27985sudo -n -u postgres /opt/vmware/vpostgres/current/bin/repmgr cluster show ID | Name | Role | Status | Upstream | Location | Connection string----+-------+---------+-----------+----------+----------+----------------------------------------------- 3889 | vcd04 | standby | running | vcd02 | default | host=192.168.194.14 user=repmgr dbname=repmgr 5066 | vcd01 | standby | running | vcd02 | default | host=192.168.194.11 user=repmgr dbname=repmgr 28278 | vcd02 | primary | * running | | default | host=192.168.194.12 user=repmgr dbname=repmgr
sudo -n -u postgres /opt/vmware/vpostgres/current/bin/repmgr cluster show ID | Name | Role | Status | Upstream | Location | Connection string----+-------+---------+-----------+----------+----------+----------------------------------------------- 3889 | vcd04 | standby | running | vcd02 | default | host=192.168.194.14 user=repmgr dbname=repmgr 5066 | vcd01 | standby | running | vcd02 | default | host=192.168.194.11 user=repmgr dbname=repmgr 28278 | vcd02 | primary | * running | | default | host=192.168.194.12 user=repmgr dbname=repmgr
You may also refer to the following documentation from VMware:
VMware documentaion how to recover from a primary Database failure
VMware documentaion how to check the status of cells in an High Availability Database Cluster
Für weitere Fragen stehe ich gerne in den Kommentaren zur Verfügung.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section]