gs_sdr

Background

openGauss 3.1.0 and later versions provide the gs_sdr tool to implement cross-region remote DR without using additional storage media. The tool provides functions such as streaming DR establishment, DR switchover, planned primary/standby switchover, DR removal, DR status monitoring, and displaying the help information and version number.

Prerequisites

Log in to the OS as the OS user omm to run the gs_sdr command.

Syntax

  • Establishing a DR Relationship

    gs_sdr -t start -m [primary|disaster_standby] [-U DR_USERNAME] [-W DR_PASSWORD] [-X XMLFILE] [--json JSONFILE] [--time-out=SECS] [-l LOGFILE]
    
  • Promoting DR Instance to Primary

    gs_sdr -t failover [-l LOGFILE] 
    
  • Planned Primary/Standby Switchover

    gs_sdr -t switchover -m [primary|disaster_standby] [--time-out=SECS] [-l LOGFILE]
    
  • DR Removal

    gs_sdr -t stop [-X XMLFILE] [--json JSONFILE] [-l LOGFILE]
    
  • Monitoring DR Status

    gs_sdr -t query [-l LOGFILE]
    

Parameter Description

gs_sdr has the following types of parameters:

  • Common parameters

    • -t

      Specifies the type of the gs_sdr command.

      Value range: start, failover, switchover, stop, or query.

    • -l

      Specifies a log file and its storage path.

      Default value: $GAUSSLOG/om/gs_sdr-YYYY-MM-DD_hhmmss.log

    • -?, --help

      Display the help information.

    • -V, --version

      Displays version information.

  • Parameters for establishing DR relationship:

    • -m

      Expected role of the cluster in the DR relationship.

      Value range: primary or disaster_standby.

    • -U

      Name of the DR user with the streaming replication permission.

    • -W

      Password of the DR user.

      NOTE:

      1. Before the DR relationship is established, you need to create a DR user on the primary cluster for DR authentication. The primary and standby clusters must use the same DR username and password. After a DR relationship is established, the user password cannot be changed. You can remove the DR relationship, modify the username and password, and establish the DR relationship again. The DR user password cannot contain blank characters and the following characters: |;&$<>`'"{}()[]~*?!\n
      2. If the -U and -W parameters are not input in the command line, they can be input in interactive mode during the establishment.
    • -X

      XML file used during cluster installation. DR information can be configured in the XML file for DR establishment. That is, three columns (“localStreamIpmap1”, “remoteStreamIpmap1” and remotedataPortBase) can be extended based on the XML file.

      The following describes how to configure the new columns. The information in bold is an example. Each line of information has a comment.

      <!-- Information about the node deployment on each server -->
      <DEVICELIST>
      <DEVICE sn="pekpomdev00038">
      <!-- Number of primary DNs that need to be deployed on the current host -->
      <PARAM name="dataNum" value="1"/>
      <!-- Basic port number of the primary DN -->
      <PARAM name="dataPortBase" value="26000"/>
      <!--Mapping between the SSH reliable channel IP address and streaming replication IP address of each DN shard node in the cluster -->
      <PARAM name="localStreamIpmap1" value="(10.244.44.216,172.31.12.58),(10.244.45.120,172.31.0.91)"/>
      <!--Mapping between the SSH reliable channel IP address and streaming replication IP address of each DN shard node in the peer cluster -->
      <PARAM name="remoteStreamIpmap1" value="(10.244.45.144,172.31.2.200),(10.244.45.40,172.31.0.38),(10.244.46.138,172.31.11.145),(10.244.48.60,172.31.9.37),(10.244.47.240,172.31.11.125)"/>
      <!--Port number of the primary DN in the peer cluster -->
      <PARAM name="remotedataPortBase" value="26000"/>
      </DEVICE>
      
    • --json

      JSON file containing DR information.

      The following describes how to configure the JSON file. The information in bold is an example.

      {"remoteClusterConf": {"port": 26000, "shards": [[{"ip": "10.244.45.144", "dataIp": "172.31.2.200"}, {"ip": "10.244.45.40", "dataIp": "172.31.0.38"}, {"ip": "10.244.46.138", "dataIp": "172.31.11.145"}, {"ip": "10.244.48.60", "dataIp": "172.31.9.37"}, {"ip": "10.244.47.240", "dataIp": "172.31.11.125"}]]}, "localClusterConf": {"port": 26000, "shards": [[{"ip": "10.244.44.216", "dataIp": "172.31.12.58"}, {"ip": "10.244.45.120", "dataIp": "172.31.0.91"}]]}}
      Parameter description:
      # remoteClusterConf: DN shard information of the peer cluster. In the preceding command, port indicates the port of the primary DN in the peer cluster, and {"ip": "10.244.45.144", "dtaIp": "172.31.2.200"} indicates the mapping between the SSH reliable channel IP address and streaming replication IP address of each DN shard node in the peer cluster.
      # localClusterConf: DN shard information of the cluster. In the preceding command, port indicates the port of the primary DN in the cluster, and {"ip": "10.244.44.216", "dtaIp": "172.31.12.58"} indicates the mapping between the SSH reliable channel IP address and streaming replication IP address of each DN shard node in the cluster.
      

      NOTE:

      -Either -X or --json can be used to configure DR information. If both parameters are delivered in the command, the JSON file prevails.

    • --time-out=SECS

      Specifies the timeout period. The primary cluster waits for the connection to the standby cluster. If the connection times out, the OM script automatically exits. Unit: s

      Value range: a positive integer. The recommended value is 1200.

      Default value: 1200

  • Parameters for switching a DR node to primary:

    None.

  • Parameters for removing DR:

    • -X

      DR information configured in the XML file during cluster installation. That is, three columns (“localStreamIpmap1”, “remoteStreamIpmap1” and remotedataPortBase) need to be extended.

    • --json

      JSON file containing local and peer DR information.

      NOTE:

      -For details about how to configure -X and --json, see the parameters for establishing DR relationship in this section.

  • DR query parameters:

    • None.

    The DR status query result is described as follows:

Item

Meaning

Value

Description

Remarks

hadr_cluster_stat

Database instance status in streaming DR

normal

The database instance does not participate in streaming DR.

-

full_backup

Full data replication in the primary database instance is in progress.

This status is available only for the primary database instance in streaming DR.

archive

Streaming log replication in the primary database instance is in progress.

This status is available only for the primary database instance in streaming DR.

backup_fail

Full data replication in the primary database instance fails.

This status is available only for the primary database instance in streaming DR.

archive_fail

Streaming log replication in the primary database instance fails.

This status is available only for the primary database instance in streaming DR.

switchover

Planned primary/standby switchover is in progress.

This status is available for both the primary and standby database instances in streaming DR.

restore

Full data restoration in the DR database instance is in progress.

This status is available only for DR database instances in streaming DR.

restore_fail

Full data restoration in the DR database instance fails.

This status is available only for DR database instances in streaming DR.

recovery

Streaming log replication in the DR database instance is in progress.

This status is available only for DR database instances in streaming DR.

recovery_fail

Streaming log replication in the DR database instance fails.

This status is available only for DR database instances in streaming DR.

promote

The DR database instance is being promoted to primary.

This status is available only for DR database instances in streaming DR.

promote_fail

The DR database instance fails to promote to primary.

This status is available only for DR database instances in streaming DR.

hadr_switchover_stat

Progress of the planned switchover between the primary and standby database instances in streaming DR

Percentage

Switchover progress.

-

hadr_failover_stat

Progress of promoting a streaming DR database instance to primary

Percentage

Switchover progress.

-

RTO

Time required for data restoration when a disaster occurs

Null

Streaming DR is interrupted due to database instance shutdown or network exceptions.

Only the primary database instance can be queried in streaming DR.

Not null

Time required for data restoration \(unit: s\)

RPO

Data loss duration of the database instance when a disaster occurs.

Null

Streaming DR is interrupted due to database instance shutdown or network exceptions.

Only the primary database instance can be queried in streaming DR.

Non null

Duration in which data of the database instance may be lost, in seconds.

Examples

  • Establish a DR relationship in a primary cluster.

    gs_sdr -t start -m primary -X /opt/install_streaming_primary_cluster.xml --time-out=1200 -U 'hadr_user' -W 'opengauss@123'
    --------------------------------------------------------------------------------
    Streaming disaster recovery start 2b9bc268d8a111ecb679fa163e2f2d28
    --------------------------------------------------------------------------------
    Start create streaming disaster relationship ...
    Got step:[-1] for action:[start].
    Start first step of streaming start.
    Start common config step of streaming start.
    Start generate hadr key files.
    Streaming key files already exist.
    Finished generate and distribute hadr key files.
    Start encrypt hadr user info.
    Successfully encrypt hadr user info.
    Start save hadr user info into database.
    Successfully save hadr user info into database.
    Start update pg_hba config.
    Successfully update pg_hba config.
    Start second step of streaming start.
    Successfully check cluster status is: Normal
    Successfully check instance status.
    Successfully check cm_ctl is available.
    Successfully check cluster is not under upgrade opts.
    Start checking disaster recovery user.
    Successfully check disaster recovery user.
    Start prepare secure files.
    Start copy hadr user key files.
    Successfully copy secure files.
    Start fourth step of streaming start.
    Starting reload wal_keep_segments value: 16384.
    Successfully reload wal_keep_segments value: 16384.
    Start fifth step of streaming start.
    Successfully set [/omm/CMServer/backup_open][0].
    Start sixth step of streaming start.
    Start seventh step of streaming start.
    Start eighth step of streaming start.
    Waiting main standby connection..
    Main standby already connected.
    Successfully check cluster status is: Normal
    Start ninth step of streaming start.
    Starting reload wal_keep_segments value: {'6001': '128'}.
    Successfully reload wal_keep_segments value: {'6001': '128'}.
    Successfully removed step file.
    Successfully do streaming disaster recovery start.
    
  • Establish a DR relationship in a standby cluster.

    gs_sdr -t start -m disaster_standby -X /opt/install_streaming_standby_cluster.xml --time-out=1200 -U 'hadr_user' -W 'opengauss@123'
    --------------------------------------------------------------------------------
    Streaming disaster recovery start e34ec1e4d8a111ecb617fa163e77e94a
    --------------------------------------------------------------------------------
    Start create streaming disaster relationship ...
    Got step:[-1] for action:[start].
    Start first step of streaming start.
    Start common config step of streaming start.
    Start update pg_hba config.
    Successfully update pg_hba config.
    Start second step of streaming start.
    Successfully check cluster status is: Normal
    Successfully check instance status.
    Successfully check cm_ctl is available.
    Successfully check cluster is not under upgrade opts.
    Start build key files from remote cluster.
    Start copy hadr user key files.
    Successfully build and distribute key files to all nodes.
    Start fourth step of streaming start.
    Start fifth step of streaming start.
    Successfully set [/omm/CMServer/backup_open][2].
    Stopping the cluster by node.
    Successfully stopped the cluster by node for streaming cluster.
    Start sixth step of streaming start.
    Start seventh step of streaming start.
    Start eighth step of streaming start.
    Starting the cluster.
    Successfully started primary instance. Please wait for standby instances.
    Waiting cluster normal...
    Successfully started standby instances.
    Successfully check cluster status is: Normal
    Start ninth step of streaming start.
    Successfully removed step file.
    Successfully do streaming disaster recovery start.
    
  • Demote a primary cluster to standby as planned.

    gs_sdr -t switchover -m disaster_standby
    --------------------------------------------------------------------------------
    Streaming disaster recovery switchover 6897d15ed8a411ec82acfa163e2f2d28
    --------------------------------------------------------------------------------
    Start streaming disaster switchover ...
    Streaming disaster cluster switchover...
    Successfully check cluster status is: Normal
    Parse cluster conf from file.
    Successfully parse cluster conf from file.
    Successfully check cluster is not under upgrade opts.
    Got step:[-1] for action:[switchover].
    Stopping the cluster.
    Successfully stopped the cluster.
    Starting the cluster.
    Successfully started primary instance. Please wait for standby instances.
    Waiting cluster normal...
    Successfully started standby instances.
    Start checking truncation, please wait...
    Stopping the cluster.
    Successfully stopped the cluster.
    Starting the cluster.
    Successfully started primary instance. Please wait for standby instances.
    Waiting cluster normal...
    Successfully started standby instances.
    .
    The cluster status is Normal.
    Successfully removed step file.
    Successfully do streaming disaster recovery switchover.
    
  • Promote a standby cluster to primary as planned.

    gs_sdr -t switchover -m primary
    --------------------------------------------------------------------------------
    Streaming disaster recovery switchover 20542bbcd8a511ecbbdbfa163e77e94a
    --------------------------------------------------------------------------------
    Start streaming disaster switchover ...
    Streaming disaster cluster switchover...
    Waiting for cluster and instances normal...
    Successfully check cluster status is: Normal
    Parse cluster conf from file.
    Successfully parse cluster conf from file.
    Successfully check cluster is not under upgrade opts.
    Waiting for switchover barrier...
    Got step:[-1] for action:[switchover].
    Stopping the cluster by node.
    Successfully stopped the cluster by node for streaming cluster.
    Starting the cluster.
    Successfully started primary instance. Please wait for standby instances.
    Waiting cluster normal...
    Successfully started standby instances.
    Successfully check cluster status is: Normal
    Successfully removed step file.
    Successfully do streaming disaster recovery switchover.
    
  • Promote a DR cluster to primary.

    gs_sdr -t failover
    --------------------------------------------------------------------------------
    Streaming disaster recovery failover 65535214d8a611ecb804fa163e2f2d28
    --------------------------------------------------------------------------------
    Start streaming disaster failover ...
    Got step:[-1] for action:[failover].
    Successfully check cluster status is: Normal
    Successfully check cluster is not under upgrade opts.
    Parse cluster conf from file.
    Successfully parse cluster conf from file.
    Got step:[-1] for action:[failover].
    Starting drop all node replication slots
    Finished drop all node replication slots
    Stopping the cluster by node.
    Successfully stopped the cluster by node for streaming cluster.
    Start remove replconninfo for instance:6001
    Start remove replconninfo for instance:6002
    Start remove replconninfo for instance:6003
    Start remove replconninfo for instance:6005
    Start remove replconninfo for instance:6004
    Successfully removed replconninfo for instance:6001
    Successfully removed replconninfo for instance:6004
    Successfully removed replconninfo for instance:6003
    Successfully removed replconninfo for instance:6002
    Successfully removed replconninfo for instance:6005
    Start remove pg_hba config.
    Finished remove pg_hba config.
    Starting the cluster.
    Successfully started primary instance. Please wait for standby instances.
    Waiting cluster normal...
    Successfully started standby instances.
    Successfully check cluster status is: Normal
    Try to clean hadr user info.
    Successfully clean hadr user info from database.
    Successfully removed step file.
    Successfully do streaming disaster recovery failover.
    
  • Remove a primary cluster DR.

    gs_sdr -t stop -X /opt/install_streaming_standby_cluster.xml
    --------------------------------------------------------------------------------
    Streaming disaster recovery stop dae8539ed8a611ecade9fa163e77e94a
    --------------------------------------------------------------------------------
    Start remove streaming disaster relationship ...
    Got step:[-1] for action:[stop].
    Start first step of streaming stop.
    Start second step of streaming start.
    Successfully check cluster status is: Normal
    Check cluster type succeed.
    Successfully check cluster is not under upgrade opts.
    Start third step of streaming stop.
    Start remove replconninfo for instance:6001
    Start remove replconninfo for instance:6002
    Successfully removed replconninfo for instance:6001
    Successfully removed replconninfo for instance:6002
    Start remove cluster file.
    Finished remove cluster file.
    Start fourth step of streaming stop.
    Start remove pg_hba config.
    Finished remove pg_hba config.
    Start fifth step of streaming start.
    Starting drop all node replication slots
    Finished drop all node replication slots
    Start sixth step of streaming stop.
    Successfully check cluster status is: Normal
    Try to clean hadr user info.
    Successfully clean hadr user info from database.
    Successfully removed step file.
    Successfully do streaming disaster recovery stop.
    
  • Query the DR status.

    gs_sdr -t query
    --------------------------------------------------------------------------------
    Streaming disaster recovery query 1201b062d8a411eca83efa163e2f2d28
    --------------------------------------------------------------------------------
    Start streaming disaster query ...
    Successfully check cluster is not under upgrade opts.
    Start check archive.
    Start check recovery.
    Start check RPO & RTO.
    Successfully execute streaming disaster recovery query, result:
    {'hadr_cluster_stat': 'archive', 'hadr_failover_stat': '', 'hadr_switchover_stat': '', 'RPO': '0', 'RTO': '0'}
    
Feedback
编组 3备份
    openGauss 2024-05-07 00:46:52
    cancel