Automatic Job Retry upon Failure

Availability

This feature is available since openGauss 1.0.0.

Introduction

If an error occurs in batch processing jobs due to network exceptions or deadlocks, failed jobs are automatically retried.

Benefits

In common fault scenarios, such as network exception and deadlock, queries retry automatically in case of failure to improve database usability.

Description

openGauss provides the job retry mechanism: gsql Retry.

  • The gsql retry mechanism uses a unique error code (SQL STATE) to identify an error that requires a retry. The function of the client tool gsql is enhanced. The error code configuration file retry_errcodes.conf is used to configure the list of errors that require a retry. The file is stored in the installation directory at the same level as gsql. gsql provides the **\set RETRY **[number] command to enable or disable the retry function. The number of retry times ranges from 5 to 10, and the default value is 5. When this function is enabled, gsql reads the preceding configuration file. The error retry controller records the error code list through the container. If an error occurs in the configuration file after the function is enabled, the controller sends the cached query statement to the server for retry until the query is successful or an error is reported when the number of retry times exceeds the maximum.

Enhancements

None

Constraints

  • Functionality constraints:

    • Retrying increases execution success rate but does not guarantee success.
  • Error type constraints:

    Only the error types in Table 1 are supported.

    Table 1 Supported error types

    Error Type

    Error Code

    Remarks

    CONNECTION_RESET_BY_PEER

    YY001

    TCP communication error. Print information: "Connection reset by peer"

    STREAM_CONNECTION_RESET_BY_PEER

    YY002

    TCP communication error. Print information: "Stream connection reset by peer" (communication between DNs)

    LOCK_WAIT_TIMEOUT

    YY003

    Lock wait timeout. Print information: "Lock wait timeout"

    CONNECTION_TIMED_OUT

    YY004

    TCP communication error. Print information: "Connection timed out"

    SET_QUERY_ERROR

    YY005

    Failed to deliver the SET command. Print information: "Set query error"

    OUT_OF_LOGICAL_MEMORY

    YY006

    Failed to apply for memory. Print information: "Out of logical memory"

    SCTP_MEMORY_ALLOC

    YY007

    SCTP communication error. Print information: "Memory allocate error"

    SCTP_NO_DATA_IN_BUFFER

    YY008

    SCTP communication error. Print information: "SCTP no data in buffer"

    SCTP_RELEASE_MEMORY_CLOSE

    YY009

    SCTP communication error. Print information: "Release memory close"

    SCTP_TCP_DISCONNECT

    YY010

    SCTP and TCP communication error. Print information: "SCTP, TCP disconnect"

    SCTP_DISCONNECT

    YY011

    SCTP communication error. Print information: "SCTP disconnect"

    SCTP_REMOTE_CLOSE

    YY012

    SCTP communication error. Print information: "Stream closed by remote"

    SCTP_WAIT_POLL_UNKNOW

    YY013

    Waiting for an unknown poll. Print information: "SCTP wait poll unknow"

    SNAPSHOT_INVALID

    YY014

    Invalid snapshot. Print information: "Snapshot invalid"

    ERRCODE_CONNECTION_RECEIVE_WRONG

    YY015

    Failed to receive a connection. Print information: "Connection receive wrong"

    OUT_OF_MEMORY

    53200

    Out of memory. Print information: "Out of memory"

    CONNECTION_EXCEPTION

    08000

    Failed to communicate with DNs due to connection errors. Print information: "Connection exception"

    ADMIN_SHUTDOWN

    57P01

    System shutdown by the administrator. Print information: "Admin shutdown"

    STREAM_REMOTE_CLOSE_SOCKET

    XX003

    Remote socket disabled. Print information: "Stream remote close socket"

    ERRCODE_STREAM_DUPLICATE_QUERY_ID

    XX009

    Duplicate query. Print information: "Duplicate query id"

    ERRCODE_STREAM_CONCURRENT_UPDATE

    YY016

    Concurrent stream query and update. Print information: "Stream concurrent update"

  • Statement type constraints:

    Support single-statement stored procedures, functions, and anonymous blocks. Statements in transaction blocks are not supported.

  • Statement constraints of a stored procedure:

    • If an error occurs during the execution of a stored procedure containing EXCEPTION (including statement block execution and statement execution in EXCEPTION), the stored procedure can be retried. If the error is captured by EXCEPTION, the stored procedure cannot be retried.
    • Advanced packages that use global variables are not supported.
    • DBE_TASK is not supported.
    • PKG_UTIL file operation is not supported.
  • Data import constraints:

    • The COPY FROM STDIN statement is not supported.
    • The gsql \copy from metacommand is not supported.
    • Data cannot be imported using JDBC CopyManager copyIn.

Dependencies

Valid only if the gsql tool works normally and the error list is correctly configured.

Feedback
编组 3备份
    openGauss 2024-05-07 00:46:52
    cancel