Data Processing Based on Extension Connector

Function

SQL on openGauss is one of the functions of Extension Connector. It allows a openGauss database instance to send SQL statements to another openGauss database instance for execution and obtain the execution result. The execution result can be used in the associated query of the local table or imported to the local table or local file system.

Supported openGauss Data Types

Currently, SQL on openGauss supports only the following data types.

Table 1 Data types supported by SQL on openGauss

General Data Type

Target Data Type

Local Data Type

NUMERIC

TINYINT

TINYINT

SMALLINT

SMALLINT

INTEGER

INTEGER

BIGINT

BIGINT

FLOAT4

FLOAT4

FLOAT8

FLOAT8

NUMERIC(p,s)

NUMERIC(p,s)

STRING

CHAR

CHAR/TEXT

NCHAR

NCHAR/TEXT

VARCHAR

VARCHAR/TEXT

VARCHAR2

VARCHAR2/TEXT

NVARCHAR2

NVARCHAR2/TEXT

TEXT

TEXT

BOOLEAN

BOOLEAN

BOOLEAN

DATE

TIMESTAMP[(p)]

TIMESTAMP[(p)]

TIMESTAMP[(p)] WITH TIME ZONE

TIMESTAMP[(p)] WITH TIME ZONE

INTERVAL[FIELDS]

INTERVAL[FIELDS]

DATE

DATE

NOTICE:

  • To receive a type of data returned by openGauss, specify its corresponding data type in the AS clause. If the returned type of openGauss is not listed in the table or its corresponding data type is not specified, the type conversion may generate an incorrect result or fail. For example, VARCHAR (10) returned by openGauss needs to be received using VARCHAR(n) (n>=10) or TEXT.

  • If the encoding mode of openGauss is set to SQL_ASCII, the length() function returns the number of bytes of the string rather than the actual number of characters. For example, run the following statement to query for the length of data returned by exec_on_extension:
    select c2,length(c2) from exec_on_extension('libra','select * from a;') as (c1 int, c2 text);
    The second column returned is the number of bytes of the string, rather than the number of characters.

  • For a string of the CHAR(n) type in openGauss, if its length is less than n, it will be automatically padded with spaces. These spaces will be retained when the string is transferred to openGauss and converted to the TEXT type.

  • For the TIMESTAMP[(p)] WITH TIME ZONE data type, the time zone of the remote database must be the same as that of the local database. Otherwise, an error may occur.

Interconnection Configuration

SQL on openGauss requires unixODBC-2.3.4 and openGauss ODBC, and openGauss ODBC requires the support of the unixODBC-2.3.0 library. The configuration procedure is similar to that for SQL on Oracle.

  1. Log in as the OS user omm to the primary node of the database.

    Perform steps 2 to 7 on the node and do not switch to another node in this process.

  2. (Optional) Configure the data source key files.

    The key files are used to encrypt the sensitive fields username and password in the data source. If the files are not set, the system uses the server.key.cipher and server.key.rand key files in the $GAUSSHOME/bin directory by default.

    1. Use gs_guc to generate key files.

      gs_guc encrypt –M source –K user key string –D  key file storage directory
      
      • The user key string must contain at least eight characters of three types.
      • Key files datasource.key.cipher and datasource.key.rand are generated. The file name cannot be changed.
      • The datasource.key.cipher and datasource.key.rand files must be placed to the $GAUSSHOME/bin directory on each node of the database instance.
    2. Save the key files to $GAUSSHOME/bin. Then, use the gs_om ec tool to send the key files to other nodes of the database instance.

      gs_om -t ec -m install --key-files --force
      

      For details, see “Server Tools > gs_om” in Tool Reference.

  3. Prepare package.zip and put it in $GAUSSHOME/utilslib/fc_conf/$DSN. If the path does not exist, create it first. $DSN indicates a folder naming after the DSN. The DSN name must consist of letters, digits, and underscores (_). The package contains the following files:

    openGauss-*.*.0-ODBC.tar.gz
    

    The package can be obtained by decompressing the installation package. Run the following command to compress it into package.zip:

    zip -r package.zip openGauss-*.*.0-ODBC.tar.gz
    
  4. Configure (remote) openGauss database instance listening parameters.

    1. Log in as the OS user omm to the primary node of the database.

    2. Set the listening IP address of the remote database instance (skip this step if you set the listening IP address by running the remoteip command).

      Add the IP addresses or host names (separated by commas) of the NICs providing external services to the listening list on the DN receiving remote services (assume that the host name is Linux-235 and the IP address is 10.11.12.16). Generally, the IP addresses are local. If they already exist in the list, you do not need to set them again. Run the following command:

      gs_guc reload -Z datanode -N Linux-235 -I all -c "listen_addresses='localhost,10.11.12.16'"
      

      For details, see Configuring a Data Source in the Linux OS in 7.

    3. Set the authentication mode on the DN of the remote database instance.

      Assume that IP addresses of nodes of the local openGauss database instance are 10.11.12.13, 10.11.12.14, and 10.11.12.15. To set the SHA-256 connection authentication mode, perform the following operations on the remote openGauss database instance:

      gs_guc reload -Z datanode -N all -I all -h "host all all 10.11.12.13/32 sha256"
      gs_guc reload -Z datanode -N all -I all -h "host all all 10.11.12.14/32 sha256"
      gs_guc reload -Z datanode -N all -I all -h "host all all 10.11.12.15/32 sha256"
      

      If the local openGauss database instance has multiple nodes and the IP addresses of the nodes are consecutive and in the same network segment, you can set the IP addresses in batches. For example:

      #Allow accesses from hosts with the IP address 10.11.12.x:
      gs_guc reload -Z datanode -N all -I all -h "host all all 10.11.12.0/24 sha256"
      #Allow accesses from hosts with the IP address 10.11.x.x:
      gs_guc reload -Z datanode -N all -I all -h "host all all 10.11.0.0/16 sha256"
      

      If there are many nodes in the local openGauss database and their IP addresses are inconsecutive or in different network segments, you can specify the IP addresses using the remoteIP function of Extension Connector with each IP address occupying a line. (Save the IP addresses to a text file that you have the read permission, for example, /opt/host/hostfile. The file path and name can be customized.) Then run the following command:

      gs_om -t ec -m add -N /opt/host/hostfile -U username --type=remoteip
      gs_om -t ec -m add -N /opt/host/hostfile -U username --type=remoteip -L  #-L indicates the local mode. You need to perform this step on every node.
      
    4. Restart the remote database instance (skip this step if you set the IP address by running the remoteip command).

      gs_om -t stop
      gs_om -t start
      
  5. Set the DSN.ini configuration file.

    Generate the DSN.ini file in $GAUSSHOME/utilslib/fc_conf/$DSN. Ensure that the file content and format are correct. For example:

    [libra]                              # DSN
    Driver=mppdbDriver                   #Driver name
    Description=Sample PostgreSQL DSN    #Optional description
    Servername=XX.XX.XX.XX               # IP address of the DN of the remote openGauss database instance to be connected
    Database=postgres                    # Database name of the remote openGauss database instance to be connected
    Port=XXXX                            # DN port number of the remote openGauss database instance to be connected
    Sslmode=allow                        #Enable the SSL mode.
    Fetch=10000                          #Amount of data fetched from the remote server each time the server cursor is used. The default value is 100. Adjust the value based on your memory usage and query result.
    
  6. Deploy Extension Connector.

    gs_om -t ec -m add -N DSN -U username --type=native
    gs_om -t ec -m add -N DSN -U username --type=native -L  #-L indicates the local mode.
    

    NOTE: If the local mode is used, perform steps 1 to 5 on each node.

  7. Run the restart command of the Extension Connector to restart the database instance and stop the om_monitor process so that the openGauss process can detect the change of environment variables.

    gs_om -t ec -m restart -U username
    gs_om -t ec -m restart -U username -L   #-L indicates the local mode.
    gs_om -t stop && gs_om -t start         #Run this command only in local mode.
    

    NOTE:

    • If the local mode is used, perform steps 1 to 6 on each node.
    • In local mode, the database instance does not need to be started or stopped. Therefore, you need to manually run the command for starting or stopping the database instance.
  8. Create a data source.

    Connect to a database (see Connecting to a Database) and create a data source. For example, to create the ds_libra data source corresponding to the remote openGauss database instance, run the following command:

    postgres=# CREATE DATA SOURCE ds_libra type 'MPPDB' OPTIONS (DSN 'libra', USERNAME 'libra_user', PASSWORD 'libra_pwd', ENCODING 'UTF8');
    

    The field meanings and modification methods are similar to those in 9 in “SQL on Oracle”.

  9. Connect to the remote openGauss database instance.

    Assume that the ds_libra data source has been created. Use exec_on_extension to connect to the remote openGauss database to query the table a (c1 int):

    postgres=# SELECT * FROM exec_on_extension('ds_libra', 'select * from a;') AS (c1 int);
    

Precautions

  1. For details about the usage restrictions on the $GAUSSHOME/utilslib directory and the env_ec file, see “Server Tools > gs_om” in Tool Reference.

  2. Users in the database must have the read and write permissions for the $GAUSSHOME/utilslib directory, its content, and the $GAUSSHOME/bin/datasource.key.* file. You are advised to check user permissions every time before you use gs_om ec, preventing upgrade and scale-out failures.

  3. Ensure that the content in package.zip is complete and that the content in the DSN.ini file is correct. Do not modify them after the installation and deployment are complete. Otherwise, the environment probably cannot be used.

  4. After a database administrator creates a data source (corresponding to a DSN in the odbc.ini file) and grants its use permissions to a user, the user can use exec_on_extension to access the remote database corresponding to the DSN defined in the odbc.ini file.

  5. The memory of remote DNs is used under the current memory restriction. When a large amount of data is queried, the memory of the remote database instance may be insufficient. As a result, the remote database instance cannot be operated.

  6. During data migration, you need to specify the table structure by using the AS clause. For example, to migrate data from the a (c1 int) table in the remote openGauss database instance to the local b (c1 int) table, run the following statement:

    postgres=# CREATE TABLE b AS SELECT * FROM exec_on_extension('librA', 'select * from a;') AS (c1 int);
    NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'c1' as the distribution column by default.
    HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
    INSERT 0 1
    
  7. omm users cannot establish remote connections.

Examples

-- Create a data source.
postgres=# CREATE DATA SOURCE librA OPTIONS (dsn 'odbc_librA', username 'mppcom', password 'Gs@123456');


-- Create a remote table and insert data.
postgres=# SELECT * FROM exec_on_extension('librA', 'create table a (c1 int);') AS (c1 text);
 c1 
----
(0 rows)
postgres=# SELECT * FROM exec_on_extension('librA', 'insert into a values (911);') AS (c1 text);
 c1 
----
(0 rows)
-- View the execution plan. If the plan is displayed in multiple columns in the destination database, the result should contain the same number of columns.
-- Set explain_perf_mode to normal for the remote openGauss database instance.
postgres=# SELECT * FROM exec_on_extension('librA', 'explain select * from a;') AS (c1 text);
                    QUERY PLAN                    
--------------------------------------------------
 Data Node Scan  (cost=0.00..0.00 rows=0 width=0)
   Node/s: All datanodes
(2 rows)
postgres=# SELECT * FROM exec_on_extension('librA', 'select * from a;') AS (c1 int);
 c1  
-----
 911
(1 row)
-- The sent SQL statement returns two columns, but exec_on_extension returns only one column. Match the column on the left first.
postgres=# SELECT * FROM exec_on_extension('librA', 'select * from a a1 inner join a a2 on a1.c1=a2.c1;') AS (c1 int);
 c1  
-----
 911
 911
(2 rows)

-- Associate with the local table.
postgres=# CREATE TABLE b AS SELECT * FROM exec_on_extension('librA', 'select * from a;') AS (c1 int);
NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'c1' as the distribution column by default.
HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
INSERT 0 1
postgres=# INSERT INTO b SELECT * FROM exec_on_extension('librA', 'select * from a;') AS (c1 int);
INSERT 0 1
postgres=# SELECT * FROM b WHERE b.c1 in (SELECT * FROM exec_on_extension('librA', 'select * from a;') AS (c1 int));
 c1  
-----
 911
 911
(2 rows)

-- Let other users use the data source.
postgres=# CREATE USER tmp_usr IDENTIFIED BY 'Gs@123456';

postgres=# GRANT USAGE ON DATA SOURCE librA TO tmp_usr;

postgres=# \c - tmp_usr
postgres=> SELECT * FROM exec_on_extension('librA', 'select * from a;') AS (c1 int);
 c1  
-----
 911
(1 row)

-- Delete the data source, table, and user.
postgres=>\c - omm
postgres=# SELECT * FROM exec_on_extension('librA', 'drop table a;') AS (c1 text);
 c1 
----
(0 rows)
postgres=# DROP DATA SOURCE librA;

postgres=# DROP TABLE b;

postgres=# DROP USER tmp_usr;

Troubleshooting

For details about common issues that occur while Extension Connector interconnects with openGauss, see Table 2.

Table 2 Common troubleshooting for interconnection between Extension Connector and openGauss

Error Type

Error Information

Troubleshooting

Deployment error

[GAUSS-50201] : The $GAUSSHOME/utilslib/fc_conf/ora/ does not exist.

View automatic deployment logs and rectify the fault based on the error information in the logs. Then perform the deployment again.

The log path is as follows:

  • The default path is $GAUSSLOG/om.
  • If the gs_om tool is used for automatic deployment and -l is specified, logs are stored in the specified directory.

Connection error

ERROR: source "spark_ds" does not exist

Create a data source. For details, see CREATE DATA SOURCE.

Execution error

ERROR: invalid input syntax for integer

Check the LirbA data types supported by Extension Connector. For details, see Table 1.

Execution error

ERROR: dn_6033_6034: DSN:oracle,Fail to exec SQL with the ODBC connection! Detail can be found in node log of 'dn_6033_6034'.

DETAIL: [Oracle][ODBC][Ora]ORA-00942: table or view does not exist

If a table or view definition does not exist in the SQL statement to be executed, an error is reported. Extension Connector returns Oracle database errors. If other syntax errors occur during SQL statement execution in the Oracle database, Extension Connector also reruns errors.

Helpful Links

CREATE DATA SOURCE, CREATE DATA SOURCE, SQL on Oracle, and “Server Tools > gs_om” in the Tool Reference.

Feedback
编组 3备份
    openGauss 2024-05-19 00:45:18
    cancel