Data Check
The data check project openGauss-tools-datachecker-performance consists of the check service and extract service. The check service is used for data check, and the extract service is used for data extraction and normalization.
Principles
Full check:
After full data migration is complete, the extract service extracts data from the source MySQL database and target openGauss database, normalizes the data, and pushes the data to Kafka. Finally, the check service extracts data from Kafka, checks the data, and outputs the check result.
Incremental check:
The debezium service listens to the incremental data of the source MySQL database to the specified topic. Then, the source extract service processes the incremental data of the topic and triggers the incremental check.
Environment Preparation
ARM+openEuler 20.03 or x86+CentOS 5.7
Installing Using Source Code
Installation software dependencies:
jdk11, git, maven, kafka, and debezium (incremental check - source connect)
Installation and operation procedure:
Run the git command to download the source code.
git clone https://gitee.com/opengauss/openGauss-tools-datachecker-performance.git
Run the maven command to build the check and extract JAR packages.
mvn clean package -Dmvnen.test.skip=true
Copy the JAR packages and the confmvn clean package -Dmvnen.test.skip=trueig directory to the specified deployment directory.
Configure related files.
Configuration file of the check end: application.yml
server: port: 9000 spring: kafka: bootstrap-servers: 192.168.0.114:9092 # kafka cluster address data: check: data-path: D:\code\tool # Local path for storing data check results. bucket-expect-capacity: 10 # The minimum bucket capacity is 1. source-uri: http://127.0.0.1:9002 # Source service address and port number (server.port). sink-uri: http://127.0.0.1:9001 # Source service address and port number (server.port).
Configuration file of the extraction end and source end: application-source.yml
server: port: 9002 spring: check: server-uri: http://127.0.0.1:9000 # Data check service address. extract: schema: test # Source data instance. databaseType: MS # The source database type is set to MS (MySQL). debezium-enable: false #Determines whether to enable incremental debezium configuration. By default, incremental debezium configuration is disabled. debezium-topic:topic # The debezium listens to incremental data in tables and uses a single topic to manage incremental data. debezium-groupId: debezium-extract-group # **d debezium**: incremental migration topic; **groupId**:consumption group setting. debezium-topic-partitions: 1 # Number of debezium listening topic partitions. debezium-tables: # List of table names listened by Debezium in debezium-tables. This configuration is configured and takes effect only on the source server. table1, table2 debezium-time-period: 1 # debezium incremental migration check period (24 x 60, in minutes) debezium-num-period: 1000 # Specifies the threshold of incremental change records in debezium incremental migration check statistics. The default value is 1000. The threshold must be greater than 100. datasource: druid: dataSourceOne: driver-class-name: com.mysql.cj.jdbc.Driver url: jdbc:mysql://127.0.0.1:3306/test?useSSL=false&useUnicode=true&characterEncoding=utf-8&serverTimezone=UTC&allowPublicKeyRetrieval=true username: jack #User name of the source MySQL database for check. password: test@123 #User name and password of the source MySQL database for check.
Configuration file of the extraction end and target end: application-sink.yml
server: port: 9001 spring: check: server-uri: http://127.0.0.1:9000 # Data check service address. extract: schema: test # The sink openGauss is used to check the data schema. databaseType: OG # The sink database type is set to OG (openGauss). datasource: druid: dataSourceOne: driver-class-name: org.opengauss.Driver # The sink openGauss is used to check the database link address. url: jdbc:opengauss://127.0.0.1:15432/test?useSSL=false&useUnicode=true&characterEncoding=utf-8&serverTimezone=UTC&batchMode=OFF username: jack #User name of the sink openGauss for check. password: test@123 #User name and password of the sink openGauss for check.
Start the service.
Start the ZooKeeper.
cd /data/kafka/confluent-7.2.0 bin/zookeeper-server-start -daemon etc/kafka/zookeeper.properties
Start Kafka.
bin/kafka-server-start.sh -daemon etc/kafka/server.properties
The connect debezium connector is started (incremental check is required). The mysql-conect.properties file is used to configure the debezium connector.
bin/connect-standalone -daemon etc/kafka/connect-standalone.properties etc/kafka/mysql-conect.properties
Start the extraction service.
sh extract-endpoints.sh stat|restart|stop sh check-endpoint.sh stat|restart|stop
Run the following command to perform full check:
curl -X 'POST' 'http://localhost:9000/start/check?checkMode=FULL' -H 'accept: */*' -d ''
Clear the full check environment.
curl -X 'POST' 'http://localhost:9000/stop/clean/check' -H 'accept: */*' -d ''
Start incremental check by modifying the configuration file on the source end.
debezium-enable: true Configure other debezium-related configurations and start the service to enable the incremental check service.
Binary Installation
Download the package from the following link, decompress the package, configure related configuration files, and run the shell script to start the service: For details about the configuration information and operation procedure, see the source code installation part.
https://opengauss.obs.cn-south-1.myhuaweicloud.com/3.1.0/tools/openGauss-datachecker-performance-3.1.0.tar.gz
tar -zxvf openGauss-datachecker-performance-3.1.0.tar.gz
The decompression directory contains:
- datachecker-check-0.0.1.jar: The check service JAR file.
- datachecker-extract-0.0.1.jar: The extract service JAR file.
The config directory contains:
- application.yml: Configuration file of the check end
- application-source.yml: Configuration file of the source end
- application-sink.yml: Configuration file of the target end
- check-endpoint.sh: Script for starting the check service
- extract-endpoints.sh: Script for starting the data extract service on the source and target ends
Uninstallation
Delete the corresponding JAR package and related configuration files.
Precautions
- The JDK version must be 11+.
- The current version supports data check only between the source MySQL database and the target openGauss database.
- The current version supports only data check and does not support table object check.
- The current version does not support geographical location data check.
- MySQL 5.7 or later is required.