Using DataChecker to Ensure Data Accuracy After Migration
Using DataChecker to Ensure Data Accuracy After Migration 
We have introduced several tools for migrating data from Oracle or MySQL to openGauss. Now, we can use the DataChecker tool to ensure data accuracy after migration.
1 Introduction to DataChecker 
DataChecker is a tool written in Java for checking data consistency between two databases. Some of its architecture and implementation are based on Alibaba's open-source data migration tool yugong.
Code repository: https://gitee.com/opengauss/openGauss-tools-datachecker
- 1.1 Application Scenario - Generally, DataChecker is used to verify data accuracy after the data is migrated. After migrating a large amount of data from one database to another, you need to check whether the migrated data is accurate and complete. In this case, you can use DataChecker to check whether the data in the two databases is consistent. 
- 1.2 Implementation Principles - The architecture of DataChecker consists of two parts: Extractor and Applier.  - Extractor is used to extract data from the source database. Data is extracted in batches based on the sequence of the data in the source table. - Applier is used to locate the data extracted by Extractor in the target database, compare the columns one by one, and return the result. 
2 Usage Guide 
- 2.1 Environment Requirements - Operating System - DataChecker is developed based on Java with bat and shell scripts. It supports both Windows and Linux. - JDK 1.6.25 or later is recommended. - Database - The source database supports MySQL and will support Oracle in the future. - The target database supports only openGauss. 
- 2.2 Downloading DataChecker - You can download the source code and compiled package at https://gitee.com/opengauss/openGauss-tools-datachecker. - Self-compilation: - git clone git@gitee.com:opengauss/openGauss-tools-datachecker.git cd openGauss-tools-datachecker mvn clean install -Dmaven.test.skip -Denv=release- If you do not want to compile the binary package by yourself, you can obtain the complied binary package DataChecker-1.0.0-SNAPSHOT.tar.gz in the target folder in the cloned home directory. 
- 2.3 Directory Structure - The structure of the target directory is as follows: - The bin directory contains three files, namely, startup.bat, startup.sh, and stop.sh, for starting and stopping programs in Windows and Linux. - The conf directory contains two configuration files. Generally, only gauss.properties is configured. - The lib directory stores the dependency files required for running. - The logs directory stores the result logs after running. 
- 2.4 Configuration Modification - Modify the configuration in the /conf/ gauss.properties file. Generally, you only need to modify basic information, such as the addresses of the source and target databases and the tables to be verified. For other information, you can use the default values or modify it as required. 
- 2.5 Starting and Stopping the Tool - Starting the Tool in Linux - sh startup.sh - Stopping the Tool in Linux - sh stop.sh - Starting the Tool in Windows - startup.bat - Stopping the Tool in Windows - You can directly close the terminal. 
- 2.6 Log Description - The log structure is as follows: - The table.log file in the gauss directory records all logs in the entire verification process. - The summary.log file in the summary directory records the names of all tables whose verification results are incorrect. That is, the data in the two tables is inconsistent. - ${table} indicates the name of each table. In the ${table} directory, the table.log file records all logs generated during verification of a table, the extractor.log file records all logs generated during data extraction, and the applier.log file records all logs generated during verification implementation (data comparison). The check.log file records the data that fails to be verified in a specific line. If the check.log file does not exist, the verification result is correct. 
- 2.7 Example - Preparing the Database - Create a table in the mysql schema in MySQL, as shown in the following figure. - Assume that after data is migrated to openGauss, only four of the five data records are successfully migrated, as shown in the following figure.  - Configuring gauss.properties  - Running startup.bat or startup.sh - Viewing Logs - Check the /logs/summary/summary.log file and locate the mysql.test table where the error occurs. - Access /logs/mysql.test/ to view details.  - There are four log files. You can mainly view check.log.  - The record whose ID is 5 and whose name is 5 fails to be migrated.