openGauss

开源数据库

openGauss社区官网

开源社区

使用PGLOADER工具向mogdb数据库导入csv格式数据

tracy2022-04-18使用PGLOADER工具向mogdb数据库导入csv格式数据

使用 PGLOADER 工具向 mogdb 数据库导入 csv 格式数据

本文出处:https://www.modb.pro/db/81349

操作系统版本:centos7.6 数据库版本:mogdb2.0.1

一、安装 pgloader 工具

0.准备工作

下载 pgloader 安装包: pgloader-3.6.2.tar.gz

修改数据库参数:

gs_guc reload -D $PGDATA -c "password_encryption_type=1"

设置数据库白名单:

gs_guc reload -D $PGDATA -h "host all all ***.***.***.***/16 md5"

1.生成 rmp 包

a.Install the EPEL repo.

 # yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

b.Install rpmbuild dependencies:

 # yum -y install yum-utils rpmdevtools @"Development Tools"

c.Install pgloader build dependencies:

 # tar-zxvf pgloader-3.6.2.tar.gz

 # cd pgloader-3.6.2

 # yum-builddep pgloader.spec

d.Download pgloader source:

 # spectool -g -R pgloader.spec

此步骤如果下载文件失败,参照报错信息,手动创建目录,手动下载文件并改名:

# mkdir -p /root/rpmbuild/SOURCES/ # cd /root/rpmbuild/SOURCES/ # wget https://github.com/dimitri/pgloader/archive/v3.6.1.tar.gz  # mv pgloader-3.6.1.tar.gz v3.6.1.tar.gz

e.Build the source and binary RPMs (see rpmbuild --help for other build options):

 # rpmbuild -ba pgloader.spec

2.安装 pgloader RPM 包

# rpm -ivh  /root/rpmbuild/RPMS/x86_64/pgloader-3.6.1-22.el7.x86_64.rpm

3.安装完成,查看 pgloader 版本:

[root@og201 pgloader-3.6.2]# pgloader --version pgloader version "3.6.1"

二、导入 CSV 格式数据

1.准备 pgloader 配置文件和 csv 数据文件

csv.load

LOAD CSV
     FROM 'path/to/file.csv' (x, y, a, b, c, d)
     INTO postgresql://<username>:<user_password>@<database_ip>:<port>/<db_name>?csv (a, b, d, c)

     WITH truncate,
          skip header = 1,
          fields optionally enclosed by '"',
          fields escaped by double-quote,
          fields terminated by ','

      SET client_encoding to 'latin1',
          work_mem to '12MB',
          standard_conforming_strings to 'on'

   BEFORE LOAD DO
    $$ drop table if exists csv; $$,
    $$ create table csv (
        a bigint,
        b bigint,
        c char(2),
        d text
       );
  $$;

file.csv

Header, with a © sign
"2.6.190.56","2.6.190.63","33996344","33996351","GB","United Kingdom"
"3.0.0.0","4.17.135.31","50331648","68257567","US","United States"
"4.17.135.32","4.17.135.63","68257568","68257599","CA","Canada"
"4.17.135.64","4.17.142.255","68257600","68259583","US","United States"
"4.17.143.0","4.17.143.15","68259584","68259599","CA","Canada"
"4.17.143.16","4.18.32.71","68259600","68296775","US","United States"

2.导入:

# pgloader csv.load
2021-07-09T16:50:22.013000+08:00 LOG pgloader version "3.6.1"
2021-07-09T16:50:22.026000+08:00 WARNING pgloader always talk to PostgreSQL in utf-8, client_encoding has been forced to 'utf8'.
2021-07-09T16:50:22.543000+08:00 LOG report summary reset
             table name     errors       rows      bytes      total time
-----------------------  ---------  ---------  ---------  --------------
                  fetch          0          0                     0.012s
            before load          0          2                     0.038s
-----------------------  ---------  ---------  ---------  --------------
             "pg"."csv"          0          6     0.2 kB          0.079s
-----------------------  ---------  ---------  ---------  --------------
        Files Processed          0          1                     0.036s
COPY Threads Completion          0          2                     0.099s
-----------------------  ---------  ---------  ---------  --------------
      Total import time6     0.2 kB          0.135s

3.登录数据库查看导入数据:

pgloader=# \c - pg
Password for user pg:
Non-SSL connection (SSL connection is recommended when requiring high-security)
You are now connected to database "pgloader" as user "pg".
pgloader=> \dt
                        List of relations
 Schema | Name | Type  | Owner |             Storage
--------+------+-------+-------+----------------------------------
 pg     | csv  | table | pg    | {orientation=row,compression=no}
(1 row)

pgloader=> table csv;
    a     |    b     | c  |       d
----------+----------+----+----------------
 33996344 | 33996351 | GB | United Kingdom
 50331648 | 68257567 | US | United States
 68257568 | 68257599 | CA | Canada
 68257600 | 68259583 | US | United States
 68259584 | 68259599 | CA | Canada
 68259600 | 68296775 | US | United States
(6 rows)

pgloader=>

4.导入数据存在错误时,错误信息会记录在/tmp/pgloader/csv 目录下

# cat path/to/file.csv
Header, with a © sign
"2.6.190.56","2.6.190.63","33996344","33996351","GB","United Kingdom"
"3.0.0.0","4.17.135.31","50331648","68257567","US","United States"
"4.17.135.32","4.17.135.63","68257568","68257599","CA","Canada"
"4.17.135.64","4.17.142.255","68257600","68259583","US","United States"
"4.17.143.0","4.17.143.15","68259584","68259599","CA","Canada"
"4.17.143.16","4.18.32.71","68259600","68296775","US","United States"
"4.18.32.71","68259600","68296775","US","United States"
# pgloader csv.load
2021-07-14T08:25:45.025000+08:00 LOG pgloader version "3.6.1"
2021-07-14T08:25:45.123000+08:00 WARNING pgloader always talk to PostgreSQL in utf-8, client_encoding has been forced to 'utf8'.
2021-07-14T08:25:45.657000+08:00 ERROR PostgreSQL ["\"public\".\"csv\""] Database error 22P02: invalid input syntax for integer: "US"
CONTEXT: COPY csv, line 7, column b: "US"
2021-07-14T08:25:45.667000+08:00 LOG report summary reset
             table name     errors       rows      bytes      total time
-----------------------  ---------  ---------  ---------  --------------
                  fetch          0          0                     0.010s
            before load          0          2                     0.052s
-----------------------  ---------  ---------  ---------  --------------
         "public"."csv"          1          6     0.2 kB          0.072s
-----------------------  ---------  ---------  ---------  --------------
        Files Processed          0          1                     0.028s
COPY Threads Completion          0          2                     0.087s
-----------------------  ---------  ---------  ---------  --------------
      Total import time          1          6     0.2 kB          0.115s
[root@centos-7-pg12 pgloader]# ls -l /tmp/pgloader/csv/
total 8
-rw-r--r-- 1 root root  30 Jul 14 08:25 csv.dat
-rw-r--r-- 1 root root 103 Jul 14 08:25 csv.log
[root@centos-7-pg12 pgloader]# cat /tmp/pgloader/csv/csv.dat
68296775        US      \N      United States

[root@centos-7-pg12 pgloader]# cat /tmp/pgloader/csv/csv.log
Database error 22P02: invalid input syntax for integer: "US"
CONTEXT: COPY csv, line 7, column b: "US"