openGauss

开源数据库

openGauss社区官网

开源社区

博客MogDB/openGauss对于生僻字的存储和显示：㼆㱔䶮𬎆(王莹)

MogDB/openGauss对于生僻字的存储和显示：㼆㱔䶮𬎆(王莹)

eygle2022-04-12MogDB/openGauss对于生僻字的存储和显示：㼆㱔䶮𬎆(王莹)

MogDB/openGauss 对于生僻字的存储和显示：㼆㱔䶮𬎆(王莹)

本文首发于墨天轮：https://www.modb.pro/db/130498

最近在云和恩墨大讲堂的微信群里，有朋友讨论生僻字的存储，其实无论任何数据库，MogDB还是 Oracle ，在任何数据库中存储生僻字，其实都和一个因素有关：字符集。

能否正常存储一个字符，首先是要看要存储的字符在数据库的当前字符集中是否能够表达。如果包含在数据库字符集中，则能够正常存储。

但是注意，很多时候，本应正常存储的字符，可能在写入过程中，因为环境问题转换错误，丢失了正确的字符，而出现乱码。

标题中出现的生僻字读音：㼆 yíng ，㱔 suǒ，䶮 yǎn

这里面有一个特殊的字：𬎆(王莹) 。注意，如果你的客户端没有支持的字库，则可能无法看到这个字。

我们看一下 MogDB 的表现，我们使用了墨天轮的实训平台，字符集如下:

enmotech=# select * from v$nls_parameters;
      parameter       |           value            |                                 description
----------------------+----------------------------+------------------------------------------------------------------------------
 lc_collate           | en_US.UTF-8                | Shows the collation order locale.
 lc_ctype             | en_US.UTF-8                | Shows the character classification and case conversion locale.
 lc_messages          | en_US.UTF-8                | Sets the language in which messages are displayed.
 lc_monetary          | en_US.UTF-8                | Sets the locale for formatting monetary amounts.
 lc_numeric           | en_US.UTF-8                | Sets the locale for formatting numbers.
 lc_time              | en_US.UTF-8                | Sets the locale for formatting date and time values.
 nls_timestamp_format | DD-Mon-YYYY HH:MI:SS.FF AM | defines the default timestamp format to use with the TO_TIMESTAMP functions.
 NLS_CHARACTERSET     | UTF8                       | Database/Server encoding
(8 rows)

生僻字的示范：

如果直接贴代码，不支持的客户端可能就看不到其中的部分汉字：

enmotech=# create table mogdb (cname varchar2(10));
CREATE TABLE
enmotech=#
enmotech=# insert into mogdb values('䶮');
INSERT 0 1
enmotech=# insert into mogdb values('㼆');
INSERT 0 1
enmotech=# insert into mogdb values('𬎆');
INSERT 0 1
enmotech=# select cname,dump(cname) from mogdb;
 cname |       dump
-------+-------------------
 䶮   | Len=3 e4,b6,ae
 㼆   | Len=3 e3,bc,86
 𬎆  | Len=4 f0,ac,8e,86
(3 rows)

enmotech=#