Routine Maintenance
- Routine Maintenance Check Items
- Checking OS Parameters
- Checking openGauss Health Status
- Checking Database Performance
- Checking and Deleting Logs
- Checking Time Consistency
- Checking the Number of Application Connections
- Routinely Maintaining Tables
- Routinely Recreating an Index
- Data Security Maintenance Suggestions
To ensure data security in GaussDB Kernel and prevent accidents, such as data loss and illegal data access, read this section carefully.
Routine Maintenance Check Items
Checking openGauss Status
openGauss provides tools to check database and instance status, ensuring that databases and instances are running properly to provide data services.
Check instance status.
gs_check -U omm -i CheckClusterState
Check parameters.
postgres=# SHOW parameter_name;
Modify parameters.
gs_guc reload -D /gaussdb/data/dbnode -c "paraname=value"
Checking Lock Information
The lock mechanism is an important method to ensure data consistency. Information check helps learn database transactions and database running status.
Query lock information in the database.
postgres=# SELECT * FROM pg_locks;
Query the status of threads waiting to acquire locks.
postgres=# SELECT * FROM pg_thread_wait_status WHERE wait_status = 'acquire lock';
Query the status of events waiting to acquire locks.
postgres=# SELECT node_name, thread_name, tid, wait_status, query_id FROM pgxc_thread_wait_status WHERE wait_status = 'acquire lock';
Kill a system process.
Search for a system process that is running and run the following command to end the process:
ps ux kill -9 pid
Collecting Event Statistics
Long-time running of SQL statements will occupy a lot of system resources. You can check event occurrence time and occupied memory to learn about database running status.
Query the time points about an event.
Run the following command to query the thread start time, transaction start time, SQL start time, and status change time of the event:
postgres=# SELECT backend_start,xact_start,query_start,state_change FROM pg_stat_activity;
Query the number of sessions on the current server.
postgres=# SELECT count(*) FROM pg_stat_activity;
Collect system-level statistics.
Run the following command to query information about the session that uses the maximum memory:
postgres=# SELECT * FROM pv_session_memory_detail() ORDER BY usedsize desc limit 10;
Checking Objects
Tables, indexes, partitions, and constraints are key storage objects of a database. A database administrator needs to routinely maintain key information and these objects.
View table details.
postgres=# \d+ table_name
Query table statistics.
postgres=# SELECT * FROM pg_statistic;
View index details.
postgres=# \d+ index_name
Query partitioned table information.
postgres=# SELECT * FROM pg_partition;
Collect statistics.
Run the ANALYZE statement to collect related statistics on the database.
Run the VACUUM statement to reclaim space and update statistics.
Query constraint information.
postgres=# SELECT * FROM pg_constraint;
Checking an SQL Report
Run the EXPLAIN statement to view execution plans.
Backing Up Data
Never forget to back up data. During the routine work, the backup execution and backup data validity need to be checked to ensure data security and encryption security.
Export a specified user.
gs_dump dbname -p port -f out.sql -U user_name -W password
Export a schema.
gs_dump dbname -p port -n schema_name -f out.sql
Export a table.
gs_dump dbname -p port -t table_name -f out.sql
Checking Basic Information
Basic information includes versions, components, and patches. Periodic database information checks and records are important for database life cycle management.
Check version information.
postgres=# SELECT version();
Check table size and database size.
postgres=# SELECT pg_table_size('table_name'); postgres=# SELECT pg_database_size('database_name');
Checking OS Parameters
Check Method
Use the gs_checkos tool provided by openGauss to check the OS status.
Prerequisites
- The hardware and network are working properly.
- The trust relationship of user root among the hosts is normal.
- Only user root is authorized to run the gs_checkos command.
Procedure
Log in to a server as user root.
Run the following command to check OS parameters of servers where the openGauss nodes are deployed:
gs_checkos -i A
Check the OS parameters to ensure that openGauss has passed the pre-installation check and can efficiently operate after it is installed. For details about the check items, see “Server Tools > gs_checkos” in the openGauss Tool Reference.
Examples
Before running the gs_checkos command, execute pre-processing scripts by running gs_preinstall to prepare the environment. The following uses parameter A as an example:
gs_checkos -i A
Checking items:
A1. [ OS version status ] : Normal
A2. [ Kernel version status ] : Normal
A3. [ Unicode status ] : Normal
A4. [ Time zone status ] : Normal
A5. [ Swap memory status ] : Normal
A6. [ System control parameters status ] : Normal
A7. [ File system configuration status ] : Normal
A8. [ Disk configuration status ] : Normal
A9. [ Pre-read block size status ] : Normal
A10.[ IO scheduler status ] : Normal
A11.[ Network card configuration status ] : Normal
A12.[ Time consistency status ] : Warning
A13.[ Firewall service status ] : Normal
A14.[ THP service status ] : Normal
Total numbers:14. Abnormal numbers:0. Warning number:1.
The following uses parameter B as an example:
gs_checkos -i B
Setting items:
B1. [ Set system control parameters ] : Normal
B2. [ Set file system configuration value ] : Normal
B3. [ Set pre-read block size value ] : Normal
B4. [ Set IO scheduler value ] : Normal
B5. [ Set network card configuration value ] : Normal
B6. [ Set THP service ] : Normal
Total numbers:6. Abnormal numbers:0. Warning number:0.
Exception Handling
If you use the gs_checkos tool to check the OS and the command output shows Abnormal, run the following command to view detailed error information:
gs_checkos -i A --detail
The Abnormal state cannot be ignored because the OS in this state affects cluster installation. The Warning state does not affect cluster installation and thereby can be ignored.
If the check result for OS version status (A1) is Abnormal, replace OSs out of the mixed programming scope with those within the scope.
If the check result for kernel version status (A2) is Warning, the platform kernel versions in the cluster are inconsistent.
If the check result for Unicode status (A3) is Abnormal, set the same character set for all the hosts. You can add export LANG=unicode to the /etc/profile file.
vim /etc/profile
If the check result for time zone status (A4) is Abnormal, set the same time zone for all the hosts. You can copy the time zone file in the /usr/share/zoneinfo/ directory as the /etc/localtime file.
cp /usr/share/zoneinfo/$primary time zone/$secondary time zone /etc/localtime
If the check result for swap memory status (A5) is Abnormal, a possible cause is that the swap memory is larger than the physical memory. You can troubleshoot this issue by reducing the swap memory or increasing the physical memory.
If the check result for system control parameter status (A6) is Abnormal, troubleshoot this issue in either of the following two ways:
Run the following command:
gs_checkos -i B1
Modify the /etc/sysctl.conf file based on the error message and run sysctl -p to make it take effect.
vim /etc/sysctl.conf
If the check result for file system configuration status (A7) is Abnormal, run the following command to troubleshoot this issue:
gs_checkos -i B2
If the check result for disk configuration status (A8) is Abnormal, set the disk mounting format to rw,noatime,inode64,allocsize=16m.
Run the man mount command to mount the XFS parameter:
rw,noatime,inode64,allocsize=16m
You can also set the XFS parameter in the /etc/fstab file. For example:
/dev/data /data xfs rw,noatime,inode64,allocsize=16m 0 0
If the check result for pre-read block size status (A9) is Abnormal, run the following command to troubleshoot this issue:
gs_checkos -i B3
If the check result for I/O scheduling status (A10) is Abnormal, run the following command to troubleshoot this issue:
gs_checkos -i B4
If the check result for NIC configuration status (A11) is Warning, run the following command to troubleshoot this issue:
gs_checkos -i B5
If the check result for time consistency status (A12) is Abnormal, verify that the NTP service has been installed and started and has synchronized time from the NTP clock.
If the check result for firewall status (A13) is Abnormal, disable the firewall. Run the following commands:
SUSE:
SuSEfirewall2 stop
RedHat7:
systemctl disable firewalld
RedHat6:
service iptables stop
If the check result for THP service status (A14) is Abnormal, run the following command to troubleshoot this issue:
gs_checkos -i B6
Checking openGauss Health Status
Check Method
Use the gs_check tool provided by openGauss to check the openGauss health status.
Precautions
- Only user root is authorized to check new nodes added during cluster scale-out. In other cases, the check can be performed only by user omm.
- Parameter -i or -e must be set. -i specifies a single item to be checked, and -e specifies an inspection scenario where multiple items will be checked.
- If -i is not set to a root item or no such items are contained in the check item list of the scenario specified by -e, you do not need to enter the name or password of user root.
- You can run --skip-root-items to skip root items.
- Check the consistency between the new node and existing nodes. Run the gs_check command on an existing node and specify the --hosts parameter. The IP address of the new node needs to be written into the hosts file.
Procedure
Method 1:
Log in as the OS user omm to the primary node of the database.
Run the following command to check the openGauss database status:
gs_check -i CheckClusterState
In the command, -i indicates the check item and is case-sensitive. The format is -i CheckClusterState, -i CheckCPU or -i CheckClusterState,CheckCPU.
Checkable items are listed in “Server Tools > gs_check > openGauss status checks” in the openGauss Tool Reference. You can create a check item as needed.
Method 2:
Log in as the OS user omm to the primary node of the database.
Run the following command to check the openGauss database health status:
gs_check -e inspect
In the command, -e indicates the inspection scenario and is case-sensitive. The format is -e inspect or -e upgrade.
The inspection scenarios include inspect (routine inspection), upgrade (inspection before upgrade), expand (inspection before cluster scale-out), binary_upgrade (inspection before in-place upgrade), and health (health inspection). You can create an inspection scenario as needed.
Method 3:
Log in as the OS user omm to the primary node of the database.
Copy the inspection tool gs_check and the inspection directory to all new hosts.
Write the IP addresses of the new hosts into the ipListFile file and separate the IP addresses with line feeds.
Run the following command to check the new nodes before cluster scale-out:
gs_check -e expand_new_node --hosts ipListFile
-e must be set to expand_new_node, indicating inspection before cluster scale-out.
The openGauss inspection is performed to check openGauss status during openGauss running or to check the environment and conditions before critical operations, such as upgrade or scale-out. For details about the inspection items and scenarios, see “Server Tools > gs_check > openGauss status checks” in the openGauss Tool Reference.
Examples
Check result of a single item:
perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU
Parsing the check items config file successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:1 Nodes:3
Checking... [=========================] 1/1
Start to analysis the check result
CheckCPU....................................OK
The item run on 3 nodes. success: 3
Analysis the check result successfully
Success. All check items run completed. Total:1 Success:1 Failed:0
For more information please refer to /opt/huawei/wisequery/script/gspylib/inspection/output/CheckReport_201902193704661604.tar.gz
Local execution result:
perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU -L
2017-12-29 17:09:29 [NAM] CheckCPU
2017-12-29 17:09:29 [STD] Check the CPU usage of the host. If the value of idle is greater than 30% and the value of iowait is less than 30%, this item passes the check. Otherwise, this item fails the check.
2017-12-29 17:09:29 [RST] OK
2017-12-29 17:09:29 [RAW]
Linux 4.4.21-69-default (lfgp000700749) 12/29/17 _x86_64_
17:09:24 CPU %user %nice %system %iowait %steal %idle
17:09:25 all 0.25 0.00 0.25 0.00 0.00 99.50
17:09:26 all 0.25 0.00 0.13 0.00 0.00 99.62
17:09:27 all 0.25 0.00 0.25 0.13 0.00 99.37
17:09:28 all 0.38 0.00 0.25 0.00 0.13 99.25
17:09:29 all 1.00 0.00 0.88 0.00 0.00 98.12
Average: all 0.43 0.00 0.35 0.03 0.03 99.17
Check result of a scenario:
[perfadm@SIA1000131072 Check]$ gs_check -e inspect
Parsing the check items config file successfully
The below items require root privileges to execute:[CheckBlockdev CheckIOrequestqueue CheckIOConfigure CheckCheckMultiQueue CheckFirewall CheckSshdService CheckSshdConfig CheckCrondService CheckNoCheckSum CheckSctpSeProcMemory CheckBootItems CheckFilehandle CheckNICModel CheckDropCache]
Please enter root privileges user[root]:root
Please enter password for user[root]:
Please enter password for user[root] on the node[10.244.57.240]:
Check root password connection successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:59 Nodes:2
Checking... [ ] 21/59
Checking... [=========================] 59/59
Start to analysis the check result
CheckClusterState...........................OK
The item run on 2 nodes. success: 2
CheckDBParams...............................OK
The item run on 1 nodes. success: 1
CheckDebugSwitch............................OK
The item run on 2 nodes. success: 2
CheckDirPermissions.........................OK
The item run on 2 nodes. success: 2
CheckReadonlyMode...........................OK
The item run on 1 nodes. success: 1
CheckEnvProfile.............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
GAUSSHOME /usr1/gaussdb/app
LD_LIBRARY_PATH /usr1/gaussdb/app/lib
PATH /usr1/gaussdb/app/bin
CheckBlockdev...............................OK
The item run on 2 nodes. success: 2
CheckCurConnCount...........................OK
The item run on 1 nodes. success: 1
CheckCursorNum..............................OK
The item run on 1 nodes. success: 1
CheckPgxcgroup..............................OK
The item run on 1 nodes. success: 1
CheckDiskFormat.............................OK
The item run on 2 nodes. success: 2
CheckSpaceUsage.............................OK
The item run on 2 nodes. success: 2
CheckInodeUsage.............................OK
The item run on 2 nodes. success: 2
CheckSwapMemory.............................OK
The item run on 2 nodes. success: 2
CheckLogicalBlock...........................OK
The item run on 2 nodes. success: 2
CheckIOrequestqueue.....................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
On device (vdb) 'IO Request' RealValue '256' ExpectedValue '32768'
On device (vda) 'IO Request' RealValue '256' ExpectedValue '32768'
CheckMaxAsyIOrequests.......................OK
The item run on 2 nodes. success: 2
CheckIOConfigure............................OK
The item run on 2 nodes. success: 2
CheckMTU....................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
1500
CheckPing...................................OK
The item run on 2 nodes. success: 2
CheckRXTX...................................NG
The item run on 2 nodes. ng: 2
The ng[host240,host157] value:
NetWork[eth0]
RX: 256
TX: 256
CheckNetWorkDrop............................OK
The item run on 2 nodes. success: 2
CheckMultiQueue.............................OK
The item run on 2 nodes. success: 2
CheckEncoding...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
LANG=en_US.UTF-8
CheckFirewall...............................OK
The item run on 2 nodes. success: 2
CheckKernelVer..............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
3.10.0-957.el7.x86_64
CheckMaxHandle..............................OK
The item run on 2 nodes. success: 2
CheckNTPD...................................OK
host240: NTPD service is running, 2020-06-02 17:00:28
host157: NTPD service is running, 2020-06-02 17:00:06
CheckOSVer..................................OK
host240: The current OS is centos 7.6 64bit.
host157: The current OS is centos 7.6 64bit.
CheckSysParams..........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
Warning reason: variable 'net.ipv4.tcp_retries1' RealValue '3' ExpectedValue '5'.
Warning reason: variable 'net.ipv4.tcp_syn_retries' RealValue '6' ExpectedValue '5'.
Warning reason: variable 'net.sctp.path_max_retrans' RealValue '5' ExpectedValue '10'.
Warning reason: variable 'net.sctp.max_init_retransmits' RealValue '8' ExpectedValue '10'.
CheckTHP....................................OK
The item run on 2 nodes. success: 2
CheckTimeZone...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
+0800
CheckCPU....................................OK
The item run on 2 nodes. success: 2
CheckSshdService............................OK
The item run on 2 nodes. success: 2
CheckSshdConfig.........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
Warning reason: UseDNS parameter is not set; expected: no
CheckCrondService...........................OK
The item run on 2 nodes. success: 2
CheckStack..................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
8192
CheckNoCheckSum.............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
Nochecksum value is N,Check items pass.
CheckSysPortRange...........................OK
The item run on 2 nodes. success: 2
CheckMemInfo................................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
totalMem: 31.260929107666016G
CheckHyperThread............................OK
The item run on 2 nodes. success: 2
CheckTableSpace.............................OK
The item run on 1 nodes. success: 1
CheckSctpService............................OK
The item run on 2 nodes. success: 2
CheckSysadminUser...........................OK
The item run on 1 nodes. success: 1
CheckGUCConsistent..........................OK
All DN instance guc value is consistent.
CheckMaxProcMemory..........................OK
The item run on 1 nodes. success: 1
CheckBootItems..............................OK
The item run on 2 nodes. success: 2
CheckHashIndex..............................OK
The item run on 1 nodes. success: 1
CheckPgxcRedistb............................OK
The item run on 1 nodes. success: 1
CheckNodeGroupName..........................OK
The item run on 1 nodes. success: 1
CheckTDDate.................................OK
The item run on 1 nodes. success: 1
CheckDilateSysTab...........................OK
The item run on 1 nodes. success: 1
CheckKeyProAdj..............................OK
The item run on 2 nodes. success: 2
CheckProStartTime.......................WARNING
host157:
STARTED COMMAND
Tue Jun 2 16:57:18 2020 /usr1/dmuser/dmserver/metricdb1/server/bin/gaussdb --single_node -D /usr1/dmuser/dmb1/data -p 22204
Mon Jun 1 16:15:15 2020 /usr1/gaussdb/app/bin/gaussdb -D /usr1/gaussdb/data/dn1 -M standby
CheckFilehandle.............................OK
The item run on 2 nodes. success: 2
CheckRouting................................OK
The item run on 2 nodes. success: 2
CheckNICModel...............................OK
The item run on 2 nodes. success: 2 (consistent)
The success on all nodes value:
version: 1.0.0
model: Red Hat, Inc. Virtio network device
CheckDropCache..........................WARNING
The item run on 2 nodes. warning: 2
The warning[host240,host157] value:
No DropCache process is running
CheckMpprcFile..............................NG
The item run on 2 nodes. ng: 2
The ng[host240,host157] value:
There is no mpprc file
Analysis the check result successfully
Failed. All check items run completed. Total:59 Success:52 Warning:5 NG:2
For more information please refer to /usr1/gaussdb/tool/script/gspylib/inspection/output/CheckReport_inspect611.tar.gz
Exception Handling
Troubleshoot exceptions detected in the inspection by following instructions in this section.
Table 1 Check of openGauss running status
Checking Database Performance
Check Method
Use the gs_checkperf tool provided by openGauss to check hardware performance.
Prerequisites
- openGauss is running properly.
- Services are running properly on the database.
Procedure
Log in as the OS user omm to the primary node of the database.
Run the following command to check the openGauss database performance:
gs_checkperf
For details about performance statistical items, see “Server Tools > gs_checkperf > Performance Check Items” in the openGauss Tool Reference.
Examples
Simple performance statistical result is displayed on the screen as follows:
gs_checkperf -i pmk -U omm
Cluster statistics information:
Host CPU busy time ratio : 1.43 %
MPPDB CPU time % in busy time : 1.88 %
Shared Buffer Hit ratio : 99.96 %
In-memory sort ratio : 100.00 %
Physical Reads : 4
Physical Writes : 25
DB size : 70 MB
Total Physical writes : 25
Active SQL count : 2
Session count : 3
Exception Handling
After you use the gs_checkperf tool to check the cluster performance, if the performance is abnormal, troubleshoot the issue by following instructions in this section.
Table 1 Cluster-level performance status
Table 2 Node-level performance status
Table 3 Session/process-level performance status
Table 4 SSD performance status
Run the following command to check whether SSD is faulty. If yes, analyze the specific cause. gs_checkperf -i SSD -U omm |
Checking and Deleting Logs
You are advised to check OS logs and database run logs monthly for monitoring system status and troubleshooting, and to delete database run logs monthly for saving disk space.
Checking OS Logs
You are advised to monthly check OS logs to detect and prevent potential OS problems.
Procedure
Run the following command to check OS log files:
vim /var/log/messages
(Pay attention to words like kernel, error, and fatal in logs generated within the last month and handle the problems based on the alarm information.)
Checking openGauss Run Logs
A database can still run when errors occur during the execution of some operations. However, data may be inconsistent before and after the error occurrences. Therefore, you are advised to monthly check openGauss run logs to detect potential problems in time.
Prerequisites
- The host used for collecting logs is running properly, and the network connection is normal. Database installation users trust each other.
- An OS tool (for example, gstack) that the log collection tool requires has been installed. If it is not installed, an error message is displayed, and this collection item is skipped.
Procedure
Log in as the OS user omm to the primary node of the database.
Run the following command to collect database logs:
gs_collector --begin-time="20160616 01:01" --end-time="20160616 23:59"
In the command, 20160616 01:01 indicates the start time of the log and 20160616 23:59 indicates the end time of the log.
Based on command output in 2, access the related log collection directory, decompress collected database logs, and check these logs.
Assume that collected logs are stored in /opt/gaussdb/tmp/gaussdba_mppdb/collector_20160726_105158.tar.gz.
tar -xvzf /opt/gaussdb/tmp/gaussdba_mppdb/collector_20160726_105158.tar.gz cd /opt/gaussdb/tmp/gaussdba_mppdb/collector_20160726_105158
Examples
Run the gs_collector command together with parameters --begin-time and --end-time:
gs_collector --begin-time="20160616 01:01" --end-time="20160616 23:59"
If information similar to the following is displayed, the logs have been archived:
Successfully collected files All results are stored in /tmp/gaussdba_mppdb/collector_20160616_175615.tar.gz.
Run the gs_collector command together with parameters --begin-time, --end-time, and -h:
gs_collector --begin-time="20160616 01:01" --end-time="20160616 23:59" -h plat2
If information similar to the following is displayed, the logs have been archived:
Successfully collected files All results are stored in /tmp/gaussdba_mppdb/collector_20160616_190225.tar.gz.
Run the gs_collector command together with parameters --begin-time, --end-time, and -f:
gs_collector --begin-time="20160616 01:01" --end-time="20160616 23:59" -f /opt/software/gaussdb/output
If information similar to the following is displayed, the logs have been archived:
Successfully collected files All results are stored in /opt/software/gaussdb/output/collector_20160616_190511.tar.gz.
Run the gs_collector command together with parameters --begin-time, --end-time, and --keyword:
gs_collector --begin-time="20160616 01:01" --end-time="20160616 23:59" --keyword="os"
If information similar to the following is displayed, the logs have been archived:
Successfully collected files. All results are stored in /tmp/gaussdba_mppdb/collector_20160616_190836.tar.gz.
Run the gs_collector command together with parameters --begin-time, --end-time, and -o:
gs_collector --begin-time="20160616 01:01" --end-time="20160616 23:59" -o /opt/software/gaussdb/output
If information similar to the following is displayed, the logs have been archived:
Successfully collected files. All results are stored in /opt/software/gaussdb/output/collector_20160726_113711.tar.gz.
Run the gs_collector command together with parameters --begin-time, --end-time, and -l (the file name extension must be .log):
gs_collector --begin-time="20160616 01:01" --end-time="20160616 23:59" -l /opt/software/gaussdb/logfile.log
If information similar to the following is displayed, the logs have been archived:
Successfully collected files. All results are stored in /opt/software/gaussdb/output/collector_20160726_113711.tar.gz.
Cleaning Run Logs
A large number of run logs will be generated during database running and occupy huge disk space. You are advised to delete expired run logs and retain logs generated within one month.
Procedure
- Log in as the OS user omm to any host in the GaussDB Kernel cluster.
- Clean logs.
Back up logs generated over one month ago to other disks.
Access the directory where logs are stored.
cd $GAUSSLOG
Access the corresponding sub-directory and run the following command to delete logs generated one month ago:
rm log name
The naming convention of a log file is postgresql-year-month-day_HHMMSS.
Checking Time Consistency
Database transaction consistency is guaranteed by a logical clock and is not affected by OS time. However, OS time inconsistency will lead to problems, such as abnormal backend O&M and monitoring functions. Therefore, you are advised to monthly check time consistency among nodes.
Procedure
Log in as the OS user omm to any host in the GaussDB Kernel cluster.
Create a configuration file for recording each cluster node. (You can specify the mpphosts file directory randomly. It is recommended that the file be stored in the /tmp directory.)
vim /tmp/mpphosts
Add the host name of each node.
plat1 plat2 plat3
Save the configuration file.
:wq!
Run the following command and write the time on each node into the /tmp/sys_ctl-os1.log file:
for ihost in `cat /tmp/mpphosts`; do ssh -n -q $ihost "hostname;date"; done > /tmp/sys_ctl-os1.log
Check time consistency between the nodes based on the command output. The time difference should not exceed 30s.
cat /tmp/sys_ctl-os1.log plat1 Thu Feb 9 16:46:38 CST 2017 plat2 Thu Feb 9 16:46:49 CST 2017 plat3 Thu Feb 9 16:46:14 CST 2017
Checking the Number of Application Connections
If the number of connections between applications and the database exceeds the maximum value, new connections cannot be established. You are advised to daily check the number of connections, release idle connections in time, or increase the allowed maximum number of connections.
Procedure
Log in as the OS user omm to the primary node of the database.
Run the following command to connect to the database:
gsql -d postgres -p 8000
postgres is the name of the database to be connected, and 8000 is the port number of the database primary node.
If information similar to the following is displayed, the connection succeeds:
gsql ((openGauss 1.0 build 290d125f) compiled at 2020-05-08 02:59:43 commit 2143 last mr 131 Non-SSL connection (SSL connection is recommended when requiring high-security) Type "help" for help. postgres=#
Run the following SQL statement to check the number of connections:
postgres=# SELECT count(*) FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
Information similar to the following is displayed. 2 indicates that two applications are connected to the database.
count ------- 2 (1 row)
View the allowed maximum connections.
postgres=# SHOW max_connections;
Information similar to the following is displayed. 200 indicates the currently allowed maximum number of connections.
max_connections ----------------- 200 (1 row)
Exception Handling
If the number of connections in the command output is close to the value of max_connections of the database, delete existing connections or change the upper limit based on site requirements.
Run the following SQL statement to view information about connections whose state is set to idle, and state_change column is not updated for a long time.
postgres=# SELECT * FROM pg_stat_activity where state='idle' order by state_change;
Information similar to the following is displayed:
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | quer y_start | state_change | waiting | enqueue | state | resource_pool | query -------+----------+-----------------+----------+----------+------------------+--------------- -+-----------------+-------------+-------------------------------+------------+-------------- -----------------+-------------------------------+---------+---------+-------+--------------- +---------------------------------------------- 13626 | postgres | 140390162233104 | 10 | gaussdba | | | | -1 | 2016-07-15 14:08:59.474118+08 | | 2016-07-15 14 :09:04.496769+08 | 2016-07-15 14:09:04.496975+08 | f | | idle | default_pool | select count(group_name) from pgxc_group; 13626 | postgres | 140390132872976 | 10 | gaussdba | cn_5002 | 10.180.123.163 | | 48614 | 2016-07-15 14:11:16.014871+08 | | 2016-07-15 14 :21:17.346045+08 | 2016-07-15 14:21:17.346095+08 | f | | idle | default_pool | SET SESSION AUTHORIZATION DEFAULT;RESET ALL; (2 rows)
Release idle connections.
Check each connection and release them after obtaining approval from the users of the connections. Run the following SQL command to release a connection using pid obtained in the previous step:
postgres=# SELECT pg_terminate_backend(140390132872976);
Information similar to the following is displayed:
postgres=# SELECT pg_terminate_backend(140390132872976); pg_terminate_backend ---------------------- t (1 row)
If no connections can be released, go to the next step.
Increase the maximum number of connections.
gs_guc set -D /gaussdb/data/dbnode -c "max_connections= 800"
800 is the new maximum value.
Restart database services to make the new settings take effect.
NOTE: The restart results in operation interruption. Properly plan the restart to avoid affecting users.
gs_om -t stop && gs_om -t start
Routinely Maintaining Tables
To ensure proper database running, after insert and delete operations, you need to routinely run VACUUM FULL and ANALYZE as appropriate for customer scenarios and update statistics to obtain better performance.
Related Concepts
You need to routinely run VACUUM, VACUUM FULL, and ANALYZE to maintain tables, because:
- VACUUM FULL can be used to reclaim disk space occupied by updated or deleted data and combine small-size data files.
- VACUUM can be used to maintain a visualized mapping for each table to track pages that contain arrays visible to other active transactions. A common index scan uses the mapping to obtain the corresponding arrays and check whether the arrays are visible to the current transaction. If the arrays cannot be obtained, capture a batch of arrays to check the visibility. Therefore, updating the visualized mapping of a table can accelerate unique index scans.
- Running VACUUM can avoid original data loss caused by duplicate transaction IDs when the number of executed transactions exceeds the database threshold.
- ANALYZE can be used to collect statistics on tables in databases. The statistics are stored in the system catalog PG_STATISTIC. Then the query optimizer uses the statistics to work out the most efficient execution plan.
Procedure
Run the VACUUM or VACUUM FULL command to reclaim disk space.
VACUUM:
Run VACUUM for a table.
postgres=# VACUUM customer;
VACUUM
This statement can be concurrently executed with database operation commands, including SELECT, INSERT, UPDATE, and DELETE; excluding ALTER TABLE.
Run VACUUM for the table partition.
postgres=# VACUUM customer_par PARTITION ( P1 );
VACUUM
VACUUM FULL:
postgres=# VACUUM FULL customer;
VACUUM
During the command running, exclusive locks need to be added to the table and all other database operations need to be suspended.
Run ANALYZE to update statistics.
postgres=# ANALYZE customer;
ANALYZE
Run ANALYZE VERBOSE to update statistics and display table information.
postgres=# ANALYZE VERBOSE customer;
ANALYZE
You can run VACUUM ANALYZE at the same time to optimize the query.
postgres=# VACUUM ANALYZE customer;
VACUUM
NOTE: VACUUM and ANALYZE cause a substantial increase in I/O traffic, which may affect other active sessions. Therefore, you are advised to set the cost-based vacuum delay feature by specifying the vacuum_cost_delay parameter. For details, see “GUC Parameters > Resource Consumption > Cost-based Vacuum Delay” in the Developer Guide.
Delete a table.
postgres=# DROP TABLE customer; postgres=# DROP TABLE customer_par; postgres=# DROP TABLE part;
If the following information is displayed, the tables have been deleted:
DROP TABLE
Maintenance Suggestions
- Routinely run VACUUM FULL for large tables. If the database performance deteriorates, run VACUUM FULL for the entire database. If the database performance is stable, you are advised to run VACUUM FULL monthly.
- Routinely run VACUUM FULL on system catalogs, especially PG_ATTRIBUTE.
- Enable automatic vacuum processes (AUTOVACUUM) in the system. The processes automatically run the VACUUM and ANALYZE statements to reclaim the record space marked as the deleted state and update statistics in the table.
Routinely Recreating an Index
Background
When data deletion is repeatedly performed in the database, index keys will be deleted from the index pages, resulting in index bloat. Recreating an index routinely improves query efficiency.
The database supports B-tree indexes. Recreating a B-tree index routinely helps improve query efficiency.
- If a large amount of data is deleted, index keys on the index pages will be deleted. As a result, the number of index pages reduces and index bloat occurs. Recreating an index helps reclaim wasted space.
- In a newly created index, pages with adjacent logical structures tend to have adjacent physical structures. Therefore, a new index achieves a higher access speed than an index that has been updated for multiple times.
Methods
Use either of the following two methods to recreate an index:
Run the DROP INDEX statement to delete the index and run the CREATE INDEX statement to create an index.
When you delete an index, a temporary exclusive lock is added in the parent table to block related read/write operations. During index creation, the write operation is locked, whereas the read operation is not locked and can use only sequential scans.
Run REINDEX to recreate an index.
- When you run the REINDEX TABLE statement to recreate an index, an exclusive lock is added to block related read/write operations.
- When you run the REINDEX INTERNAL TABLE statement to recreate an index for a desc table (such as column-store cudesc table), an exclusive lock is added to block related read/write operations on the table.
Procedure
Assume the ordinary index areaS_idx exists in the area_id column of the imported table areaS. Use either of the following two methods to recreate an index:
Run the DROP INDEX statement to delete the index and run the CREATE INDEX statement to create an index.
Delete the index.
postgres=# DROP INDEX areaS_idx; DROP INDEX
Create an index
postgres=# CREATE INDEX areaS_idx ON areaS (area_id); CREATE INDEX
Run REINDEX to recreate an index.
Run REINDEX TABLE to recreate an index.
postgres=# REINDEX TABLE areaS; REINDEX
Run REINDEX INTERNAL TABLE to recreate an index for a desc table (such as column-store cudesc table).
postgres=# REINDEX INTERNAL TABLE areaS; REINDEX
NOTE: Before you recreate an index, you can increase the values of maintenance_work_mem and psort_work_mem to accelerate the index recreation.
Data Security Maintenance Suggestions
To ensure data security in GaussDB Kernel and prevent accidents, such as data loss and illegal data access, read this section carefully.
Preventing Data Loss
You are advised to plan routine physical backup and store backup files in a reliable medium. If a serious error occurs in the system, you can use the backup files to restore the system to the state at the backup point.
Preventing Illegal Data Access
- You are advised to manage database users based on their permission hierarchies. A database administrator creates users and grants permissions to the users based on service requirements to ensure users properly access the database.
- You are advised to deploy GaussDB Kernel servers and clients (or applications developed based on the client library) in trusted internal networks. If the servers and clients must be deployed in an untrusted network, enable SSL encryption before services are started to ensure data transmission security. Note that enabling the SSL encryption function compromises database performance.
Preventing System Logs from Leaking Personal Data
Delete personal data before sending debug logs to others for analysis.
NOTE: The log level log_min_messages is set to DEBUGx (x indicates the debug level and the value ranges from 1 to 5). The information recorded in debug logs may contain personal data.
Delete personal data before sending system logs to others for analysis. If the execution of a SQL statement fails, the error SQL statement will be recorded in a system log by default. SQL statements may contain personal data.
Set log_min_error_statement to PANIC to prevent error SQL statements from being recorded in system logs. However, once the function is disabled, it is difficult to locate fault causes if faults occur.