Routine Maintenance

Routine Maintenance Check Items
Checking OS Parameters
Checking openGauss Health Status
Checking Database Performance
Checking and Deleting Logs
Checking Time Consistency
Checking the Number of Application Connections
Routinely Maintaining Tables
Routinely Recreating an Index
Data Security Maintenance Suggestions

To ensure data security in GaussDB Kernel and prevent accidents, such as data loss and illegal data access, read this section carefully.

Routine Maintenance Check Items

Checking openGauss Status

openGauss provides tools to check database and instance status, ensuring that databases and instances are running properly to provide data services.

Check instance status.
```
gs_check -U omm -i CheckClusterState
```
Check parameters.
```
postgres=# SHOW parameter_name;
```

Modify parameters.

gs_guc reload  -D /gaussdb/data/dbnode -c "paraname=value"

Checking Lock Information

The lock mechanism is an important method to ensure data consistency. Information check helps learn database transactions and database running status.

Query lock information in the database.
```
postgres=# SELECT * FROM pg_locks;
```

Query the status of threads waiting to acquire locks.

postgres=# SELECT * FROM pg_thread_wait_status WHERE wait_status = 'acquire lock';

Query the status of events waiting to acquire locks.

postgres=# SELECT node_name, thread_name, tid, wait_status, query_id FROM pgxc_thread_wait_status WHERE wait_status = 'acquire lock';

Kill a system process.
Search for a system process that is running and run the following command to end the process:
```
ps ux
kill -9 pid
```

Collecting Event Statistics

Long-time running of SQL statements will occupy a lot of system resources. You can check event occurrence time and occupied memory to learn about database running status.

Query the time points about an event.
Run the following command to query the thread start time, transaction start time, SQL start time, and status change time of the event:
```
postgres=# SELECT backend_start,xact_start,query_start,state_change FROM pg_stat_activity;
```

Query the number of sessions on the current server.

postgres=# SELECT count(*) FROM pg_stat_activity;

Collect system-level statistics.
Run the following command to query information about the session that uses the maximum memory:
```
postgres=# SELECT * FROM pv_session_memory_detail() ORDER BY usedsize desc limit 10;
```

Checking Objects

Tables, indexes, partitions, and constraints are key storage objects of a database. A database administrator needs to routinely maintain key information and these objects.

View table details.
```
postgres=# \d+ table_name 
```
Query table statistics.
```
postgres=# SELECT * FROM pg_statistic;
```
View index details.
```
postgres=# \d+ index_name
```
Query partitioned table information.
```
postgres=# SELECT * FROM pg_partition;
```
Collect statistics.
Run the ANALYZE statement to collect related statistics on the database.
Run the VACUUM statement to reclaim space and update statistics.

Query constraint information.

postgres=# SELECT * FROM pg_constraint;

Checking an SQL Report

Run the EXPLAIN statement to view execution plans.

Backing Up Data

Never forget to back up data. During the routine work, the backup execution and backup data validity need to be checked to ensure data security and encryption security.

Export a specified user.

gs_dump dbname -p port -f out.sql -U user_name -W password

Export a schema.

gs_dump dbname -p port -n schema_name -f out.sql

Export a table.

gs_dump dbname -p port -t table_name -f out.sql

Checking Basic Information

Basic information includes versions, components, and patches. Periodic database information checks and records are important for database life cycle management.

Check version information.
```
postgres=# SELECT version();
```

Check table size and database size.

postgres=# SELECT pg_table_size('table_name');
postgres=# SELECT pg_database_size('database_name');

Checking OS Parameters

Check Method

Use the gs_checkos tool provided by openGauss to check the OS status.

Prerequisites

The hardware and network are working properly.
The trust relationship of user root among the hosts is normal.
Only user root is authorized to run the gs_checkos command.

Procedure

Log in to a server as user root.
Run the following command to check OS parameters of servers where the openGauss nodes are deployed:
```
gs_checkos -i A
```
Check the OS parameters to ensure that openGauss has passed the pre-installation check and can efficiently operate after it is installed. For details about the check items, see “Server Tools > gs_checkos” in the openGauss Tool Reference.

Examples

Before running the gs_checkos command, execute pre-processing scripts by running gs_preinstall to prepare the environment. The following uses parameter A as an example:

gs_checkos -i A
Checking items:
    A1. [ OS version status ]                                   : Normal
    A2. [ Kernel version status ]                               : Normal
    A3. [ Unicode status ]                                      : Normal
    A4. [ Time zone status ]                                    : Normal
    A5. [ Swap memory status ]                                  : Normal
    A6. [ System control parameters status ]                    : Normal
    A7. [ File system configuration status ]                    : Normal
    A8. [ Disk configuration status ]                           : Normal
    A9. [ Pre-read block size status ]                          : Normal
    A10.[ IO scheduler status ]                                 : Normal
    A11.[ Network card configuration status ]                   : Normal
    A12.[ Time consistency status ]                             : Warning
    A13.[ Firewall service status ]                             : Normal
    A14.[ THP service status ]                                  : Normal
Total numbers:14. Abnormal numbers:0. Warning number:1.

The following uses parameter B as an example:

gs_checkos -i B
Setting items:
    B1. [ Set system control parameters ]                       : Normal
    B2. [ Set file system configuration value ]                 : Normal
    B3. [ Set pre-read block size value ]                       : Normal
    B4. [ Set IO scheduler value ]                              : Normal
    B5. [ Set network card configuration value ]                : Normal
    B6. [ Set THP service ]                                     : Normal
    B7. [ Set RemoveIPC value ]                                 : Normal
    B8. [ Set Session Process ]                                 : Normal
Total numbers:6. Abnormal numbers:0. Warning number:0.

Exception Handling

If you use the gs_checkos tool to check the OS and the command output shows Abnormal, run the following command to view detailed error information:

gs_checkos -i A --detail

The Abnormal state cannot be ignored because the OS in this state affects cluster installation. The Warning state does not affect cluster installation and thereby can be ignored.

If the check result for OS version status (A1) is Abnormal, replace OSs out of the mixed programming scope with those within the scope.
If the check result for kernel version status (A2) is Warning, the platform kernel versions in the cluster are inconsistent.
If the check result for Unicode status (A3) is Abnormal, set the same character set for all the hosts. You can add export LANG=unicode to the /etc/profile file.
```
vim /etc/profile
```
If the check result for time zone status (A4) is Abnormal, set the same time zone for all the hosts. You can copy the time zone file in the /usr/share/zoneinfo/ directory as the /etc/localtime file.
```
cp /usr/share/zoneinfo/$primary time zone/$secondary time zone /etc/localtime
```
If the check result for swap memory status (A5) is Abnormal, a possible cause is that the swap memory is larger than the physical memory. You can troubleshoot this issue by reducing the swap memory or increasing the physical memory.
If the check result for system control parameter status (A6) is Abnormal, troubleshoot this issue in either of the following two ways:
- Run the following command:
```
gs_checkos -i B1
```
- Modify the /etc/sysctl.conf file based on the error message and run sysctl -p to make it take effect.
```
vim /etc/sysctl.conf
```
If the check result for file system configuration status (A7) is Abnormal, run the following command to troubleshoot this issue:
```
gs_checkos -i B2
```
If the check result for disk configuration status (A8) is Abnormal, set the disk mounting format to rw,noatime,inode64,allocsize=16m.
Run the man mount command to mount the XFS parameter:
```
rw,noatime,inode64,allocsize=16m
```
You can also set the XFS parameter in the /etc/fstab file. For example:
```
/dev/data /data xfs rw,noatime,inode64,allocsize=16m 0 0
```
If the check result for pre-read block size status (A9) is Abnormal, run the following command to troubleshoot this issue:
```
gs_checkos -i B3
```
If the check result for I/O scheduling status (A10) is Abnormal, run the following command to troubleshoot this issue:
```
gs_checkos -i B4
```
If the check result for NIC configuration status (A11) is Warning, run the following command to troubleshoot this issue:
```
gs_checkos -i B5
```
If the check result for time consistency status (A12) is Abnormal, verify that the NTP service has been installed and started and has synchronized time from the NTP clock.
If the check result for firewall status (A13) is Abnormal, disable the firewall. Run the following commands:
- SUSE:
```
SuSEfirewall2 stop
```
- RedHat7:
```
systemctl disable firewalld
```
- RedHat6:
```
service iptables stop
```
If the check result for THP service status (A14) is Abnormal, run the following command to troubleshoot this issue:
```
gs_checkos -i B6
```

Checking openGauss Health Status

Check Method

Use the gs_check tool provided by openGauss to check the openGauss health status.

Precautions

Only user root is authorized to check new nodes added during cluster scale-out. In other cases, the check can be performed only by user omm.
Parameter -i or -e must be set. -i specifies a single item to be checked, and -e specifies an inspection scenario where multiple items will be checked.
If -i is not set to a root item or no such items are contained in the check item list of the scenario specified by -e, you do not need to enter the name or password of user root.
You can run --skip-root-items to skip root items.
Check the consistency between the new node and existing nodes. Run the gs_check command on an existing node and specify the --hosts parameter. The IP address of the new node needs to be written into the hosts file.

Procedure

Method 1:

Log in as the OS user omm to the primary node of the database.
Run the following command to check the openGauss database status:
```
gs_check -i CheckClusterState
```
In the command, -i indicates the check item and is case-sensitive. The format is -i CheckClusterState, -i CheckCPU or -i CheckClusterState,CheckCPU.
Checkable items are listed in “Server Tools > gs_check > openGauss status checks” in the openGauss Tool Reference. You can create a check item as needed.

Method 2:

Log in as the OS user omm to the primary node of the database.
Run the following command to check the openGauss database health status:
```
gs_check -e inspect
```
In the command, -e indicates the inspection scenario and is case-sensitive. The format is -e inspect or -e upgrade.
The inspection scenarios include inspect (routine inspection), upgrade (inspection before upgrade), Install (install inspection ), binary_upgrade (inspection before in-place upgrade), slow_node (node inspection), longtime (time-consuming inspection) and health (health inspection). You can create an inspection scenario as needed.

The openGauss inspection is performed to check openGauss status during openGauss running or to check the environment and conditions before critical operations, such as upgrade or scale-out. For details about the inspection items and scenarios, see “Server Tools > gs_check > openGauss status checks” in the openGauss Tool Reference.

Examples

Check result of a single item:

perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU
Parsing the check items config file successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:1 Nodes:3

Checking...               [=========================] 1/1
Start to analysis the check result
CheckCPU....................................OK
The item run on 3 nodes.  success: 3

Analysis the check result successfully
Success. All check items run completed. Total:1  Success:1  Failed:0
For more information please refer to /opt/huawei/wisequery/script/gspylib/inspection/output/CheckReport_201902193704661604.tar.gz

Local execution result:

perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU -L

2017-12-29 17:09:29 [NAM] CheckCPU
2017-12-29 17:09:29 [STD] Check the CPU usage of the host. If the value of idle is greater than 30% and the value of iowait is less than 30%, this item passes the check. Otherwise, this item fails the check.
2017-12-29 17:09:29 [RST] OK

2017-12-29 17:09:29 [RAW]
Linux 4.4.21-69-default (lfgp000700749)  12/29/17  _x86_64_

17:09:24        CPU     %user     %nice   %system   %iowait    %steal     %idle
17:09:25        all      0.25      0.00      0.25      0.00      0.00     99.50
17:09:26        all      0.25      0.00      0.13      0.00      0.00     99.62
17:09:27        all      0.25      0.00      0.25      0.13      0.00     99.37
17:09:28        all      0.38      0.00      0.25      0.00      0.13     99.25
17:09:29        all      1.00      0.00      0.88      0.00      0.00     98.12
Average:        all      0.43      0.00      0.35      0.03      0.03     99.17

Check result of a scenario:

[perfadm@SIA1000131072 Check]$ gs_check -e inspect
Parsing the check items config file successfully
The below items require root privileges to execute:[CheckBlockdev CheckIOrequestqueue CheckIOConfigure CheckCheckMultiQueue CheckFirewall CheckSshdService CheckSshdConfig CheckCrondService CheckNoCheckSum CheckSctpSeProcMemory CheckBootItems CheckFilehandle CheckNICModel CheckDropCache]
Please enter root privileges user[root]:root
Please enter password for user[root]:
Please enter password for user[root] on the node[10.244.57.240]:
Check root password connection successfully
Distribute the context file to remote hosts successfully
Start to health check for the cluster. Total Items:59 Nodes:2

Checking...               [                         ] 21/59
Checking...               [=========================] 59/59
Start to analysis the check result
CheckClusterState...........................OK
The item run on 2 nodes.  success: 2

CheckDBParams...............................OK
The item run on 1 nodes.  success: 1

CheckDebugSwitch............................OK
The item run on 2 nodes.  success: 2

CheckDirPermissions.........................OK
The item run on 2 nodes.  success: 2

CheckReadonlyMode...........................OK
The item run on 1 nodes.  success: 1

CheckEnvProfile.............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
GAUSSHOME        /usr1/gaussdb/app
LD_LIBRARY_PATH  /usr1/gaussdb/app/lib
PATH             /usr1/gaussdb/app/bin


CheckBlockdev...............................OK
The item run on 2 nodes.  success: 2

CheckCurConnCount...........................OK
The item run on 1 nodes.  success: 1

CheckCursorNum..............................OK
The item run on 1 nodes.  success: 1

CheckPgxcgroup..............................OK
The item run on 1 nodes.  success: 1

CheckDiskFormat.............................OK
The item run on 2 nodes.  success: 2

CheckSpaceUsage.............................OK
The item run on 2 nodes.  success: 2

CheckInodeUsage.............................OK
The item run on 2 nodes.  success: 2

CheckSwapMemory.............................OK
The item run on 2 nodes.  success: 2

CheckLogicalBlock...........................OK
The item run on 2 nodes.  success: 2

CheckIOrequestqueue.....................WARNING
The item run on 2 nodes.  warning: 2
The warning[host240,host157] value:
On device (vdb) 'IO Request' RealValue '256' ExpectedValue '32768'
On device (vda) 'IO Request' RealValue '256' ExpectedValue '32768'

CheckMaxAsyIOrequests.......................OK
The item run on 2 nodes.  success: 2

CheckIOConfigure............................OK
The item run on 2 nodes.  success: 2

CheckMTU....................................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
1500

CheckPing...................................OK
The item run on 2 nodes.  success: 2

CheckRXTX...................................NG
The item run on 2 nodes.  ng: 2
The ng[host240,host157] value:
NetWork[eth0]
RX: 256
TX: 256


CheckNetWorkDrop............................OK
The item run on 2 nodes.  success: 2

CheckMultiQueue.............................OK
The item run on 2 nodes.  success: 2

CheckEncoding...............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
LANG=en_US.UTF-8

CheckFirewall...............................OK
The item run on 2 nodes.  success: 2

CheckKernelVer..............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
3.10.0-957.el7.x86_64

CheckMaxHandle..............................OK
The item run on 2 nodes.  success: 2

CheckNTPD...................................OK
host240: NTPD service is running, 2020-06-02 17:00:28
host157: NTPD service is running, 2020-06-02 17:00:06


CheckOSVer..................................OK
host240: The current OS is centos 7.6 64bit.
host157: The current OS is centos 7.6 64bit.


CheckSysParams..........................WARNING
The item run on 2 nodes.  warning: 2
The warning[host240,host157] value:
Warning reason: variable 'net.ipv4.tcp_retries1' RealValue '3' ExpectedValue '5'.
Warning reason: variable 'net.ipv4.tcp_syn_retries' RealValue '6' ExpectedValue '5'.
Warning reason: variable 'net.sctp.path_max_retrans' RealValue '5' ExpectedValue '10'.
Warning reason: variable 'net.sctp.max_init_retransmits' RealValue '8' ExpectedValue '10'.


CheckTHP....................................OK
The item run on 2 nodes.  success: 2

CheckTimeZone...............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
+0800

CheckCPU....................................OK
The item run on 2 nodes.  success: 2

CheckSshdService............................OK
The item run on 2 nodes.  success: 2

CheckSshdConfig.........................WARNING
The item run on 2 nodes.  warning: 2
The warning[host240,host157] value:

Warning reason: UseDNS parameter is not set; expected: no

CheckCrondService...........................OK
The item run on 2 nodes.  success: 2

CheckStack..................................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
8192

CheckNoCheckSum.............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
Nochecksum value is N,Check items pass.

CheckSysPortRange...........................OK
The item run on 2 nodes.  success: 2

CheckMemInfo................................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
totalMem: 31.260929107666016G

CheckHyperThread............................OK
The item run on 2 nodes.  success: 2

CheckTableSpace.............................OK
The item run on 1 nodes.  success: 1

CheckSctpService............................OK
The item run on 2 nodes.  success: 2

CheckSysadminUser...........................OK
The item run on 1 nodes.  success: 1

CheckGUCConsistent..........................OK
All DN instance guc value is consistent.

CheckMaxProcMemory..........................OK
The item run on 1 nodes.  success: 1

CheckBootItems..............................OK
The item run on 2 nodes.  success: 2

CheckHashIndex..............................OK
The item run on 1 nodes.  success: 1

CheckPgxcRedistb............................OK
The item run on 1 nodes.  success: 1

CheckNodeGroupName..........................OK
The item run on 1 nodes.  success: 1

CheckTDDate.................................OK
The item run on 1 nodes.  success: 1

CheckDilateSysTab...........................OK
The item run on 1 nodes.  success: 1

CheckKeyProAdj..............................OK
The item run on 2 nodes.  success: 2

CheckProStartTime.......................WARNING
host157:
STARTED COMMAND
Tue Jun  2 16:57:18 2020 /usr1/dmuser/dmserver/metricdb1/server/bin/gaussdb --single_node -D /usr1/dmuser/dmb1/data -p 22204
Mon Jun  1 16:15:15 2020 /usr1/gaussdb/app/bin/gaussdb -D /usr1/gaussdb/data/dn1 -M standby


CheckFilehandle.............................OK
The item run on 2 nodes.  success: 2

CheckRouting................................OK
The item run on 2 nodes.  success: 2

CheckNICModel...............................OK
The item run on 2 nodes.  success: 2  (consistent)
The success on all nodes value:
version: 1.0.0
model: Red Hat, Inc. Virtio network device


CheckDropCache..........................WARNING
The item run on 2 nodes.  warning: 2
The warning[host240,host157] value:
No DropCache process is running

CheckMpprcFile..............................NG
The item run on 2 nodes.  ng: 2
The ng[host240,host157] value:
There is no mpprc file

Analysis the check result successfully
Failed. All check items run completed. Total:59   Success:52   Warning:5   NG:2
For more information please refer to /usr1/gaussdb/tool/script/gspylib/inspection/output/CheckReport_inspect611.tar.gz

Exception Handling

Troubleshoot exceptions detected in the inspection by following instructions in this section.

Table 1 Check of openGauss running status

Check Item	Abnormal Status	Solution
CheckClusterState (Checks the openGauss status.)	openGauss or openGauss instances are not started.	Run the following command to start openGauss and instances: gs_om -t start
CheckClusterState (Checks the openGauss status.)	The status of openGauss or openGauss instances is abnormal.	Check the status of hosts and instances. Troubleshoot this issue based on the status information. gs_check -i CheckClusterState
CheckDBParams (Checks database parameters.)	Database parameters have incorrect values.	Use the gs_guc tool to set the parameters to specified values.
CheckDebugSwitch (Checks debug logs.)	The log level is incorrect.	Use the gs_guc tool to set log_min_messages to specified content.
CheckDirPermissions (Checks directory permissions.)	The permission for a directory is incorrect.	Change the directory permission to a specified value (750 or 700). chmod 750 DIR
CheckReadonlyMode (Checks the read-only mode.)	The read-only mode is enabled.	Verify that the usage of the disk where database nodes are located does not exceed the threshold (60% by default) and no other O&M operations are performed. gs_check -i CheckDataDiskUsage ps ux Use the gs_guc tool to disable the read-only mode of openGauss. gs_guc reload -N all -I all -c 'default_transaction_read_only = off'
CheckEnvProfile (Checks environment variables.)	Environment variables are inconsistent.	Update the environment variable information.
CheckBlockdev (Checks pre-read blocks.)	The size of a pre-read block is not 16384 KB.	Use the gs_checkos tool to set the size of the pre-read block to 16384 KB and write the setting into the auto-startup file. gs_checkos -i B3
CheckCursorNum (Checks the number of cursors.)	The number of cursors fails to be checked.	Check whether the database is properly connected and whether the openGauss status is normal.
CheckPgxcgroup (Checks the data redistribution status.)	There are pgxc_group tables that have not been redistributed.	Proceed with the redistribution. gs_expand, gs_dropnode
CheckDiskFormat (Checks disk configurations.)	Disk configurations are inconsistent between nodes.	Configure disk specifications to be consistent between nodes.
CheckSpaceUsage (Checks the disk space usage.)	Disk space is insufficient.	Clear or expand the disk for the directory.
CheckInodeUsage (Checks the disk index usage.)	Disk indexes are insufficient.	Clear or expand the disk for the directory.
CheckSwapMemory (Checks the swap memory.)	The swap memory is greater than the physical memory.	Reduce or disable the swap memory.
CheckLogicalBlock (Checks logical blocks.)	The size of a logical block is not 512 KB.	Use the gs_checkos tool to set the size of the logical block to 512 KB and write the setting into the auto-startup file. gs_checkos -i B4
CheckIOrequestqueue (Checks I/O requests.)	The requested I/O is not 32768.	Use the gs_checkos tool to set the requested I/O to 32768 and write the setting into the auto-startup file. gs_checkos -i B4
CheckCurConnCount (Checks the number of current connections.)	The number of current connections exceeds 90% of the allowed maximum number of connections.	Break idle primary database node connections.
CheckMaxAsyIOrequests (Checks the maximum number of asynchronous requests.)	The maximum number of asynchronous requests is less than 104857600 or (Number of database instances on the current node x 1048576).	Use the gs_checkos tool to set the maximum number of asynchronous requests to the larger one between 104857600 and (Number of database instances on the current node x 1048576). gs_checkos -i B4
CheckMTU (Checks MTU values.)	MTU values are inconsistent between nodes.	Set the MTU value on each node to 1500 or 8192. ifconfig eth* MTU 1500
CheckIOConfigure (Checks I/O configurations.)	The I/O mode is not deadline.	Use the gs_checkos tool to set the I/O mode to deadline and write the setting into the auto-startup file. gs_checkos -i B4
CheckRXTX (Checks the RX/TX value.)	The NIC RX/TX value is not 4096.	Use the checkos tool to set the NIC RX/TX value to 4096 for openGauss. gs_checkos -i B5
CheckPing (Checks whether the network connection is normal.)	There are openGauss IP addresses that cannot be pinged.	Check the network settings, network status, and firewall status between the abnormal IP addresses.
CheckNetWorkDrop (Checks the network packet loss rate.)	The network packet loss rate is greater than 1%.	Check the network load and status between the corresponding IP addresses.
CheckMultiQueue (Checks the NIC multi-queue function.)	Multiqueue is not enabled for the NIC, and NIC interruptions are not bound to different CPU cores.	Enable multiqueue for the NIC, and bind NIC interruptions to different CPU cores.
CheckEncoding (Checks the encoding format.)	Encoding formats are inconsistent between nodes.	Write the same encoding format into /etc/profile for each node. echo "export LANG=XXX" >> /etc/profile
CheckActQryCount (Checks the archiving mode.)	The archiving mode is enabled, and the archiving directory is not under the primary database node directory.	Disable archiving mode or set the archiving directory to be under the primary database node directory.
CheckFirewall (Checks the firewall.)	The firewall is enabled.	Disable the firewall. redHat(CentOS)7.x: systemctl disable firewalld.service redHat(CentOS)6.x: service iptables down SuSE: SuSEfirewall2 down
CheckKernelVer (Checks kernel versions.)	Kernel versions are inconsistent between nodes.
CheckMaxHandle (Checks the maximum number of file handles.)	The maximum number of handles is less than 1000000.	Set the soft and hard limits in the 91-nofile.conf or 90-nofile.conf file to 1000000. gs_checkos -i B2
CheckNTPD (Checks the time synchronization service.)	The NTPD service is disabled or the time difference is greater than 1 minute.	Enable the NTPD service and set the time to be consistent.
CheckSysParams (Checks OS parameters.)	OS parameter settings do not meet requirements.	Use the gs_checkos tool or manually set parameters to values meeting requirements. gs_checkos -i B1 vim /etc/sysctl.conf
CheckTHP (Checks the THP service.)	The THP service is disabled.	Use the gs_checkos to enable the THP service. gs_checkos -i B6
CheckTimeZone (Checks time zones.)	Time zones are inconsistent between nodes.	Set time zones to be consistent between nodes. cp /usr/share/zoneinfo/$primary time zone/$secondary time zone /etc/localtime
CheckCPU (Checks the CPU.)	CPU usage is high or I/O waiting time is too long.	Upgrade CPUs or improve disk performance.
CheckSshdService (Checks the SSHD service.)	The SSHD service is disabled.	Enable the SSHD service and write the setting into the auto-startup file. service sshd start echo "server sshd start" >> initFile
CheckSshdConfig (Checks SSHD configurations.)	The SSHD service is incorrectly configured.	Reconfigure the SSHD service. PasswordAuthentication=no; MaxStartups=1000; UseDNS=yes; ClientAliveInterval=10800/ClientAliveInterval=0 Restart the service. server sshd start
CheckCrondService (Checks the Crond service.)	The Crond service is disabled.	Install and enable the Crond service.
CheckStack (Checks the stack size.)	The stack size is less than 3072.	Use the gs_checkos tool to set the stack size to 3072 and restart the processes with a smaller stack size. gs_checkos -i B2
CheckNoCheckSum (Checks the NoCheckSum parameter.)	NoCheckSum is incorrectly set or its value is inconsistent on each node.	Set NoCheckSum to a consistent value on each node. If redHat6.4 or redHat6.5 with the NIC binding mode bond0 exists, set NoCheckSum to Y. In other cases, set it to N. echo Y > /sys/module/sctp/parameters/no_checksums
CheckSysPortRange (Checks OS port configurations.)	OS IP ports are not within the required port range or openGauss ports are within the OS IP port range.	Set the OS IP ports within 26000 to 65535 and set the openGauss ports beyond the OS IP port range. vim /etc/sysctl.conf
CheckMemInfo (Checks the memory information.)	Memory sizes are inconsistent between nodes.	Use physical memory of the same specifications between nodes.
CheckHyperThread (Checks the hyper-threading.)	The CPU hyper-threading is disabled.	Enable the CPU hyper-threading.
CheckTableSpace (Checks tablespaces.)	The tablespace path is nested with the openGauss path or nested with the path of another tablespace.	Migrate tablespace data to the tablespace with a valid path.
CheckSctpService (Checks the SCTP service.)	The SCTP service is disabled.	Install and enable the SCTP service. modprobe sctp

Checking Database Performance

Check Method

Use the gs_checkperf tool provided by openGauss to check hardware performance.

Prerequisites

openGauss is running properly.
Services are running properly on the database.

Procedure

Log in as the OS user omm to the primary node of the database.
Run the following command to check the openGauss database performance:
```
gs_checkperf
```

For details about performance statistical items, see “Server Tools > gs_checkperf > Performance Check Items” in the openGauss Tool Reference.

Examples

Simple performance statistical result is displayed on the screen as follows:

gs_checkperf -i pmk -U omm
Cluster statistics information:
    Host CPU busy time ratio                     :    1.43       %
    MPPDB CPU time % in busy time                :    1.88       %
    Shared Buffer Hit ratio                      :    99.96      %
    In-memory sort ratio                         :    100.00     %
    Physical Reads                               :    4
    Physical Writes                              :    25
    DB size                                      :    70         MB
    Total Physical writes                        :    25
    Active SQL count                             :    2
    Session count                                :    3

Exception Handling

After you use the gs_checkperf tool to check the cluster performance, if the performance is abnormal, troubleshoot the issue by following instructions in this section.

Table 1 Cluster-level performance status

Abnormal Status	Solution
High CPU usage of hosts	1. Add high-performance CPUs, or replace current CPUs with them. 2. Run the top command to check which system processes cause high CPU usage, and run the kill command to stop unused processes. top
High CPU usage of GaussDB Kernel	1. Add high-performance CPUs, or replace current CPUs with them. 2. Run the top command to check which database processes cause high CPU usage, and run the kill command to stop unused processes. top 3. Use the gs_expand tool to add new hosts to lower the CPU usage.
Low hit ratio of the shared memory	1. Expand the memory. 2. Run the following command to check the OS configuration file /etc/sysctl.conf and increase the value of kernel.shmmax. vim /etc/sysctl.conf
Low in-memory sort ratio	Expand the memory.
High I/O and disk usage	1. Replace current disks with high-performance ones. 2. Adjust the data layout to evenly distribute I/O requests to all the physical disks. 3. Run VACUUM FULL for the entire database. vacuum full; 4. Clean up the disk space. 5. Reduce the number of concurrent connections.
Transaction statistics	Query the pg_stat_activity system catalog and disconnect unnecessary connections. (Log in to the database and run the postgres=# \d+ pg_stat_activity; command.)

Table 2 Node-level performance status

Abnormal Status	Solution
High CPU usage	1. Add high-performance CPUs, or replace current CPUs with them. 2. Run the top command to check which system processes cause high CPU usage, and run the kill command to stop unused processes. top
High memory usage	Expand or clean up the memory.
High I/O usage	1. Replace current disks with high-performance ones. 2. Clean up the disk space. 3. Use memory read/write to replace as much disk I/O as possible, putting frequently accessed files or data in the memory.

Abnormal Status

Solution

High CPU usage

1. Add high-performance CPUs, or replace current CPUs with them.

2. Run the top command to check which system processes cause high CPU usage, and run the kill command to stop unused processes.

top

High memory usage

Expand or clean up the memory.

High I/O usage

1. Replace current disks with high-performance ones.

2. Clean up the disk space.

3. Use memory read/write to replace as much disk I/O as possible, putting frequently accessed files or data in the memory.

Table 3 Session/process-level performance status

Abnormal Status	Solution
High CPU, memory, and I/O usage	Check which processes cause high CPU, memory, or I/O usage. If they are unnecessary processes, kill them; otherwise, analyze the specific cause of high usage. For example, if SQL statement execution occupies much memory, check whether the SQL statements need optimization.

Table 4 SSD performance status

Abnormal Status	Solution
SSD read/write fault	Run the following command to check whether SSD is faulty. If yes, analyze the specific cause. gs_checkperf -i SSD -U omm

Abnormal Status

Solution

SSD read/write fault

Run the following command to check whether SSD is faulty. If yes, analyze the specific cause.

gs_checkperf -i SSD -U omm

Checking and Deleting Logs

You are advised to check OS logs and database run logs monthly for monitoring system status and troubleshooting, and to delete database run logs monthly for saving disk space.

Checking OS Logs

You are advised to monthly check OS logs to detect and prevent potential OS problems.

Procedure

Run the following command to check OS log files:

vim /var/log/messages

(Pay attention to words like kernel, error, and fatal in logs generated within the last month and handle the problems based on the alarm information.)

Checking openGauss Run Logs

A database can still run when errors occur during the execution of some operations. However, data may be inconsistent before and after the error occurrences. Therefore, you are advised to monthly check openGauss run logs to detect potential problems in time.

Prerequisites

The host used for collecting logs is running properly, and the network connection is normal. Database installation users trust each other.
An OS tool (for example, gstack) that the log collection tool requires has been installed. If it is not installed, an error message is displayed, and this collection item is skipped.

Procedure

Log in as the OS user omm to the primary node of the database.
Run the following command to collect database logs:
```
gs_collector --begin-time="20160616 01:01" --end-time="20160616 23:59"
```
In the command, 20160616 01:01 indicates the start time of the log and 20160616 23:59 indicates the end time of the log.
Based on command output in 2, access the related log collection directory, decompress collected database logs, and check these logs.
Assume that collected logs are stored in /opt/gaussdb/tmp/gaussdba_mppdb/collector_20160726_105158.tar.gz.
```
tar -xvzf /opt/gaussdb/tmp/gaussdba_mppdb/collector_20160726_105158.tar.gz 
cd /opt/gaussdb/tmp/gaussdba_mppdb/collector_20160726_105158
```

Examples

Run the gs_collector command together with parameters --begin-time and --end-time:

gs_collector --begin-time="20160616 01:01" --end-time="20160616 23:59"