因磁阵IO慢导致性能异常的问题
问题现象
资源池化性能异常,性能过低或忽然降低。
定位方法
可通过数据库内置视图 statement_history 来查看慢SQL的性能瓶颈,分析相关的性能损耗点,发现在IO上。
openGauss=# select *,statement_detail_decode(details, 'plaintext', true) from statement_history WHERE QUERY LIKE 'insert%' ; -[ RECORD 1 ] ------------------------------------------------------------------ db_name | postgres schema_name | "$user",public origin_node | 0 user_name | carrot application_name | gsql client_addr | client_port | -1 unique_query_id | 3750366782 debug_query_id | 562949953433822 query | insert into a select * from a; start_time | 2024-10-12 09:48:13.171546+08 finish_time | 2024-10-12 09:48:16.273371+08 slow_sql_threshold | 0 transaction_id | 14247 thread_id | 281450315045296 session_id | 281450315045296 n_soft_parse | 0 n_hard_parse | 1 query_plan | Datanode Name: dn1 | Insert on a (cost=0.00..7562.88 rows=524288 width=4) | -> Seq Scan on a (cost=0.00..7562.88 rows=524288 width=4) | | n_returned_rows | 0 n_tuples_fetched | 8 n_tuples_returned | 526614 n_tuples_inserted | 524288 n_tuples_updated | 0 n_tuples_deleted | 0 n_blocks_fetched | 16267 n_blocks_hit | 13946 db_time | 3101823 cpu_time | 2931460 execution_time | 3100285 parse_time | 45 plan_time | 256 rewrite_time | 12 pl_execution_time | 0 pl_compilation_time | 0 data_io_time | 0 net_send_info | {"time":43543, "n_calls":2327, "size":169431} net_recv_info | {"time":0, "n_calls":0, "size":0} net_stream_send_info | {"time":0, "n_calls":0, "size":0} net_stream_recv_info | {"time":0, "n_calls":0, "size":0} lock_count | 4675 lock_time | 0 lock_wait_count | 0 lock_wait_time | 0 lock_max_count | 8 lwlock_count | 0 lwlock_wait_count | 0 lwlock_time | 0 lwlock_wait_time | 0 details | \xaf0000000241a500000006000000011300427566486173685461626c65536561726368009e3f000000000000011200537472617465677947657442756666657200960b0000000000000111004461746146696c655472756e636174 65000000000000000000010e004461746146696c65577269746500969d04000000000001100057414c427566666572416363657373000000000000000000040b00666c7573682064617461000d00000000000000 is_slow_sql | t trace_id | advise | net_send_time | 0 srt1_q | 14 srt2_simple_query | 567 srt3_analyze_rewrite | 565 srt4_plan_query | 2 srt5_light_query | 0 srt6_p | 0 srt7_b | 0 srt8_e | 0 srt9_d | 0 srt10_s | 0 srt11_c | 0 srt12_u | 0 srt13_before_query | 59 srt14_after_query | 6 rtt_unknown | 12 parent_query_id | 0 net_trans_time | 0 statement_detail_decode | ---------------Wait Events Area--------------- | '1' IO_EVENT DataFileWrite 302486 (us) | '2' IO_EVENT BufHashTableSearch 16286 (us) | '3' IO_EVENT StrategyGetBuffer 2966 (us) | '4' STATUS flush data 13 (us) | '5' IO_EVENT WALBufferAccess 0 (us) | '6' IO_EVENT DataFileTruncate 0 (us)
通过waitevent视图来查看相关的等待事件信息,等待最久的为IO相关的事件。
openGauss=# select * from dbe_perf.wait_events order by total_wait_time desc; nodename | type | event | wait | failed_wait | total_wait_time | avg_wait_time | max_wait_time | min_wait_time | last_updated ----------+----------+----------------------------+---------+-------------+-----------------+---------------+---------------+---------------+------------------------------- dn1 | IO_EVENT | DataFileRead | 65401 | 0 | 14800532 | 226 | 131456 | 91 | 2024-10-12 10:03:10.417273+08 dn1 | IO_EVENT | WALWrite | 50070 | 0 | 5831542 | 116 | 19795 | 85 | 2024-10-12 10:03:10.130683+08 dn1 | IO_EVENT | DataFileWrite | 25755 | 0 | 3481093 | 135 | 5476 | 90 | 2024-10-12 10:02:45.263841+08 dn1 | IO_EVENT | BufHashTableSearch | 2609675 | 0 | 2628273 | 1 | 14316 | 1 | 2024-10-12 10:03:36.548881+08 dn1 | IO_EVENT | DoubleWriteFileWrite | 1563 | 0 | 768891 | 491 | 17380 | 126 | 2024-10-12 10:03:10.416386+08 dn1 | IO_EVENT | ControlFileWriteUpdate | 2794 | 0 | 434213 | 155 | 15372 | 91 | 2024-10-12 10:02:45.260013+08 dn1 | IO_EVENT | StrategyGetBuffer | 20835 | 0 | 28848 | 1 | 39 | 1 | 2024-10-12 10:03:10.12946+08 dn1 | IO_EVENT | SLRUWrite | 38 | 0 | 10501 | 276 | 1599 | 103 | 2024-10-12 09:59:44.250612+08 dn1 | IO_EVENT | ControlFileSyncUpdate | 2794 | 0 | 2812 | 1 | 5 | 1 | 2024-10-12 10:02:45.260013+08 dn1 | IO_EVENT | SLRURead | 3 | 0 | 2197 | 732 | 795 | 673 | 2024-10-11 16:17:56.402803+08 dn1 | IO_EVENT | ControlFileWrite | 6 | 0 | 2160 | 360 | 1141 | 174 | 2024-10-11 10:43:27.570611+08 dn1 | IO_EVENT | DoubleWriteFileRead | 1 | 0 | 1174 | 1174 | 1174 | 1174 | 2024-10-11 10:43:27.238529+08 dn1 | IO_EVENT | SLRUFlushSync | 38 | 0 | 38 | 1 | 1 | 1 | 2024-10-12 09:59:44.250622+08 dn1 | IO_EVENT | ControlFileSync | 6 | 0 | 6 | 1 | 1 | 1 | 2024-10-11 10:43:27.570612+08 dn1 | IO_EVENT | LockFileCreateSync | 0 | 0 | 0 | 0 | 0 | 0 | 2024-10-11 10:43:25.343805+08 dn1 | IO_EVENT | LockFileCreateWRITE | 0 | 0 | 0 | 0 | 0 | 0 | 2024-10-11 10:43:25.343805+08 dn1 | IO_EVENT | RelationMapRead | 0 | 0 | 0 | 0 | 0 | 0 | 2024-10-11 10:43:25.343805+08 dn1 | IO_EVENT | RelationMapSync | 0 | 0 | 0 | 0 | 0 | 0 | 2024-10-11 10:43:25.343805+08 dn1 | IO_EVENT | RelationMapWrite | 0 | 0 | 0 | 0 | 0 | 0 | 2024-10-11 10:43:25.343805+08 dn1 | IO_EVENT | ReplicationSlotRead | 0 | 0 | 0 | 0 | 0 | 0 | 2024-10-11 10:43:25.343805+08 ....
通过iostat来观察IO情况,观察所使用磁阵IO是否异常,例如sdbm磁阵盘的IO异常缓慢。
[carrot@openGauss111 openGauss-server]$ iostat -m -x 1 Linux 4.19.90-2003.4.0.0036.oe1.aarch64 (openGauss111) 10/12/2024 _aarch64_ (96 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 6.02 0.00 10.44 1.65 0.00 81.89 Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz aqu-sz %util dm-0 0.10 0.00 0.00 0.00 28.48 52.41 82.17 8.84 0.00 0.00 75.09 110.15 0.00 0.00 0.00 0.00 0.00 0.00 6.17 24.42 dm-1 0.00 0.00 0.00 0.00 22.91 64.46 0.01 0.00 0.00 0.00 811.50 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 dm-2 0.61 0.01 0.00 0.00 12.39 12.52 15.70 0.21 0.00 0.00 95.50 13.64 0.00 0.00 0.00 0.00 0.00 0.00 1.51 14.52 nvme0n1 5.61 0.66 0.00 0.00 1.06 119.75 52.04 3.30 1.88 3.48 1.18 64.97 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.12 nvme1n1 14216.11 1773.03 0.00 0.00 0.37 127.71 68.38 3.14 6.04 8.11 0.59 46.96 0.00 0.00 0.00 0.00 0.00 0.00 0.03 44.89 nvme2n1 6.36 0.71 0.00 0.00 1.04 113.90 7.91 0.77 0.12 1.46 0.49 99.62 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 nvme3n1 4.36 0.54 0.00 0.00 1.13 127.57 4.99 0.58 0.08 1.57 0.66 118.57 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 sda 0.68 0.01 0.03 3.86 10.21 18.84 77.74 9.04 19.99 20.45 51.70 119.06 0.00 0.00 0.00 0.00 0.00 0.00 3.88 29.51 sdaa 11.97 0.71 0.00 0.00 0.31 60.47 0.01 0.00 0.00 0.00 0.22 10.48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.35 sdab 0.31 0.31 0.00 0.00 2.62 1018.33 0.42 0.00 0.00 0.00 0.34 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.06 sdac 13.57 0.54 0.00 0.00 0.28 40.52 0.01 0.00 0.00 0.00 0.22 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.36 sdbi 4.67 0.32 0.00 0.00 0.31 70.57 0.00 0.00 0.01 95.00 20.82 1280.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 sdbj 14.42 0.35 0.00 0.00 0.21 25.06 0.69 0.02 0.01 1.26 0.28 27.61 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.32 sdbk 5.55 0.30 0.00 0.00 0.29 55.78 0.07 0.01 0.01 10.14 1.14 156.79 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.15 sdbl 0.09 0.01 0.00 0.00 0.39 91.84 0.15 0.00 0.03 18.07 0.58 15.36 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 sdbm 2.66 0.14 0.00 0.00 0.30 55.09 1088 52 0.03 94.91 21.79 1256.71 0.00 0.00 0.00 0.00 69.4 0.00 0.00 99 sdbn 0.00 0.00 0.00 0.00 0.37 65.58 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdce 0.00 0.00 0.00 0.00 0.27 65.58 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdcf 2.87 0.05 0.00 0.00 0.26 17.83 0.01 0.00 0.00 0.00 1.33 202.57 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.08 sdcg 0.94 0.05 0.00 0.00 0.36 51.79 0.00 0.00 0.00 0.00 2.61 930.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 sdch 0.01 0.01 0.00 0.00 3.79 871.17 0.02 0.00 0.00 0.00 0.84 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdci 0.63 0.03 0.00 0.00 0.44 45.14 0.00 0.00 0.00 0.00 0.47 1.95 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 sdn 14.38 0.73 0.00 0.00 0.26 51.70 0.00 0.00 0.00 0.00 1.24 4.92 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.35 sdo 11.02 0.71 0.00 0.00 0.28 65.61 0.02 0.02 0.00 0.00 3.28 970.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.30 sdp 0.94 0.94 0.00 0.00 2.71 1023.83 0.32 0.00 0.00 0.00 0.39 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.14 sdq 17.75 0.41 0.00 0.00 0.27 23.94 0.03 0.00 0.00 0.00 0.23 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.47 sdr 0.00 0.00 0.00 0.00 30.06 64.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdv 46.89 0.77 0.00 0.00 0.32 16.71 20.70 0.30 8.33 28.69 0.42 14.68 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.74 sdw 16.25 0.57 0.00 0.00 0.23 36.06 8.69 0.71 0.00 0.00 0.55 83.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.71 sdx 0.34 0.34 0.00 0.00 2.67 1018.91 0.24 0.00 0.00 0.00 0.36 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.06 sdy 12.74 0.31 0.00 0.00 0.27 25.26 0.02 0.00 0.00 0.00 0.23 1.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.33 sdz 24.44 0.77 0.00 0.00 0.25 32.19 0.01 0.00 0.00 35.11 0.29 14.79 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.59
确定为磁阵问题,登陆磁阵平台,查看磁阵IO状态、日志等,根据磁阵故障手册进一步定位。
问题根因
磁阵性能异常,磁阵网络异常或磁阵本身异常导致。
解决方案
若磁阵网络异常,则恢复网络。
若磁阵功能异常,则需要按照磁阵故障手册继续排查。
意见反馈