因磁阵IO慢导致性能异常的问题

问题现象

资源池化性能异常,性能过低或忽然降低。

定位方法

  1. 可通过数据库内置视图 statement_history 来查看慢SQL的性能瓶颈,分析相关的性能损耗点,发现在IO上。

    openGauss=# select *,statement_detail_decode(details, 'plaintext', true) from  statement_history WHERE QUERY LIKE 'insert%'
    ;
    -[ RECORD 1 ]
    ------------------------------------------------------------------
    db_name                 | postgres
    schema_name             | "$user",public
    origin_node             | 0
    user_name               | carrot
    application_name        | gsql
    client_addr             |
    client_port             | -1
    unique_query_id         | 3750366782
    debug_query_id          | 562949953433822
    query                   | insert into a select * from a;
    start_time              | 2024-10-12 09:48:13.171546+08
    finish_time             | 2024-10-12 09:48:16.273371+08
    slow_sql_threshold      | 0
    transaction_id          | 14247
    thread_id               | 281450315045296
    session_id              | 281450315045296
    n_soft_parse            | 0
    n_hard_parse            | 1
    query_plan              | Datanode Name: dn1
                            | Insert on a  (cost=0.00..7562.88 rows=524288 width=4)
                            |   ->  Seq Scan on a  (cost=0.00..7562.88 rows=524288 width=4)
                            |
                            |
    n_returned_rows         | 0
    n_tuples_fetched        | 8
    n_tuples_returned       | 526614
    n_tuples_inserted       | 524288
    n_tuples_updated        | 0
    n_tuples_deleted        | 0
    n_blocks_fetched        | 16267
    n_blocks_hit            | 13946
    db_time                 | 3101823
    cpu_time                | 2931460
    execution_time          | 3100285
    parse_time              | 45
    plan_time               | 256
    rewrite_time            | 12
    pl_execution_time       | 0
    pl_compilation_time     | 0
    data_io_time            | 0
    net_send_info           | {"time":43543, "n_calls":2327, "size":169431}
    net_recv_info           | {"time":0, "n_calls":0, "size":0}
    net_stream_send_info    | {"time":0, "n_calls":0, "size":0}
    net_stream_recv_info    | {"time":0, "n_calls":0, "size":0}
    lock_count              | 4675
    lock_time               | 0
    lock_wait_count         | 0
    lock_wait_time          | 0
    lock_max_count          | 8
    lwlock_count            | 0
    lwlock_wait_count       | 0
    lwlock_time             | 0
    lwlock_wait_time        | 0
    details                 | \xaf0000000241a500000006000000011300427566486173685461626c65536561726368009e3f000000000000011200537472617465677947657442756666657200960b0000000000000111004461746146696c655472756e636174
    65000000000000000000010e004461746146696c65577269746500969d04000000000001100057414c427566666572416363657373000000000000000000040b00666c7573682064617461000d00000000000000
    is_slow_sql             | t
    trace_id                |
    advise                  |
    net_send_time           | 0
    srt1_q                  | 14
    srt2_simple_query       | 567
    srt3_analyze_rewrite    | 565
    srt4_plan_query         | 2
    srt5_light_query        | 0
    srt6_p                  | 0
    srt7_b                  | 0
    srt8_e                  | 0
    srt9_d                  | 0
    srt10_s                 | 0
    srt11_c                 | 0
    srt12_u                 | 0
    srt13_before_query      | 59
    srt14_after_query       | 6
    rtt_unknown             | 12
    parent_query_id         | 0
    net_trans_time          | 0
    statement_detail_decode |         ---------------Wait Events Area---------------
                            | '1'     IO_EVENT        DataFileWrite                                       302486 (us)
                            | '2'     IO_EVENT        BufHashTableSearch                                   16286 (us)
                            | '3'     IO_EVENT        StrategyGetBuffer                                     2966 (us)
                            | '4'     STATUS          flush data                                              13 (us)
                            | '5'     IO_EVENT        WALBufferAccess                                          0 (us)
                            | '6'     IO_EVENT        DataFileTruncate                                         0 (us)
    
  2. 通过waitevent视图来查看相关的等待事件信息,等待最久的为IO相关的事件。

    openGauss=# select * from dbe_perf.wait_events order by total_wait_time desc;
    nodename |   type   |           event            |  wait   | failed_wait | total_wait_time | avg_wait_time | max_wait_time | min_wait_time |         last_updated
    ----------+----------+----------------------------+---------+-------------+-----------------+---------------+---------------+---------------+-------------------------------
    dn1      | IO_EVENT | DataFileRead               |   65401 |           0 |        14800532 |           226 |        131456 |            91 | 2024-10-12 10:03:10.417273+08
    dn1      | IO_EVENT | WALWrite                   |   50070 |           0 |         5831542 |           116 |         19795 |            85 | 2024-10-12 10:03:10.130683+08
    dn1      | IO_EVENT | DataFileWrite              |   25755 |           0 |         3481093 |           135 |          5476 |            90 | 2024-10-12 10:02:45.263841+08
    dn1      | IO_EVENT | BufHashTableSearch         | 2609675 |           0 |         2628273 |             1 |         14316 |             1 | 2024-10-12 10:03:36.548881+08
    dn1      | IO_EVENT | DoubleWriteFileWrite       |    1563 |           0 |          768891 |           491 |         17380 |           126 | 2024-10-12 10:03:10.416386+08
    dn1      | IO_EVENT | ControlFileWriteUpdate     |    2794 |           0 |          434213 |           155 |         15372 |            91 | 2024-10-12 10:02:45.260013+08
    dn1      | IO_EVENT | StrategyGetBuffer          |   20835 |           0 |           28848 |             1 |            39 |             1 | 2024-10-12 10:03:10.12946+08
    dn1      | IO_EVENT | SLRUWrite                  |      38 |           0 |           10501 |           276 |          1599 |           103 | 2024-10-12 09:59:44.250612+08
    dn1      | IO_EVENT | ControlFileSyncUpdate      |    2794 |           0 |            2812 |             1 |             5 |             1 | 2024-10-12 10:02:45.260013+08
    dn1      | IO_EVENT | SLRURead                   |       3 |           0 |            2197 |           732 |           795 |           673 | 2024-10-11 16:17:56.402803+08
    dn1      | IO_EVENT | ControlFileWrite           |       6 |           0 |            2160 |           360 |          1141 |           174 | 2024-10-11 10:43:27.570611+08
    dn1      | IO_EVENT | DoubleWriteFileRead        |       1 |           0 |            1174 |          1174 |          1174 |          1174 | 2024-10-11 10:43:27.238529+08
    dn1      | IO_EVENT | SLRUFlushSync              |      38 |           0 |              38 |             1 |             1 |             1 | 2024-10-12 09:59:44.250622+08
    dn1      | IO_EVENT | ControlFileSync            |       6 |           0 |               6 |             1 |             1 |             1 | 2024-10-11 10:43:27.570612+08
    dn1      | IO_EVENT | LockFileCreateSync         |       0 |           0 |               0 |             0 |             0 |             0 | 2024-10-11 10:43:25.343805+08
    dn1      | IO_EVENT | LockFileCreateWRITE        |       0 |           0 |               0 |             0 |             0 |             0 | 2024-10-11 10:43:25.343805+08
    dn1      | IO_EVENT | RelationMapRead            |       0 |           0 |               0 |             0 |             0 |             0 | 2024-10-11 10:43:25.343805+08
    dn1      | IO_EVENT | RelationMapSync            |       0 |           0 |               0 |             0 |             0 |             0 | 2024-10-11 10:43:25.343805+08
    dn1      | IO_EVENT | RelationMapWrite           |       0 |           0 |               0 |             0 |             0 |             0 | 2024-10-11 10:43:25.343805+08
    dn1      | IO_EVENT | ReplicationSlotRead        |       0 |           0 |               0 |             0 |             0 |             0 | 2024-10-11 10:43:25.343805+08
    ....
    
  3. 通过iostat来观察IO情况,观察所使用磁阵IO是否异常,例如sdbm磁阵盘的IO异常缓慢。

    [carrot@openGauss111 openGauss-server]$ iostat -m -x 1
    Linux 4.19.90-2003.4.0.0036.oe1.aarch64 (openGauss111)  10/12/2024      _aarch64_       (96 CPU)
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            6.02    0.00   10.44    1.65    0.00   81.89
    
    Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
    dm-0             0.10      0.00     0.00   0.00   28.48    52.41   82.17      8.84     0.00   0.00   75.09   110.15    0.00      0.00     0.00   0.00    0.00     0.00    6.17  24.42
    dm-1             0.00      0.00     0.00   0.00   22.91    64.46    0.01      0.00     0.00   0.00  811.50    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.01   0.01
    dm-2             0.61      0.01     0.00   0.00   12.39    12.52   15.70      0.21     0.00   0.00   95.50    13.64    0.00      0.00     0.00   0.00    0.00     0.00    1.51  14.52
    nvme0n1          5.61      0.66     0.00   0.00    1.06   119.75   52.04      3.30     1.88   3.48    1.18    64.97    0.00      0.00     0.00   0.00    0.00     0.00    0.03   0.12
    nvme1n1       14216.11  1773.03     0.00   0.00    0.37   127.71   68.38      3.14     6.04   8.11    0.59    46.96    0.00      0.00     0.00   0.00    0.00     0.00    0.03  44.89
    nvme2n1          6.36      0.71     0.00   0.00    1.04   113.90    7.91      0.77     0.12   1.46    0.49    99.62    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.04
    nvme3n1          4.36      0.54     0.00   0.00    1.13   127.57    4.99      0.58     0.08   1.57    0.66   118.57    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.02
    sda              0.68      0.01     0.03   3.86   10.21    18.84   77.74      9.04    19.99  20.45   51.70   119.06    0.00      0.00     0.00   0.00    0.00     0.00    3.88  29.51
    sdaa            11.97      0.71     0.00   0.00    0.31    60.47    0.01      0.00     0.00   0.00    0.22    10.48    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.35
    sdab             0.31      0.31     0.00   0.00    2.62  1018.33    0.42      0.00     0.00   0.00    0.34     0.50    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.06
    sdac            13.57      0.54     0.00   0.00    0.28    40.52    0.01      0.00     0.00   0.00    0.22     2.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.36
    sdbi             4.67      0.32     0.00   0.00    0.31    70.57    0.00      0.00     0.01  95.00   20.82  1280.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.13
    sdbj            14.42      0.35     0.00   0.00    0.21    25.06    0.69      0.02     0.01   1.26    0.28    27.61    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.32
    sdbk             5.55      0.30     0.00   0.00    0.29    55.78    0.07      0.01     0.01  10.14    1.14   156.79    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.15
    sdbl             0.09      0.01     0.00   0.00    0.39    91.84    0.15      0.00     0.03  18.07    0.58    15.36    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.01
    sdbm             2.66      0.14     0.00   0.00    0.30    55.09    1088        52     0.03  94.91   21.79  1256.71    0.00      0.00     0.00   0.00    69.4     0.00    0.00     99
    sdbn             0.00      0.00     0.00   0.00    0.37    65.58    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
    sdce             0.00      0.00     0.00   0.00    0.27    65.58    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
    sdcf             2.87      0.05     0.00   0.00    0.26    17.83    0.01      0.00     0.00   0.00    1.33   202.57    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.08
    sdcg             0.94      0.05     0.00   0.00    0.36    51.79    0.00      0.00     0.00   0.00    2.61   930.80    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.03
    sdch             0.01      0.01     0.00   0.00    3.79   871.17    0.02      0.00     0.00   0.00    0.84     0.50    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
    sdci             0.63      0.03     0.00   0.00    0.44    45.14    0.00      0.00     0.00   0.00    0.47     1.95    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.03
    sdn             14.38      0.73     0.00   0.00    0.26    51.70    0.00      0.00     0.00   0.00    1.24     4.92    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.35
    sdo             11.02      0.71     0.00   0.00    0.28    65.61    0.02      0.02     0.00   0.00    3.28   970.80    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.30
    sdp              0.94      0.94     0.00   0.00    2.71  1023.83    0.32      0.00     0.00   0.00    0.39     0.50    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.14
    sdq             17.75      0.41     0.00   0.00    0.27    23.94    0.03      0.00     0.00   0.00    0.23     2.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.47
    sdr              0.00      0.00     0.00   0.00   30.06    64.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
    sdv             46.89      0.77     0.00   0.00    0.32    16.71   20.70      0.30     8.33  28.69    0.42    14.68    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.74
    sdw             16.25      0.57     0.00   0.00    0.23    36.06    8.69      0.71     0.00   0.00    0.55    83.07    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.71
    sdx              0.34      0.34     0.00   0.00    2.67  1018.91    0.24      0.00     0.00   0.00    0.36     0.50    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.06
    sdy             12.74      0.31     0.00   0.00    0.27    25.26    0.02      0.00     0.00   0.00    0.23     1.99    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.33
    sdz             24.44      0.77     0.00   0.00    0.25    32.19    0.01      0.00     0.00  35.11    0.29    14.79    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.59
    
  4. 确定为磁阵问题,登陆磁阵平台,查看磁阵IO状态、日志等,根据磁阵故障手册进一步定位。

问题根因

磁阵性能异常,磁阵网络异常或磁阵本身异常导致。

解决方案

若磁阵网络异常,则恢复网络。

若磁阵功能异常,则需要按照磁阵故障手册继续排查。

意见反馈
编组 3备份
    openGauss 2025-04-20 07:52:01
    取消