窗口函数

窗口函数

列存表目前只支持rank(expression)和row_number(expression)两个函数。

窗口函数与OVER语句一起使用。OVER语句用于对数据进行分组,并对组内元素进行排序。窗口函数用于给组内的值生成序号。

示例:

--创建表。
openGauss=# CREATE TABLE t2 (c1 INT, c2 INT);
--导入数据。
openGauss=# INSERT INTO t2 SELECT generate_series, generate_series FROM generate_series(1, 1000000);
--执行查询,查询正常无报错,且执行结果正确。
openGauss=# EXPLAIN ANALYZE SELECT nc1 FROM (
SELECT row_number() over() rid,
t2.c1 nc1
FROM t2
) WHERE rid BETWEEN 1 AND (1 + 10 - 1);
                                        QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
Subquery Scan on __unnamed_subquery__  (cost=0.00..0.42 rows=3 width=4) (actual time=0.201..0.228 rows=10 loops=1)
  Filter: (__unnamed_subquery__.rid >= 1)
  ->  WindowAgg  (cost=0.00..0.30 rows=10 width=4) (actual time=0.191..0.211 rows=10 loops=1)
        row_number_filter: (row_number() OVER () <= 10)
        ->  Seq Scan on t2  (cost=0.00..11977.45 rows=817445 width=4) (actual time=0.150..0.153 rows=11 loops=1)
Total runtime: 0.539 ms
(6 rows)
--环境清理,避免数据泄露。
openGauss=# DROP TABLE t2;

说明:

窗口函数中的order by后面必须跟字段名,若order by后面跟数字,该数字会被按照常量处理,因此对目标列没有起到排序的作用。

  • RANK()

    描述:RANK函数为各组内值生成跳跃排序序号,其中,相同的值具有相同序号。

    返回值类型:BIGINT

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, rank() OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM tpcds.date_dim WHERE d_moy < 4 AND d_fy_week_seq < 7 ORDER BY 1,2;
       d_moy | d_fy_week_seq | rank 
    -------+---------------+------
         1 |             1 |    1
         1 |             1 |    1
         1 |             1 |    1
         1 |             1 |    1
         1 |             1 |    1
         1 |             1 |    1
         1 |             1 |    1
         1 |             2 |    8
         1 |             2 |    8
         1 |             2 |    8
         1 |             2 |    8
         1 |             2 |    8
         1 |             2 |    8
         1 |             2 |    8
         1 |             3 |   15
         1 |             3 |   15
         1 |             3 |   15
         1 |             3 |   15
         1 |             3 |   15
         1 |             3 |   15
         1 |             3 |   15
         1 |             4 |   22
         1 |             4 |   22
         1 |             4 |   22
         1 |             4 |   22
         1 |             4 |   22
         1 |             4 |   22
         1 |             4 |   22
         1 |             5 |   29
         1 |             5 |   29
         2 |             5 |    1
         2 |             5 |    1
         2 |             5 |    1
         2 |             5 |    1
         2 |             5 |    1
         2 |             6 |    6
         2 |             6 |    6
         2 |             6 |    6
         2 |             6 |    6
         2 |             6 |    6
         2 |             6 |    6
         2 |             6 |    6
    (42 rows)
    
  • ROW_NUMBER()

    描述:ROW_NUMBER函数为各组内值生成连续排序序号,其中,相同的值其序号也不相同。

    返回值类型:BIGINT

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, Row_number() OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM tpcds.date_dim  WHERE d_moy < 4 AND d_fy_week_seq < 7 ORDER BY 1,2;
     d_moy | d_fy_week_seq | row_number 
    -------+---------------+------------
         1 |             1 |          1
         1 |             1 |          2
         1 |             1 |          3
         1 |             1 |          4
         1 |             1 |          5
         1 |             1 |          6
         1 |             1 |          7
         1 |             2 |          8
         1 |             2 |          9
         1 |             2 |         10
         1 |             2 |         11
         1 |             2 |         12
         1 |             2 |         13
         1 |             2 |         14
         1 |             3 |         15
         1 |             3 |         16
         1 |             3 |         17
         1 |             3 |         18
         1 |             3 |         19
         1 |             3 |         20
         1 |             3 |         21
         1 |             4 |         22
         1 |             4 |         23
         1 |             4 |         24
         1 |             4 |         25
         1 |             4 |         26
         1 |             4 |         27
         1 |             4 |         28
         1 |             5 |         29
         1 |             5 |         30
         2 |             5 |          1
         2 |             5 |          2
         2 |             5 |          3
         2 |             5 |          4
         2 |             5 |          5
         2 |             6 |          6
         2 |             6 |          7
         2 |             6 |          8
         2 |             6 |          9
         2 |             6 |         10
         2 |             6 |         11
         2 |             6 |         12
    (42 rows)
    
  • DENSE_RANK()

    描述:DENSE_RANK函数为各组内值生成连续排序序号,其中,相同的值具有相同序号。

    返回值类型:BIGINT

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, dense_rank() OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM tpcds.date_dim WHERE d_moy < 4 AND d_fy_week_seq < 7 ORDER BY 1,2;
     d_moy | d_fy_week_seq | dense_rank 
    -------+---------------+------------
         1 |             1 |          1
         1 |             1 |          1
         1 |             1 |          1
         1 |             1 |          1
         1 |             1 |          1
         1 |             1 |          1
         1 |             1 |          1
         1 |             2 |          2
         1 |             2 |          2
         1 |             2 |          2
         1 |             2 |          2
         1 |             2 |          2
         1 |             2 |          2
         1 |             2 |          2
         1 |             3 |          3
         1 |             3 |          3
         1 |             3 |          3
         1 |             3 |          3
         1 |             3 |          3
         1 |             3 |          3
         1 |             3 |          3
         1 |             4 |          4
         1 |             4 |          4
         1 |             4 |          4
         1 |             4 |          4
         1 |             4 |          4
         1 |             4 |          4
         1 |             4 |          4
         1 |             5 |          5
         1 |             5 |          5
         2 |             5 |          1
         2 |             5 |          1
         2 |             5 |          1
         2 |             5 |          1
         2 |             5 |          1
         2 |             6 |          2
         2 |             6 |          2
         2 |             6 |          2
         2 |             6 |          2
         2 |             6 |          2
         2 |             6 |          2
         2 |             6 |          2
    (42 rows)
    
  • PERCENT_RANK()

    描述:PERCENT_RANK函数为各组内对应值生成相对序号,即根据公式 (rank - 1) / (total rows - 1)计算所得的值。其中rank为该值依据RANK函数所生成的对应序号,totalrows为该分组内的总元素个数。

    返回值类型:DOUBLE PRECISION

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, percent_rank() OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM tpcds.date_dim WHERE d_moy < 4 AND d_fy_week_seq < 7 ORDER BY 1,2;
     d_moy | d_fy_week_seq |   percent_rank   
    -------+---------------+------------------
         1 |             1 |                0
         1 |             1 |                0
         1 |             1 |                0
         1 |             1 |                0
         1 |             1 |                0
         1 |             1 |                0
         1 |             1 |                0
         1 |             2 | .241379310344828
         1 |             2 | .241379310344828
         1 |             2 | .241379310344828
         1 |             2 | .241379310344828
         1 |             2 | .241379310344828
         1 |             2 | .241379310344828
         1 |             2 | .241379310344828
         1 |             3 | .482758620689655
         1 |             3 | .482758620689655
         1 |             3 | .482758620689655
         1 |             3 | .482758620689655
         1 |             3 | .482758620689655
         1 |             3 | .482758620689655
         1 |             3 | .482758620689655
         1 |             4 | .724137931034483
         1 |             4 | .724137931034483
         1 |             4 | .724137931034483
         1 |             4 | .724137931034483
         1 |             4 | .724137931034483
         1 |             4 | .724137931034483
         1 |             4 | .724137931034483
         1 |             5 |  .96551724137931
         1 |             5 |  .96551724137931
         2 |             5 |                0
         2 |             5 |                0
         2 |             5 |                0
         2 |             5 |                0
         2 |             5 |                0
         2 |             6 | .454545454545455
         2 |             6 | .454545454545455
         2 |             6 | .454545454545455
         2 |             6 | .454545454545455
         2 |             6 | .454545454545455
         2 |             6 | .454545454545455
         2 |             6 | .454545454545455
    (42 rows)
    
  • CUME_DIST()

    描述:CUME_DIST函数为各组内对应值生成累积分布序号。即根据公式(小于等于当前值的数据行数)/(该分组总行数totalrows)计算所得的相对序号。

    返回值类型:DOUBLE PRECISION

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, cume_dist() OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM tpcds.date_dim e_dim WHERE d_moy < 4 AND d_fy_week_seq < 7 ORDER BY 1,2;
     d_moy | d_fy_week_seq |    cume_dist     
    -------+---------------+------------------
         1 |             1 | .233333333333333
         1 |             1 | .233333333333333
         1 |             1 | .233333333333333
         1 |             1 | .233333333333333
         1 |             1 | .233333333333333
         1 |             1 | .233333333333333
         1 |             1 | .233333333333333
         1 |             2 | .466666666666667
         1 |             2 | .466666666666667
         1 |             2 | .466666666666667
         1 |             2 | .466666666666667
         1 |             2 | .466666666666667
         1 |             2 | .466666666666667
         1 |             2 | .466666666666667
         1 |             3 |               .7
         1 |             3 |               .7
         1 |             3 |               .7
         1 |             3 |               .7
         1 |             3 |               .7
         1 |             3 |               .7
         1 |             3 |               .7
         1 |             4 | .933333333333333
         1 |             4 | .933333333333333
         1 |             4 | .933333333333333
         1 |             4 | .933333333333333
         1 |             4 | .933333333333333
         1 |             4 | .933333333333333
         1 |             4 | .933333333333333
         1 |             5 |                1
         1 |             5 |                1
         2 |             5 | .416666666666667
         2 |             5 | .416666666666667
         2 |             5 | .416666666666667
         2 |             5 | .416666666666667
         2 |             5 | .416666666666667
         2 |             6 |                1
         2 |             6 |                1
         2 |             6 |                1
         2 |             6 |                1
         2 |             6 |                1
         2 |             6 |                1
         2 |             6 |                1
    (42 rows)
    
  • NTILE(num_buckets integer)

    描述:NTILE函数根据num_buckets integer将有序的数据集合平均分配到num_buckets所指定数量的桶中,并将桶号分配给每一行。分配时应尽量做到平均分配。

    返回值类型:INTEGER

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, ntile(3) OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM tpcds.date_dim WHERE d_moy < 4 AND d_fy_week_seq < 7 ORDER BY 1,2;
     d_moy | d_fy_week_seq | ntile 
    -------+---------------+-------
         1 |             1 |     1
         1 |             1 |     1
         1 |             1 |     1
         1 |             1 |     1
         1 |             1 |     1
         1 |             1 |     1
         1 |             1 |     1
         1 |             2 |     1
         1 |             2 |     1
         1 |             2 |     1
         1 |             2 |     2
         1 |             2 |     2
         1 |             2 |     2
         1 |             2 |     2
         1 |             3 |     2
         1 |             3 |     2
         1 |             3 |     2
         1 |             3 |     2
         1 |             3 |     2
         1 |             3 |     2
         1 |             3 |     3
         1 |             4 |     3
         1 |             4 |     3
         1 |             4 |     3
         1 |             4 |     3
         1 |             4 |     3
         1 |             4 |     3
         1 |             4 |     3
         1 |             5 |     3
         1 |             5 |     3
         2 |             5 |     1
         2 |             5 |     1
         2 |             5 |     1
         2 |             5 |     1
         2 |             5 |     2
         2 |             6 |     2
         2 |             6 |     2
         2 |             6 |     2
         2 |             6 |     3
         2 |             6 |     3
         2 |             6 |     3
         2 |             6 |     3
    (42 rows)
    
  • LAG(value any [, offset integer [, default any ]])

    描述:LAG函数为各组内对应值生成滞后值。即当前值对应的行数往前偏移offset位后所得行的value值作为序号。若经过偏移后行数不存在,则对应结果取为default值。若无指定,在默认情况下,offset取为1,default值取为NULL。

    返回值类型:与参数数据类型相同。

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, lag(d_moy,3,null) OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM tpcds.date_dim WHERE d_moy < 4 AND d_fy_week_seq < 7 ORDER BY 1,2;
     d_moy | d_fy_week_seq | lag 
    -------+---------------+-----
         1 |             1 |    
         1 |             1 |    
         1 |             1 |    
         1 |             1 |   1
         1 |             1 |   1
         1 |             1 |   1
         1 |             1 |   1
         1 |             2 |   1
         1 |             2 |   1
         1 |             2 |   1
         1 |             2 |   1
         1 |             2 |   1
         1 |             2 |   1
         1 |             2 |   1
         1 |             3 |   1
         1 |             3 |   1
         1 |             3 |   1
         1 |             3 |   1
         1 |             3 |   1
         1 |             3 |   1
         1 |             3 |   1
         1 |             4 |   1
         1 |             4 |   1
         1 |             4 |   1
         1 |             4 |   1
         1 |             4 |   1
         1 |             4 |   1
         1 |             4 |   1
         1 |             5 |   1
         1 |             5 |   1
         2 |             5 |    
         2 |             5 |    
         2 |             5 |    
         2 |             5 |   2
         2 |             5 |   2
         2 |             6 |   2
         2 |             6 |   2
         2 |             6 |   2
         2 |             6 |   2
         2 |             6 |   2
         2 |             6 |   2
         2 |             6 |   2
    (42 rows)
    
  • LEAD(value any [, offset integer [, default any ]])

    描述:LEAD函数为各组内对应值生成提前值。即当前值对应的行数向后偏移offset位后所得行的value值作为序号。若经过向后偏移后行数超过当前组内的总行数,则对应结果取为default值。若无指定,在默认情况下,offset取为1,default值取为NULL。

    返回值类型:与参数数据类型相同。

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, lead(d_fy_week_seq,2) OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM  tpcds.date_dim WHERE d_moy < 4 AND d_fy_week_seq < 7 ORDER BY 1,2;
    d_moy | d_fy_week_seq | lead
    -------+---------------+------
         1 |             1 |    1
         1 |             1 |    1
         1 |             1 |    1
         1 |             1 |    1
         1 |             1 |    1
         1 |             1 |    2
         1 |             1 |    2
         1 |             2 |    2
         1 |             2 |    2
         1 |             2 |    2
         1 |             2 |    2
         1 |             2 |    2
         1 |             2 |    3
         1 |             2 |    3
         1 |             3 |    3
         1 |             3 |    3
         1 |             3 |    3
         1 |             3 |    3
         1 |             3 |    3
         1 |             3 |    4
         1 |             3 |    4
         1 |             4 |    4
         1 |             4 |    4
         1 |             4 |    4
         1 |             4 |    4
         1 |             4 |    4
         1 |             4 |    5
         1 |             4 |    5
         1 |             5 |     
         1 |             5 |     
         2 |             5 |    5
         2 |             5 |    5
         2 |             5 |    5
         2 |             5 |    6
         2 |             5 |    6
         2 |             6 |    6
         2 |             6 |    6
         2 |             6 |    6
         2 |             6 |    6
         2 |             6 |    6
         2 |             6 |     
         2 |             6 |     
    (42 rows)
    
  • FIRST_VALUE(value any)

    描述:FIRST_VALUE函数取各组内的第一个值作为返回结果。

    返回值类型:与参数数据类型相同。

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, first_value(d_fy_week_seq) OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM tpcds.date_dim WHERE d_moy < 4 AND d_fy_week_seq < 7 ORDER BY 1,2; 
     d_moy | d_fy_week_seq | first_value 
    -------+---------------+-------------
         1 |             1 |           1
         1 |             1 |           1
         1 |             1 |           1
         1 |             1 |           1
         1 |             1 |           1
         1 |             1 |           1
         1 |             1 |           1
         1 |             2 |           1
         1 |             2 |           1
         1 |             2 |           1
         1 |             2 |           1
         1 |             2 |           1
         1 |             2 |           1
         1 |             2 |           1
         1 |             3 |           1
         1 |             3 |           1
         1 |             3 |           1
         1 |             3 |           1
         1 |             3 |           1
         1 |             3 |           1
         1 |             3 |           1
         1 |             4 |           1
         1 |             4 |           1
         1 |             4 |           1
         1 |             4 |           1
         1 |             4 |           1
         1 |             4 |           1
         1 |             4 |           1
         1 |             5 |           1
         1 |             5 |           1
         2 |             5 |           5
         2 |             5 |           5
         2 |             5 |           5
         2 |             5 |           5
         2 |             5 |           5
         2 |             6 |           5
         2 |             6 |           5
         2 |             6 |           5
         2 |             6 |           5
         2 |             6 |           5
         2 |             6 |           5
         2 |             6 |           5
    (42 rows)
    
  • LAST_VALUE(value any)

    描述:LAST_VALUE函数取各组内的最后一个值作为返回结果。

    返回值类型:与参数数据类型相同。

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, last_value(d_moy) OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM tpcds.date_dim WHERE d_moy < 4 AND d_fy_week_seq < 6 ORDER BY 1,2;
      d_moy | d_fy_week_seq | last_value 
    -------+---------------+------------
         1 |             1 |          1
         1 |             1 |          1
         1 |             1 |          1
         1 |             1 |          1
         1 |             1 |          1
         1 |             1 |          1
         1 |             1 |          1
         1 |             2 |          1
         1 |             2 |          1
         1 |             2 |          1
         1 |             2 |          1
         1 |             2 |          1
         1 |             2 |          1
         1 |             2 |          1
         1 |             2 |          1
         1 |             3 |          1
         1 |             3 |          1
         1 |             3 |          1
         1 |             3 |          1
         1 |             3 |          1
         1 |             3 |          1
         1 |             3 |          1
         1 |             4 |          1
         1 |             4 |          1
         1 |             4 |          1
         1 |             4 |          1
         1 |             4 |          1
         1 |             4 |          1
         1 |             4 |          1
         1 |             5 |          1
         1 |             5 |          1
         2 |             5 |          2
         2 |             5 |          2
         2 |             5 |          2
         2 |             5 |          2
         2 |             5 |          2
    (35 rows)
    
  • NTH_VALUE(value any, nth integer)

    描述:NTH_VALUE函数返回该组内的第nth行作为结果。若该行不存在,则默认返回NULL。

    返回值类型:与参数数据类型相同。

    示例:

    openGauss=# SELECT d_moy, d_fy_week_seq, nth_value(d_fy_week_seq,6) OVER(PARTITION BY d_moy ORDER BY d_fy_week_seq) FROM tpcds.date_dim WHERE d_moy < 4 AND d_fy_week_seq < 6 ORDER BY 1,2;
     d_moy | d_fy_week_seq | nth_value 
    -------+---------------+-----------
         1 |             1 |         1
         1 |             1 |         1
         1 |             1 |         1
         1 |             1 |         1
         1 |             1 |         1
         1 |             1 |         1
         1 |             1 |         1
         1 |             2 |         1
         1 |             2 |         1
         1 |             2 |         1
         1 |             2 |         1
         1 |             2 |         1
         1 |             2 |         1
         1 |             2 |         1
         1 |             3 |         1
         1 |             3 |         1
         1 |             3 |         1
         1 |             3 |         1
         1 |             3 |         1
         1 |             3 |         1
         1 |             3 |         1
         1 |             4 |         1
         1 |             4 |         1
         1 |             4 |         1
         1 |             4 |         1
         1 |             4 |         1
         1 |             4 |         1
         1 |             4 |         1
         1 |             5 |         1
         1 |             5 |         1
         2 |             5 |          
         2 |             5 |          
         2 |             5 |          
         2 |             5 |          
         2 |             5 |          
    (35 rows)
    
  • row_count()

    描述:上一条语句影响的行数。

    • 若上次执行的是DDL语句,比如create table或者drop语句,会返回0。
    • 对于insert/update/delete语句,以及其他会涉及元组修改的语句,会返回具体影响的行数。
    • 对于select语句,若为select into, 会返回导出的元组数量,反之若语句不涉及修改则返回-1。

    返回值类型:BIGINT

    示例:

    td_db=# insert into test_any_value values(1,1),(2,1),(3,2),(4,2);
    INSERT 0 4
    td_db=# select row_count();
    row_count
    -----------
            4
    (1 row)
    
意见反馈
编组 3备份
    openGauss 2024-12-23 00:51:56
    取消