gs_cgroup

Background

When jobs are batch processed in a cluster, loads on servers significantly vary due to the complexity of batch processing. To fully use cluster resources, you need to manage loads. gs_cgroup is a load management tool provided by openGauss. It can create default Cgroups and user-defined Cgroups, delete default and user-defined Cgroups, update resource quotas and allocations, display the configuration files of Cgroups and the Cgroup tree, and delete all Cgroups.

gs_cgroup creates Cgroups configuration files for the OS user of a database and generates Cgroups that the OS user sets in the OS. gs_cgroup also allows users to add or delete Cgroups, update Cgroup resource quotas, allocate CPU cores or I/O resources, set exception thresholds, and handle the exceptions. gs_cgroup is responsible only for Cgroups operations performed on the node where the current OS resides, and needs to be centrally configured across nodes by invoking the same statement.

For details, see “Resource Load Management” in Developer Guide.

Examples

  • Commands executed by a common user or the database administrator:
    1. Prerequisites: The GAUSSHOME environment variable is used as the database installation directory and user root has created default Cgroups for common users.

    2. Create Cgroups and set corresponding resource quota so that jobs of the database can be specified to a Cgroup and use its resources. The database administrator creates Class Cgroups for each database user.

      1. Create class and workload Cgroups.

        gs_cgroup -c -S class1 -s 40  
        

        Create the class1 Cgroup and allocate 40% of Class resources to it.

        gs_cgroup -c -S class1 -G grp1 -g 20 
        

        Create the grp1 Workload Cgroup under the class1 Cgroup and allocate 20% of class1 Cgroup resources to the Workload Cgroup.

      2. Delete the created grp1 Cgroup and class1 Cgroup.

        gs_cgroup -d -S class1 -G grp1
        

        Delete the created grp1 Cgroup.

        gs_cgroup -d -S class1
        

        Delete the created class1 Cgroup.

        NOTICE: If a Class Cgroup is deleted, its Workload Cgroups will be deleted as well.

    3. Update the resource quota for created Cgroups.

      1. Update dynamic resource quota.

        gs_cgroup -u -S class1 -G grp1 -g 30
        

        Update the resources allocated to the grp1 Workload Cgroup under the class1 Cgroup for the current user to 30% of class1 resources.

      2. Update the resource limitation quota.

        gs_cgroup --fixed -u -S class1 -G grp1 -g 30
        

        Set the number of CPU cores allocated to the grp1 Cgroup to 30% of cores allocated to its parent Cgroup class1.

    4. Update the range of the CPU cores in the Gaussdb Cgroup.

      gs_cgroup -u -T Gaussdb -f 0-20
      

      Update the number of CPU cores used by the GaussDB process to 0–20.

      NOTE: The -f parameter can only be used to set the range of the CPU cores in the Gaussdb Cgroup. For other Cgroups, if you need to set the number of cores, use the --fixed parameter.

    5. Set exception handling information. (class:wg group must exist.)

      1. Terminate a job under the class:wg Cgroup when job congestion lasts for 1200s or job execution lasts for 2400s.

        gs_cgroup -S class -G wg -E "blocktime=1200,elapsedtime=2400" -a
        
      2. Specify the termination action performed when the size of spilled job data in the class:wg group reaches 256 MB or the size of broadcast job data in the group reaches 100 MB.

        gs_cgroup -S class -G wg -E "spillsize=256,broadcastsize=100" -a
        
      3. Demote a job under the Class Cgroup when the total CPU time taken to execute the job on all nodes reaches 100s.

        gs_cgroup -S class -E "allcputime=100" --penalty
        
      4. Demote a job under the Class Cgroup when the total time taken to execute the job on all nodes reaches 2400s and the skew of the CPU time reaches 90 percent.

        gs_cgroup -S class -E "qualificationtime=2400,cpuskewpercent=90"
        

        NOTICE:

        To set exception handling information for a Cgroup, ensure that the Cgroup has been created. Multiple specified thresholds are separated by commas (,). If no operation is specified, --penalty is used by default.

    6. Set the number of cores per CPU have for a Cgroup.

      Set the range of cores for the class:wg Cgroup to 20% of Class cores.

      gs_cgroup -S class -G wg -g 20 --fixed -u
      

      NOTICE: The range of cores for the Class or Workload Cgroup must be specified by the --fixed parameter.

    7. Roll back the previous step.

      gs_cgroup --recover
      

      NOTE: The --recover parameter can only roll back the latest addition, deletion, or modification made to the Class and Workload Cgroups.

    8. View information about Cgroups that have been created.

      1. View Cgroup information in configuration files.

        gs_cgroup -p 
        

        Cgroup configuration

        gs_cgroup -p
        
        Top Group information is listed:
        GID:   0 Type: Top    Percent(%): 1000( 50) Name: Root                  Cores: 0-47
        GID:   1 Type: Top    Percent(%):  833( 83) Name: Gaussdb:omm           Cores: 0-20
        GID:   2 Type: Top    Percent(%):  333( 40) Name: Backend               Cores: 0-20
        GID:   3 Type: Top    Percent(%):  499( 60) Name: Class                 Cores: 0-20
        
        Backend Group information is listed:
        GID:   4 Type: BAKWD  Name: DefaultBackend   TopGID:   2 Percent(%): 266(80) Cores: 0-20
        GID:   5 Type: BAKWD  Name: Vacuum           TopGID:   2 Percent(%):  66(20) Cores: 0-20
        
        Class Group information is listed:
        GID:  20 Type: CLASS  Name: DefaultClass     TopGID:   3 Percent(%): 166(20) MaxLevel: 1 RemPCT: 100 Cores: 0-20
        GID:  21 Type: CLASS  Name: class1           TopGID:   3 Percent(%): 332(40) MaxLevel: 2 RemPCT:  70 Cores: 0-20
        
        Workload Group information is listed:
        GID:  86 Type: DEFWD  Name: grp1:2           ClsGID:  21 Percent(%):  99(30) WDLevel:  2 Quota(%): 30 Cores: 0-5
        
        Timeshare Group information is listed:
        GID: 724 Type: TSWD   Name: Low              Rate: 1
        GID: 725 Type: TSWD   Name: Medium           Rate: 2
        GID: 726 Type: TSWD   Name: High             Rate: 4
        GID: 727 Type: TSWD   Name: Rush             Rate: 8
        
        Group Exception information is listed:
        GID:  20 Type: EXCEPTION Class: DefaultClass
        PENALTY: QualificationTime=1800 CPUSkewPercent=30
        
        GID:  21 Type: EXCEPTION Class: class1
        PENALTY: AllCpuTime=100 QualificationTime=2400 CPUSkewPercent=90
        
        GID:  86 Type: EXCEPTION Group: class1:grp1:2
        ABORT: BlockTime=1200 ElapsedTime=2400
        

        Table 1 lists the Cgroup configuration shown in the above example.

        Table 1 Cgroup configuration

        GID

        Type

        Name

        Percentage (%)

        Remarks

        0

        Top Cgroup

        Root

        1000 indicates that the total system resources are divided into 1000 pieces.

        50 in the parentheses indicates 50% of I/O resources.

        openGauss does not control I/O resources through Cgroups. Therefore, the following Cgroup information is only about CPU.

        -

        1

        Gaussdb:omm

        Only one database program runs in a cluster. The default quota of the Gaussdb:omm Cgroup is 833. That is, the ratio of database programs to non-database programs is 5:1 (833:167).

        -

        2

        Backend

        The number 40 in the parentheses indicates that the Backend Cgroup takes up 40% of the resources of the Gaussdb:dbuser Cgroup. The number 60 in the parentheses indicates that the Class Cgroup takes up 60% of the resources of the Gaussdb:dbuser Cgroup.

        -

        3

        Class

        -

        4

        Backend Cgroup

        DefaultBackend

        The numbers 80 and 20 in the parentheses indicate the percentages of Backend Cgroup resources taken by the DefaultBackend and Vacuum groups, respectively.

        TopGID: specifies the GID (2) of the Backend group in the Top Cgroup.

        5

        Vacuum

        20

        Class Cgroup

        DefaultClass

        The number 20 in the parentheses indicates that the DefaultClass Cgroup takes up 20% of the Class Cgroup resources. The number 40 in the parentheses indicates that the class1 Cgroup takes up 40% of the Class Cgroup resources. There are only two Class Cgroups currently. Therefore, the system resource quotas for the Class Cgroups (499) are allocated in the ratio of 20:40 (166:332).

        • TopGID: GID (3) of the Class Cgroups in a Top Cgroup to which the DefaultClass and class1 Cgroups belong.
        • MaxLevel: maximum number of levels for Workload Cgroups in a Class Cgroup. This parameter is set to 1 for DefaultClass because it has no Workload Cgroups.
        • RemPCT: percentage of remaining resources in a Class Cgroup after its resources are allocated to Workload Cgroups. For example, the percentage of remaining resources in the class1 Cgroup is 70%.

        21

        class1

        86

        Workload Cgroup

        grp1:2

        (This name is composed of the name of a Workload Cgroup and its level in the Class Cgroup. This grp1:2 Cgroup is the first Workload Cgroup under the class1 Cgroup, and its level is 2. Each Class Cgroup contains a maximum of 10 levels of Workload Cgroups.)

        In this example, this Workload Cgroup takes up 30% of class1 Cgroup resources (332 x 30% = 99).

        • ClsGID: GID of the class1 Cgroup to which this Workload Cgroup belongs.
        • WDLevel: Level of this Workload Cgroup in the corresponding Class Cgroup.

        724

        Timeshare Cgroup

        Low

        -

        Rate: rate of resources allocated to a Timeshare Cgroup. The Low Cgroup has the minimum rate 1 and the Rush Cgroup has the maximum rate 8. The resource rate for Rush:High:Medium:Low is 8:4:2:1 under a Timeshare Cgroup.

        725

        Medium

        -

        726

        High

        -

        727

        Rush

        -

      2. View the Cgroup tree in the OS.

        gs_cgroup -P displays a Cgroup tree. In the tree, shares indicates the value of cpu.shares, which specifies the dynamic quota of CPU resources in the OS, and cpus indicates the value of cpuset.cpus, which specifies the dynamic quota of CPUSET resources in the OS (number of cores that a Cgroup can use).

        gs_cgroup -P
        Mount Information:
        cpu:/dev/cgroup/cpu
        blkio:/dev/cgroup/blkio
        cpuset:/dev/cgroup/cpuset
        cpuacct:/dev/cgroup/cpuacct
        
        Group Tree Information:
        - Gaussdb:wangrui (shares: 5120, cpus: 0-20, weight: 1000)
                - Backend (shares: 4096, cpus: 0-20, weight: 400)
                        - Vacuum (shares: 2048, cpus: 0-20, weight: 200)
                        - DefaultBackend (shares: 8192, cpus: 0-20, weight: 800)
                - Class (shares: 6144, cpus: 0-20, weight: 600)
                        - class1 (shares: 4096, cpus: 0-20, weight: 400)
                                - RemainWD:1 (shares: 1000, cpus: 0-20, weight: 100)
                                        - RemainWD:2 (shares: 7000, cpus: 0-20, weight: 700)
                                                - Timeshare (shares: 1024, cpus: 0-20, weight: 500)
                                                        - Rush (shares: 8192, cpus: 0-20, weight: 800)
                                                        - High (shares: 4096, cpus: 0-20, weight: 400)
                                                        - Medium (shares: 2048, cpus: 0-20, weight: 200)
                                                        - Low (shares: 1024, cpus: 0-20, weight: 100)
                                        - grp1:2 (shares: 3000, cpus: 0-5, weight: 300)
                                - TopWD:1 (shares: 9000, cpus: 0-20, weight: 900)
                        - DefaultClass (shares: 2048, cpus: 0-20, weight: 200)
                                - RemainWD:1 (shares: 1000, cpus: 0-20, weight: 100)
                                        - Timeshare (shares: 1024, cpus: 0-20, weight: 500)
                                                - Rush (shares: 8192, cpus: 0-20, weight: 800)
                                                - High (shares: 4096, cpus: 0-20, weight: 400)
                                                - Medium (shares: 2048, cpus: 0-20, weight: 200)
                                                - Low (shares: 1024, cpus: 0-20, weight: 100)
                                - TopWD:1 (shares: 9000, cpus: 0-20, weight: 900)
        

Parameter Description

  • -a [--abort]

    Terminates a job when it exceeds an exception threshold.

  • -b pct

    Specifies the percentage of resources of the Top Backend Cgroup taken by a Backend Cgroup. The **-B **backendname parameter must be specified as well.

    Value Range

    • The value ranges from 1 to 99. If this parameter is not set, the default CPU quota accounts for 20% of the Vacuum Cgroup and 80% of the DefaultBackend Cgroup, respectively. The quota sum for the Vacuum and DefaultBackend Cgroups must be less than 100%.
  • -B name

    Specifies the name of a Backend Cgroup. Only the -u parameter can be used to change the resource quota for this Cgroup.

    The -b percent and **-B **backendname parameters need to be specified to set the resource proportion of database backend threads.

    Value range: a string with a maximum of 64 bytes.

  • -c

    Creates a Cgroup and specifies its name.

    A common user can specify -c and -S classname to create a Class Cgroup. If -G groupname is specified as well, a Workload Cgroup will be created under the Class Cgroup. The Workload Cgroup is at the bottom layer in the Class Cgroup (Layer-4 is the bottom layer.)

  • -d

    Deletes Cgroups.

    A common user can specify -d and -S classname parameters to delete the created Class Cgroups. If the -G groupname parameter is specified as well, a Workload Cgroup under the Class Cgroup is deleted, and related threads are put into the DefaultClass:DefaultWD:1 Cgroup. If the Workload Cgroups to be deleted locate at a high level (Level 1 is the top level), adjust hierarchy of lower-level Cgroups, create the new Cgroups-related threads, and load them to the new Cgroups.

  • -E data

    Specifies the exception thresholds, including blocktime, elapsedtime, allcputime, spillsize, broadcastsize, qualificationtime, and cpuskewpercent. The thresholds are separated by commas (,). 0 indicates that the setting is canceled. If the parameter is set to an invalid value, an error will be prompted.

    Table 2 Exception threshold types

    Exception Threshold Type

    Description

    Value Range (0 Indicates Setting Canceled)

    Operation upon Exception

    blocktime

    Job blocking duration. The unit is second. blocktime includes the total time spent in global and local concurrent queuing.

    0–UINT_MAX

    abort

    elapsedtime

    Execution time of a job that has not been finished. The unit is second. The time indicates the duration from the start point of execution to the current time point.

    0–UINT_MAX

    abort

    allcputime

    Total CPU time spent in executing a job on all nodes. The unit is second.

    0–UINT_MAX

    abort, penalty

    cpuskewpercent

    CPU time skew of a job executed on nodes. The value depends on the setting of qualificationtime.

    0–100

    abort, penalty

    qualificationtime

    Interval for checking the CPU skew. The unit is second. This parameter must be set together with cpuskewpercent.

    0–UINT_MAX

    none

    spillsize

    Amount of job data spilled to disks on nodes. The unit is MB.

    0–UINT_MAX

    abort

    broadcastsize

    Size of broadcast operators of a job on nodes. The unit is MB.

    0–UINT_MAX

    abort

  • -h [--help]

    Displays the command help information.

  • -H

    Collects $GAUSSHOME information among the current users.

    Value range: a string with a maximum of 1023 characters.

  • -f

    Specifies the range of core quantity used by the Gaussdb Cgroup. The range format can be a-b or a. For other Cgroups, use the --fixed parameter to set the range of core quantity.

  • --fixed

    Specifies the percentage of cores allocated for a Cgroup's parent group that the Cgroup can use, or specifies the I/O resources.

    --fixed is set together with -s, -g, -t, and -b when the kernel range ratio is set.

    The ratio is between 0 and 100. The sum of kernels of the same level is less than or equal to 100. The value 0 indicates that the kernel number of a level is same as that of the upper level. The CPU quota for all the Cgroups is set to 0 by default. -f and --fixed cannot be configured at the same time. After --fixed is set, the -f range will be automatically invalid. The ratio will be displayed in -p as the quota value.

    When the I/O resource quota is set, -R, -r, -W, and -w are used together.

  • -g pct

    Specifies the percentage of resources in a Class Cgroup taken by a Workload Cgroup. The -G groupname parameter needs to be specified as well. The -g pct parameter can be used with the -c parameter to create a Cgroup or with the -u parameter to update a Workload Cgroup.

    Value range: 1 to 99. By default, the CPU quota of a Workload Cgroup is 20%. The sum of CPU quotas for all Workload Cgroups must be less than 99%.

  • -G name

    Specifies the name of a Workload Cgroup. The **-S **classname parameter needs to be set to specify the Class Cgroup to which the Workload Cgroup belongs. The **-G **name parameter can be used with -c to create a Cgroup, with -d to delete a Cgroup, and with -u to update the resource quota for a Cgroup. Note that name in the **-G **name parameter cannot be a default Timeshare Cgroup name, including Low, Medium, High, and Rush.

    If a user creates a Workload Cgroup, the name must contain any colons (:). Names of Cgroups must be different.

    Value range: a string with a maximum of 28 bytes

  • -N [--group] name

    Shows the Cgroup name, class:wg for short.

  • -p

    Shows information about Cgroup configuration files.

  • -P

    Shows the structure of the Cgroup tree.

  • --penalty

    Demotes a job when the job exceeds an exception threshold. If no operation is specified, --penalty is used by default.

  • -r data

    Only updates the upper limit of data reading for I/O resources, that is, sets the value of blkio.throttle.read_bps_device. This parameter is a string consisting of major:minor value, in which major indicates the major device number of the disk to be accessed, minor indicates the minor device number, and value indicates the upper limit of the number of read operations. The upper limit ranges from 0 to ULONG_MAX, and 0 indicates that the number of read operations is not restricted. This parameter needs to be used with the -u parameter and Cgroup names. If both the Class Cgroup name and Workload Cgroup name are specified, this parameter is used for the Workload Cgroup.

    Value range: a string with a maximum of 32 characters.

  • -R data

    Only updates the upper limit of I/O resources used to read data per second, that is, sets the value of blkio.throttle.read_iops_device. The value of this parameter is the same as that of the -r parameter. This parameter needs to be used with the -u parameter and Cgroup names. If both the Class Cgroup name and Workload Cgroup name are specified, this parameter is used for the Workload Cgroup.

    Value range: a string with a maximum of 32 characters.

  • --recover

    Rolls back only the latest addition, deletion, or modification made to the Class and Workload Cgroups.

  • --revert

    Restores to the default status of the Cgroup.

  • -D mpoint

    Specifies a mount point. The default mount point is /dev/cgroup/subsystem.

  • -m

    Mounts the Cgroup.

  • -M

    Unmounts the Cgroup.

  • -U

    Specifies the database username.

  • --refresh

    Updates the status of the Cgroup.

  • -s pct

    Specifies the percentage of resources in the top Class Cgroup taken by a Class Cgroup. The **-S **classname parameter needs to be specified as well. The -s pct parameter can be used with the -c parameter to create a Cgroup or with the -u parameter to update a Class Cgroup.

    Value range: 1 to 99. By default, the CPU quota of the Class Cgroup is set to 20%. In R6C10, the CPU quota of the Class Cgroup is set to 40%. During the upgrade, the quota is not updated. The sum of the CPU quota of the newly created Class Cgroup and the default DefaultClass quota must be less than 100%.

  • -S name

    Specifies the name of a Class Cgroup. This parameter can be used with -c to create a Cgroup, with -d to delete a Cgroup, or with -u to update resource quota for a Cgroup. The name of a sub-Class Cgroup cannot contain the colon (:).

    Value range: a string with a maximum of 31 bytes.

  • -t percent

    Specifies the percentage of resources for top Cgroups (Root, Gaussdb: omm, Backend, and Class Cgroups). The -T name parameter needs to be specified as well. If this parameter is used to specify resource percentage for the -T Root Cgroup, the name shown in the Cgroup configuration file is Root. percent indicates the percentage of the value of blkio.weight, and its minimum value is 10%. The CPU resource quota, such as the value of cpu.shares cannot be changed. If this parameter is used to specify resource percentage for the Gaussdb:omm Cgroup, the parameter value indicates the percentage of CPU resources taken by the Gaussdb:omm Cgroup. (The cpu.shares value for the Gaussdb:omm Cgroup can be obtained based on the quota 1024 for the Root Cgroup and the condition that only one database is available for the current system.) The I/O resource quota is 1000 and will not change. If this parameter is used to specify resource percentage for the Class or Backend Cgroup, the parameter value indicates the percentage of resources in the Gaussdb Cgroup taken by the Class or Backend Cgroup.

    Value range: 1 to 99. By default, the quota of the Class Cgroup is 60%, and the quota of the Backend Cgroup is 40%. Modify the quota of the Class Cgroup and automatically update the quota of the Backend Cgroup so that the sum quota of the Backend and Class Cgroups is 100%.

  • -T name

    Specifies the names of top Cgroups.

    Value range: a string with a maximum of 64 bytes.

  • -u

    Updates Cgroups.

  • -V [--version]

    Displays version information about the gs_cgroup tool.

  • -w data

    Only updates the upper limit of I/O resources used to write data per second, that is, sets the value of blkio.throttle.write_bps_device. The value of this parameter is the same as that of the -r parameter. The -u parameter and the Cgroup name need to be specified as well. If both the Class Cgroup name and Workload Cgroup name are specified, this parameter is used for the Workload Cgroup.

    Value range: a string with a maximum of 32 characters.

  • -W data

    Only updates the upper limit of I/O resources used to write data per second, that is, sets the value of blkio.throttle.write_iops_device. The value of this parameter is the same as that of the -r parameter. The -u parameter and the Cgroup name need to be specified as well. If both the Class Cgroup name and Workload Cgroup name are specified, this parameter is used for the Workload Cgroup.

    Value range: a string with a maximum of 32 characters.

NOTE:

Use the following method to obtain the major:minor value for the disk. For example, obtain the number of the disk corresponding to the /mpp directory.

> df 
Filesystem      1K-blocks      Used  Available Use% Mounted on
/dev/sda1       524173248  41012784  456534008   9% /
devtmpfs         66059264       236   66059028   1% /dev
tmpfs            66059264        88   66059176   1% /dev/shm
/dev/sdb1      2920486864 135987592 2784499272   5% /data
/dev/sdc1      2920486864  24747868 2895738996   1% /data1
/dev/sdd1      2920486864  24736704 2895750160   1% /mpp
/dev/sde1      2920486864  24750068 2895736796   1% /mpp1
> ls -l /dev/sdd
brw-rw---- 1 root disk 8, 48 Feb 26 11:20 /dev/sdd

NOTICE:

Check the disk number of sdd rather than sdd1. Otherwise, an error will be reported. If the length of I/O quota limitation after the upgrade exceeds the allowed maximum length of the string, the update will not be saved in the configuration file. If the maximum length of the string is set to 96 and I/O resources of more than eight disks are updated, the string limitation may be exceeded. The update will not be saved in the configuration file though the update succeeds.

Feedback
编组 3备份
    openGauss 2024-05-07 00:46:52
    cancel