Dynamic Data Anonymization

Availability

This feature is available as of V500R001C00.

Introduction

Data anonymization is an effective database privacy protection solution, which can prevent attackers from snooping on private data. The dynamic data anonymization mechanism is a technology that protects privacy data by customizing anonymization policies. It can effectively prevent unauthorized users from accessing sensitive information while retaining original data. After the administrator specifies the object to be anonymized and customizes a data anonymization policy, if the database resources queried by a user are associated with an anonymization policy, data is anonymized based on the user identity and anonymization policy to restrict attackers' access to privacy data.

Benefits

Data privacy protection is one of the required database security capabilities. It can restrict attackers' access to privacy data, ensuring privacy data security. The dynamic data anonymization mechanism can protect the privacy of specified database resources by configuring anonymization policies. In addition, the anonymization policy configuration is flexible and can implement targeted privacy protection in specific user scenarios.

Description

The dynamic data anonymization mechanism customizes anonymization policies based on resource labels. It can select anonymization modes based on the site requirements or customize anonymization policies for specific users. The SQL syntax for creating a complete anonymization policy is as follows:

CREATE RESOURCE LABEL label_for_creditcard ADD COLUMN(user1.table1.creditcard);

CREATE RESOURCE LABEL label_for_name ADD COLUMN(user1.table1.name);

CREATE MASKING POLICY msk_creditcard creditcardmasking ON LABEL(label_for_creditcard);

CREATE MASKING POLICY msk_name randommasking ON LABEL(label_for_name) FILTER ON IP(local), ROLES(dev);

label_for_creditcard and msk_name are the resource labels for anonymization, and each label is allocated to two column objects. creditcardmasking and randommasking are preset masking functions. msk_creditcard specifies that the anonymization policy creditcardmasking will be applied when any user accesses resources with label_for_creditcard, regardless of the access source. msk_name specifies that the anonymization policy randommasking will be applied when local user dev accesses resources with label_for_name. If FILTER is not specified, the setting takes effect for all users. Otherwise, the setting takes effect only for specified users.

The following table shows the preset masking functions:

Masking Function	Example
creditcardmasking	'4880-9898-4545-2525' will be anonymized as 'xxxx-xxxx-xxxx-2525'. This function anonymizes digits except the last four digits.
basicemailmasking	'abcd@gmail.com' will be anonymized as 'xxxx@gmail.com'. This function anonymizes text before the first @.
fullemailmasking	'abcd@gmail.com' will be anonymized as 'xxxx@xxxxx.com'. This function anonymizes text before the first dot (.) (except @).
alldigitsmasking	'alex123alex' will be anonymized as 'alex000alex'. This function anonymizes only digits in the text.
shufflemasking	"hello word" will be anonymized as "hlwoeor dl". This weak anonymization function is implemented through character dislocation. You are not advised to use this function to anonymize strings with strong semantics.
randommasking	'hello word' will be anonymized as 'ad5f5ghdf5'. This function randomly anonymizes text by character.
maskall	'4880-9898-4545-2525' will be anonymized as 'xxxxxxxxxxxxxxxxxxx'.

The data types supported by each anonymization function are as follows:

Masking Function	Supported Data Types
creditcardmasking	BPCHAR, VARCHAR, NVARCHAR, TEXT (character data in credit card format only)
basicemailmasking	BPCHAR, VARCHAR, NVARCHAR, TEXT (character data in email format only)
fullemailmasking	BPCHAR, VARCHAR, NVARCHAR, TEXT (character data in email format only)
alldigitsmasking	BPCHAR, VARCHAR, NVARCHAR, TEXT (character data containing digits only)
shufflemasking	BPCHAR, VARCHAR, NVARCHAR, TEXT
randommasking	BPCHAR, VARCHAR, NVARCHAR, TEXT
maskall	BOOL, RELTIME, TIME, TIMETZ, INTERVAL, TIMESTAMP, TIMESTAMPTZ, SMALLDATETIME, ABSTIME, TEXT, BPCHAR, VARCHAR, NVARCHAR2, NAME, INT8, INT4, INT2, INT1, NUMRIC, FLOAT4, FLOAT8, CASH

For unsupported data types, the maskall function is used to anonymize data by default. If the data type is also not supported by maskall, 0 is used to anonymize data. If implicit conversion is involved, the data type after implicit conversion is used for anonymization. In addition, if the anonymization policy is applied to a data column and takes effect, operations on the data in the column are performed based on the anonymization result.

Dynamic data anonymization applies to scenarios closely related to actual services. It provides users with proper masking query interfaces based on service requirements to prevent raw data from being obtained through credential stuffing.

Enhancements

None

Constraints

The dynamic data anonymization policy must be created by a user with the POLADMIN or SYSADMIN attribute, or by the initial user. Common users do not have the permission to access the security policy system catalog and system view.
Dynamic data anonymization takes effect only on data tables for which anonymization policies are configured. Audit logs are not within the effective scope of the anonymization policies.
In an anonymization policy, only one anonymization mode can be specified for a resource label.
Multiple anonymization policies cannot be used to anonymize the same resource label, except when FILTER is used to specify user scenarios where the policies take effect and there is no intersection between user scenarios of different anonymization policies that contain the same resource label. In this case, you can identify the policy that a resource label is anonymized by based on the user scenario.
It is recommended that APP in FILTER be set to applications in the same trusted domain. Since a client may be forged, a security mechanism must be formed on the client when APP is used to reduce misuse risks. Generally, you are not advised to set APP. If it is set, pay attention to the risk of client spoofing.
For INSERT or MERGE INTO operations with the query clause, if the source table contains anonymized columns, the inserted or updated result in the preceding two operations is the anonymized value and cannot be restored.
When the built-in security policy is enabled, the ALTER TABLE EXCHANGE PARTITION statement fails to be executed if the source table is in the anonymization column.
If a dynamic data anonymization policy is configured for a table, grant the trigger permission of the table to other users with caution to prevent other users from using the trigger to bypass the anonymization policy.
A maximum of 98 dynamic data anonymization policies can be created.
Only the preceding seven preset anonymization policies can be used.
Only data with the resource labels containing the COLUMN attribute can be anonymized.
Only columns in base tables can be anonymized.
Only the data queried using SELECT can be anonymized.

Dependencies

None