Data Anonymization

Data anonymization refers to the process of anonymizing personal identifiable information (PII) input by users, ensuring that sensitive user information is not accessible to LLM services and protecting user privacy.

Process

graph LR
    Input --> Anonymization --> LLM --> Deanonymization --> Output

Config

Currently, only the Microsoft Presidio anonymization service is available.

Group

Different entities can be grouped into separate categories, making it easier to select and use them within agents.

Entity

An entity refers to the object of anonymization. GPTBots has built-in support for a set of commonly used entities, but also allows users to define custom entities to meet various anonymization needs.

New Entity

Name: The name of the entity, which can only contain uppercase letters and underscores.
Language: The language(s) supported by the entity. A single entity can support multiple languages.
Description: A brief introduction or explanation of the entity.
Regex Pattern: A regular expression used to match the entity.
Score (Confidence): The confidence level of the match, ranging from 0.0 to 1.0.
Sensitive Words: A list of exact words or phrases that will be identified as this entity if present in the text.
Context: A list of contextual keywords that help increase the matching score. If these words appear near a potential match in the text, Presidio will assign a higher confidence score to the match.