Data Anonymization
Data Anonymization
Data anonymization refers to the process of anonymizing personal identifiable information (PII) input by users, ensuring that sensitive user information is not accessible to LLM services and protecting user privacy.
Process
graph LR Input --> Anonymization --> LLM --> Deanonymization --> Output
Config
Currently, only the Microsoft Presidio anonymization service is available.
Group
Different entities can be grouped into separate categories, making it easier to select and use them within agents.
Entity
An entity refers to the object of anonymization. GPTBots has built-in support for a set of commonly used entities, but also allows users to define custom entities to meet various anonymization needs.
New Entity
- Name: The name of the entity, which can only contain uppercase letters and underscores.
- Language: The language(s) supported by the entity. A single entity can support multiple languages.
- Description: A brief introduction or explanation of the entity.
- Regex Pattern: A regular expression used to match the entity.
- Score (Confidence): The confidence level of the match, ranging from 0.0 to 1.0.
- Sensitive Words: A list of exact words or phrases that will be identified as this entity if present in the text.
- Context: A list of contextual keywords that help increase the matching score. If these words appear near a potential match in the text, Presidio will assign a higher confidence score to the match.