FAQ

Frequently Asked Questions

27 questions about eu gdpr compliance hub โ€” answered with data.

Multi-Language Support (48 Languages)

Why does my PII detection tool miss names and IDs in German, French, and Polish documents?

Three-tier language support: spaCy language-native models for 25 high-resource languages (provides semantic understanding of names, places, organizations in native language), Stanza for 7 additional languages, XLM-RoBERTa cross-lingual transformers for 16 lower-resource languages. This mirrors the academic best practice identified in 2024 hybrid PII detection research. Example: A compliance officer at a European BPO processing customer service data from Germany, France, Poland, and the Netherlands. Each country's customer records contain different national identifier formats. A single English-centric tool misses all non-English PII. anonym.legal's 48-language support with region-specific entity types (Steuer-ID, NIR, PESEL, BSN) provides complete coverage in a single platform.

How do I anonymize customer data across DACH and Benelux regions with GDPR-compliant accuracy?

48-language detection stack with three complementary models. spaCy covers 25 EU languages natively. XLM-RoBERTa handles cross-lingual transfer for 16 additional languages. 260+ entity types include DACH-specific identifiers (Steuer-ID, AHV-Nr, Sozialversicherungsnummer), French NIR/SIRET, Nordic personnummers, and UK NHS/NI numbers. Example: A multinational HR software company processes employee onboarding documents across 18 EU countries. Their existing English-language PII tool misses 40% of non-English PII, creating GDPR Article 5 (data minimization) compliance gaps. anonym.legal's 48-language support closes this gap with pre-built regional identifiers, eliminating the need for country-specific custom configurations.

How do I detect PII in Arabic and Hebrew text with RTL formatting?

Full RTL support for Arabic, Hebrew, Persian, and Urdu. XLM-RoBERTa (cross-lingual transformer) provides language-agnostic entity recognition that works across script types. Stanza NER handles Hebrew (HE) specifically. Example: An Israeli legal tech firm processes employment contracts in Hebrew and English. Their US-built redaction tool fails entirely on the Hebrew sections, requiring manual review for every bilingual document. anonym.legal's Stanza-powered Hebrew NER detects names, addresses, and Israeli ID numbers (Teudat Zehut) without requiring transliteration or manual preprocessing.

We outsource customer support to a BPO in the Philippines โ€” how do we ensure their agents' multilingual chat logs are anonymized before analysis?

48-language support includes APAC languages: Indonesian (ID), Thai (TH), Vietnamese (VI), Filipino (TL), and others via XLM-RoBERTa. Stanza covers additional APAC languages. Single deployment handles global customer support log anonymization. Example: A Singapore-based fintech processes 500,000 customer support chat logs monthly across 12 APAC languages. PDPA (Personal Data Protection Act) requires anonymization before analytics. Their current tool only processes English accurately. anonym.legal's multilingual support reduces their manual review burden from 60% of non-English logs to near-zero.

We process data from Brazil, India, and the EU โ€” do we need three different tools for CPF, PAN, and IBAN detection?

260+ entity types include Brazil CPF, India PAN, all EU IBAN formats, Brazilian CNPJ, Indian Aadhaar, and many more. The entity library is maintained and updated by the anonym.legal team. Organizations with global operations get comprehensive coverage from a single tool. Example: A London-based marketplace processes seller onboarding documents for merchants from 45 countries. They need to detect and anonymize national ID numbers for GDPR (EU), LGPD (Brazil), and DPDP (India) compliance. anonym.legal's 260+ entity type library covers all their regional identifier requirements without custom development.

How do I detect PII in Arabic and Hebrew text? Our RTL documents are completely missed by standard NER tools.

XLM-RoBERTa provides cross-lingual entity recognition for Arabic and Hebrew with full RTL text handling. The platform includes Arabic, Hebrew, Persian, and Urdu in its 48-language support stack. Example: A fintech company in Dubai processing KYC documents for EU clients. Documents contain Arabic customer names and UAE Emirates IDs alongside English business data. GDPR applies to the EU client relationship data. Without RTL PII detection, Arabic name fields are invisible to the compliance system.

We have documents mixing English and German โ€” does NER get confused when languages switch mid-document?

XLM-RoBERTa's cross-lingual transformer architecture is trained on multilingual corpora and handles mixed-language text natively without requiring explicit language switching. Combined with language-specific spaCy models for high-accuracy regions, the hybrid approach handles multilingual documents robustly. Example: A Swiss pharmaceutical company processes employment contracts that mix German, French, and English within a single document (Switzerland has four official languages). Their current tool misses French-section PII when configured for German. anonym.legal's multilingual stack processes all three languages simultaneously within the same document pass.

260+ Entity Types

Our tool detects US SSNs perfectly but misses German Steuer-IDs, French NIRs, and Swedish Personnummer. How do we get complete EU coverage?

260+ entity types include complete DACH coverage (Steuer-ID, AHV-Nr, Sozialversicherungsnummer), French identifiers (NIR, Carte Vitale, SIRET, SIREN), UK identifiers (NHS Number, NI Number, UTR), Nordic identifiers (Swedish Personnummer, Norwegian Fodselsnummer, Finnish Henkilotunnus), and all EU IBAN formats. This is 13x the coverage of standard Presidio (~20 default entity types). Example: A global HR manager at a multinational company processing payroll data for employees across 12 EU countries. Each country's national ID format is different. anonym.legal's 260+ entity types cover all 12 countries' formats in a single detection pass โ€” eliminating the need for country-specific tool configurations or manual review for missed regional identifiers.

How do I detect Medical Record Numbers (MRNs) in clinical notes when every hospital has a different format?

The 260+ entity types include NPI numbers, DEA numbers, Medicare IDs, and health plan identifiers. The Custom Entity Creation feature allows healthcare organizations to define their specific MRN format once and apply it consistently. The AI-assisted pattern helper generates the regex from examples, removing the technical barrier for clinical informatics teams without regex expertise.

Our PII tool detects US SSNs but not German Steuer-IDs or French NIR numbers โ€” how do we cover EU-specific identifiers?

260+ entity types include all major EU member state identifiers: DACH (Steuer-ID, AHV-Nr, Sozialversicherungsnummer), France (NIR, Carte Vitale, SIRET, SIREN), UK (NHS Number, NI Number, UTR), Nordic (Swedish Personnummer, Norwegian Fodselsnummer, Finnish Henkilotunnus), and others. Pre-built and maintained by the anonym.legal team. Example: A pan-European HR software provider processes onboarding documents for clients in 18 EU countries. Each country has its own national identifier format. Their US-built PII tool detects SSNs reliably but misses 14 of 18 EU country identifiers. anonym.legal's 260+ entity library covers all 18 countries' identifiers, closing the EU compliance gap without requiring custom development.

We process healthcare records and need to detect MRN numbers that are unique to each hospital โ€” how do we build custom patterns?

Custom Entity Creation feature includes an AI-assisted pattern helper that suggests regex from provided examples. Healthcare teams provide 3-5 sample MRN values; the AI generates the appropriate regex pattern. The pattern is validated against additional examples. The custom entity is saved as a preset for reuse across all anonymization sessions. Example: A regional hospital system uses MRN format "SVHS-[0-9]{7}" for their 350,000 patient records. Their HIPAA compliance team needs to include MRN detection in their de-identification pipeline. Using anonym.legal's AI pattern helper, the team provides 5 example MRNs and receives a validated regex in under 2 minutes โ€” without writing a single line of code.

We need to anonymize data containing internal employee IDs that don't follow any standard format โ€” what do we do?

AI-assisted custom entity creation allows non-programmers to define internal identifier patterns. Visual regex pattern builder provides a guided interface. Test interface validates patterns against sample data. Custom entities integrate with the full detection pipeline alongside all 260+ built-in types. Presets allow custom patterns to be saved and shared across the team. Example: A global logistics company's compliance team must anonymize employee records for an external HR audit. Employee IDs follow the format "EMP-[REGION]-[0-9]{6}" (e.g., "EMP-EU-123456"). anonym.legal's AI pattern helper generates the regex from 3 examples in 30 seconds. The custom pattern is added to the team's GDPR compliance preset. All subsequent anonymization sessions detect employee IDs automatically.

Brazilian CPF numbers and Indian Aadhaar look nothing like a US SSN โ€” how do we detect them in a single pipeline?

260+ entity types include Brazil CPF, CNPJ; India PAN, Aadhaar (where detectable by format); all US state driver's licenses, SSN, EIN, ITIN; all EU member state identifiers. Single anonymization pass covers global multi-regulatory compliance. Example: A UK-based global marketplace processes seller verification documents from 80 countries. Their compliance team needs to meet GDPR (EU sellers), LGPD (Brazilian sellers), and DPDP (Indian sellers) simultaneously. anonym.legal's 260+ entity library covers all three regulatory regimes' identifiers in a single processing pipeline โ€” replacing three separate tools with one.

We're processing data that includes Bitcoin wallet addresses and SWIFT codes โ€” do PII tools cover financial crypto identifiers?

260+ entity types include cryptocurrency addresses (Bitcoin, Ethereum, and others), SWIFT codes, BICs, IBANs, bank account numbers, and routing numbers. Financial teams get comprehensive coverage for both traditional and crypto financial identifiers in a single anonymization pass. Example: A European crypto exchange processes KYC documents that include customer bank account IBANs, cryptocurrency wallet addresses used for initial funding, and SWIFT codes for wire transfers. A single anonym.legal anonymization pass detects and handles all three financial identifier types โ€” no separate tools or custom patterns required. MiCA compliance for crypto asset PII is covered alongside GDPR for traditional financial PII.

GDPR Compliance

The EDPB is running a 2025 enforcement sweep on right-to-erasure compliance โ€” what do we need to do?

Zero-knowledge design means original text is never stored on anonym.legal servers โ€” the tool itself cannot be a source of data requiring erasure. For organizations processing data through anonym.legal, the tool supports GDPR-compliant anonymization (replacing PII with tokens or encrypted values) that satisfies data minimization requirements. The Desktop App's local processing ensures no cloud retention to complicate erasure requests. Example: A retail company's DPO receives a surge of right-to-erasure requests following a DPA awareness campaign. The company uses anonym.legal to anonymize customer purchase history for analytics โ€” replacing names and contact details with tokens before analytics processing. When erasure requests arrive, the analytics datasets do not contain real customer data โ€” erasure from operational systems is sufficient. The DPO demonstrates GDPR-compliant data minimization to the investigating DPA.

TikTok was fined โ‚ฌ530M for sending EU data to China โ€” how do I ensure my anonymization tool doesn't create the same data transfer problem?

EU data storage (Hetzner data centers, Germany). Zero-knowledge architecture means original text is not stored on servers at all โ€” no EU data transfer issue. For organizations requiring absolute local processing, the Desktop App handles everything locally with no data leaving the device. Example: A French marketing agency processes customer email lists for targeted campaigns. They previously used a US-based data cleaning tool that received raw PII on US servers. Following the TikTok fine, their legal team flags this as a potential GDPR Article 46 violation. They switch to anonym.legal โ€” EU-based Hetzner servers, zero-knowledge design โ€” for all PII handling. The legal team documents EU data residency in their Article 30 records of processing activities.

The anonymization tool we're using stores our documents on US servers. Is that itself a GDPR violation?

All processing occurs on Hetzner infrastructure in EU data centers. Zero-knowledge architecture means original text never reaches anonym.legal servers โ€” only encrypted output is stored. The DPIA is complete and available to enterprise customers. The Data Processing Agreement is governed by EU law. This directly resolves the compliance paradox: using anonym.legal to anonymize data does not itself create a GDPR data transfer.

The EDPB issued new pseudonymization guidelines in January 2025. Does our current tool meet the new standard?

anonym.legal explicitly offers both modes: irreversible anonymization (Replace/Redact/Mask/Hash โ€” no recovery possible, output is truly anonymous under EDPB guidelines) and pseudonymization (Encrypt โ€” reversible with key, output is pseudonymized personal data under GDPR). This explicit distinction allows DPOs to choose the appropriate method for their use case and document their choice correctly for regulatory purposes.

What's the difference between GDPR anonymization and pseudonymization โ€” and why does it matter for our compliance?

anonym.legal offers all five methods: Replace (pseudonymization โ€” GDPR still applies), Redact (near-anonymization โ€” if comprehensive), Mask (pseudonymization), Hash (one-way โ€” approaching anonymization), and Encrypt (pseudonymization with controlled reversibility). The Encrypt method with client-held keys provides the strongest pseudonymization control. Documentation helps organizations understand which method produces which GDPR outcome. Example: A Dutch data analytics company offers anonymized customer datasets to third-party researchers. Their DPO needs to determine whether their "anonymized" data removes GDPR obligations. Using anonym.legal's Redact method (permanent removal of PII with no token mapping), the resulting dataset has no pathway to re-identification โ€” meeting GDPR's anonymization threshold. The DPO documents this determination in the DPIA. GDPR scope is removed for the analytics dataset.

Our DPO needs to sign off on our anonymization tool as part of our DPIA โ€” what does a GDPR-compliant tool need to demonstrate?

ISO 27001 certified. DPIA complete. EU data storage (Hetzner). Zero-knowledge design (original text never stored โ€” minimal data processor footprint). Data Processing Agreement available. Transparent architecture documentation available for DPO review. Example: An Austrian insurance company's DPO is completing a DPIA for their customer complaint anonymization process. The DPIA requires vendor assessment of anonym.legal as the anonymization tool. anonym.legal's ISO 27001 certificate, EU hosting documentation, DPIA, and DPA are provided. The DPO includes these in the DPIA documentation. The supervisory authority's subsequent audit finds the DPIA complete and compliant.

We received 500 data subject access requests in one month โ€” how do we respond efficiently without manually processing each one?

Batch processing (1-5,000 files) with GDPR-compliant anonymization presets enables bulk DSAR preparation. A preset configured for "third-party PII removal" automatically detects and anonymizes references to other individuals in documents being prepared for DSAR response. The same preset can be applied across all documents in a DSAR batch. Example: A German telecommunications company receives 300 DSARs monthly following a DPA awareness campaign. Each DSAR requires reviewing communications (emails, service notes) to remove third-party PII (other customers mentioned in the records) before sending to the requesting subject. anonym.legal's batch processing with a "DSAR response" preset processes 50 documents per request in minutes, reducing DSAR response time from 3 weeks to 3 days.

Custom Entity Creation

Our healthcare system uses proprietary patient identifiers (MRN format: HOSP-YYYY-XXXXXX). HIPAA requires de-identification but no tool detects our format. We'd need to write custom code โ€” is there a simpler way?

Custom entity creation with AI-assisted regex generation is purpose-built for this use case. A compliance officer describes the MRN format ("Hospital identifier starting with HOSP, dash, 4-digit year, dash, 6-digit number") and receives a working regex pattern. Custom entity is saved, applied to all document processing, and shared with the team via presets. Zero engineering required. HIPAA Safe Harbor compliance for organization-specific identifiers is achievable in under an hour. Example: A regional hospital network (15 facilities) is preparing to share de-identified patient data with a university research partner. Their MRN format (HOSP-YYYY-XXXXXX) appears in thousands of discharge summary PDFs. Their compliance team uses anonym.legal to define the custom MRN pattern, validate it against a sample document set, and process the full research dataset in batch. The university receives HIPAA-compliant de-identified data. Compliance timeline: 3 days vs. 3 months for custom code developmen

Our employee ID format is 'EMP-XXXXX' โ€” none of the standard PII tools detect it. How do we anonymize internal identifiers that aren't standard PII types?

Custom entity creation with AI-assisted pattern generation. Users describe their identifier format in plain language ("Employee IDs that start with EMP followed by 5 digits") and the AI generates the appropriate regex pattern. Custom entities integrate seamlessly with the existing 260+ type detection. Results can be saved as presets and shared across teams. Zero engineering required โ€” compliance and legal teams can define their own patterns. Example: A financial services firm has customer account numbers in the format "ACC-XXXXXXXX-XX" that appear throughout support ticket exports. Standard PII tools miss them entirely. Using anonym.legal's custom entity builder, their compliance team creates a pattern in 10 minutes. All 180,000 historical support tickets processed in batch now have account numbers redacted alongside standard PII. Re-identification risk eliminated without an engineering ticket.

We work with German tax identification numbers (Steueridentifikationsnummer) โ€” 11 digits starting with a non-zero digit. Standard tools don't detect them. Is there a way to add this?

The 260+ entity library includes major European national identifiers. For formats not yet covered, the custom entity builder allows compliance teams to add them using the AI pattern assistant or manually entering the regex. Once added, they're available in all processing modes and can be shared via presets to the entire team. The German Steueridentifikationsnummer, for example, can be added in under 5 minutes. Example: A German payroll outsourcing firm processes documents for 500 client companies. Their anonymization workflow missed Steueridentifikationsnummern in payslip PDFs because their previous tool (standard Presidio) had no German tax ID recognizer. After a DPA audit finding, they need to add this detection immediately. anonym.legal's custom entity creation lets their compliance officer add the pattern without waiting for an engineering sprint โ€” critical gap closed in one afternoon.

I'm trying to build a GDPR-compliant customer support AI. The problem is customer messages contain our order IDs (ORD-XXXXXXX) alongside standard PII. I need to strip both before sending to the AI. How do I handle custom identifiers?

Custom entity creation for order IDs and account numbers in specific formats, combined with the default 260+ entity type detection, provides complete anonymization in a single pass. The Chrome Extension or MCP Server can apply custom entity detection in real-time as support agents type โ€” preventing PII and custom identifiers from ever reaching external AI systems. Configuration is shareable across the support team via presets. Example: A SaaS company's customer support team uses Claude via their internal AI platform to draft support responses. Customer messages copied into the AI interface contained customer names, email addresses, and order IDs (ORD-XXXXXXX format). After a GDPR review, the DPO required anonymization before AI processing. anonym.legal's Chrome Extension with custom order ID entity detects and replaces all identifiers in real-time. Support team workflow unchanged, GDPR compliance achieved.

We're building a legal discovery tool and need to detect case reference numbers, attorney bar numbers, and court docket IDs โ€” none of which are standard PII. How do we add legal-specific identifiers?

Custom entity creation supports legal identifier formats. Attorneys and compliance officers can define bar number formats (State + 6 digits), docket number formats (XX-CV-XXXXXX for federal civil), and matter number formats using the AI-assisted pattern builder. These custom entities integrate with standard PII detection, enabling comprehensive document review. The resulting preset can be shared across the legal team or sold as a product feature by legal tech vendors integrating via API. Example: A legal AI startup builds a document analysis tool for law firms. Their enterprise clients require redaction of client matter numbers alongside standard PII before documents are processed by their AI. Using anonym.legal's custom entity API, they add matter number detection to their pipeline in 2 days (vs. 3 months building a custom NLP model). Their enterprise contracts close without the compliance blocker.

Every hospital in our network has a different Medical Record Number format. How do I create custom detection rules without being a regex expert?

The AI-assisted pattern helper accepts plain-language examples ("These look like MRN numbers: MRN:1234567, MRN:9876543") and generates the appropriate regex pattern. The visual regex builder allows refinement. The test interface validates against sample text. Patterns are saved as named custom entities and can be shared across the team with Basic+ plans.

Also from anonym.legal: anonymize.legal ยท blurgate.eu ยท privacyhub.legal ยท anonym.company ยท anonym.digital ยท anonym.management ยท anonym.marketing ยท anonym.agency

Published by George Curta, Founder of anonym.legal ยท