AGENDA
Generative and Multimodal Anonymization for Unstructured Data
Challenge
The growing need to protect personal data, especially data that can directly or indirectly identify a person, has made anonymization a necessity. This project responds to that demand with a technological solution capable of anonymizing and pseudonymizing text, audio, and image information in a total of eight languages (Spanish, English, French, Italian, Portuguese, Catalan, Basque, and Galician). Using generative AI, this tool automatically detects and replaces sensitive data while maintaining the consistency and usefulness of the original content.
The solution addresses a critical problem in open data environments and model training, where the presence of facial features, voice, or linguistic patterns can compromise people's identity and generate risks of non-compliance with the General Data Protection Regulation (GDPR). The challenge is to provide a robust, scalable, and multilingual tool that allows personal data to be identified, replaced, or deleted in different ways, reducing legal risks and facilitating the secure use of data in business and research environments.
Solution
This tool is designed to be easily integrated into different environments, such as Data Spaces. This facilitates the secure use of information and the exchange of data between companies in a safe and simple manner. The solution is structured around four main features:
-
Enables the anonymization of text, voice, and images in any audiovisual format, covering the various points where personal data may appear.
-
Processes and protects documents and content in different languages, ensuring data privacy in international environments.
-
Creates synonyms and consistent substitutions for identified entities, synthesizes anonymized speech, and generates artificial images that preserve content usability without compromising real identities.
-
Enables secure data sharing, exchange, or trading between companies and institutions, ensuring regulatory compliance and reducing risk in collaborative systems.
Results
The resulting anonymization tool will incorporate into its architecture the necessary blocks to be able to connect to data spaces (registration, connection, security...). Specifically, an existing data space will be selected on which integration tests and a final demonstration will be carried out.
Funding
This project is responsability of the SEDIA, from the Ministry of economic affairs and digital trasformation. The project is funded by the Nextgeneration EU funds.
Partners
Sigma Cognition is the main partner and collaborates with the company Itelligent (https://itelligent.es) which has a very proven experience with the development of technologies around data spaces
Project News and Events
November 7, 2024: the call for applications is published in the BOE (Official State Gazette) with the regulatory bases for the granting of aid to finance Research, Development and Innovation (R&D&I) projects on Data Spaces.
January 31, 2025: the application for the participation of Sigma Cognition in consortium with Itelligent is submitted.
May 19, 2025: the provisional resolution proposal of the AGENDA project is published in the electronic headquarters of the Ministry for Digital Transformation and Public Administration.
September 23-26, 2025: SIGMA COGNITION is attending the SEPLN2025 conference in ZARAGOZA and AGENDA will be presented at the Tecnoling showroom. See below the brochure edited for the occasion.
October 29-30, 2025: Techshow event in Madrid at the IFEMA venue, where we presented the project and had a demonstration already running about voice anonimization using various strategies including generative AI.
Publications
Project leaflet edited for the SEPLN2025 conference.