7.6. Skills Data Space specific Technical Building Blocks

On top of the building blocks specified by the DSSC, the following building blocks are considered to be essential for the Skills Data Space.

7.6.1. Decentralised AI training #

Description
  • The users give or revoke the right to mobilize their data, wherever it is stored, to train AI models.
  • A decentralised federated learning protocol allows to train AI models without disclosing users’ contributions. Whereas today, the training of AI imposes the need to provide the data in clear to a central actor. Individuals no longer have to arbitrate between altruism and confidentiality. Since data is no longer shared, AI researchers can focus on their core business because they no longer have to spend time on compliance for data access.

By mobilising data at its source, decentralised learning increases the relevance of AIs by allowing them to train on more transversal, sensitive and up-to-date data, from multiple sources: data is no longer shared to benefit from the service.

Key Functions in skills context For the skills context and its specific requirements in terms of AI applications, specific functions are required:

·       AI providers need user data to train their models, while data providers need AI models to provide innovative features to their users. This building block is an answer to this need by making the link between AI providers and data providers through secure and trusted decentralized learning protocol.

·       Users give or revoke consent or their personal data through intuitive UI, translating into semantic description.

·       Once a user gives consent to participate in an AI model,

  •  it gives access to the specified data through a consent mechanism,
  •  it becomes part of the contributors’ nodes. Once enough users give consent, an execution tree is computed, including contributors and aggregators.

·       Each contributor securely receives the AI model to train, including the weights, and the related algorithms. The relevant user data, already in place, is identified and queried.

·       The computation is then made in a secure environment to guarantee the robustness and trustworthiness of the execution.

·       Once the contribution is computed, the result is split in shares and a noise is added to each share to ensure the confidentiality of the contribution. The noise is computed in such a way that at the end of the execution, the aggregation of all the contributions removes the overall noise and produces the final trained model, which can be retrieved by the AI provider.

·       During the process, no user data is exposed whatsoever, ensuring the security and the privacy of the users.

·       Any organisation or individual can implement its own learning model and submit it to a call for contribution so that individuals can participate with their data.

·       Anonymised & High-Quality Statistical Data are usable:

  • as high value data input for algorithms
  • as auditable and aggregable of the AI/ML outputs

·       AI explainability and interpretation: Explainability is a key requirement for trust. Federated learning has not yet been adequately studied in the context of inherently explainable models. Once the global model is computed, this block will provide a reliable approach that provides clear and understandable explanations of the model outputs.

·       Handle data heterogeneity: This block quantifies user heterogeneity in terms of quantity, quality, and distribution, as well as its impact on the overall model. A notion of score is to be defined to quantify the clients’ contributions to the overall model and provide a benefit to clients who contribute positively.

Dependencies and relationships
Commonly Used Standards in skills context No standards found in DS4Skills inventory
Specs & Reference Implementations in skills context No specifications and reference implementations found in DS4Skills inventory

Table 42: Decentralised AI training.

Powered by BetterDocs