Federated learning
Federated learning, also known as distributed learning, is a technique that facilitates the creation of robust artificial intelligence models where data is trained on local devices (nodes) that then transfer weights to a central model. Models can potentially be trained using larger and/or more diverse data pools with federated learning without the need for exchange of actual data between the nodes. Because federated learning architecture enables multiple proxies to collaborate and build a common machine learning model without sharing sensitive clinical data, it enables the utilization of clinical data whilst simultaneously addressing critical legal issues of data governance such as data privacy, data protection, and access rights.
Federated learning is a method in machine learning that belongs to a class of distributed systems that depend on the principle of remote execution — i.e. distributing copies of an algorithm to multiple devices (nodes) where the data is kept, performing training iterations locally, and returning the results of the computation (e.g. updated neural network weights) to update the core algorithm. Decentralized variants of federated learning simply eliminate a central server receiving updates, using transactions between all nodes instead.
The main advantage of employing federated learning techniques is the ability of the data to remain with its owners, while still supporting the training of algorithms on the data.
Practical points
All sights (nodes) require their own appropriate computational capacity for their dataset. Many hospital systems have complex computer and data infrastructures that include legacy systems which can have profound effects on real-time performance, latency and throughput. The potential benefits of federated learning include:
- privacy
- every participant keeps control of its own clinical data
- complying with local governance of the clinical data
- more robust models
- opens up the possibility to build larger, more diverse datasets and thereby more robust AI algorithms with potentially less bias
The technique is not without potential problems even given the relative ease of interoperability of much radiology data due to theoretically uniform file types and protocols. Understanding the roots of bias if it is created by a federated learning-driven algorithm can be more difficult without all the training data, requiring each site to thoroughly analyze their own dataset and collaborate to establish differences. In terms of privacy, federated learning from extremely small datasets on one person e.g. a retina scan on one user's mobile device may be potentially vulnerable to reconstruction attacks in which bad players could try to reconstruct the user identity from updates to the central algorithm.