On October 27, 2021, pursuant to the Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device Action Plan (Action Plan), the US Food and Drug Administration (FDA) released its Good Machine Learning Practice for Medical Device Development: Guiding Principles (Guiding Principles) developed in conjunction with Health Canada and the United Kingdom (UK) Medicines and Healthcare products Regulatory Agency (MHRA). In the Action Plan, FDA noted that stakeholders had called for FDA to encourage harmonization of the development of good machine learning practices (GMLP) through consensus standards efforts and other community initiatives. GMLP are AI/ML best practices (e.g., data management, feature extraction, training and evaluation) that are analogous to quality system practices or good software engineering practices.
FDA also solicited feedback from stakeholders on GMLP in its 2019 Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Discussion Paper and Request for Feedback. The 10 guiding principles, while not formal or binding, provide a helpful framework for developers and identify areas where collaborative bodies and international standards organizations could work to advance GMLP through the development of formal policies and guidance.
Leveraging Multi-Disciplinary Expertise Throughout the Total Product Life Cycle
Having an in-depth understanding of how the ML-enabled medical device will be integrated into the clinical workflow can help ensure that such devices are safe and effective. Developers should rethink the traditional device development process to include inputs from internal stakeholders such as the chief information security officer, privacy and data strategy personnel, and medical personnel. Input from these stakeholders may be needed earlier in the design and development process than is typical for traditional devices.
Implementing Good Software Engineering, Data Quality Assurance, Data Management and Security Practices
These practices include methodical risk management and design process designed to capture and communicate design, implementation and risk management decisions and rationale, and to ensure data authenticity and integrity. Developers should also consider FDA’s Content of Premarket Submissions for Management of Cybersecurity in Medical Devices guidance and interoperability of ML-enabled devices within systems or workflows from different manufacturers.
Designing Clinical Studies with Participants and Data Sets That Are Representative of the Intended Patient Population
Consistent with FDA’s Enhancing the Diversity of Clinical Trial Populations — Eligibility Criteria, Enrollment Practices, and Trial Designs Guidance for Industry (discussed in depth here), data collection protocols should ensure that relevant characteristics of the intended patient population, use and measurement inputs are sufficiently represented in a sample of adequate size in the clinical study or training and test datasets. This allows results and use of data to be generalizable and helps mitigate bias.
Ensuring Training Data Sets Are Independent of Test Sets
Developers should consider sources of dependence (e.g., patient, data acquisition and site factors) and ensure that training datasets and test datasets are appropriately independent of one another. This principle suggests that regulators will expect developers to explain how they separated the training and test sets to control for bias and confounding factors.
Ensuring Selected Reference Datasets Are Based Upon Best Available Methods
Developers should use the best available, accepted methods for developing a reference standard to ensure they collect clinically relevant and well-characterized data, and should ensure that they understand the limitations of the reference. Where available, developers should use accepted reference datasets in model development and testing. This may present a hurdle for ML-enabled devices that address disease states or therapeutic areas for which there is no single universally accepted reference standard.
Tailoring Model Design to the Available Data and Reflecting the Intended Use of the Device
Model design should be suited to the available data and actively mitigate against known risks (e.g., overfitting, performance degradation, security risks). The Guiding Principles suggest that the regulators may expect developers to provide more detailed information to demonstrate alignment between a product’s proposed intended use and indications for use and the design of the model in terms of mitigating risks and demonstrating efficacy and performance.
Placing Focus on the Performance of the Human-AI Team
To the extent the model has a human element, developers should consider human factors and interpretability of model outputs. Considerations that inform traditional device development, such as the impact of human factors, the need for specialized training to use the device, and the expected effect on clinical outcomes (i.e., improvements) and impact on clinical and other user work flows, will be equally important for machine-learning tools.
Demonstrating Device Performance by Testing During Clinically Relevant Conditions
Device performance should be evaluated independently of the training data set. Testing performance should consider the intended patient population, clinical environment, human users, measurement inputs and potential confounding factors.
Providing Users With Clear, Essential Information
Users should be provided with clear, contextually relevant information, including the product’s intended use and indications for use, information about the model’s performance in relevant subgroups, characteristics of the data used to train and test the model, acceptable inputs, known limitations, how to interpret the user interface and how the model integrates into the clinical workflow. Users also should be apprised of device modifications, updates from real-world performance monitoring, the basis for decision-making, and a way to communicate product concerns to the developers.
Monitoring Deployed Models for Performance and Ensuring Retraining Risks are Managed
Developers should monitor deployed models. Additionally, when models are trained after deployment, whether continually or periodically, developers should ensure that there are appropriate controls to manage risks of overfitting, unintended bias or degradation of the model (e.g., dataset rift) that could impact the safety or performance of the deployed model. Developers also should consider how to ensure that the datasets they use to develop and train models will not become stale or outdated over time. The Guiding Principles suggest that regulators will expect developers to consider how changes to real-world clinical assumptions, diagnosis or treatment standards may impact the tool’s performance over its expected lifecycle.
Although the Guiding Principles provide practical, common-sense principles for GMLP, the concepts are not necessarily new. The more challenging task for the regulators and for industry will be developing concrete practices, policies and procedures for ML tools within or alongside the existing framework for medical device quality system regulation in the United States, UK, European Union and other regions.
The Guiding Principles docket, FDA-2019-N-1185, is open for public comment. FDA recently announced that it plans to publish a draft guidance on Marketing Submission Recommendations for A Change Control Plan for Artificial Intelligence/Machine Learning (AI/ML)-Enabled Device Software Functions, as development resources in permit, in current Fiscal Year 2022.