School of Medicine Research using Artificial Intelligence

GUIDANCE Contents

Purpose and Applicability
Context
Role of the IRB
IRB Review and Stage of Development or use of AI
Identifying and Assessing AI Related Risks
Consent Considerations
Related Materials
References
Version Information

Purpose and Applicability

This guidance applies to UW reviewed human subjects research involving the use of Artificial Intelligence Systems (AI) when:

It is led by School of Medicine Principal Investigators (PIs); AND,
It involves either the targeted enrollment of UW Medicine patients, OR use of UW Medicine Data. Note that UW Medicine Data is NOT limited to clinical records and includes many types of data stored in any UW Medicine system or application.

This guidance is intended to align with the UW Medicine Policy on Use of Artificial Intelligence in the Healthcare Setting and uses the definitions of AI and UW Medicine Data from the glossary of this policy. It covers both research involving the development of AI systems and the use of AI as a tool to facilitate the administration of a research study (e.g., recruitment, safety monitoring, data analysis). Exception: This guidance does not apply to the use of AI as tool for the administration of research when: 1) the AI tool(s) have been approved for use in clinical care and UW medicine business operations as described in UW Medicine’s policy and 2) are being used for their approved purpose.

HSD has revised its interpretation of the regulatory definition of a human subject to capture some research that should be reviewed by the IRB to mitigate the risks to subjects that may result from re-identification of their data. This means that IRB review will be required for some research involving AI and the secondary use of de-identified data that did not previously require IRB review. Use the Human Subjects Research Determination worksheet to determine if your research requires IRB review.

For research covered by this guidance, the UW IRB now requires researchers to complete and submit the SUPPLEMENT Artificial Intelligence form with their IRB application. The supplement is designed to be used with this guidance to develop and describe a plan to address the risks associated with research involving the use of AI, and to provide the IRB with the information necessary to complete its review of the risk mitigation plan.

When the research will not involve data collection through interaction with research participants (e.g. it involves only use of secondary data), the supplement may be submitted in conjunction with the shorter IRB Protocol, No Contact form. Otherwise, the standard IRB Protocol form should be used.

Back to Table of Contents

Context

AI systems introduce unique and evolving risks when used in human research, stemming from their complexity, scale, and unpredictability. Unlike traditional technologies, AI can produce outputs that are fabricated, difficult to interpret, or that reflect and amplify societal biases. These systems may also re-identify individuals from datasets previously considered de-identified or reveal sensitive information, raising concerns about equity, participant safety, privacy, and confidentiality. Adaptive AI, which continues to learn and evolve based on new data or interactions, introduces additional challenges, such as performance drift, unpredictable behavior, and difficulty in validating outputs over time. These characteristics can complicate informed consent, challenge participant autonomy, obscure accountability, and increase the likelihood of evolving risks that may not be foreseeable at the outset of a study.

This guidance is designed to establish a standardized and risk-based approach to the review of research involving AI that will help the IRB identify the risks in a study, determine when risks have been appropriately mitigated, and communicate the IRBs expectations to researchers.

The approach in this guidance is largely based on the white paper A Novel, Streamlined Approach to the IRB Review of Artificial Intelligence Human Subjects Research (AI HSR) and the Multi-Regional Clinical Trials Center’s (MRCT) Framework for Review of Clinical Research Involving Artificial Intelligence. Both the white paper and the MRCT framework call for the IRB to consider the stages of AI development when determining the level of oversight and risk mitigations measures required. The guidance also draws extensively from the Taxonomy of Trustworthiness for Artificial Intelligence and the National Institutes of Standards and Technology (NIST) AI Risk Management Framework, as well as relevant FDA guidance and presentations from various IRB forums.

Back to Table of Contents

Role of the IRB

The UW IRB ensures that research involving AI adheres to the three fundamental ethical principles described in the Belmont Report: Beneficence, Justice and Respect for Persons. These principles are applied through the established regulatory criteria for IRB approval of research. As the unique and evolving risks introduced by AI technologies present new ethical challenges, the purpose of this section is to explain how the UW IRB interprets and applies both the Belmont Principles and federal human subjects regulations in the context of AI research. The UW IRB will also use the information in the remaining sections of this document to guide its review.

Beneficence.

Regulatory Criteria:

Risks to subjects are minimized by using procedures that are consistent with sound research design and that do not unnecessarily expose subjects to risk, and whenever appropriate, by using procedures already being performed on the subjects for diagnostic or treatment purposes.
Risks to subjects are reasonable in relation to anticipated benefits, if any, to subjects, and the importance of the knowledge that may reasonably be expected to result.
There is an adequate plan for monitoring the data collected to ensure the safety of subjects.
There is an adequate plan to protect the privacy of subjects and to maintain the confidentiality of data.

IRB review:

Evaluates whether the research plan includes sufficient measures to reduce harm that may result from AI system issues, such as inaccurate misleading system outputs, performance drift, lack of transparency in decision-making processes, the potential for overreliance on AI, and limited explainability of system outputs, and the risk of AI systems retaining or disclosing information beyond their intended scope.
Evaluates whether the potential benefits of AI research justify the risks, including the potential for AI to cause individual harm in a clinical setting and the potential benefits of AI for individuals in clinical setting and broadly for improvement/scalability of medical care.
Evaluates whether there are sufficient data monitoring and feedback systems to detect and mitigate adverse effects that may be caused by errors, biases, or unexpected behaviors.
Evaluates whether there are sufficient privacy and security measures built into the AI system design, testing, deployment, and operation.

Justice.

Regulatory Criteria:

Selection of subjects is equitable. In making this assessment the IRB should take into account the purposes of the research and the setting in which the research will be conducted.

IRB review:

Evaluates whether there are sufficient measures in place to assess and mitigate computational bias (including biased input data and biased model design) so that underrepresented populations are not unfairly excluded, disproportionately affected, or disenfranchised by AI-driven decisions.
Encourages equitable distribution of research benefits and burdens across populations.

Respect for Persons.

Regulatory Criteria:

The IRB ensures that informed consent will be obtained from each participant (or their legally authorized representative) before they participate in the research, unless the research qualifies for a waiver of the consent requirements.
The IRB ensures that consent will be appropriately documented unless the research qualifies for a waiver of the requirement for documented consent.

IRB review:

Verifies that consent processes are clearly documented and that waivers are justified when applicable.
Ensures that the consent materials clearly explain the role of AI in the study, how data will be used, describes the risks and limitations of the AI system, the safeguards that will be in place, and how any results will be returned to participants.

Back to Table of Contents

IRB Review and Stage of Development or use of AI

The questions in the SUPPLEMENT Artificial Intelligence are structured around the stage of development of the AI system, and the use of AI as a tool to facilitate the administration of a research study. Breaking the review process down into the stages of development allows for a more targeted and efficient evaluation of AI-related clinical research by addressing the specific challenges and considerations at each stage. Researchers can use the information and resources provided in the Identifying and Assessing Risks section to help them complete the supplement and the information in the Consent Considerations section to design the consent process.

For research involving the development of an AI system, the IRB will review only one stage at a time. Researchers must submit either a new application or a study modification for each additional stage with an updated supplement.

As defined in the table, Stage 1 and Stage 2 studies are usually eligible for expedited review. These studies do not directly impact patient healthcare or treatment and tend to involve minimal risk.
Stage 3 studies are more likely to require review by the convened IRB, particularly when they directly impact patient healthcare or treatment.
HSD will generally review Stage 2 and 3 studies in consultation with AI subject matter experts.

STAGE/USE	DESCRIPTION
Stage 1 – Discovery	This stage focuses on the conceptual and exploratory development of AI algorithms. It involves gathering and early analysis of training data to explore potential use cases. During this stage, hypotheses are built and tested through iterative algorithm building on retrospective (sometimes prospective) datasets. The emphasis is on selecting appropriate algorithmic approaches and establishing preliminary associations to inform future development. Stage 1 research must not impact participant or patient healthcare, treatment or clinical decision-making. Stage 1 research may not release results to the medical records, patients or providers for clinical care purposes.
Stage 2 – Translation	This stage of AI development involves advancing AI systems in research from ‘conceptual development’ to ‘validation’, emphasizing performance testing and identifying risks. This stage may include: Controlled real-world simulation (new, unseen data) Performance metrics (sensitivity, specificity, accuracy) Bias detection (identify and measure) Safety testing (stress test; harm identification and mitigation) Clinician feedback (sandbox/ offline) Outcome comparison (performance comparison to existing tools/workflows) Stage 2 research must not impact participant or patient healthcare, treatment or clinical decision-making. Stage 2 research may not release results to the medical records, patients or providers for clinical care purposes.
Stage 3 – Deployment	The use of a tested and validated AI system within a research context to confirm clinical efficacy, safety, and risks. It involves clinical investigation to collect real world evidence. Stage 3 research has the potential to impact patient healthcare or treatment.
AI for administration of research	The use of artificial intelligence technologies to facilitate various aspects of the research process. This may include, but is not limited to recruitment, data analysis, transcription, and patient monitoring.

Back to Table of Contents

Identifying and Assessing AI Related Risks

The table below describes the primary risks that should be considered when conducting human research involving the use of AI, questions to consider, and resources to assist in the design of a risk mitigation plan. The questions and resources are intended to aid researchers in completing the SUPPLEMENT Artificial Intelligence and the IRB in its review of the study. The relevance of the questions to consider, and applicability of the resources will vary depending on the stage of the study or use of the AI system.

AI Risk	Questions to consider	Relevant resources
Accuracy and Reliability Accuracy refers to the degree to which a model’s outputs are correct when compared to ground truth. AI systems can suffer from accuracy issues due to flawed training data, incomplete information, and limitations in their ability to distinguish between truth and falsehood. These issues can lead to incorrect predictions, biased outputs, and even the generation of fabricated or false information (i.e. hallucinations). In adaptive AI, this variability can be influenced by changes in input data, environmental context, or internal model updates, making it difficult to ensure consistent performance.	How will changes in the model be tracked and validated? What are the known limitations of the model’s performance? How will the AI model be validated for the specific purpose, population and setting of the study? Will the AI model continue to learn or evolve during the study? How will the model’s performance be monitored during the study? What is the plan for handling errors and unexpected outputs? How will human oversight be maintained throughout the research lifecycle? Are there clear escalation pathways if the model fails or behaves unexpectedly? Who is accountable for AI-driven decisions in the study? If the AI model is not fully predictable, how will the study assess whether it can still be depended upon for its intended purposes? What steps have been taken to assess and improve the completeness, quality, quantity, suitability, and representativeness of the data? What is the desired and appropriate degree of automation, given the AI model’s characteristics and the context of its uses? What are the consequences of inaccuracy in the context of a clinical use case?	NIST AI RMF Playbook Map 1.1 Context mapping NIST AI RMF Playbook Map 2.2 System limits NIST AI RMF Playbook Map 2.3 Scientific integrity and TEVV considerations NIST AI RMF Playbook Map 4.1 Third party software and data NIST AI RMF Playbook Measure 1.3 Consultation and feedback NIST AI RMF Playbook Measure 2.2 Evaluations involving human subjects NIST AI RMF Playbook Measure 2.3 Measurement of performance and assurance criteria NIST AI RMF Playbook Measure 2.5 Demonstrate validity and reliability NIST AI RMF Playbook Measure 2.13 Evaluation of effectiveness IMDRF Framework for Risk Categorization of Software as a Medical Device
Bias and Equity Bias arises in artificial intelligence models in multiple ways. The data sets used to train AI models can reflect the biases that pervade societies and cultures that produced the data they contain. For example, generative AI models can exhibit bias by reinforcing cultural stereotypes present in their training data. In addition, the design of AI systems reflects the values, assumptions, and experiences of the decision makers responsible for their development.	Are there known disparities in how the AI system performs across gender, race, ability, age, or other dimensions? If so, how will these be addressed? Are or were the data used to develop the system sufficient, and representative of the population affected by the disease or condition, or of the general population? What steps have been taken to assess and mitigate computational bias (including biased input data and biased model design)? What steps have been taken to assess and mitigate ways in which systemic and human bias may influence the design, development, and deployment of the AI model? Have issues of participant access (e.g. access to mobile devices or the internet) and/or digital literacy been sufficiently addressed in the study design?	NIST AI RMF Playbook Map 1.1 Context mapping NIST AI RMF Playbook Map 1.2 Diverse team NIST AI RMF Playbook Measure 2.3 Measurement of performance and assurance criteria NIST AI RMF Playbook Measure 2.2 Evaluations involving human subjects NIST AI RMF Playbook Measure 2.11 Evaluation of fairness and bias
Privacy and Security AI systems raise significant privacy concerns due to their reliance on vast amounts of personal data for training and operation. This data can be vulnerable to breaches, misuse, unauthorized access, and re-identification of seemingly de-identified data and images, potentially revealing sensitive information and leading to harm. Providing data to third-party AI services for analysis may constitute a breach of participant privacy.	Is the scope of the data appropriately limited? Will the data be reused or shared beyond the scope of the study? What measures are in place to ensure that privacy and security are built into the AI system design, testing, deployment, and operation? What steps have been taken to ensure that information is not made available, or disclosed to unauthorized individuals, entities, or processes? How will the security of the data that is used for training or created be ensured? If data includes sensitive or personally identifiable information, what extra precautions will be taken? What measures are in place to mitigate the risks associated with re-identification and inference of sensitive information, and how likely is it that this could be accomplished by someone with access and intent?	NIST AI RMF Playbook Measure 2.7 Security and resilience NIST AI RMF Playbook Measure 2.10 Examination of system privacy NIST AI RMF Playbook Map 4.1 Mapping AI technology and legal risks including use of third party data or software Report of the Medical Image De-Identification (MIDI) Task Group — Best Practices and Recommendations
Transparency and Explainability Transparency in AI refers to the degree to which an AI system’s operations and decisions are clear, understandable, and accessible for review or scrutiny by users and stakeholders. Explainability refers to the ability to provide a user-friendly explanation of the reasons behind an AI system’s output (e.g., a diagnosis or prediction), to provide an understanding of its decision process. Several factors complicate AI transparency and explainability, such as the complexity of algorithms, limited visibility into training data, and the dynamic and adaptive nature of some models.	Will participants be informed that AI is being used in the study, or is it appropriate not to inform the participant? Can the AI model’s outputs be explained, understood and interpreted by the relevant parties (e.g. researchers, clinical staff, and participants)? Have any features or tools been added to make any uncertainty in the model easier to understand, such as confidence intervals, probabilistic estimates, or clear explanations? How will participants be informed that a decision that impacts them was made by an AI system? How will the research team communicate the model’s capabilities, benefits, and limitations and potential risks to the relevant parties? Are there mechanisms in place to prevent overreliance or deference, and challenge AI-generated outputs? How will these mechanisms be used to retrain the model and reassess the validity of the outputs?	FDA Guidance Transparency for Machine Learning-Enabled Medical Devices: Guiding Principles NIST AI RMF Playbook Measure 3.2 Explainability and interpretability NIST AI RMF Playbook Map 2.2 System limits NIST AI RMF Playbook Map 3.5 Human oversight NIST AI RMF Playbook Measure 2.8 Examination of risks associated with transparency NIST AI RMF Playbook Manage 1.1 Determining that purpose and objectives are achieved

Back to Table of Contents

Consent Considerations

Informed consent in research involving artificial intelligence (AI) must address the unique risks and ethical complexities introduced by these technologies. The consent processes should be tailored to the nature of the research, the stage of development, the role and function of the AI system, the type of data used, and address the unique risks, benefits, and uncertainties associated with the AI system. As AI models evolve, the associated risks and benefits can change and it’s important to consider whether these changes could impact a participant’s decision to continue in the study and the need for reconsent or ongoing communication.

Most AI research involving only human data does not require direct interaction with participants and uses large scale data sets. This research may qualify for a waiver of the informed consent requirement when it involves secondary use of existing data and poses no more than minimal risk of harm to subjects. In situations where consent must be obtained from research participants, the information below should be included in the consent form in addition to the required elements of consent. Refer to HSD’s Designing the Consent Process guidance for additional information about designing an informed and meaningful consent process.

Explain the role of AI in the study

Clearly state that AI is being used in the study and describe its role in the study (e.g., generates predictions, classifications, or decisions).
Indicate whether the AI system is static (fixed rules and predictable performance) or adaptive (learns and evolves in real time).
If applicable, explain whether AI outputs will be reviewed by a human before influencing decisions.

Explain how data will be used

Explain whether data will be reused, shared, or commercialized, and if participants will share in any profit.
Disclose if data will be retained after participant withdrawal and describe limits for removal of data from the AI model, including the data entered in the model and the data that it generates.

Describe risks and limitations

Describe any known and potential risks and limitations. These include but are not limited to:
- Re-identification of de-identified data
- Misclassification or incorrect predictions
- Algorithmic bias and fairness
- Psychological, social, or employment impacts
Explain that AI systems may evolve over time, and outputs may change as models are updated.
Explain what safeguards will be in place to mitigate these risks.

Privacy and Confidentiality

Describe any potential privacy and confidentiality issues related to the sharing of data and use of AI.
If applicable, describe safeguards against re-identification, especially when combining datasets.
Explain how privacy and confidentiality will be protected (e.g., use of encryption, access controls, and secure storage).

Return of Results

If AI will generate clinically actionable findings, refer to HSD’s guidance on return of individual results and designing consent.

Back to Table of Contents

Related Materials

SUPPLEMENT Artificial Intelligence

Back to Table of Contents

References

Comeau, “Collaborative Ethics: The Role of IRBs in Navigating AI Oversight in Medical Research”. [Conference presentation] Public Responsibility in Medicine & Research, AER, December 2023
Eto, “Harmonizing Health and AI: Navigating Innovation and Ethics”, [Webinar presentation] Consortium for Applied Research Ethics Quality, February 2024
Eto, T. (2024) Pre-Print: A Novel, Streamlined Approach to the IRB Review of Artificial Intelligence Human Subjects Research (AI HSR). Version 1. Stanford Digital Repository.
FDA Guidance, Good Machine Learning Practice for Medical Device Development: Guiding Principles
FDA Guidance, Software as a Medical Device: Clinical Evaluation
FDA Guidance, Transparency for Machine Learning-Enabled Medical Devices: Guiding Principles
Lifson, Loufek, Eto, “A Simplified IRB Review Process: AI HSR in 3 Phases”. [Conference presentation] Public Responsibility in Medicine & Research, AER, January, 2025
Multi-Regional Clinical Trials (MRCT) Center, Framework for Review of Clinical Research Involving Artificial Intelligence
National Institutes of Standards and Technology, AI Risk Management Framework
Secretary’s Advisory Committee on Human Research Protections (SACHRP), IRB Considerations on the Use of Artificial Intelligence in Human Subjects Research, October 19, 2022
Silverman, “IRB Review of Research Involving AI”, [Videocast Presentation] Office of Human Subjects Research Protections, Education Series, April 2024
UC Berkeley Center for Long-Term Cybersecurity (CLTC), Taxonomy of Trustworthiness for Artificial Intelligence: Connecting Properties of Trustworthiness with Risk Management and the AI Lifecycle
UW Medicine Policy: Use of Artificial Intelligence (AI) in the Healthcare Setting

Back to Table of Contents

Version Information

Open the accordion for version changes to this guidance.

Version Number	Posted Date	Implementation Date	Change Notes
1.0	08.29.2025	08.29.2025	Newly implemented guidance

Keywords: Artificial Intelligence; Large Language Models, Deep Learning, Generative AI, Machine Learning.

Plan/Propose

Tools

Setup

Tools

Manage

Tools

Closeout

Tools

Popular Resources

PI Quick Links

Collaboration

Tools

Support Offices

Research Units

UW Research Partners

Office of Research Compliance

UW Partner Compliance

Research Administration

Research Compliance

Human Subjects Division

School of Medicine Research using Artificial Intelligence

GUIDANCE Contents

Purpose and Applicability

Context

Role of the IRB

IRB Review and Stage of Development or use of AI

Identifying and Assessing AI Related Risks

Consent Considerations

Related Materials

References

Version Information

Version History

University of Washington Office of Research

Tools

OR Support Offices

OR Research Units

Research Partner Offices

Collaboration

About