| To jump to a topic, click your choice below: | ||||
|---|---|---|---|---|
| Protocol Submission | AI FDA FAQs | Glossary of Risk | AI Video Series | Additional Resources |
| To jump to a topic, click your choice below: | ||||
|---|---|---|---|---|
| Protocol Submission | AI FDA FAQs | Glossary of Risk | AI Video Series | Additional Resources |
The AI Protocol Submission Guidance for Researchers offers guidance to researchers conducting research that uses and/or develops Artificial Intelligence (AI). Researchers can utilize the guidance to determine what should be included in an AI protocol application.
Researchers should not feel obligated to address all concerns or add information in all of the protocol application sections below. For example, a study using an AI transcription service to transcribe recorded interviews may only need to include the AI model name and confirm whether the third-party vendor has access and maintains ownership of the data being transcribed. Include this information in the informed consent.
A general AI Design Description should include:
Provide more details if any of the following apply:
An AI Data Description should divulge if:
An AI Data Security explanation should indicate:
An AI Informed Consent should include:
Consult the FDA Digital Center for Excellence and use the Digital Health Policy Navigator to determine if subject to FDA oversight and applicable FDA legal and regulatory requirements.
Consult the FDA website and Graphic, Final FDA Guidance, and/or consult the Digital Center for Excellence with questions.
For formal classification of a product, a 513(g) request for information to the applicable Office of Health Technology (OHT). You may also engage the FDA or request feedback through the Pre-Submission program.
Use the Federal Trade Commission (FTC) Mobile Health App Interactive Tool for determining which laws and rules may apply.
| TERM | DEFINITION |
|---|---|
| Anthropomorphism | Also known as over-personification, anthropomorphism ascribes human features/characteristics to the model. This can lead to overconfidence in the model's performance and lax human oversight. |
| Bias | AI bias occurs when data output perpetuates existing prejudices. Bias can be imbedded in discriminatory data training sets, or it can be introduced through subjective algorithm development. |
| Data Drift | Data drift is a reaction to statistical and characteristic changes in input data that the model is not trained to handle. The model cannot generalize beyond the training data. This can lead to off-purpose data output and performance decline. |
| Data Fusion | Data fusion is the process of combining multiple data sources. The sources generally include raw data and produce false positives that can lead to inaccurate data profiles. Raw data is unlabeled data that has not been cleaned, organized, or summarized. Data fusion can also lead to re-identification. |
| Data Leaks | Data leaks refer to intended and unintended exposure of sensitive, private, or proprietary data. Data leakage commonly refers to vendor access and ownership of protected data as a stipulation for use of the third party’s AI model. Leaks can occur at any point in AI use or development. |
| Data Minimization | Data minimization is the process of identifying and inputting the least amount of data points to fulfill the model's purpose. The intention is privacy protection. However, it can also lead to a loss of data that may limit results and impact accuracy. Another concern would be the potential for unintentionally including bias (e.g., excluding race, gender, or age data points). |
| Deepfakes | A deepfake is audiovisual content intentionally altered to disseminate false information. Deepfakes can contaminate results when inputting unsupervised data from open AI sources. Contaminated results could potentially infiltrate supervised databases and/or peer reviewed publications. This could lead to perpetuating the falsehood and raise intellectual property concerns. |
| False Negative | A false negative is a data output prediction/decision that incorrectly indicates an attribute/condition is not present when it is present. False negatives can lead to missed opportunities to participate in research in a pre-clinical effort to locate a target population. False negatives can also lead to a misdiagnosis in a clinical trial. |
| False Positive | A false positive is a data output prediction/decision that incorrectly indicates an attribute/condition is present when it is not present. A high rate of false positives generally indicates biased training data. False positives can lead to carrying forward misguided prejudice as a result. |
| Hallucinations | A hallucination is nonsensical and inaccurate data output from a Large Language Model (LLM). Underlying causes for hallucinating LLM responses include the lack of real-world context and insufficient or poor-quality data. Vague prompting combined with an expectation to “guess” can also produce hallucinations. |
| Interpretability | Interpretability describes how a model makes a prediction or decision. In a model that combines different types of data, interpretability also includes how different data types interact. |
| Misclassification | AI data classification is the process of sorting and labeling data inputted into AI models. Misclassification occurs when the AI model incorrectly sorts and/or labels data output. |
| Overfitting | Overfitting is when a model performs poorly on new data because it memorized the training data. It occurs for a variety of reasons (e.g., insufficient training data, training too long on the same data, too much emphasis on noise in the data that is real-world uncommon). |
| Over-optimization | Optimization is the process of adjusting the mathematical parameters of an algorithm to improve accuracy and reduce errors. Over-optimization occurs when an algorithm metric is too narrowly defined on a specific task. A hyper-tuned algorithm can develop blind spots that lead to misidentification or misunderstanding in a prediction or decision. |
| Over-reliance | Over-reliance involves placing too much trust in the output of a model. This can cause performance errors, and that might otherwise be identified through more diligent human scrutiny. |
| Poisoned Data | Poisoned data is incorrect, biased, or mislabeled data that can contaminate training sets. Data pulled from the internet, third-party platforms, and government can advance toxicity in the early stages of model development. |
| Re-identification | Re-identification is a process that results in linking a de-identified data source back to the identity of the de-identified individual. It generally occurs when data sets are combined, allowing an AI model to match data points in de-identified data and other publicly available data. |
| Skewed Data | Models trained on skewed data generate unequal predictions and decisions. Skewed data can produce higher/lower rate patterns of inconsistencies for a specific demographic population without justification. This can amplify already existing biases. |
| Synthetic Data | Synthetic data is trained on real-world data for the purpose of generating statistically identical data. This can lead to the re-identification of the de-identified real-world data. It can also generate an inaccurate representation of the real world and reinforce demographic inequities. |
The AI Adventures in Human Subjects Research Video Series offers researchers responsible-use strategies for mitigating the risk of employing AI in human subjects research.
Episode 1 provides the purpose for creating the video series, presents the beneficence conundrum, and introduces the data ancestors. [7:10]
Episode 2 offers strategies for mitigating the risk of inputting potentially biased historical data sets into AI models. [4:54]
Episode 3 offers strategies for mitigating the risk of combining de-identified data with publicly available auxiliary information. [4:53]
Episode 4 offers strategies for mitigating the risk of mining open-source social media content that may or may not be publicly available. [6:43]
Episode 5 offers strategies for mitigating the risk of unintentionally including hallucinations or fabricated data in research involving human subjects. [6:10]
Since 2023, the UK Center for Applied Artificial Intelligence has been at the forefront of AI, making investments in people and technology to empower others to explore AI and create meaningful solutions. Today, the Center is a specialized community who empowers faculty, staff, researchers, and clinicians to use AI to advance research, improve health outcomes, enhance student experiences, and drive productivity. Our team includes early and late-career software developers, project managers, data scientists, and advisors who guide our collaborators to overcome common AI barriers, such as technical expertise and access to secure compute resources. Without requiring a technical background or costly investments, we help our collaborators quickly turn ideas into prototypes and prototypes into solutions that make an impact for the communities they serve. Since our inception, we’ve supported more than 100 projects, have a network of 43 partners and over 120 individual collaborators. Please fill out our Collaboration Intake Form to connect.
Recommendations on the Use of Generative AI in Research and Scholarly Activity. UK ADVANCE offers guidance in response to frequently asked questions (FAQs) about the use of artificial intelligence from the UK research community.