AI Hacking
Exploiting vulnerabilities in AI Models

Prompt Injection
Prompt injection occurs when the original instructions provided to a model are overridden, often for malicious purposes such as disclosing more information than it should, or generating harmful content.
Data Poisoning
When an attacker manipulates the training data so its generated output is incorrect or biased. Used to manipulate training data so that it fails to recognize correct security protocols like recognizing spam.
Model Theft
When an attacker gains unauthorized access to an AI model to steal the intellectual property that lies within and uses it for malicious purposes.
Privacy Leaking
The possibility of an AI model to inadvertently reveal sensitive information about the data it was trained on, even if the data was supposed to be kept confidential.
Model Drift
Potential for a a model's performance to drift over time due to changes in the data or the environment surrounding it.
Last updated