AI Hacking

Exploiting vulnerabilities in AI Models

Prompt Injection

Prompt injection occurs when the original instructions provided to a model are overridden, often for malicious purposes such as disclosing more information than it should, or generating harmful content.

Data Poisoning

When an attacker manipulates the training data so its generated output is incorrect or biased. Used to manipulate training data so that it fails to recognize correct security protocols like recognizing spam.

Model Theft

When an attacker gains unauthorized access to an AI model to steal the intellectual property that lies within and uses it for malicious purposes.

Privacy Leaking

The possibility of an AI model to inadvertently reveal sensitive information about the data it was trained on, even if the data was supposed to be kept confidential.

Model Drift

Potential for a a model's performance to drift over time due to changes in the data or the environment surrounding it.

Last updated