Under manipulations, are there AI models harder to audit?

2023-10-01

Augustin Godinot
Université de Rennes, Inria, IRISA/CNRS, PEReN

Erwan Le Merrer
Inria

Gilles Tredan
LAAS, CNRS

Camilla Penzo
PEReN

François Taïani
Université de Rennes, Inria, IRISA/CNRS

You are a regulator, I am a platform hosting a Machine Learning (ML) model. You want to verify that my model does not discriminate marginalized populations. I want to have the most accurate model in average (yes, I like to brag about that 97.98989% accuracy).

Threat model of the manipulation-proof auditing game

To verify that I am not too unfair, you conduct an audit: you ask me questions and I have to answer them. The thing is, I am a sneaky platform. I don't want to spend time to have a fair and accurate model. Thus, during the audit, I design answers to your questions as fair as I can, but after the audit, I will change my model back to the most accurate model I have. I know that you are not naive, I know that you'll check if I kept the same answers on your questions from time to time. Can you craft questions that make it hard for me to manipulate the model after the audit while keeping the same answers on your questions?

We build on a previous work by Yan and Zhang and proved that, unfortunately, it is not possible to achieve significantly better performance than asking random questions.

Larger models are harder to audit

We saw that for some types of models (the ones that have a high capacity), despite how clever any audit algorithm might be, it will not get better performance than randomly sampling queries. In practice, what about commonly used models?

The effect of model capacity on the manipulability of the audit

It turns out that just by increasing the capacity (tree depth, number of trees, neurons...), the platform can always arbitrarily increase the manipulability of the audit. If you want to know more about about how to measure this notion of capacity, and audit manipulability, checkout the paper 😉.

Manipulations are cheap for the platform

Thanks to the capacity of existing models, it is easy for the platform to appear fair during the audit and then use an other model. Yet, is the accuracy of this new model be as high as if the platform did not have to jump through all those hoops to game the audit?

The cost of exhaustion for different types of models

The cost of exhaustion is a measure of this accuracy drop. It turns out that for most model types, the impact on the accuracy is minimal.

An attempt at relaxing the Manipulation-Proof auditing problem

In a earlier work, I attempted to relax the notion of Manipulation-Proof auditing. You can have a look at the paper and slides.