4 human-caused biases we need to fix for machine learning

Bias is an overloaded word.

When people say an AI model is biased, they usually mean that the model is performing badly.

But ironically, poor model performance is often because of various kinds of actualbiasin the data or algorithm.

Algorithms that are biased will end up doing things that reflect that bias.

It can be detected and it can be mitigated but we need to be on our toes.

Thereare four distinct types of machine learning bias that we need to be aware of and guard against.

It’s free, every week, in your inbox.

Sample bias

Sample bias is a problem with training data.

This science is well understood by social scientists, but not all data scientists are trained in sampling techniques.

We can use an obvious but illustrative example involving autonomous vehicles.

Training the algorithm on both daytime and nighttime data would eliminate this source of sample bias.

Prejudice bias

Prejudice bias is a result of training data that is influenced by cultural or other stereotypes.

For instance, imagine a computer vision algorithm that is being trained to understand people at work.

The algorithm is likely to learn that coders are men and homemakers are women.

This is prejudice bias, because women obviously can code and men can cook.

The issue here is that training data decisions consciously or unconsciously reflected social stereotypes.

Decisions like these obviously require a sensitivity to stereotypes and prejudice.

Its up to humans to anticipate the behavior the model is supposed to express.

Mathematics cant overcome prejudice.

Measurement bias

Systematic value distortion happens when theres an issue with the gear used to observe or measure.

This kind of bias tends to skew the data in a particular direction.

The algorithm would be trained on image data that systematically failed to represent the environment it will operate in.

This kind of bias cant be avoided simply by collecting more data.

Algorithm bias

This final key in of bias has nothing to do with data.

In fact, this bang out of bias is a reminder that bias is overloaded.

In machine learning, bias is a mathematical property of an algorithm.

The counterpart to bias in this context is variance.

Models with high variance can easily fit into training data and welcome complexity but are sensitive to noise.

Importantly, data scientists are trained to arrive at an appropriate balance between these two properties.

Data scientists who understand all four types of AI bias will produce better models and better training data.

AI algorithms are built by humans; training data is assembled, cleaned, labeled and annotated by humans.

Also tagged with#