Bias and Ambiguity in Facial Expression Datasets

This experimental research examines an open-source facial expression image dataset, analyzes it, and addresses questions about bias and ambiguity.

Is it possible for datasets to be unbiased? Do facial expression datasets simply reinforce existing assumptions about what different expressions mean?

Introduction

The research phase was iterative and based on experimenting with visual elements and examining their connections.

This exploratory research centered on the idea of working with visual corpora, investigating recurring motifs, and examining relations within collections of images. The project examined the layers of meaning in facial expressions and questioned the assumptions about what these images represent.

Details of the Dataset

The dataset I used is open-source and available on Hugging Face. It contains a total of 800 images, with each of the following emotion categories represented by 100 images:

- Happy
- Anger
- Sad
- Contempt
- Disgust
- Fear
- Surprise
- Neutral

Each image is labeled with one of these emotion categories. I selected this dataset because it appeared relatively diverse in terms of age, race, and gender representation, to explore potential biases and assumptions in facial expression data.

800

images in total

8

emotion categories

1

label per image

Process Overview

During this exploratory process, I conducted several analyses and experiments. The structure of the analysis is outlined on the side.

Each section is explained below, and
the results from one session will lead
to the next.

Computational Image Analysis

Average Image Analysis

"Images are made up of pixels, which are data points that represent the relationships between different parts of the image."

In my dataset, there are 800 images of the same size, each belonging to different individuals. I wanted to create an average image for each emotion category. To do this, I used Google Colab to run the code that overlaps the images.

  • The average image(on the right) reveals consistent facial features due to uniform image sizes, with eyes, nose, and mouth typically positioned in similar locations. Identifying gender in the images becomes challenging.

  • The "sad" expression appears older, while "surprise" resembles a younger person, emphasizing a closer examination of each image for dataset diversity.

  • It becomes difficult to assign a single label to the average images without their "original" labels.

Color Analysis

I ran color analysis code in Google Colab, generating graphs based on the mean RGB values of each image. The plot patterns reflect color similarities rather than structural or expressive features, with images of similar dominant colors (like skin tones, lighting, or backgrounds) clustering together.

The color analysis allowed me to visualize all the images in one graph and provided an opportunity to compare them side by side.

  • The surprise dataset contains more images of younger individuals, which makes the average "surprise" face look younger.

  • Surprise and fear expressions are very similar, raising the question: Can a single face represent only one emotion, and if so, based on what criteria?

  • The surprise dataset contains more images of younger individuals, which makes the average "surprise" face look younger.

  • Surprise and fear expressions are very similar, raising the question: Can a single face represent only one emotion, and if so, based on what criteria?

  • The surprise dataset contains more images of younger individuals, which makes the average "surprise" face look younger.

  • Surprise and fear expressions are very similar, raising the question: Can a single face represent only one emotion, and if so, based on what criteria?

Analysis & Experiments

"Some basic human emotions (happiness, sadness, anger, fear, surprise, disgust and contempt) are innate and shared by everyone, and that they are accompanied across cultures by universal facial expressions."

Paul Ekman

Paul Ekman suggests that each basic emotion has a corresponding, universal facial expression. This means that regardless of cultural background, people are likely to display emotions in similar ways through facial cues.

However, analysing this this dataset showed me that facial expressions are inherently ambiguous, making it challenging to generalize them. Distinguishing between certain emotions, like fear and surprise, is particularly difficult because their facial cues are quite similar. Key features that contribute to these expressions include the eyes, eyebrows, and mouth, with the eyebrows playing a crucial role in conveying specific emotions. This raises an interesting question:

Which facial features do we focus on when interpreting an emotion?

CodeChart Experiment

To experiment on how people focus on facial features to interpret emotions, I used an open-source eye-tracking code to track participants' attention on images from my dataset. I selected one image for each emotion category and created an experiment in p5.js. The experiment flow is outlined below.

Experiment can be found here.

The experiment involved six participants selected through a snowballing method. Participants were provided with a link and instructed to complete the questionnaire using their laptops. Subsequently, the results were downloaded and overlaid using the Code Chart Visualizer.

The user responses were gathered and categorized under the assigned labels within the dataset. The resulting image depicts the areas where users directed their attention, with emotions they recognized represented by post-it notes. Notably, the green post-it notes indicate instances where users' recognitions matched the assigned labels.

Analysis of the Experiment

The experiment involved six participants selected through a snowballing method. Participants were provided with a link and instructed to complete the questionnaire using their laptops. Subsequently, the results were downloaded and overlaid using the Code Chart Visualizer.

The user responses were gathered and categorized under the assigned labels within the dataset. The resulting image depicts the areas where users directed their attention, with emotions they recognized represented by post-it notes. Notably, the green post-it notes indicate instances where users' recognitions matched the assigned labels.

The analysis showed a general tendency for participants to focus on the "left eye" across emotions and highlighted influences like background brightness on attention. However, it was difficult to draw concrete conclusions from the eye-tracking data, as my small sample size didn’t yield any significant implications.

While some emotions (like disgust, happiness, and sadness) were easier to identify, others remained ambiguous, underscoring the challenge of assigning a single, definitive emotion to an image.

GenAI Experiments

Since the data from my sample didn’t yield concrete results but highlighted the ambiguity in assigning emotions to specific facial expressions, I became curious to see how a GenAI model (ChatGPT-4) would interpret emotions in these images and which facial features it would consider most relevant. I ran a quick experiment, and here is one of the results.

Prompt: I’m going to upload some pictures, and I’d like you to tell me which emotions you see in them and which facial features led you to interpret them that way.

Main Arguments

  • AI will always be biased, because humans are biased.


Image datasets inherit biases from the data and labeling processes, which reflect the biases of human creators and data sources. It’s critical to approach datasets with a thorough, critical perspective, recognizing that both images and labels can introduce bias.


  • Assigning a single emotion to images and anticipating agreement

among individuals is challenging.

Human interpretations of facial expressions vary widely, and even people sometimes struggle to assign clear emotions to specific expressions. Training AI to consistently recognize and categorize complex expressions is challenging, as one expression may convey multiple emotions simultaneously.


  • It is not realistic to expect ML models to recognize facial expressions “correct” due to their nature of ambiguity.

Emotions are nuanced, complex, and often ambiguous. Reducing them to single labels oversimplifies the richness of human emotional experiences and overlooks the subjective nature of emotional interpretation. Context, cultural differences, and individual variations heavily influence how emotions are expressed and perceived.

  • AI will always be biased, because humans are biased.


Image datasets inherit biases from the data and labeling processes, which reflect the biases of human creators and data sources. It’s critical to approach datasets with a thorough, critical perspective, recognizing that both images and labels can introduce bias.


  • Assigning a single emotion to images and anticipating agreement

among individuals is challenging.

Human interpretations of facial expressions vary widely, and even people sometimes struggle to assign clear emotions to specific expressions. Training AI to consistently recognize and categorize complex expressions is challenging, as one expression may convey multiple emotions simultaneously.


  • It is not realistic to expect ML models to recognize facial expressions “correct” due to their nature of ambiguity.

Emotions are nuanced, complex, and often ambiguous. Reducing them to single labels oversimplifies the richness of human emotional experiences and overlooks the subjective nature of emotional interpretation. Context, cultural differences, and individual variations heavily influence how emotions are expressed and perceived.

  • AI will always be biased, because humans are biased.


Image datasets inherit biases from the data and labeling processes, which reflect the biases of human creators and data sources. It’s critical to approach datasets with a thorough, critical perspective, recognizing that both images and labels can introduce bias.


  • Assigning a single emotion to images and anticipating agreement

among individuals is challenging.

Human interpretations of facial expressions vary widely, and even people sometimes struggle to assign clear emotions to specific expressions. Training AI to consistently recognize and categorize complex expressions is challenging, as one expression may convey multiple emotions simultaneously.


  • It is not realistic to expect ML models to recognize facial expressions “correct” due to their nature of ambiguity.

Emotions are nuanced, complex, and often ambiguous. Reducing them to single labels oversimplifies the richness of human emotional experiences and overlooks the subjective nature of emotional interpretation. Context, cultural differences, and individual variations heavily influence how emotions are expressed and perceived.

If avoiding bias completely isn’t possible, how can we use it responsibly and for the benefit of humans? What is the line?

All knowledge can be data, however some of them are implicit. How can we make implicit knowledge clear and tangible?

Do we really need AI to recognize human emotions? What is the value of automating such personal and subjective interpretations?