Last year,Googlebuilt a dataset called GoEmotions.
It’s free, every week, in your inbox.
Why would you present a comment with no additional metadata?

The subreddit and parent post its replying to are especially important context.
Imagine you see the comment his traps hide the fucking sun by itself.
Would you have any idea what it means?

Probably not maybe thats why Google mislabeled it.
But what if you were told it came from the /r/nattyorjuice subreddit dedicated to bodybuilding?
Would you realize, then, that traps refers to someones trapezoid muscles?
The problem
This kind of data cant be properly labeled.
There are no shortcuts to gleaning insight into human communications.
Were not stupid like machines are.
Sometimes its right, sometimes its wrong, and theres no way to be sure one way or another.
This particular kind of AI development is a grift.
And its one of the oldest in the book.
The reason its a grift is because you dont need AI to match keywords to labels.
Hell, you could do that in Microsoft Excel 20 years ago.
A bit deeper
You know the dataset the AI was trained on contains mislabeled data.
Were not trying to find all the cars that are red in a dataset of automobile images.
Were making determinations about human beings.
If the AI screws up and misses some red cars, those cars are unlikely to suffer negative outcomes.
And if it accidentally labels some blue cars as red, those blue cars should be okay.
But this particular dataset is specifically built for decision-making related to human outcomes.
Again, we know for a fact that any AI model trained on this dataset will produce erroneous outputs.
Thats something humans-in-the-loop cannot help with.
It would require a person to review every single file thatwasntselected.
Whether its legal to do so or not is irrelevant.
Final thoughts
Googles researchers arent stupid.
it’s possible for you to draw your own conclusions as to their motivations.