How useful is your AI?

Its not a simple question to answer.

If youre an AI developer, theres a good chance you think the answer is: benchmarks.

What AI researchers can learn from the NFL Combine

But thats not the whole story.

It’s free, every week, in your inbox.

To do that, we use a benchmark.

Article image

Basically, we grab a bunch of pictures of cats and dogs and we label them correctly.

Then we hide the labels from the AI and ask it to tell us whats in each image.

If it scores 9 out of 10, its 90% accurate.

If we think 90% accurate is good enough, we can call our model successful.

If not, we keep training and tweaking.

It wouldnt be very useful outside of the benchmark leaderboards.

However, an AI capable of labeling all the objects in any given image would be very useful.

But theres no universal benchmark for labeling objects.

And that means any benchmark measuring how good an AI is at labeling images is an arbitrary one.

Does it matter what the categories are?

The Combine was a place where NFL scouts could gather to judge player performance at the same time.

Not only did this save time and money, but it also established a universal benchmark.

Of course, there are no guarantees in sports.

However, the Combine is just a small part of the scouting process.

Ideally, benchmarking in the AI world would simply represent the first round of rigor.

Also tagged with