In other words, theyre teaching AI how to hear and see at the same time.

Up front:Youve probably heard about transformer AI systems such as GPT-3.

At their core, they process and categorize data from a specific kind of media stream.

Google’s teaching AI to ‘see’ and ‘hear’ at the same time — here’s why that matters

Youd need a model thats been trained on videos and another model thats been trained on audio clips.

It’s free, every week, in your inbox.

The researchers dub their system PolyVit.

Article image

Quick take:This could be a huge deal for the business world.

One of the biggest problems facing companies hoping to implement AI stacks is compatibility.

There are literally hundreds of machine learning solutions out there and there are no guarantees theyll work together.

A paradigm where multimodal systems become the norm would be a godsend for weary administrators.

But it is a great step towards a one-size-fits all classification system, and thats pretty exciting stuff.

Also tagged with