TinyML is bringing deep learning models to microcontrollers

Deep learning models owe their initial success to large servers with large amounts of memory and clusters of GPUs.

The promises of deep learning gave rise to an entire industry of cloud computing services for deep neural networks.

Why TinyML?

TinyML is bringing deep learning models to microcontrollers

40% off TNW Conference!

Many applications require on-gear inference.

For example, in some parameters, such as drone rescue missions, internet connectivity is not guaranteed.

edge-ai-hardware-ic

And the delay because of the roundtrip to the cloud is prohibitive for applications that require real-time ML inference.

All these necessities have made on-equipment ML both scientifically and commercially attractive.

Your iPhone now runs facial recognition and speech recognition on unit.

Architecture of convolutional neural network (CNN)

Your Android phone can run on-rig translation.

Your Apple Watch uses machine learning to detect movements and ECG patterns.

But they have also been made possible thanks to advances in hardware.

MobileNetV2 is a neural network that has been optimized for edge devices. But it has a memory peak of about 1.4 megabytes, which makes it inaccessible to many microcontrollers.

Our smartphones and wearables now pack more computing power than a server did 30 years ago.

Some even have specialized co-processors for ML inference.

At the same time, they dont have the resources found in generic computing devices.

Image classification ML models (orange line) are more resilient to resolution reduction than object detection models (blue line)

Most of them dont have an operating system.

They mostly dont have a mains electricity source and must run on cell and coin batteries for years.

Therefore, fitting deep learning models on MCUs can pop kick open the way for many applications.

Left: Classic deep learning systems computer an entire layer simultaneously Right: MCUNetV2 computes one patch at a time, reducing the memory requirements of DL inference

However, most of these efforts are focused on reducing the number of parameters in the deep learning model.

The problem with pruning methods is that they dont address the memory bottleneck of the neural networks.

To execute the model, a gear will need as much memory as the models peak.

MCUNetV2 reduces the memory peak of deep learning models by up to 8x

Another approach to optimizing neural networks is reducing the input size of the model.

A smaller input image requires a smaller CNN to perform prediction tasks.

However, reducing the input size presents its own challenges and is not efficient for all computer vision tasks.

Redistribution of receptive fields helps reduce the computation overhead of MCUNetV2 by more than two-thirds

MCUNetV2 builds onprevious work by the same group, which was accepted and presented at the NeurIPS 2020 conference.

The researchers experiments show that MCUNetV2 can reduce the memory peak by a factor of eight.

The memory-saving benefits of patch-based inference come with a computation overhead tradeoff.

To overcome this limit, the researchers redistributed the receptive field of the different blocks of the online grid.

In CNNs, the receptive field is the area of the image that is processed at any one moment.

Larger receptive fields require larger patches and overlaps between patches, which creates a higher computation overhead.

The researchers tested the deep learning architecture in different applications on several microcontroller models with small memory capacity.

The researchers display MCUNetV2 in action using real-time person detection, visual wake words, and face/mask detection.

Our ability to capture data from the world has increased immensely thanks to advances in sensors and CPUs.

As Warden argued, processors and sensors are much more energy-efficient than radio transmitters such as Bluetooth and wifi.

The physics of moving data around just seems to require a lot of energy.

[Its] obvious theres a massive untapped market waiting to be unlocked with the right technology.

This is the gap that machine learning, and specifically deep learning, fills.

Thanks to MCUNetV2 and other advances in TinyML, Wardens forecast is fast turning into a reality.

to enable applications that were previously impossible.

you could read the original articlehere.

Why TinyML?#

Also tagged with#

Why TinyML?

Also tagged with