Real-Time Deep Learning at the Edge

May 26, 2025

One of the major driving forces behind the push to run artificial intelligence models on-device is the reduction in latency that this approach can offer. When relying on remote data centers, there will always be network latency involved. This latency can be unpredictable at times, and it prevents applications from running in real-time.

Of course this move is not as easy as deploying the same model that runs on a cluster of GPUs to a microcontroller with a few tens of kilobytes of memory. The model must first be reduced in size and optimized to run on the less powerful platform. But too much trimming will make the model’s performance unacceptable, so only just so much can be done. Many times it is not enough, which means that the new platform will be spending too many processing cycles on inferences, bringing excessive latency back into the picture.

An overview of msf-CNN (📷: Z. Huang et al.)

That brings us right back to the problem we started with, so it just won’t do. In response researchers have proposed a technique called patch-based layer fusion to speed up deep learning algorithms on resource-constrained hardware platforms. These methods operate on small windows (or patches) of the input data at any given time. They also fuse together operations from multiple layers of a neural network to simplify processing. Taken together, these optimizations speed up inferences and reduce memory utilization.

Improving on this approach, a pair of researchers at Freie Universität Berlin and Inria have developed what they call msf-CNN. Using this method, convolutional neural networks can be tuned for optimal processing speed and memory utilization. These optimizations make real-time execution of accurate models possible on even highly-constrained hardware.

The msf-CNN technique builds on patch-based fusion by applying a graph-based search algorithm to determine the best way to fuse layers in a convolutional neural network. By modeling the network’s structure as a directed acyclic graph, the researchers can explore the entire fusion solution space, identifying configurations that minimize either peak RAM usage or compute cost. This graph-based search strategy enables msf-CNN to outperform previous solutions like MCUNetV2 and StreamNet in both flexibility and efficiency.

A neural network modeled as a directed acyclic graph (📷: Z. Huang et al.)

To make this technology practical for real-world applications, the team implemented msf-CNN on a range of commercially available microcontrollers, including Arm Cortex-M, RISC-V, and ESP32 platforms. They also introduced enhancements to global pooling and dense layer operations, further reducing RAM consumption without adding compute overhead. Testing revealed that RAM utilization could be reduced by as much as 50% when compared with previous techniques.

The source code for msf-CNN is publicly available on GitHub. Given the number of platforms that are already supported, and the wide range of applications that msf-CNN can be applied to, this work could make a big impact in the world of tiny hardware.

iPhone 17 Pro Max rumor: thicker, video demonstration

May 28, 2025

Vincent

Purported iPhone 17 Pro Max dummy A new video of a dummy of the forthcoming iPhone 17 Pro Max has surfaced, backing up previous rumors that the model will be…

Apple

Apple wants to fix Siri in iOS 19, here’s how

May 28, 2025

Vincent

It goes without saying, Apple’s approach to developing its AI-infused version of Siri has gone very wrong in many ways. Most of the features intended to ship alongside iOS 18…

Real-Time Deep Learning at the Edge

Leave a Reply Cancel reply

iPhone 17 Pro Max rumor: thicker, video demonstration

Apple wants to fix Siri in iOS 19, here’s how

Strawberry Festival Garden Grove: California’s Sweetest Sustainable Celebration

Building clean does not need to break the bank

Airlines face tough choices as emissions-scheme compliance cost rises

Floating Offshore Wind Turbines Get A Vertical Axis Makeover

TouchTone555: An Analog Synth with a 555 Timer

Customer Experience is Ready to Engage at Cisco Live

ZiMad launches World Hunger Day event in its flagship games

A new sodium metal fuel cell could help clean up transportation

How to Fix Analog FPV Video Issues: Noise, Scrolling, Power, Signal, and Compatibility

Strawberry Festival Garden Grove: California’s Sweetest Sustainable Celebration