New Stanford Compute-In-Memory Chip Promises to Bring Efficient AI to Low-Power Devices

MMS Founder
MMS Sergio De Simone

Article originally posted on InfoQ. Visit InfoQ

In a paper recently published in Nature, Stanford researchers presented a new compute-in-memory (CIM) chip using resistive random-access memory (RRAM) that promises to bring energy efficient AI capabilities to edge devices.

The fundamental idea behind the new chip is eliminating the separation between the compute and memory units, thus enabling AI processing within the memory itself. This makes it possible to eliminate data transfers between memory and CPU and increase the amount of predictive work that can be carried through with limited battery power.

Dubbed NeuRRAM, the Stanford chip stores model weights in a dense, analogue non-volatile RRAM device. This approach is not new, say the researchers, but such devices must still prove they meet their promise.

Although efficiency, versatility and accuracy are all indispensable for broad adoption of the technology, the inter-related trade-offs among them cannot be addressed by isolated improvements on any single abstraction level of the design.

To address those conflicting requirements, NeuRRAM provides high versatility in reconfiguring its cores to better adapt to diverse model architectures, according to the researchers. Additionally, energy efficiency is twice that of previous RRAM CIM chips with accuracy comparable to existing software models across various AI tasks:

We report fully hardware-measured inference results for a range of AI tasks including image classifications using CIFAR-10 and MNIST datasets, Google speech command recognition and MNIST image recovery, implemented with diverse AI models including convolutional neural networks (CNNs), long short-term memory (LSTM) and probabilistic graphical models.

Admittedly, those are not the largest models in use nowadays, but it should not be overlooked that NeuRRAM is addressing applications running on the Edge or on low-power devices. Typical applications for such devices are wake-word detection for voice-enabled devices and human detection for security cameras. In the future, though, with the further evolution of this technology, “it could power real-time video analytics combined with speech recognition all within a tiny device”, says Weier Wan, first author of the paper.

The NeuRRAM chip is currently fabricated using a 130-nm CMOS technology and Stanford researchers are at work to understand how they can scale this technology while keeping it energy efficient and reducing latency. Their hope is that scaling the current design from 130-nm to 7-nm technology could deliver higher energy and area efficiency than today’s state-of-the-art edge inference accelerators.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.