Memory that computes: closed-loop in-memory computing from Politecnico di Milano and the next frontier of AI

The next big leap in AI may come less from new algorithms than from how we move electrons. At Politecnico di Milano, a chip prototype performs analog computation directly in memory: RAM that not only stores, but processes, closing the loop without shuttling results back to the CPU at every step. This closed-loop in-memory approach targets the Von Neumann bottleneck—the constant, energy-hungry traffic between processor and memory.

Modern AI workloads hinge on matrix products and vector ops. The Milan team’s in-memory design aims to accelerate exactly those patterns, trimming bus traffic and power. The upside is faster training and thriftier inference, with both economic and climate dividends. No surprise research interest is tilting toward compute-in-memory, neuromorphic devices and analog schemes: AI’s cost curve now outpaces marginal gains.

If the approach lands, between 2026 and 2028 we could see in-memory accelerators in data centers and at the edge. Likely scenarios include hybrid clusters where NPUs/GPUs handle general compute while memory banks perform in-situ MACs, or edge nodes running local inference on tight energy budgets. In industry, it boosts low-latency robotics and automotive; in health and finance, it lowers per-query costs while improving privacy by moving fewer data.

The field is broader: LPUs, NPUs, iGPUs, and unconventional avenues like thermodynamic or quantum logic. In-memory has a pragmatic edge: it speaks the matrix language at AI’s core. Challenges remain—noise, process variation, drift, limited linearity in analog; the need for calibration, mature toolchains and open standards; lifetime reliability under heavy duty to win hyperscaler trust.

Energy-wise, the impact could be substantial. Data motion dominates AI power; cutting it can curb consumption. Coupled with data center efficiency policies and renewables, this could help contain AI’s carbon footprint. For cloud providers, every efficiency point chips at TCO; for developers, it democratizes larger models on accessible hardware.

What’s next? Near-term: prototypes on standard benchmarks (matmul, conv, attention) with accuracy vs. energy tradeoffs published. Within a few product cycles: smart DIMMs or interposers bringing compute closer to HBM. If EDA and frameworks (PyTorch, JAX) expose tuned backends, adoption friction falls. Validation on real-world data and compatibility with quantization and pruning will be pivotal.

The subtext: AI can’t scale indefinitely on conventional FLOPS. Architectural innovation is back at center stage. The Milan effort joins a global move to compute where data lives. If results hold, closed-loop in-memory computing could become a standard pillar alongside GPUs and NPUs—aligning performance with sustainability.

Sources:

Politecnico di Milano: https://www.polimi.it
IEEE Xplore: https://ieeexplore.ieee.org
Nature Electronics: https://www.nature.com/natildelectron

Sources:

Politecnico di Milano: https://www.polimi.it
IEEE Xplore: https://ieeexplore.ieee.org
Nature Electronics: https://www.nature.com/natildelectron