As the demand for local AI processing grows, so does the strain on system memory. Lexar is addressing this challenge with a novel approach: offloading parts of large language models (LLMs) from DRAM to NAND Flash, significantly reducing memory requirements while maintaining functionality.

This shift is driven by a stark reality in hardware manufacturing: DRAM costs roughly six times more than NAND Flash. By leveraging Lexar's AI Storage Core SSD, users could run models with up to 122 billion parameters on as little as 32 GB of RAM, a stark contrast to traditional setups that would require 128 GB or more.

Performance is not without its tradeoffs. While the technology enables larger models to operate within constrained memory, latency increases notably with heavier workloads. For example, time-to-first-token (TTFM) can reach up to 8 seconds in larger context windows, and throughput drops significantly compared to pure DRAM configurations. However, Lexar's solution still outperforms traditional frameworks when RAM is limited.

Lexar AI Storage Core: A New Path for Local AI Workloads
  • Key specs:
  • DRAM reduction: Up to 40% less memory needed for large LLMs
  • Model support: Compatible with models up to ~400 billion parameters (with performance tradeoffs)
  • Throughput: 15.6 tokens per second on a 32 GB RAM setup with Qwen 3.5 122B model
  • Latency: TTFM ranges from 2 seconds (smaller contexts) to 8 seconds (larger contexts)
  • Form factor: M.2 SSD with PCIe Gen 4 or Gen 5 support, featuring a custom Storage Processing Unit (SPU) for DRAM-less control

The AI Storage Core SSD is designed to integrate seamlessly into existing systems, particularly in mini-PCs and desktops where space is limited. Lexar's concept involves a hot-swappable M.2 slot encased in a metal jacket, eliminating additional overhead while providing direct connectivity to the processor or chipset. This design, combined with the SPU, allows for precise data movement without relying on traditional DRAM.

For users balancing cost and performance, this approach could offer a viable alternative to high-RAM configurations. While larger models will always face latency challenges when offloaded to NAND, the potential cost savings—especially as AI workloads become more common—make it an intriguing development. Whether this becomes a mainstream solution remains to be seen, but Lexar's innovation points to a future where SSDs play a more central role in AI processing.