TurboQuant: Inner Workings and Implications
Google's year old research resurfaced in a recent blog post and spooked memory investors.
Google’s blog post on a quantization technique called TurboQuant caused a sharp selloff in memory stocks yesterday in the fear that KV-cache usage will drop significantly, and that memory will no longer be a concern. The actual method was published nearly a year ago, but Google’s resurfacing of its prior research is what is causing the jitters.
This is not the first time that efficiency improvements, or niche use cases have triggered a selloff in the memory sector. The announcement of SRAM accelerators worried investors that HBM will no longer be useful, for example. Investors in memory have seen cyclical booms and busts, and their trigger happiness to move stocks is not without reason. Given the run up in memory stocks, everyone is looking for the top.
A few people have asked me to explore TurboQuant in a bit more depth. TurboQuant is a 2-stage algorithm involving PolarQuant and QJL, that allows KV-cache to physically occupy fewer bits and thus allow for longer context windows in LLMs.
In this short note, we’ll go through TurboQuant briefly and tie it to what it means for the industry.

