URE
TM
About
Resilience Engineering
Articles
Search
Mamba Hybrid Model KV Cache
Why KV-cache capacity on Mamba-attention hybrid models is bound by the Mamba state, not attention KV, and what that means for serving and capacity planning.