Ollama on RTX 5090 (32GB)
RTX 5090 changes local inference mainly through 32GB capacity: you can keep larger models and larger context on GPU more often than 24GB systems. Raw speed helps, but staying fully on GPU is still the primary predictor of user-perceived performance.
The core mindset on 5090 is still budget management: weights + KV cache + overhead. The card is fast enough that when it slows down, it is usually because fit was lost, not because the GPU is weak.