Ollama on 24GB GPUs (RTX 3090 / 4090)
24GB is a major local-LLM threshold: enough for stronger models and serious context windows while staying on a single consumer GPU. The biggest gains come from explicit context control, not leaving every model at the same default.
The qualitative jump from 16GB is that context becomes a real working tool, not just a risk to minimize. You can keep richer chat history and larger prompt packs without immediately forcing offload.