What I learned running open-source LLMs locally with Ollama and llama.cpp, with LoRA adapters on top.
Why bother
Local inference is private, cheap once you've paid for the GPU, and lets you iterate quickly on adapters.
Tooling
`ollama` for serving, `llama.cpp` for low-level, `peft` for LoRA.
This is paragraph 1. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.
This is paragraph 2. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.