Debwashis Borman

What I learned running open-source LLMs locally with Ollama and llama.cpp, with LoRA adapters on top.

Why bother

Local inference is private, cheap once you've paid for the GPU, and lets you iterate quickly on adapters.

Tooling

`ollama` for serving, `llama.cpp` for low-level, `peft` for LoRA.

This is paragraph 1. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.

This is paragraph 2. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.

Notes on LoRA Fine-Tuning and Local LLM Inference

Why bother

Tooling

Related Posts

Three Lessons from Multitask BERT

Stay Updated