NLP

Notes on LoRA Fine-Tuning and Local LLM Inference

Ddebwashis
·Jul 8, 2025·8 min read
datalanguagemodelnlptokenstextvectorvocab

What I learned running open-source LLMs locally with Ollama and llama.cpp, with LoRA adapters on top.

Why bother

Local inference is private, cheap once you've paid for the GPU, and lets you iterate quickly on adapters.

Tooling

`ollama` for serving, `llama.cpp` for low-level, `peft` for LoRA.

This is paragraph 1. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.

This is paragraph 2. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.

Stay Updated

Subscribe to get the latest blog posts, project updates, and data science insights straight to your inbox.

No spam. Unsubscribe anytime.