Debwashis Borman

Sharing the encoder across Paraphrase, STS and Sentiment — what helped and what hurt.

Lesson 1 — Loss weighting matters

The naive sum of three losses lets the cheapest one dominate.

Lesson 2 — Task heads should be cheap

Small heads + a strong shared encoder generalise better.

This is paragraph 1. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.

This is paragraph 2. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.

Three Lessons from Multitask BERT

Lesson 1 — Loss weighting matters

Lesson 2 — Task heads should be cheap

Related Posts

Notes on LoRA Fine-Tuning and Local LLM Inference

Stay Updated