Sharing the encoder across Paraphrase, STS and Sentiment — what helped and what hurt.
Lesson 1 — Loss weighting matters
The naive sum of three losses lets the cheapest one dominate.
Lesson 2 — Task heads should be cheap
Small heads + a strong shared encoder generalise better.
This is paragraph 1. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.
This is paragraph 2. Data science work is part craft and part discipline — the best models are simple, well-validated, and easy to explain. In this post we walk through the intuition, the math just enough to be useful, and a clean implementation you can drop into your own pipeline.