TF3: Compact Romanian LMs on Synthetic Microfiction

The TF3 paper is now on arXiv: TF3-RO-50M: Training Compact Romanian Language Models from Scratch on Synthetic Moral Microfiction. This is the third and final piece of the TinyFabulist pipeline, and it brings together everything from TF1 and TF2.

What TF3 Demonstrates

The core question TF3 asks is: can a small language model trained exclusively on synthetic literary text learn to generate coherent Romanian prose? The answer, with caveats, is yes.

The model is a 51-million-parameter decoder-only transformer trained on the high-quality Romanian translations from the TF2 corpus. After training, it can:

Generate grammatically correct Romanian sentences with proper diacritic usage
Produce narrative text with recognizable fable structure (characters, conflict, resolution)
Follow prompted constraints (e.g., “write a fable about a fox and a bear in a forest”)

What it cannot do – and this is important – is match the quality of models 100x its size. The generated text is sometimes repetitive, the morals can be shallow, and longer outputs tend to lose coherence. These are expected limitations for a model of this size.

Model Compression

The paper also explores model compression. The 51M-parameter model is compressed to approximately 26M parameters using a combination of pruning and knowledge distillation. The compressed model retains most of the generation quality while being significantly more practical to deploy on resource-constrained devices.

Why This Matters

TF3 is not trying to beat GPT-4 at writing Romanian fables. Its contribution is scientific:

Controlled data attribution. Because the model trained exclusively on synthetic data with known provenance, we can study what the model learned and attribute its capabilities (and failures) to specific properties of the training data.

Evidence for synthetic data utility. TF3 shows that synthetic literary text, generated by larger models and quality-filtered through the TF2 pipeline, is sufficient to train a functional (if limited) language model. This has implications for low-resource language modeling, where large natural corpora may not exist.

A complete pipeline. TF1 generates English fables, TF2 translates them to Romanian with quality evaluation, and TF3 trains a Romanian model on the result. Each stage is reproducible, and the full pipeline demonstrates a path from structured specifications to a deployable model.

What Is Next

With TF3 complete, the three pillars of my thesis are in place. The remaining work focuses on evaluation – specifically, building a unified evaluation framework that works across all three tasks and studying whether open-weight judge panels can replace proprietary evaluation APIs. That work is underway, and I will share more about it soon.