I am excited to share that my first preprint as a PhD student is now on arXiv, and the accompanying dataset is live on HuggingFace. The paper is TF1-EN-3M: Three Million Synthetic Moral Fables from Open Language Models, and the dataset is available at klusai/ds-tf1-en-3m.

What We Released

TF1-EN-3M is a dataset of three million English moral fables, generated by multiple open-weight language models from structured YAML story specifications. Each fable in the dataset comes with its full provenance: the story elements that were specified, the prompt that was constructed, the model that generated it, and the generation parameters used.

The key properties of the dataset:

  • 3 million fables from models across multiple families (Llama, Qwen, Mistral, Phi, and others)
  • Structured provenance – every fable is traceable to its specification
  • Multi-dimensional evaluation scores for grammar, creativity, moral clarity, and adherence to the prompt
  • Fully reproducible – all generation code is open source

The Evaluation Framework

One contribution I am particularly proud of is the evaluation approach. Rather than relying on a single automated metric, each fable is scored across four dimensions on a 1-10 scale:

  • Grammar: Correctness and fluency of the prose
  • Creativity: Originality of narrative choices
  • Moral Clarity: How clearly the intended moral comes through
  • Adherence: Fidelity to the specified characters, setting, and conflict

These scores are assigned by LLM-based judges using detailed rubrics. The multi-dimensional approach gives a much richer picture of generation quality than a single aggregate score would.

Lessons from a First Paper

Writing and submitting this paper taught me several things:

Dataset papers require meticulous documentation. Readers want to know exactly how the data was generated, what quality controls were applied, and what the known limitations are. I spent as much time on the datasheet and methodology sections as on the results.

HuggingFace has become essential infrastructure. Releasing a dataset on HuggingFace makes it immediately usable. Within the first week, the dataset had downloads from researchers I have never met. That kind of reach would have been impossible with a personal website or university repository alone.

Preprints change the pace of research. Posting on arXiv before formal peer review means the work is available to the community immediately. The trade-off is that you are publishing work that has not yet been externally validated, which requires confidence in your methodology.

What Comes Next

TF1 establishes the generation and evaluation pipeline for English fables. The next step is extending this to cross-lingual work – specifically, translating the fable corpus to Romanian and studying what happens when literary translation meets open models. That project, which I am calling TF2, is already underway.