Research

Projects

Corpus · Critical Edition

● Active

Ramayana Authoritative Edition

Insight Publica Research Team

A multilingual critical edition of the Vālmīki Rāmāyaṇa comprising 493 sargas aligned across English (Griffith, 1870) and a new Malayalam translation. Textual discrepancies, anachronistic terms, and translator omission notes systematically identified and corrected through human QC. Parallel corpus formatted for AI training and NLP research.

Milestones

493 sargas · EN + ML aligned

Human QC: systematic review underway

Auto-QC script: 265 issues identified, 223 auto-fixed

HuggingFace upload: pending QC completion

Infrastructure · AI Data

● Active

Parallel Corpus Initiative

Insight Publica Data Team

Building a systematic multilingual parallel corpus of classical and world literary texts for AI training data licensing and NLP research. The initiative targets 16+ languages per text, with Malayalam as anchor translation. Corpora are formatted as JSONL datasets hosted on HuggingFace.

Milestones

1001 Nights: complete · Arabic + Malayalam · 1,001 nights

Bhagavad Gita: complete · 8 Indian languages · 674 verses

Panchatantra: complete · 16 languages · 2,273 segments

Constitution of India: complete · EN + ML · 383 articles

Ramayana: QC in progress

Forthcoming

● Planning

Mahabharata Critical Edition

Insight Publica

The Mahābhārata project will follow the Ramayana model — systematic alignment of Sanskrit, English, and Malayalam translations with scholarly apparatus. Given the scale (100,000 verses), this is a multi-year initiative.

Milestones

Source text selection: under review

Translation team: forming

Pipeline: inherits from Ramayana infrastructure