Projects
Ramayana Authoritative Edition
Insight Publica Research Team
A multilingual critical edition of the Vālmīki Rāmāyaṇa comprising 493 sargas aligned across English (Griffith, 1870) and a new Malayalam translation. Textual discrepancies, anachronistic terms, and translator omission notes systematically identified and corrected through human QC. Parallel corpus formatted for AI training and NLP research.
Milestones
493 sargas · EN + ML aligned
Human QC: systematic review underway
Auto-QC script: 265 issues identified, 223 auto-fixed
HuggingFace upload: pending QC completion
Parallel Corpus Initiative
Insight Publica Data Team
Building a systematic multilingual parallel corpus of classical and world literary texts for AI training data licensing and NLP research. The initiative targets 16+ languages per text, with Malayalam as anchor translation. Corpora are formatted as JSONL datasets hosted on HuggingFace.
Milestones
1001 Nights: complete · Arabic + Malayalam · 1,001 nights
Bhagavad Gita: complete · 8 Indian languages · 674 verses
Panchatantra: complete · 16 languages · 2,273 segments
Constitution of India: complete · EN + ML · 383 articles
Ramayana: QC in progress
Mahabharata Critical Edition
Insight Publica
The Mahābhārata project will follow the Ramayana model — systematic alignment of Sanskrit, English, and Malayalam translations with scholarly apparatus. Given the scale (100,000 verses), this is a multi-year initiative.
Milestones
Source text selection: under review
Translation team: forming
Pipeline: inherits from Ramayana infrastructure