|
| 1 | +--- |
| 2 | +title: "DMP '25 Final Report by Aman Chadha" |
| 3 | +excerpt: "Final Report for the project Modernizing Music Blocks’ i18n with AI-Assisted Translation" |
| 4 | +category: "DEVELOPER NEWS" |
| 5 | +date: "2025-09-17" |
| 6 | +slug: "2025-09-17-dmp-25-aman-chadha-final-report" |
| 7 | +author: "@/constants/MarkdownFiles/authors/aman-chadha.md" |
| 8 | +tags: "dmp25,sugarlabs,finalreport,aman-chadha" |
| 9 | +image: "assets/Images/c4gt_DMP.png" |
| 10 | +--- |
| 11 | + |
| 12 | +<!-- markdownlint-disable --> |
| 13 | + |
| 14 | +# DMP '25 Final Report by Aman Chadha |
| 15 | + |
| 16 | +## Contributor Details |
| 17 | + |
| 18 | +**Name:** Aman Chadha |
| 19 | +**Email:** [aman.chadha.mmi@gmail.com](mailto:aman.chadha.mmi@gmail.com) |
| 20 | +**GitHub:** [AmanChadha](https://github.com/ac-mmi) |
| 21 | +**Organization:** [Sugar Labs](https://www.sugarlabs.org/) |
| 22 | +**Project:** [JS Internationalization with AI Translation Support](https://github.com/sugarlabs/musicblocks/pull/4731) |
| 23 | +**Mentors:** [Walter Bender](https://github.com/walterbender), [Devin Ulibarri](https://github.com/pikurasa) |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## Project Overview |
| 28 | + |
| 29 | +Music Blocks is a learning platform for children worldwide. Currently, it primarily supports **English, Japanese, and Spanish**, leaving many learners struggling when the platform is not in their native language. The goal of this project was to **modernize Music Blocks’ i18n system** and introduce an **AI-assisted translation workflow** to improve accessibility and engagement globally. |
| 30 | + |
| 31 | +Key problems addressed: |
| 32 | +- The legacy `webL10n.js` system lacked **modern i18n features**, including fallback strategies and JSON-based translation support. |
| 33 | +- UI strings often lacked context, leading to ambiguous or inaccurate translations. |
| 34 | +- Translators faced difficulty translating terms with multiple meanings, like "duck" (pitch vs. volume). |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## Project Objectives |
| 39 | + |
| 40 | +- Migrate from **webL10n.js to i18next** for modern, modular, and maintainable i18n. |
| 41 | +- Automate translation of missing strings using **AI with contextual awareness**. |
| 42 | +- Ensure a **contributor-friendly workflow**, where human translators can review AI suggestions. |
| 43 | +- Expand accessibility for new languages and improve adoption by educators worldwide. |
| 44 | + |
| 45 | +--- |
| 46 | + |
| 47 | +## Technical Approach |
| 48 | + |
| 49 | +### Framework Migration |
| 50 | + |
| 51 | +- **Why migration was needed:** |
| 52 | + - `webL10n.js` was outdated and lacked support for modern i18n features. |
| 53 | + - i18next supports **language-specific formatting**, flexible fallbacks, and **JSON-based translation files**. |
| 54 | + |
| 55 | +- **Process:** |
| 56 | + - Replaced `webL10n.js` references in the codebase with i18next API calls. |
| 57 | + - Added **fallback strategies**: cleaned text, lowercase, title case, hyphenated strings. |
| 58 | + - Incrementally tested migration to ensure existing UI remained functional. |
| 59 | + |
| 60 | +### Context-Aware Translation (RAG Model) |
| 61 | + |
| 62 | +- Extracted **code context** for each `msgid` by taking **5 lines above and below** and any developer comments. |
| 63 | +- Stored context snippets in **context_ui_full.json** with metadata: source file, line numbers, and snippet. |
| 64 | +- Indexed the JSON in **ChromaDB**, a vector database optimized for semantic search. |
| 65 | +- Built a **RAG model** to retrieve and analyze context, generating clear explanations for each string. |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +## AI-Assisted Translation Workflow |
| 70 | + |
| 71 | +### Workflow Steps |
| 72 | + |
| 73 | +1. **.PO to JSON Automation:** |
| 74 | + - Converted `.po` files to JSON using a Python script, enabling AI integration. |
| 75 | + |
| 76 | +2. **Translation with Context:** |
| 77 | + - Retrieved context using the RAG model. |
| 78 | + - Sent `msgid + context` to translation API for accurate translations. |
| 79 | + |
| 80 | +3. **Google Translate API Integration:** |
| 81 | + - Chose Google Translate for its **robustness, contextual translation quality, and reliability**. |
| 82 | + - Open-source alternatives like LibreTranslate produce **word-by-word translations** and fail to use surrounding context. |
| 83 | + |
| 84 | +4. **Automated QA:** |
| 85 | + - Developed a **Selenium + GPT script** to validate translations automatically. |
| 86 | + - Detected inaccuracies and flagged strings for manual review by a human translator. |
| 87 | + |
| 88 | +5. **PO File Generation:** |
| 89 | + - Generated complete Arabic, Japanese, and Hindi `.po` files using the automated pipeline. |
| 90 | + |
| 91 | +--- |
| 92 | + |
| 93 | +### Key Python Translation Script |
| 94 | + |
| 95 | +```python |
| 96 | +from google.cloud import translate_v2 as translate |
| 97 | + |
| 98 | +translate_client = translate.Client() |
| 99 | + |
| 100 | +def translate_prompt(msgid, context, target_lang="ar"): |
| 101 | + prompt = f"{msgid}: {context}" |
| 102 | + result = translate_client.translate(prompt, target_language=target_lang) |
| 103 | + translated = html.unescape(result["translatedText"]).strip() |
| 104 | + return translated.split(':')[0].strip() if ':' in translated else translated |
| 105 | +``` |
| 106 | + |
| 107 | +--- |
| 108 | + |
| 109 | +## Challenges & Solutions |
| 110 | + |
| 111 | +| Challenge | Solution | |
| 112 | +|-----------|---------| |
| 113 | +| Ambiguous UI strings (e.g., "duck","pitch","minor" etc) | Implemented **RAG model** for context-aware translations | |
| 114 | +| Legacy i18n system | Migrated from `webL10n.js` → i18next with JSON support | |
| 115 | +| Automated translation validation | Built **Selenium + GPT-based QA system** to mark errors for review | |
| 116 | +| Open-source translation drawbacks | Used **Google Translate API** for higher quality and context handling | |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +## Achievements |
| 121 | + |
| 122 | +- Successfully **migrated Music Blocks to i18next**. |
| 123 | +- Developed a **context-aware AI translation workflow**. |
| 124 | +- Generated **Arabic, Japanese, and Hindi `.po` files**. |
| 125 | +- Built an **automation pipeline** for `.po → JSON → AI translation → validation → .po` cycle. |
| 126 | +- Created QA tooling to **check translation accuracy** before human review. |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +## Key Learnings |
| 131 | + |
| 132 | +- Extracting and using **context** drastically improves translation accuracy. |
| 133 | +- Clean migration and testing are crucial when replacing legacy infrastructure. |
| 134 | +- Combining **AI automation with human review** ensures high-quality localization. |
| 135 | +- Open-source translation tools can be limited; commercial APIs may be necessary for production quality. |
| 136 | + |
| 137 | +--- |
| 138 | + |
| 139 | +## Future Work |
| 140 | + |
| 141 | +- Add support for more AI translation models (e.g., DeepL, OpenAI). |
| 142 | +- Extend automated QA to **more languages**. |
| 143 | +- Build a **web-based UI** for translators to review flagged translations. |
| 144 | +- Integrate GitHub Actions for automatic updates of `.po` files on new/modified strings. |
| 145 | + |
| 146 | +--- |
| 147 | + |
| 148 | +## Resources & References |
| 149 | + |
| 150 | +- **Music Blocks Repository:** [github.com/sugarlabs/musicblocks](https://github.com/sugarlabs/musicblocks) |
| 151 | +- **Migration PR:** [#4731](https://github.com/sugarlabs/musicblocks/pull/4731) |
| 152 | +- **i18next Documentation:** [i18next.com](https://www.i18next.com/) |
| 153 | +- **ChromaDB:** [chromadb.com](https://www.chromadb.com/) |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## Conclusion |
| 158 | + |
| 159 | +This project modernized Music Blocks’ localization infrastructure, introduced **AI-assisted, context-aware translations**, and enabled **scalable multilingual support**. By combining **framework migration, RAG-based context generation, automated translation, and QA tooling**, Music Blocks is now better equipped to serve children worldwide in their **native languages**, improving engagement, accessibility, and global adoption. |
| 160 | + |
| 161 | +I am deeply grateful to my mentors, the Sugar Labs community, and C4GT for their guidance and support throughout this journey. |
| 162 | + |
0 commit comments