Connect with us

Science

India Pursues Dual Strategy to Establish AI Leadership

Editorial

Published

on

India is rapidly advancing its ambitions in artificial intelligence, as demonstrated during Google’s annual I/O Connect event in Bengaluru this July. The event attracted over 1,800 developers, with discussions focusing on enhancing AI capabilities to accommodate the country’s linguistic diversity. With 22 official languages and numerous dialects, India faces a significant challenge in developing AI systems that serve its multilingual population effectively. Startups showcased innovative solutions to this challenge, exemplifying a dual strategy that balances immediate needs with long-term goals.

AI Startups Showcase Multilingual Solutions

Among the startups featured at the event was Sarvam AI, which introduced Sarvam-Translate, a multilingual model fine-tuned on Google’s open-source large language model (LLM), Gemma. Another notable participant, CoRover, presented BharatGPT, a chatbot designed to facilitate public services, including those used by the Indian Railway Catering and Tourism Corporation (IRCTC). During the event, Google announced that startups Sarvam, Soket AI, and Gnani are collaborating to develop next-generation AI models, leveraging Gemma’s capabilities.

While this may appear contradictory, as three of these startups are also part of the ₹10,300 crore IndiaAI Mission—an initiative aimed at creating indigenous foundational models—there is a practical rationale behind using Gemma. Developing competitive models from scratch is resource-intensive, and India currently lacks the infrastructure to operate in isolation. With limited high-quality training datasets and a pressing market demand, these startups are adopting a layered approach, refining existing models to address real-world issues while gradually building the necessary data pipelines and domain expertise.

Building for the Future with Local Context

This strategy is exemplified by Project EKA, an open-source initiative led by Soket AI in collaboration with prominent educational institutions like IIT Gandhinagar and IIT Roorkee. EKA aims to create a sovereign LLM, with a 7 billion-parameter model expected within four to five months, and a 120 billion-parameter model planned within a ten-month cycle. “We’ve mapped four key domains: agriculture, law, education, and defence,” stated Abhishek Upperwal, co-founder of Soket AI. Each domain has a specific dataset strategy, utilizing data from government bodies and public-sector use cases.

CoRover’s BharatGPT employs a similar dual strategy, currently running on a fine-tuned model that delivers AI services in multiple languages to government clients. “For applications in public health and railways, we needed a base model for quick fine-tuning,” explained Ankush Sabharwal, CoRover’s founder. The company’s current deployments serve both immediate needs and contribute to dataset creation, enhancing accessibility while paving the way for future independent models.

According to Amlan Mohanty, a technology policy expert, India’s approach is an innovative experiment that balances the urgency of deploying AI solutions with the goal of achieving autonomy. By leveraging models like Gemma, India aims to reduce dependency on foreign technology while ensuring cultural representation in AI applications.

Building AI capabilities is not merely a matter of national pride for India; it is about addressing pressing local issues. For instance, consider a migrant worker from Bihar in Maharashtra seeking medical assistance. If the doctor speaks Marathi and the AI tool explains findings in English, valuable nuances may be lost, highlighting the need for culturally grounded AI solutions. Local health workers require tools that understand region-specific medical terminology, while farmers need advice aligned with their local agricultural practices.

This dual-track strategy enables Indian developers to tackle immediate challenges while simultaneously building the datasets necessary for future sovereign AI infrastructure. As Upperwal puts it, “We don’t want to lose momentum. Fine-tuning models like Gemma allows us to address real-world problems today while developing independent models from scratch.” He emphasizes that these efforts are interconnected; one focuses on immediate utility, while the other aims for long-term independence.

A National Priority Amid Geopolitical Considerations

The IndiaAI Mission represents a strategic national response to geopolitical concerns. As AI becomes integral to sectors like education and governance, reliance on foreign platforms raises significant risks regarding data control and exposure. Recent incidents, such as Microsoft cutting off services to Nayara Energy due to sanctions, illustrate the volatility of depending on foreign tech providers. Such disruptions can jeopardize critical infrastructure and services, making the development of sovereign AI systems essential.

Moreover, India’s AI initiatives aim to ensure that local values and linguistic diversity are accurately represented. Most global AI models, trained predominantly on English-language datasets, struggle to meet the complexities of India’s multilingual landscape. Mohanty emphasizes that sovereignty in AI is about control over infrastructure and the terms of engagement, rather than complete isolation. “The more choice you have, the greater your sovereignty,” he states, noting that even major powers like the US and China balance domestic initiatives with strategic partnerships.

Despite the progress, India faces a significant barrier: the lack of high-quality training data, particularly in Indian languages. Manish Gupta, director of engineering at Google DeepMind India, noted that 72 out of 125 spoken languages in India lack any digital representation. To address this, Google is collaborating with the Indian Institute of Science to collect voice samples from across the country, aiming to capture over 14,000 hours of speech data in the initial phases.

This initiative will expand to cover all districts in India, ensuring that local language capabilities can be integrated into AI models effectively. Gupta highlights the importance of improving performance in lower-resource languages through techniques developed from widely spoken languages like English and Hindi.

India’s layered strategy serves as a potential model for other nations in the Global South facing similar challenges in building AI systems reflective of local contexts and values. This approach allows countries to enhance their capabilities while navigating resource constraints. “Full-stack sovereignty in AI is a marathon, not a sprint,” asserts Upperwal, emphasizing the need for rapid deployment and continuous learning.

As India works towards its sovereign AI goals, the question of dependency on foreign technology remains crucial. Despite the partnerships with major tech firms, concerns linger regarding control over the architecture and training processes. The future of India’s digital sovereignty may hinge on its ability to transition from using open models to developing its independent infrastructure before external circumstances change.

Our Editorial team doesn’t just report the news—we live it. Backed by years of frontline experience, we hunt down the facts, verify them to the letter, and deliver the stories that shape our world. Fueled by integrity and a keen eye for nuance, we tackle politics, culture, and technology with incisive analysis. When the headlines change by the minute, you can count on us to cut through the noise and serve you clarity on a silver platter.

Continue Reading

Trending

Copyright © All rights reserved. This website offers general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information provided. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult relevant experts when necessary. We are not responsible for any loss or inconvenience resulting from the use of the information on this site.