Science
India Pursues Dual Strategy to Build AI Capabilities and Sovereignty

India’s ambitions to become a leader in artificial intelligence (AI) gained significant attention during Google’s annual I/O Connect event in Bengaluru this July. With over 1,800 developers in attendance, discussions centered on the challenge of creating AI systems capable of catering to India’s linguistic diversity, which includes 22 official languages and countless dialects. This multifaceted challenge was on full display as startups showcased innovative solutions for navigating the country’s complex multilingual landscape.
Among the notable participants was Sarvam AI, which presented Sarvam-Translate, a multilingual model developed using Google’s open-source large language model (LLM), Gemma. CoRover also featured its AI chatbot, BharatGPT, designed for public services, including applications for the Indian Railway Catering and Tourism Corporation (IRCTC). At the event, Google highlighted that AI startups Sarvam, Soket AI, and Gnani are at the forefront of developing the next generation of AI models tailored for India, fine-tuning them on Gemma.
This dual approach may seem contradictory, as three of these startups are involved in the 10,300 crore IndiaAI Mission, a government initiative aimed at creating indigenous foundational models trained on Indian data. The rationale for incorporating Gemma lies in the resource-intensive nature of developing competitive models from scratch. India faces the added challenge of limited high-quality training datasets and an evolving computational infrastructure, making a more pragmatic approach essential.
Startups are currently adopting a layered strategy, fine-tuning existing open-source models to address immediate real-world problems while simultaneously building the necessary data pipelines and expertise for future independent models. Fine-tuning refers to the process of enhancing an existing LLM, which has already been trained on broad datasets, by further training it on local and specialized data to improve performance in specific contexts.
Building New Foundations for AI
The initiative known as Project EKA, led by Soket AI and in collaboration with institutions like IIT Gandhinagar, IIT Roorkee, and IISc Bangalore, exemplifies this layered approach. This open-source project aims to create a sovereign LLM from the ground up, utilizing local resources for data, infrastructure, and training. The project anticipates delivering a 7 billion-parameter model within four to five months, with plans for a more advanced 120 billion-parameter model within ten months.
According to Abhishek Upperwal, co-founder of Soket AI, the initiative is focused on four key domains: agriculture, law, education, and defense, each with a clear strategy for data sourcing from government agencies and public sector use cases. The EKA pipeline is designed to operate independently of foreign infrastructure, with training conducted on India’s GPU cloud, and all resulting models will be open-sourced for public access.
CoRover’s BharatGPT illustrates this dual strategy, currently using a fine-tuned model to provide AI services in multiple Indian languages to various government clients, including the IRCTC and Bharat Electronics Ltd. Founder Ankush Sabharwal emphasizes the importance of using a base model that can be rapidly adapted to meet the needs of public health, railways, and space applications. “The deployment serves as both service delivery and a means of dataset creation,” Sabharwal explains.
Amlan Mohanty, a technology policy expert, describes India’s approach as a calculated experiment, leveraging models like Gemma for swift deployment while maintaining the long-term goal of autonomy. This strategy is critical as India aims to mitigate dependency on foreign technology and ensure cultural representation in its AI systems.
Addressing Local Needs through AI
The necessity for India to develop its AI capabilities goes beyond national pride; it is fundamentally about addressing local challenges. For example, consider a migrant from Bihar visiting a clinic in rural Maharashtra. The doctor speaks Marathi, but the AI tool provides explanations in English, leading to a breakdown in communication. This scenario highlights the importance of developing AI systems that are not only linguistically compatible but also culturally and contextually relevant.
Frontline health workers in Bihar require AI tools that understand local medical terminologies in Maithili, just as farmers in Maharashtra need crop advisories tailored to specific irrigation practices. Government portals should adeptly process citizen queries across multiple languages, including regional variations. These practical applications underscore the urgency of developing AI systems that can navigate linguistic and cultural nuances effectively.
Fine-tuning existing models offers Indian developers a means to meet these immediate needs while simultaneously constructing the datasets required for a sovereign AI infrastructure. This dual-track strategy enables rapid response to pressing challenges while laying the groundwork for long-term independence.
The IndiaAI Mission represents a national response to the growing geopolitical complexities surrounding AI technology. As AI applications become central to sectors like education, agriculture, and defense, reliance on foreign platforms poses risks related to data exposure and control. This concern was underscored when Microsoft recently suspended cloud services to Nayara Energy due to sanctions on its Russian-linked operations, demonstrating how foreign tech providers can become geopolitical leverage points.
Sovereign AI systems are vital not only for reducing dependency but also for ensuring that local values and regulatory frameworks are accurately represented. Most global AI models are trained on English-dominant datasets, rendering them inadequate for addressing the complexities of India’s diverse population.
Despite the momentum behind India’s sovereign AI initiatives, the lack of high-quality training data in Indian languages remains a significant challenge. According to Manish Gupta, director of engineering at Google DeepMind India, 72 of the country’s spoken languages, which each have over 100,000 speakers, lack any digital representation.
To tackle this, Google has partnered with the Indian Institute of Science to collect voice samples across hundreds of districts. The first phase captured over 14,000 hours of speech data from 80 districts and 59 languages, with plans to expand coverage further. Gupta noted that data quality is often a hurdle, requiring significant cleanup efforts.
Google’s Gemma LLM incorporates insights from this extensive work, helping to enhance the performance of AI in lower-resource languages. The collaboration with Indian startups aims to cultivate inclusive tools that can scale beyond India, potentially benefiting other linguistically diverse regions in Southeast Asia and Africa.
A Roadmap for the Global South
India’s strategy of leveraging open models while concurrently developing sovereign capabilities serves as a potential roadmap for other nations in the Global South facing similar challenges. Many countries are grappling with how to build AI systems that accurately reflect local contexts without the advantage of extensive resources or mature data ecosystems.
For these nations, fine-tuning open models presents a viable pathway toward enhancing capability and control in the AI landscape. “Full-stack sovereignty in AI is a marathon, not a sprint,” Upperwal emphasizes. The development of a 120 billion-parameter model will not occur in isolation but will depend on rapid deployment and learning.
Countries like Singapore, Vietnam, and Thailand are already exploring similar methods, utilizing Gemma to kickstart their local LLM initiatives. By 2026, when India’s sovereign models are expected to be operational, these efforts may converge, with homegrown systems gradually taking precedence over borrowed technologies.
As India and other nations in the Global South progress toward achieving digital sovereignty, the critical question remains whether they can establish a complete, independent AI infrastructure before external conditions change. The journey is fraught with challenges, but the potential rewards in terms of capability and autonomy could redefine the technological landscape for the region.
-
Sports3 weeks ago
Broad Advocates for Bowling Change Ahead of Final Test Against India
-
Science3 weeks ago
New Blood Group Discovered in South Indian Woman at Rotary Centre
-
Sports3 weeks ago
Cristian Totti Retires at 19: Pressure of Fame Takes Toll
-
World1 month ago
Torrential Rains Cause Flash Flooding in New York and New Jersey
-
World1 month ago
SBI Announces QIP Floor Price at ₹811.05 Per Share
-
Lifestyle1 month ago
Cept Unveils ₹3.1 Crore Urban Mobility Plan for Sustainable Growth
-
Top Stories1 month ago
Konkani Cultural Organisation to Host Pearl Jubilee in Abu Dhabi
-
Science1 month ago
Nothing Headphone 1 Review: A Bold Contender in Audio Design
-
Top Stories1 month ago
Air India Crash Investigation Highlights Boeing Fuel Switch Concerns
-
Business1 month ago
Indian Stock Market Rebounds: Sensex and Nifty Rise After Four-Day Decline
-
Politics1 month ago
Abandoned Doberman Finds New Home After Journey to Prague
-
Top Stories1 month ago
Patna Bank Manager Abhishek Varun Found Dead in Well