17 Vital Datasets Delayed, As 2023 Comes To A Close
Several datasets which were to be released this year have been delayed, increasing the total number of delayed datasets critical for governance and decision-Making
Mumbai: Seventeen important datasets that cover health, education, the economy, environment and the census are delayed, according to an IndiaSpend analysis of data from various government department websites based on this report from the Ministry of Statistics and Programme Implementation (MoSPI).
Of the 17 delayed datasets, four are late by two years (since 2021) and 13 are even more delayed, with no updates since 2020. Of these, delays to the census and household consumption surveys have a far-reaching impact on the way our policies are shaped.
“Various data sources serve distinct purposes, with the census being a fundamental requirement starting from the village level and extending to the national level,” says P.C. Mohanan, forming acting chairman of the National Statistical Council (NSC). “However, at the ground level, like in gram panchayats, only the 2011-12 data is available, lacking projection methodologies.”
For now, there is no clarity on when the already overdue census will be conducted, “Due to the outbreak of the COVID-19 pandemic, the Census 2021 and the related field activities have been postponed,” the home ministry said in response to a question in the Lok Sabha on December 5, 2023.
Experts express dissatisfaction with the government’s response. “If the government can conduct elections, they can conduct a census as well,” says Vikas Kumar, associate professor at the School of Development, Azim Premji University.
The census supplies the sampling frame for surveys. Presently, we are using the 2011 census as the sampling frame.
“That comes with a few challenges,” says Kumar. “India has seen massive urbanisation in the last decade, and we don’t have updated boundaries of urban and rural areas. You [the government] say that your data is representative of urban and rural folks, and that you have sampled this data according to the proportion of the urban and rural population. But you haven’t updated the sampling frame of the urban versus rural population distribution, the male-female distribution, the inter-state distribution etc., even though India has seen a lot of change. Your sample frame is not accurate, and you cannot therefore conduct proper sample surveys.”
The consumption expenditure survey (CES), typically released every five years, was last conducted for the period July 2017-June 2018. It was published in November 2019, and withdrawn in the same month, citing “data quality issues” and a “significant increase in the divergence in not only the levels in the consumption pattern but also the direction of the change when compared to the other administrative data sources like the actual production of goods and services”. The government, therefore, is still working with the 2011-12 CES, which is 10 years out of date.
The latest CES survey was launched in August 2022, as per a response by MoSPI in the Lok Sabha in December 2022. Experts point out that the delay in the decadal census is causing issues with estimates for this survey as well.
Regularly updated CES data are essential, experts point out. “The data also plays a vital role in revising the consumer price index and GDP, highlighting its influence on various aspects of research and government policies,” Mohanan explains.
Among all ministries, MoSPI has the highest number of delayed reports--nine, including the CES reports and the survey on persons with disabilities, which is a five-year report that was last published in 2018, as per the ministry’s report from 2023.
This is followed by the Registrar General and Census Commissioner, which falls under the Ministry of Home Affairs, and has six delayed reports including the census and the Annual Report on Medical Cause of Death Report, which hasn’t been updated since 2020.
“Researchers are still unable to definitively determine the death rate or toll during the second wave of the pandemic due to the absence of CRS [Civil Registration System ] data,” says Ashish Kulkarni, who writes at econ for everybody and is a visiting faculty, teaching economics and statistics, at the Gokhale Institute of Politics and Economics.
Other issues with data
Half of the government data sets are presented in PDF format, while 40% adopt more user-friendly formats such as MS Excel, CSV, and txt formats--as for example the 2023 report from MoSPI. Furthermore, disaggregation levels are evident, with 72% of datasets providing geographical breakdowns while 32% are only available at the all-India level.
In terms of periodicity, a significant portion of the compiled datasets--approximately 60%--is disseminated annually or monthly. Compilation methods for 55% of information from various ministries/departments fall under the category of 'administrative data,' while a smaller proportion, 14%, pertains to datasets, indicators, or registries gathered through surveys.
“Departmental data or administrative data are generally for their monitoring and evaluation needs,” Mohanan explains. “These do not usually represent the general population, but only those covered by the department. Surveys and censuses represent the whole population or domain. These provide data that can correlate several variables and help validate administrative claims. Independent data from surveys strengthens the national database and is more trusted than administrative data that may have restrictive definitions and coverage.”
Experts point out that while the total number of datasets is growing, the quality of datasets remains uneven across departments and states. Vikas Kumar argues, “Data is often disseminated in the form of PDFs or scanned documents rather than CSVs. But this is not the real problem, as now we have software that can convert PDFs into CSVs very easily.”
The main problem, he says, is that explanatory notes are not published on time. Even when they are published, it is done in bits and pieces, denying users a clear understanding of the larger context of data collection and processing.
“It’s like giving a buyer of a new model of car the user manual five years after the purchase,” says Kumar, who blames not just the government but also data users such as researchers and journalists who, in their race to publish ahead of their peers, often do not look at the explanatory notes or metadata.
“The recently concluded caste survey in Bihar is a case in point,” Kumar says. “After the data was released, everyone wanted to be the first to tweet or write, but hardly anyone cared to ask if the Bihar government had released the metadata.”
“Metadata is crucial for comprehending the data collection process, and detailing how surveys are conducted,” Vikas Kumar explains. “The final report represents the survey's outcomes, while metadata elucidates the survey's methodology. Many extensive datasets exhibit gaps and inconsistencies, requiring a thorough understanding.
For instance, until about a decade ago, the National Sample Survey was restricted to 5 km of bus routes in rural Nagaland. The reach of bus routes changed over time and later the NSS introduced changes to cover the villages beyond bus routes as well. In other words, the population covered in surveys changed over time. However, insufficient metadata on the changes in the degree of coverage in NSS reports have meant that users do not have sufficient background information to understand the inexplicable changes in key statistics including consumption expenditure.”
Experts point out three main factors that speak to the delays in timely datasets: narrative control, resource challenges, and lack of autonomy.
Narrative control:
Azim Premji University's 2019 State of Working India report revealed a loss of five million jobs between 2016 and 2018.
In April 2019, Dharmendra Pradhan, then India's minister for new and renewable energy, dismissed a leaked National Sample Survey Office (NSSO) report that highlighted record unemployment figures. The report, highlights of which were published in a Business Standard report, pointed to a 6.1% unemployment rate.
Contradicting Pradhan, P.C, Mohanan, then acting chairman of the NSC, affirmed the Council’s approval of the report in an interview with IndiaSpend in May 2019. The government’s refusal to accept the NSSO report led to Mohanan's resignation, and to concerns about the NSC being sidelined in key statistical matters.
Kulkarni points to another instance where the government pressured the World Bank to withdraw two reports on open defecation in India. “I would argue that it's challenging to view the withdrawals of some of these reports as anything other than politically convenient,” Kulkarni said. “Consider, for instance, the recent data or World Bank reports on the reduction of open defecation. While disagreement and critique are welcome, suggesting improvements in data collection or addressing errors would be a more constructive approach. Withdrawing the reports altogether seems counterproductive to the intended process.”
Vikas Kumar argues that the deterioration of India's census began in the 1970s, when the introduction of population control policies necessitated the freezing of delimitation, among other things.
“The quality of the census diminished as the latest data was no longer used in high-stakes exercises such as delimitation and federal redistribution. It declined further due to de-institutionalization of the state and communalisation of politics, which manifested as growing delays in the release of politically sensitive data. Belated attempts to fix problems haven't helped.”
Resource challenges
Resource deficit is particularly acute at the state level, with complaints even from officials at MoSPI. The Union government allocates only 0.2% of its budget to the ministry, with the majority directed towards the Member of Parliament Local Area Development Scheme. A mere quarter of this allocation is dedicated to funding statistical activities, presenting a significant funding challenge for the National Sample Survey (NSS) particularly in expanding staff to meet survey demands, as per the June 2023 paper by Carnegie Endowment for International Peace.
The NSS faces staff shortages, and while enumerator vacancies are filled on a contractual basis, expanding supervisory staff proves challenging. This affects the quality of supervision, hindering the NSS's ability to conduct timely surveys for urgent policy-making needs.
Despite increased overall public spending on statistical activities in the past decade, the lack of coordination and regulation in creating numerous datasets poses challenges. The total number of datasets and their spending remains unknown, with concerns raised about the inefficient use of public resources on datasets that policymakers and researchers rarely utilise, as per the study.
Experts like Kumar also point to the fact that over the years, the government has failed to attract good talent. “Statistical agencies of the government no longer attract the best statisticians, who now have access to better opportunities in the private sector,” Kumar explains. “But it is not just about money--official statisticians face obstacles in the form of political interference, poor support systems, and slow adoption of technology.”
Loss of Autonomy
“The data coming out of a survey can be inconvenient for the ruling party or government and, in such cases, there might be efforts to exert pressure on statistical agencies to either postpone or entirely withdraw reports,” Kulkarni pointed out. “This is done to eliminate the possibility of such inconvenient data being made public. To avoid suspicions of interference, it is logical for a statistical agency to operate independently from the government in power.”
Prasanta Chandra Mahalanobis, founder of the Indian Statistical Institute, proposed the establishment of a subcommittee on sampling within the Statistical Commission to formulate global standards, and addressing the data gaps in newly independent nations through large-scale surveys. This resulted in the creation of the first global manual on sampling, the study cited above said.
Mohanan agrees with this point of view. “There needs to be autonomy granted to statistical officers for data collection and dissemination, not solely reliant on government decisions,” he says. “While efforts were made to establish autonomous systems with external advisory bodies, these initiatives have been somewhat diluted over time.
“Previously, bodies like the National Sample Survey and CSO [Central Statistical Organisation] had independent councils for decision-making, but now decisions are increasingly made by local departments, raising concerns about the influence of government decisions on statistical agencies. Ensuring true independence and autonomy for statistical agencies, free from government influences, is a critical issue that needs to be addressed.”
IndiaSpend has reached out to the offices of MoSPI, the Deputy Director General, Social Statistics Division, Ashutosh Ojha, and Deputy Director General for survey coordination on National Sample Surveys S.K Mishra regarding the delay in data releases and the measures that the ministry is taking to improve the quality of data. We also reached out to the Registrar General & Census Commissioner, at the Ministry of Home Affairs, Mritunjay Kumar Narayan, and Additional Registrar General Sanjay regarding the delay in the conduct of Census 2021. We will update the story when we receive a response.
We welcome feedback. Please write to respond@indiaspend.org. We reserve the right to edit responses for language and grammar.