There is a heated debate in the country not only about how the economy is performing but also the credibility of numbers that measure economic progress. Right from gross domestic product (GDP) data to unemployment numbers, everything is under scrutiny. At times, even the government has rejected the data - such as the Periodic Labour Force Survey (PLFS), which apparently did not paint a good picture of the jobs scenario - put out by the Central Statistics Office (CSO).
To discuss the issue of data credibility, Business Today organised a roundtable that was attended by Pronab Sen, former Chief Statistician of India; P. Mohanan, former Acting Chairperson of the National Statistical Commission (NSC) who quit after the government refused to publish the PLFS data; Sudipto Mundle, Professor, National Institute of Public Finance and Policy and former acting chairperson of the NSC; Mahesh Vyas, Managing Director and CEO of Centre for Monitoring Indian Economy; and Laveesh Bhandari, an economist and a data scientist. The roundtable was moderated by Prosenjit Datta, Editor, Business Today, and Rajeev Dubey, Editor, BusinessToday.In.
Prosenjit Datta: Till five years ago, we would have taken any government data on the economy. But over the last three or four years, the grounds have been muddied because of various controversies, and we do not know exactly what is going on. Mr Mohanan, since you were the last to resign over data-related controversies, tell us how should we treat government data?
P. Mohanan: I would not go to the extent of saying all government data is suspect. I have been part of the government statistical system my entire life. Almost all data generating systems have external committees to check accuracy. That system has been very strong in the past, especially the governing council that looks into the NSSO data. Nobody would raise questions about the NSSO data. It is only in recent times that we are seeing a lot of media and public interest in these figures. To that extent, the system has to change and become transparent. The idea of trying to get figures the government is comfortable with published and withholding data it is not comfortable with is of recent origin.
In case of GDP, there has been a major shift in the database after we started looking at corporate filings. However, unlike earlier, all the data is not in public domain. Substantial data comes from the corporate database and we are not clear how they use accounting standards and how Indian accounting standards have been converted for national accounting purposes.
As far as the NSSO survey data is concerned, it is already available, and you cannot change the results. But the government has been coming out with a lot of administrative data. This should be looked at with caution because a simple instruction can perhaps change the whole nature of the data. That kind of opaqueness (in generation of administrative data) is not good.
Pronab Sen: NSSO is the only primary data-generating body in the country. All other data come from a variety of sources. When you are looking at something that complex, you can pick any one data point and say this is garbage. And it might well be garbage. The question is, does it transform anything material? It is the job of statisticians to decide that. But there is a more fundamental problem. It relates to the fact that there has been a popular understanding that more data is better data. The NSSO's sample size is 1,30,000, the CMIE's is 1,75,000, so the CMIE numbers are better. The whole argument that big data is better for identifying individual behaviour is lousy.
So, now what you have is an imbalance in terms of the way official data is generated and the way people think it should be generated. Earlier, the rest of the world didn't know where the data was coming from, and as long as it was consistent and transparent, and the systems that generated it were seen as independent, it was trusted. Today, we are in a situation where we have this popular view on one side, while on the other, the statistical system itself continues to remain rooted in things we were doing in the past.
The first major break from that comes in the 2011/12 GDP estimates where we moved to a much larger MCA database. It was a fundamental shift, but the problem was that you continued to think as if you were in the old model. So, a lot of questions that came up reflected the lack of understanding of what this data was giving and what its limitations were. I believe the statistical system was not particularly good at communicating what this change was. The creation of the position of chief statistician was to bridge that (communication) gap. What has happened is that not only is there an attack on the data that exists, but the attack has gone far and questioned the data we have used in the past.
What should we do?
Sudipto Mundle: There is an established system of statistical organisations, and they are not different from what they were before, whether it is NSSO or CSO. You should know that the new kids on the block - the EPFO and others - are not meant to estimate unemployment. Institutions have a track record. So that sifting can be done very easily. Just to stick to the employment-unemployment story, you have the NSSO, the labour bureau survey, which is not as good as the NSS because the sampling is done in a different time cycle, but the sample design and all are not very different. Then you have an organisation like Mahesh's (Mahesh Vyas of CMIE). Whatever the government might say, CMIE has established its credibility. These are the two filters you could use. Third thing is collateral information. Sift within the chaotic stuff that is happening.
Prosenjit Datta: In jobs, at least, we can say there has been a consistency in what the government has said. With GDP, the major problem is break after the MCA 21 database was used. But even within the new system, we are facing sharp revisions, which was not so earlier.
Mundle: Earlier, when there was a big change in generation of GDP figures, there would be an explanation. In this case, it did not happen. So, the question is, what do you do when the go-to government institutions are looking dodgy. This raises long-term issues and there are three parts to this. One, there may be some room for upgrading skills. The sense I got looking from outside is that by and large they have demoralised the institution. The other point is that the technology for data generation is changing very fast and that capacity building is not happening. It is an even bigger problem when you go to the state level. Thirdly, there is a serious governance issue and this has to do with the institutional architecture - how to ring-fence the government statistical generation system from the executive government of the day.
Mahesh Vyas: What do you do when you are faced with conflicting numbers? The first thing is obvious. Now, the official statistical machinery is apparently used more to support government narrative. Having said this, I would say that the official statistical machinery is not highly compromised. What is compromised is the release of the information. What is probably compromised is the generation of GDP numbers. I don't think it is easy to manipulate the household survey.
As Mohanan said, unit level data is available to the world and they can develop own information. So, if they say 6.1 per cent, somebody else can come with 7.1 per cent, but he has to explain how he got it. So, if you appreciate that the official statistical machinery is used for an apparently political narrative, what is the next step?
This is a nice game that can be played by people who want to control the narrative. So, CMIE is saying no additional employment is generated on a net basis. And say we want to claim seven million jobs are created, because according to the World Bank, eight million jobs have to be created in a year. So what do you do? You play a game. If I say seven million, then the media will say the government is saying seven, the CMIE is saying zero, the answer could be somewhere in between. What do you get? 3.5 million. Not a good outcome. So what do you do? All you need is some "intelligent" economist to say it is 13 million. The average becomes seven. The narrative is not only in using the official machinery, as that is not the end game. The end game is perception. The perceptions are the best when there is some mad hat who can say 15 million. The CMIE used to, and we still do to a small extent, look at GDP numbers, NSSO numbers to understand how it is done. After many years, we said let us create our own databases. So, there is a large alternative to official statistics, and this can be a way of putting pressure.
EPFO and its reliability as jobs data
Rajeev Dubey: How reliable is the EPFO number?
Laveesh Bhandari: I stopped looking at it 10 years ago. The database is a mix of a lot of databases which sometimes match, sometimes don't. Then there was this issue of people joining new jobs with new numbers. Now, what happens is, if I go in as an analyst, I will come up with a number. After me, if you go, and you don't build on what I have done, and do your own study using an independent method, you will get something different. It was really stupid of Ghosh of SBI to come up with that number, but he did, because he just didn't know how to aggregate these disparities.
P. Mohanan: The problem with the EPFO is that these are not employment numbers at all. A total of 17.9 million have entered the EPFO. I will be happy if they say these are new jobs, but they will not say it, as it is a big number. So, they say it is four million, and it is a very small number. So, you got another number - 3.3 million people have re-entered. So, if you add, you get 7.3 million, which looks respectable. I have a suspicion their numbers are monthly subscriptions. Also, they say they have 16-17 crore entries in their database. I am not sure the number goes off when somebody retires. They say they have de-duplicated six crore subscribers using Aadhaar but the remaining two-thirds are yet to be cleared. So we have no clarity where the numbers have come from.
Pronab Sen: NSSO (data on jobs) does have limitations. It is not looking at specific industries. It gives you numbers by activities. To address this, in the wake of 2008 global crisis, the labour bureau was asked to follow up on a bunch of high employment sectors. These were supplements to the NSSO data. Essentially from manufacturing only. On agriculture, NSSO gives you a possibly complete picture. In services, if you are not particularly interested in any specific service, NSSO gives you a fairly good data.
Sudipto Mundle: The PLFS has two parts. For the urban sector, which takes care of most organised employment, quarterly data was already there. For agriculture, you have an annual data, and then you combine both and have an annual data for the economy. So, our CSO has this data, and despite all its limitations, there is no deterioration in quality. They are still coming out with these numbers. And then you have CMIE. I am not saying you only look at CMIE or government numbers. You have two sets of data and you can always do robustness checks.
The buck stops with CSO
Laveesh Bhandari: I think we are in a sense too kind to the CSO. When we moved to the new data in 2011, and I am talking about only the GDP data, it was always like introducing a new product. So, when you are coming up with something new on a highly respected number, and go for a sudden shift and GDP which is ostensibly incomparable, you are stuck. Where the CSO went wrong, and I think it must accept this blame, is it made the shift too fast. Wait for 10 years, do the analysis required for comparability, accept that firms fudge data. When they are wrong, we need to understand where they are wrong, how they are wrong, and where their biases are, etc. I don't think CSO has the people or depth to have done this in a short period - two or three or four years - it should have taken more time to bring the new series. The CSO is heavily under-resourced. You cannot expect it to do all these new-age things when it is not really investing.
Pronab Sen: But the shift was important. Now the second point. We have a situation where earlier trained statisticians had no alternative jobs. The difference now is that a whole bunch of private sector entities, almost everyone in the private financial sector, has a team of statisticians. In premier colleges teaching statistics, campus recruitment is taking place at the graduation level, not at the master's level. So we are losing the cream (at the undergraduate level), and the quality of people we are picking up is lower than what it used to be sometime back. Another problem is our training hasn't evolved. Everybody gets the same training and the position you hold simply depends on your seniority.
Sudipto Mundle: By talking about the issues of getting talent into the organisation, you are talking about huge reforms. You have to think of a creative way of doing these reforms.
P. Mohanan: Let me add to what Prof Sen said. The issue is not resources. The government gives enough funds. But once you do the survey, there is no data analytics (capability) within the CSO. Many data sets are not interoperable - we have agriculture census, census on livestock, economic census, etc, and there is no way to integrate these data sets. So we are seeing so much resources go into collecting data, but nothing much is being done with that data.
Ring-fencing statistical system
Prosenjit Datta: Talking about too many data sets, in agriculture there are two sets of data - Central- and state-level data. So, which data get used?
Sudipto Mundle: For the Centre's own policy purpose, the Central data is used, but states use their data as well. At times it becomes a contentious issue. The states may need crop data and use their own data, which may be different from Centre's data. Then negotiations start and adjustments are done. This takes me to a larger point. This business of government having a say on what the statistics say is not a new problem. There are three-four points on which we need huge discussions for reform. The most important part of it is ring-fencing statistics.
Rajeev: Are there examples where past government had suppressed or withheld data?
Sudipto Mundle: Thirty years ago, I was economic advisor to the government. One day I found that the then DG, CSO, was virtually in tears. He was given orders to produce one particular growth number and the guy was wondering what to do.
So I told him to give the number as a preliminary estimate (those days it was not called advanced estimate), and then give the real number in the revised estimate. So it was more subtle those days, but government intervention was always there.
Pronab Sen: The ring-fencing of the statistical system began in the Vajpayee era. He made a separate ministry for statistics, he appointed the Rangarajan Commission, which also batted for ring-fencing. The UPA government did the ring-fencing with creation of NSC and the office of CSI (Chief Statistician of India). That process has been going on since 1997, but now it is going backward.
Sudipto Mundle: I have a different perception about this. What is this NSC? It has three-four part-time members, it is not set up by any Act of Parliament but a government order. The secretariat for them, which is Mospi, can treat them with great respect but they have no power, no budget. If you want to ring-fence, you have to have at least a couple of full-time members, with expenditure being put as a charged item in the Budget, which means no voting takes place for the amount involved and the executive has no control over it. The statistical system should report to them and not the Mospi; and the CSI should not be reporting to the cabinet secretary.
Laveesh Bhandari: I want to qualify this. I think there should be some political oversight, obviously not on the output. I don't think any politician should decide the outcome. I think the system can do with some quality (political) oversight.
Pronab Sen: Political oversight can come only from either the government of the day (the executive) or Parliament. I prefer Parliament.
Sudipto Mundle: It is time for the Act, which has already been drafted, to be revived. If that gets done, NSC will become a statutory body, the entire statistical system will get oversight. That way even the Mospi, the secretariat for the NSC, will get empowered. If today, Mospi wants something to do with the census department (under the home ministry), it has no power.
Beyond GDP and jobs data
Prosenjit Datta: Generally the discourse is around GDP and jobs data, but what about other data points like IIP? Does it still reflect the modern economy given that it is an index of production and not an index of production and services. Or for that matter relevance of CPI and WPI given how the basket of consumption has changed over time.
Sudipto Mundle: We are looking at a very negative side of the story. But over the years, due to demand for better data and statistics, three-four major initiatives have started. Look at what were the gaps. We all knew that annual survey-based statistics of industrial production, manufacturing, etc, were limited surveys. We augmented them with MCA data. Secondly, just like the annual survey of industries, there was a proposal to have an annual survey of services, which was also for the organised sector. Then, the big hole in the survey was the unorganised sector. What you have for this is enterprise survey, which is now done regularly - both industry and services. So that's another big improvement.
On the price front, you are saying there are CPI and WPI, but no producer price data. That also has been initiated. Of course, we talked about the PLFS. So, the four key areas where we had deficiencies have been recognised and reforms started.
Pronab Sen: I will tell you where we are. As far as the informal sector data is concerned, earlier we used to have three surveys and, therefore, we used to have those surveys once in five years. But then we realised we have to increase the frequency of the surveys. So, the first thing we did was merge the three surveys into one. The results have not been great, just about okay. As far as the annual survey of services is concerned, it was supposed to be launched by now, except you ran into a problem. The sample had to come from the GST data, but the GST guys failed to provide that data. All we want is name, addresses and some measure of size.
As far as prices are concerned, the consumer price index - on the household pricing side, the divergence between national accounts consumption data and the household data is now 45 per cent. That is the problem number one. So you don't know how representative the household data is.
Number two, as it is, the household consumption survey takes 2-2.5 hours. We cannot further lengthen it. After about an hour or so, people start getting upset. But the fact of the matter is that, over the years, the range of consumption has increased enormously, and a lot of it is not reflected in the survey. As a result, what you have is a category called miscellaneous, which gets all the hits.