Shekhar Mittal


shekhar's profile picture

I am a fourth year Ph.D. student in the Global Economics and Management program at Anderson School of Management, UCLA. I completed my Master of Public Policy degree from Harris School of Public Policy at University of Chicago in June 2013.

I am interested in poverty and development issues in emerging economies. Specifically, I want to understand how the incentives of various stakeholders such as politicians, bureaucrats, and citizens vary. I also want to analyze the impact of this variation towards achieving intended developmental outcomes.

I received an undergraduate degree in Computer Engineering from University of Delhi. I started my career with Cisco Systems as a software developer in their Core Routing Business Unit. I then moved to a start-up which was interested in urban development issues. Most of my time here was spent on building our online platform and collaborating with citizen groups who could potentially use the platform. I followed it with a stint at a public policy think-tank - Center for Civil Society. At CCS, I focussed on the elementary education sector with an emphasis on school choice and the then recently legislated Right of Children to Free and Compulsory Education Act.

I am available at and My CV last updated in July, 2017 can be found here. My linkedin profile can be found here. I am active on twitter.


Ongoing Research

VAT in Emerging Economies: Does Third Party Verification Matter? (with Aprajit Mahajan)

A key stated advantage of the value-added tax (VAT) is that it allows the tax authority to verify transactions by comparing seller and buyer transaction reports. However, there is little evidence on how these paper trails actually affect VAT collections particularly in low compliance environments. We use a unique data set (the universe of VAT returns for the Indian state of Delhi over five years) and the timing of a policy that improved the tax authority's information about buyer-seller interactions to shed light on this issue. Using a difference-in-difference strategy we find that the policy had a large and significant effect on wholesalers relative to retailers. We also document significant heterogeneity with almost the entire increase being driven by changes in the behavior of the largest firms. We also find suggestive evidence that information and enforcement are complementary. Finally, we discuss the details of the policy implementation and argue that this policy which seems simple in principle, faces substantial hurdles in execution, particularly in a system with limited resources.

Who's Bogus? Using Machine Learning to Identify Fraudulent Firms (with Aprajit Mahajan, Ofir Reich)

Improving the state's ability to tax effectively is increasingly seen as central to the development process and value added tax (VAT) is often proposed as a key tool towards accomplishing this goal. However, VAT implementation in many low compliance environments is plagued by firms generating false paper trails. This demand for false paper trails has led to the creation of fraudulent firms (referred to as ``bogus" firms by tax authorities) which issue fake receipts to genuine firms that allow the latter to lower their tax liability. We are requesting pilot funding to initiate the first stage of a long-term intervention to improve tax collections in Delhi (India). In this stage, we plan to apply machine learning methods on a large network data set (the universe of all tax returns for five years from Delhi) to identify fraudulent firms and then use on-the-field verification of such guesses to further improve the machine learning algorithm. In the second stage, we plan to implement an RCT with the tax authority that compares the authority's current method to our data-driven approach towards identifying fraudulent firms.

Other Research Experience

World Values Survey

With Prof Romain Wacziarg, I used the World Values Survey data to compute various measures of cultural distance between countries. We automated this in Stata and based this distance measure on differences in answer frequencies to a series of questions. This distance measure was calculated in euclidian, manhattan and FST form.

United States Historical Election Returns

With Prof Paola Giuliano, I use the US historical election returns data to calculate the county level vote share of all political parties from 1824 to 1968.

Extreme coefficients of 2k regressions

Under Prof Ed Leamer, I used Matlab to implement possible methods to efficiently determine extreme coefficients of 2k regressions. We also tried to develop efficient techniques to identify all the orthants of 2k regression coefficients.

Evaluation of Right to Public Service Schemes

With Professor Marianne Bertrand and Professor Paul Niehaus, I evaluated the implementation of Right to Public Services Act that was introduced in Karnataka, India. We analysed the government data on service requests to figure out patterns related to accuracy of information being captured and time to resolution of requests. (The report is available upon request)

I went to Karnataka to understand the design and implementation of the Act. I interacted with all the government level stakeholders and visited various government offices to witness the ground realities. I developed and piloted a survey questionnaire which captured the experience and perception of citizens who avail the services that are covered under the Act.

Measuring performance and analysing time-use of Indian Administrative Service (IAS) officers

This project was also with Professor Marianne Bertrand and Professor Paul Niehaus. I undertook a field trip to Bihar, India to understand the time-use of district magistrates. Here, I shadowed 2 district magistrates for a week each for as much time as they permitted. The goal of the exercise was multi-fold: to grasp the variation in the tasks that a district magistrate undertakes, to understand the dynamics of her relationship with all the agents she interacts with, the constraints in which she is expected to deliver, and her role and responsibilities. Another aim of the exercise was to understand the organisational setup at the state/district/sub-district level.

I developed time-use tracking forms and survey questionnaires for the district magistrates, formulated Right to Information Act questionnaires to get data which could help measure the performance of district level IAS officers as part of their statutory & regulatory responsibilities.

In the initial stages, I reviewed government documents to understand the statutory and developmental responsibilities of IAS officers and narrowed the research problem. Between all this, I wrote perl scripts to get election data for national and state level elections (1980 onwards) in India.

Effect of name order in Senate roll call voting pattern

Under Professor Pablo Montagnes, I analysed the impact of the order in which a senator is called to vote on her voting behavior. We are able to do this because Senators in the US are called to vote in the order of their last names.

Using combined fixed effects at the senator level and at the congress level, we show that the agreement level of the senators with their party goes up as we go down the order. The major magnitude of the result is being driven by party unity. Senators also get signal from the way in which senators from her party have already voted (bandwagon effect).

Besides doing the stata analysis, I also wrote perl scripts to extract senate and house roll-call data from 1940 into an analysable data set.


Summer paper:
"Mumbai municipal elections: Performance and incumbency effect analysis", Shekhar Mittal, Summer 2014.

Short independent paper:
"Distance as an instrument for measuring centralized control in government schools", Shekhar Mittal, December 2012.

Research designs/proposals as part of course-work:
"Partition of India: Long term effects of selection in migration", Shekhar Mittal, Winter 2015.
"Economic consequences of partition of British India", Shekhar Mittal, Fall 2014.
"Indian Politics: The Criminals beget Criminals Effect", Shekhar Mittal, May 2012.

Policy memo:
"Reservation in Private Schools under the Right to Education Act: Model for Implementation", Shekhar Mittal and Parth J Shah, December 2010.

Graduate presentations


I don't claim to be a computer science nerd but from time to time I wonder if there was a better way to do the task that I was performing. This page is for people like me. Here, I list tools that I have used, and found useful and fascinating. All of them are free and have an active online community. They made my work efficient and fun. If you think I should add something to this list, I am always up for trying new useful tools. (Disclaimer: In some cases, I would not have latched on to these tools had it not been for my computer sciency geeky friends.)

Drupal (Link)

If you are looking to create a website which goes beyond text/html/css requirements, drupal is it. A lot of times one needs to include functionalities which are complicated but fairly common. Think integrating google maps, blogs, forums, wikis, anything and everything into one website. Chances are that you will find a module in Drupal which suits your needs closely.

Google Refine (Link)

If you look into government data developing countries, chances are you regularly come across data with spelling and naming inconsistencies. Google Refine helps you solve that problem. There are other ways in which people have gotten around this problem (reclink command in stata, Prof Bhavnani wrote an application). I have not tried those other ways. Besides, this one has the google name behind it.

FrontlineSMS (Link)

There are lot of SMS server offerings out there. One has to pay for them and be sure of what she wants. On top of that, those services usually require you to have access to internet - a luxury not easily available in developing countries. FrontlineSMS allows you to create and manage common SMS activities such as making announcements, conducting polls and sending automatic replies to incoming SMS. Potentially, a great tool to collect data from the field. All one needs is a laptop and a mobile phone.

Ushahidi (Link)

Crowdsourcing is going to be the way to collect data in the near future. Think many kind of issues (election violations, transparency, etc.) getting reported live and coming up on a map in a crowd sourced manner. Ushahidi will enable that.

Emacs (Link) plus Org-Mode (Link)

These tools have a little steep learning curve but once you cross it, they are addictive! I use emacs to write and edit: assignments, papers, presentations, latex files, scripts, anything and everything. One can open multiple files in the same window and then compile them from there itself. Mouse becomes redundant. Org-mode allows you to create to-do lists, track agendas, create text files and then very easily export them to other formats such as latex, pdfs and htmls.

This website was completely built in emacs and org-mode.


Here you will find a few basic scripts that I put together (copied and modified from internet) to clean data that was needed in our research. Feel free to use them. Send some more my way if you have them. Most scripts that we write have already been written by someone. No point in reinventing the wheel. Hopefully this list will grow with time.

Perl Scripts

a. Convert from pdf to text (Download)
Copying text from pdf is easy. But if one has to do it for multiple files and multiple pages it can be slow, painful and boring. This small script automates it.

b. Convert from text to csv (Download)
One needs to know how to handle regular expressions in perl to do this (if one wants to use this code, she will have to change the regular expression for sure). This script reads the text file line by line and puts the content in the required csv format.

Stata Scripts

Writing "for loops" in stata is cumbersome. Here (download) is a stata file in which I read specific cells from the multiple tabs of the same excel file and then append them into a single dataset.

Created: 2017-08-21 Mon 12:32