### THE SCIENCE AND MATHS OF OPINION POLLSuppose in a bag there are 100 slips. Ninety-Nine of them carry either the alphabet X or the alphabet Y. The hundredth has the alphabet Y or X. If you were to pick a slip from the bag and it turned out to be X, would you expect 99 sheets to be X or would you expect it to be Y? Most people would say “X.” Some would have used ‘probability theory’ to arrive at the answer; others would describe it as ‘common sense.’ It is this common sense that is the basis of opinion polls. Let us say there are 10,00,000 eligible voters in Tamil Nadu. Assume that there are only two parties, A and B. The winning party is getting 53 per cent votes (or more) and the other party is getting 47 per cent votes (or less). Suppose we make all possible lists of 1501 voters. On each list, we write down the party that has the support of 751 or more voters out of 1501 voters on that list. Thus each list will have a letter A or B written in it. The total number of lists is humongous, but one can compute it using software. Theory tells us that if party A has 53 per cent support or more, then over 99 per cent lists will have A written on these. Now if we randomly pick one list out of these and talk to voters on that list, we can figure out who is winning on that list. We can then say that the one winning on the list will also win the state. Just like in the bag experiment explained earlier. ### Stratified sampling... If instead of 10 lakh, the state had 100 crore voters and we make lists of size 1501, 99 percent of records will still show the name of the winning party. So, the accuracy of the finding from the sample survey depends on the sample size and not the sampling fraction. This is called random sampling. The random sampling could be replaced by proportio-nately allocating the sample to each district but within each district, we choose the voters at random. This is called stratified sampling. If we increase the sample size to 7400, the vote percentage for a party in the sample and the state will differ by less than 1.5 per cent with 99 per cent accuracy. In short, a sample size of 7401, irrespective of the scale of the state would suffice. We first select the required number of constituencies randomly (say one in every 3 or 4) and then in each constituency, we select, say 4 to 6 polling booths and then in each chosen booth we choose the required number of voters randomly from the list of electors. Then the investigators go door to door and talk to the identified voters. ### Can we predict the number of seats for major parties in the state? One needs to build a mathematical model for predicting seats based on a sample survey of say about 6000 to 10,000 for state poll and 40,000 for the national election. The number of seats a party gets depends on its overall vote share and how the votes are distributed across the state. We assume that the distribution of votes for a party across the state is same as it was in the previous election. In other words, we assume that the change in votes for a given party (also called swing) from the last election is constant across the state. Once we have estimated vote shares for major parties in each constituency, we can just declare the party getting highest votes in every constituency to be a winner and then count the predicted seats for each party. However, we would be more confident in predicting the winner if the gap between the top two candidates is large, say 7 per cent. We have developed a model to do this. For the top three parties in every constituency, we assign a predicted probability of win (adding up to one). We then compute the standard deviation and the modeling error. Once we have probabilities of the win for parties in each constituency, we sum these over the state to predict the seats. But hold on. On the voting day, only about 60 per cent voters cast their ballot. Also, if there is a long gap between survey and actual voting day, the opinion of public as a whole can change. These two factors put a question mark on the predictive power of any pre-election opinion poll done a month before the election. This does not mean we ignore opinion polls. Don't throw the baby along with the bath water. -RK |