Invited Presentation to the Jubilee Conference of the New Zealand Statistical Society, 5-7 March, 1999, Wellington.
Keywords Political Economy & History; Statistics;
Fabian ‘… unless you do redeem it by some laudable attempt either of valour or policy.’
Sir Andrew Auguecheek ‘An’t be any way, it must be with valour; for policy I hate; …”
Thankyou for the honour of being invited to speak to the Jubilee Conference of the New Zealand Statistical Association. My associations go back only two-thirds of its history, but they are experiences and friendships I greatly value. Indeed, had there been an academic career path in social statistics, I may well have ended up pri
Invited today to talk on the relevance of risk in economics, one faces an overwhelming task because of its central role in so much of economics, I must be selective. Economics divides itself between the positive and the normative. The “positive” is essentially about the realm of the scientist, studying the world as it is. It includes the standard statistician’s activities of measuring, estimation, and hypothesis testing, together with forecasting where the purpose is to predict the world. The “normative” is essentially about the world of the policy-maker, of the world as it should be – of what could be done to improve it. Normative economics involves statistics insofar as policy makes use of positive economics. In addition, policy-making is about managing risk. That includes private policy decisions such as financial investment, as well as public policy, on which I shall focus.
My approach today is positive, and yet I shall talk about policy-making, for it is a part of human endeavour, and therefore susceptible to positive analysis by the standard tools of a social scientist. As it happens the average policy-maker’s grasp of statistical methods is limited, and much of what I have to say would normally go over their head. Even so, I am going to try to keep the presentation as accessible as I dare. In doing so I may tread on a few statistical prejudices. In particular, my approach will be primarily frequentist, although there will be a guest appearance from bayesian analysis.
The positive analysis of policy-making – what I have called “policy as process” – is not a popular activity in this country. Practical New Zealanders want to talk about what policy should be, rather than how it happens. A number of reviewers of my The Commercialisation of New Zealand complained that analysis of the recent reforms was all very well, but they wanted really to hear about the alternatives. As it happened, many chapters discussed policy options, because scientific method necessarily thinks about alternatives, about competing theories or, in the case of statistics, we need a null hypothesis to evaluate an alternative. In the book’s successor The Whimpering of the State: Policy After MMP I have been more explicit, both describing what I am trying to do, and by including a couple of chapters which discuss alternatives. Today, I am going to describe the statistical decision analysis which underpins both books, but is not presented in them because of the audience. Thankyou for giving me an opportunity to do so.
I am going to make some gross simplifications. The first is that I am going to ignore the rhetoric and the theatre – what Fabian and Sir Andrew might have called the “valour” – although it is never very far away from any policy story.
Second, I shall assume that the goal of policy is agreed. Divergences in goals are more common than are usually admitted, or are a clothed in a rhetoric of the “national interest”. For instance, the current case for lowering tax rates is really about increasing the share of disposable income of those on highest incomes, of altering the pattern of output by reducing further the share of public supplied goods and services and, even more fundamentally, of changing the balance in the national culture towards less community and more individuality. These are proper matters of political debate. Sadly, they are obscured in tax policy by rhetoric and theatre. (There, I told you we cannot easily leave the valour out of the story.)
The third assumption is that the underlying model is not contested. Typically it is, but I want to look at a situation where, say, a government agency has some agreed empirical and theoretical model underpinning its advice. In fact often there is no attempt to think through a systematic model.
Even suppressing valour, goal disagreement, and contestation, there remains serious issues. The discipline of statistical decision analysis is one of the best ways to explore it.
Risks in Policy Advice
Suppose, to keep it simple, there are two policy options, the “no change” policy – keeping the existing policy – and the one being advocated. I will label no-change as “N” (statisticians will automatically think of “Null”) and the advocated one as “A” (as in “Alternative”).
Following tradition, we represent the objective by a parameter, called θ. In summary the advice favours option A over N because, because the policy-advocates believe:
θA > θN. … (1).
Now suppose the minister receiving the advice was well trained in inferential statistics. He or she would begin by noticing that the θ were forecasts consequential on the policy option chosen, and observe that the typical policy document only implies Equation 1, without giving specific estimates. The minister might ask for the actual estimates of θA and θN, or just θA-θN. This is likely to cause a bit of a flurry in the advice agency because it often has no estimate, merely claiming things will get better under A compared to N. But supposing there was. The minister would then ask whether the forecast difference applied for every situation. Probably the officials would say something like: “the actual outturn depends on the circumstances. The provided figures are an estimate of the average improvement as a result of the adoption of the alternate policy.” So they are not saying Equation 1 at all, but
E(θA) > E(θN). … (2).
The Minister, no doubt smiling to her or himself, would then point out that the forecast of the effects of the advice was a probability distribution, and they were simply reporting the mean of this distribution. Why the mean?
Now I happen to have a client who is debating whether they should forecast the mode, median, or mean. The usual justification for the mean, especially if the underlying distribution is asymmetric, is that the loss function is quadratic. I doubt most politicians have a quadratic loss function – its is probably not even symmetrical. If θA-θN is negative, the government will suffer much more grievously than if it is positive. That is the nature of the rhetoric, although it is partly the politicians’ own fault, because they sell their policies as though there will always be a positive gain. When was the last time you heard a policy advocate admitting their policy could go wrong?
Our statistically trained Minister might accept that elaborating the entire distribution of θN-θA was an excessive challenge, but still ask what was the probability that it could be less than zero, which is important given the penalties politicians carry if they are wrong.(1) In practice such questions are rare, although even the most bumbling minister might ask the likelihood that the proposed policy will result in deterioration from the status quo. That is:
“What is P(θA-θN < 0)”?
There are various responses to such a question. Often theory is used to claim there will be always be a positive increase. I wont go through the general analysis but briefly illustrate the claim with the free trade debate. It is a well established economic theory that under certain production and consumption assumptions – and providing full employment is maintained – the introduction of free trade increases total welfare compared to if there had been barriers to trade. The theory does not predict how large the increase is. When New Zealand economists tried to measure the benefits of free trade in the 1970s they found increases in GDP of less than 1 percent.(2) While the mean increases may be small, the theory seems to say the gains from trade are never negative.(3) Advisers relying on this theory often conclude there are never net losses. But even in the case of the trade policy debate, the assumptions of the theory do not always apply. To be certain there will always be gains, may require ignoring the possibility that reality does not match the theory.
Risks and the Tight Prior
This ignoring of reality became exceptionally exaggerated in the reforms of the 1980s. The key paper here is by Melvin Reder, a Chicago professor of economics writing about the methodology of the Chicago School of Economics which underpinned much of the thinking of New Zealand policy advisers in the 1980s.
As befits some of Reder’s scientific contributions, I shift from classical to bayesian inference. Recall that bayesian inference begins with a prior distribution which melds with new information to give a posterior distribution. It is not hard to see this happening in policy advice. The advisers start off with some previous view of the situation (a theory in the Popperian sense), which they combine with new information to get the distribution of θA-θN, on which they base their policy advice. Thus the advice is a mixture of prior theory and evidence, and the prior distribution influences the posterior distribution. A parameter of the prior distribution which markedly influences the posterior is its standard deviation.
Let me illustrate this by supposing we are trying to estimate the mean of the distribution. Suppose the mean of the prior distribution is 10 say, and the new observation is 1. From this information we cannot tell what will be the mean of the posterior distribution. Suppose the variance of the prior distribution is such that 95 percent of the distribution lies between -10 and +30. Then the posterior distribution mean may well be close to the 1 observed from the new evidence. On the other hand, if the 95 percent range for the prior distribution is between 9 and 11, the posterior distribution has a mean much nearer 9.
Reder argues that Chicago School method involves a “tight prior”, that is the prior distribution has a narrow standard deviation.. That means the posterior distribution will be close to the prior distribution, and the theory will dominate the evidence. Transferring the methodology to policy advice, as occurred in the 1980s, a tight prior means the theory rather than evidence will dominate that advice. If, the theory assumes that θA-θN will always be positive, then so will the posterior. Even a shrewd minister, asking for the probability of the downside, could be misled and, presumably, astonished later when the policy outcome proves negative. Despite being told
P(θA-θN < 0) = 0,
very often the outturn is
θA < θN.
We may also that the estimate of E(θA) – E(θN), the mean differences between the policies, is almost certainly biased upwards, a revelation of little comfort to the minister.
We can now understand two features of the post 1980s period. The first is there was a collapse in the quality of empirical work in economics. Certainly there were some who continued to work away, often maintaining high quality, but their role in economics was downgraded. Instead, extraordinarily thin pieces of statistical work were promoted, because they were consistent with the tight prior. The point is that under a tight prior methodology, empirical work does not change the advisers’ minds much: like the drunk with a lamppost it gives support rather than sheds light. Indeed evidence that contradicts the theory is best ignored, it has no effect on the posterior distribution.
Second, we can see why so many of the policies failed. It was not just the theory was wrong, because normally a bad theory gets improved by the facts. But under the tight prior, the theory would not be modified by evidence which contradicted it.
This is well illustrated by the health reforms, based on the proposition that commercialisation would be beneficial. Evidence and expertise to the contrary was ruthlessly ignored. As late as the 1996 Treasury Post Election Briefing showed the graph in the appendix and stated “there does not appear to be a close relationship internationally between total spending per capita and … life expectancy …”(4) In fact, with one exception, the graph shows a plausible positive relationship. Countries which spend more on health tend to have higher life expectancy (although one may not cause the other). The outlier is the United States, well known to be peculiar, for its extensive use of commercial health care delivery, which has proved extremely expensive and inefficient. Americans appear to get little extra benefit from higher health spending. The 1996 Treasury PEB overlooked the obvious, because it was inconsistent with its theory.
So there are two weaknesses of a policy based on a tight prior methodology. It leads to badly designed policy advice, while policy failure does not lead to the revision of the policy. Basically, as argued in detail in my two policy books, Rogernomics with its lack of attention to reality and its suppression of dissent was a high risk strategy. If it succeeded the gains would be great. But when it failed the losses are painful.
Risk and Uncertainty and Safety Margins
Not only was the commercialisation policy methodology flawed, its underlying probability assumptions were problematic. To begin with the economists’ distinction between uncertainty and risk. For an economist, risk involves a knowledge of the probability distribution of the event under consideration, uncertainty does not. A strict bayesian may deny that such a distinction exists, but allow me to avoid this dispute by suggesting that uncertainty is where the knowledge of the relevant probability distribution is so meagre it is, in effect, unusable.
An important implication of uncertainty is to that the relevant decision rules are from game theory. Sometimes, the policy framework designs a system where certain possibilities are totally ruled out. For instance, for reason of child safety, the size of the parts of very young children’s toys have to be larger than that which they could swallow. Perhaps the decision analysis could be put into a probabilistic framework, but it is not.
There is an intermediate stage between risk and uncertainty. Normally we think of the tails of the distributions are “thin”, converging at the same rate, or less, than the normal distribution. That means one can use standard probabilistic decision analysis. Major deviations from the centre of the distribution are unlikely and the expected costs are small. However Benoit Mandelbrot has pointed to distributions which are thick tailed, converging slower than the tails of the normal distribution. In a recent Scientific American he discusses “10 sigma” storms in financial markets.(5)
“According to portfolio theory, the probability of these large fluctuations would be a few millionths of a millionth of a millionth of a millionth. (The fluctuations are greater than 10 standard deviations.) But in fact, one observes spikes on a regular basis – as often as every month – and their probability amounts to a few hundredths.”
He goes on:
“The discrepancies between the pictures painted by modern portfolio theory and the actual movement of prices is obvious. Prices do not vary continuously, and they oscillate wildly at all time scales. Volatility – far from a static entity to be ignored or easily compensated for – is at the very heart of what goes on in financial markets.”
Modern portfolio theory underpins much economic analysis outside financial markets and thereby much policy thinking. For instance the recent electricity market reforms should handle adequately two sigmoid disturbances, but they may not handle higher one. If Mandelbrot is correct we may see more systems breakdown than the minister and advisers might expect. Throughout the economic system is built in surplus capacity. Mandelbrot’s analysis raises the raises the possibility there is insufficient when we shift from the traditional rules of thumb to analyses based on portfolio theory. I shall to be less ambitious than Mandelbrot and look only at four and five sigma crises. For daily fluctuations which are independently normally distributed, they occur between once a decade and once a lifetime.
Consider an accident and emergency service in a hospital. The number of arrivals in any period cannot be exactly predicted, but it conforms to a statistical distribution which can be analyzed. Because most observations are around the median, rather than in the upper tail, that has to be estimated under somewhat stronger assumptions than the middle of the distribution. Under financial pressure, the hospital reviewing its A&E service, might conclude that the on-duty team can be reduced, say, by one doctor (and associated nurses). There will be a bitter dispute between the staff and management. Suppose management prevails. Most of the time the on-duty team will function adequately, although under greater stress than in the past. However, the safety margin has been reduced, and when the fourth or fifth sigma disturbance occurs – perhaps more often than once in a decade or a lifetime – there will be no capacity to cope with every emergency. The possibility is that, literally, someone will bleed to death, because the doctors are so busy with other patients they cannot attend her or him. That may have actually happened at the Christchurch Hospital A&E service. Certainly the Stent report leaves the impression that safety margins were cut back excessively, which may have contributed to unnecessary deaths.
Excessive waiting lists are riddled through the hospital system. The fire safety reforms would have had the effect of reducing the fire brigade’s service to the private dwellings. There are parallels in physical infrastructure such as for energy, transport, water and waste water disposal. Here again the safety margins have been cut back. Generally, capacity is not actively reduced. But as demand rises there is a need to augment capacity. That can be expensive (especially if interest real rates are high) and the margins are squeezed. Under commercialisation we would expect four and five sigma crises to happen more often, especially where demand is growing fastest, for there safety margins are undermined fastest. So we should not be surprised at the energy and water shortages that Auckland city faces. Slower growing urban areas may eventually follow suit, and similar crises may occur in transport, waste-water and other parts of the infrastructure.
The Public Attitude to Risk
What is going on here is the commercialisers promised higher economic growth, but they were doing so, in part, by trading off the safety margins. They failed on the economic growth front, but the safety margins were cut anyway. It is not clear that the public wanted this tradeoff – they were certainly not consulted. The jargon talks of “public sector risk management”, but in practice it is a risk shifting from the public to the private sector. To compensate for the expected public sector failure, the individual has to pay for extra capacity in various ways – additional supplies, insurance, willingness to tolerate queues. Often it is not even clear what the individual can do privately to ensure an adequate safety margin. Health insurance does not in general purchase A&E capacity, while the dwellers of downtown Auckland were not in a position to put in an extra power cable.
In the past the government covered New Zealanders for uncertainty – for the thick tail. It did so by providing a safety margin, or by taking action when a four or five sigma event occurred. That increasingly no longer applies. Yet most New Zealanders still expect such actions from their government. A recent example of failure was the sluggish response of central authorities to the tropical cyclone which struck the Hokianga earlier this year.
There is a realism behind the public’s assessment: if you are out the back you do not expect as good A&E care as in the middle of Auckland, although the closure of the Taumaranui hospital service left many people dismayed. This weekend’s high country snow in the South did not have those affected saying the government ought to have stopped the outages. But they did expect some improvement in response since the last big snow fall, thirty years ago.
Abraham Maslow constructed a well known “hierarchy of needs”:
– need for cognitive understanding;
– need for self actualization;
– esteem needs;
– needs for belongingness and love;
– safety needs;
– physiological needs.
– Bottom (6)
The bottom need is the material one that economics policy is primarily about. It is a foundation for, but not the ultimate end of, human existence. As it is adequately satisfied the higher needs in the hierarchy become increasingly important. The Whimpering of the State argues that government has limited contributions to the higher needs, but in the case of New Zealand, commercialisation often undermined them. A good example is the one I have just demonstrated: the undermining of safety needs.(7)
Where does that leave us? I have not wasted your time today by arguing that the reforms have largely failed. With a few minor exceptions – inflation and the way the rich have benefited – that failure is now widely recognised. At issue is why they failed. Today I have not argued the commercialisation theories of rogernomics were wrong – I have done that elsewhere. Rather, I have argued that the policy process which instituted them was fundamentally flawed, a point elaborated in my two policy books. Today, I have used the statistical decision analysis paradigm to show some ways in which the policy-making was flawed. But in truth, as the books argue, the fundamental problem was an antagonism to the scientific approach.
I hope this paper has perhaps reaffirmed, perhaps extended, your belief in the usefulness of the statistician’s craft in the policy process. Most of us will not be attending the centennial conference in 2049. But one hopes there will be papers at it which will demonstrate a strong input by the statistics profession into the overall public policy process in the next fifty years, just as it has contributed so successfully in the detailed practical level in the last.
1. It is interesting to speculate what probability of failure would be acceptable to the average politician, not to mention what the public would think if they knew the critical level of this analogue of a Type I error.
3. You may ask why then is there such a fierce debate over protection. Despite the rhetoric of increasing output, it is actually about the distribution of output between factors such as labour, capital and land. Factor shares are much more variable under different protections regimes than is aggregate output.
5. See cover feature of February 1999, and letters in June 1999.
6. Maslow (1954:147-150)
7. It is intriguing that there is considerable concern across the political spectrum for the safety needs of law and order and of defence – which must be about a five plus sigma disturbance, while the economic health and related safety needs that are being undermined.