**Keywords**: Distributional Economics; Statistics;

The conventional Engel Curve relates expenditure on (usually) food (F) to aggregate household (usually) expenditure (X). Since it is a one-to-one relation it my be mathematically inverted to

(1) X = f(F, z)

where z is a vector of other variables which are relevant.

Such a function can be used to estimate a poverty line. If we know the F for a household on the poverty line then we can estimate X their total expenditure using equation (1).

A particular functional form widely used is

(2) log(X) = α + βlog(F) + Gzi

where the zi are a set of other effcts, such as the location or ethnicity of a household.

It is a standard assumption – indeed an apparently universal empirical finding – that β > 1, that is total expenditure rises faster than food expenditure. (Note that because we have inverted the equation, the expenditure elasticity is 1/β and the empirical finding is usually 1/β< 1.)

Normally we assume in the presentation that the households are of similar composition (say two adults and two children). However typically the data set consists of observations from households with many different compositions (including 1, 2, 3 and many adults and various numbers of children of various ages).

A simple way of dealing with this is to calculate X and F on a per capita basis. Hover this does not allow for:

(a) Children’s needs differ from adult’s;

(b) There are economies of scale so that a couple may be able to live cheaper than two adults living separately (for the same standard of living) .

Both effects will differ for different expenditure groups. For instance, there are likely to stronger economies of scale for housing, than there is for most other items.

A means of allowing for these effects is ‘household expenditure scales’, the simplest of which is the per capita one. Basically it scales each expenditure to the same equivalent (say, of a single adult). Note that the import of the previous paragraph is that there should be different scales for different expenditure groups.

Mathematically the household equivalence scale is a function of the various elements that make up the household composition. We represent the functions by xe for all expenditure and fe for food expenditure. The arguments in the function will become evident below.

The introduction of the household expenditure scale is to modify equation 2 to

(3) log(X/xe) = α + βlog(F/fe) + Gzi

which following a little manipulation gives

(4) log(X) = α + βlog(F) + (log(xe) – βlog(fe)) + Gzi

In effect this adds an additional term – log(xe) – βlog(fe), which we shall call ‘d’ – to equation 2.

The next step is how to estimate the term. One procedure is to set out a functional form but typically that involves a non-linear estimation. Moreover, we cannot be sure what the form is.

An alternative is to use dummies as follows. This is a simple example:

set d10 = 1 if there is only one adult in the household, = 0 otherwise

set d20 = 1 if there is only two adults in the household, = 0 otherwise

and more generally

set dm0 = 1 if there is only m adults in the household, = 0 otherwise

set d11 = 1 if there is only one adult and on child in the household, = 0 otherwise

and more generally

set dmn = 1 if there is only m adults and n children in the household, = 0 otherwise

One could use a more complicated system. For instance the dummies could be extended to include children by age categories.

Now using the data base estimate the equation

(5) log(X) = βlog(F) + Gzi + G(δnm)(dnm)

where the second summation is across all n and m.

(What has happened to the α? Since Gdnm = 1, then the model would be under-identified if it and all the dnm were included.)

The δnm are the equivalence scale values for each n,m household composition.

Equation 5 is a relatively straight forward exercise to estimate (it is linear in the unknown parameters) providing the data base is reasonably clean, and the zi are known.

There is one disappointment with this method. It is not possible to estimate either log(xe) or log(fe) from the δnm, because the relevant equation will be under-identified. One needs to make some external assumption (such as log(xe) = log(fe) ). That limits one’s ability to assess whether the δnm are sensible.