Gujarati (pages 312-327)

•

IDP

Débora

09/03/2023

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 3, do total de 13 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 6, do total de 13 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 9, do total de 13 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

E aí, curtiu este material?

Ajude a incentivar outros estudantes a melhorar o conteúdo

Gostou desse material? Compartilhe! 🧡

Metodo de Pesquisa de Survey

24 Materiais compartilhados

Baixe o app para aproveitar ainda mais

Leia os materiais offline, sem usar a internet. Além de vários outros recursos!

Prévia do material em texto

Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
312 PART ONE: SINGLE-EQUATION REGRESSION MODELS
12A time series may contain four components: a seasonal, a cyclical, a trend, and one that
is strictly random.
13For the various methods of seasonal adjustment, see, for instance, Francis X. Diebod,
Elements of Forecasting, 2d ed., South-Western Publishers, 2001, Chap. 5.
The preceding example clearly reveals the role of interaction dummies
when two or more qualitative regressors are included in the model. It is
important to note that in the model (9.6.5) we are assuming that the rate of
increase of hourly earnings with respect to education (of about 80 cents per
additional year of schooling) remains constant across gender and race. But
this may not be the case. If you want to test for this, you will have to intro-
duce differential slope coefficients (see exercise 9.25)
9.7 THE USE OF DUMMY VARIABLES IN SEASONAL ANALYSIS
Many economic time series based on monthly or quarterly data exhibit
seasonal patterns (regular oscillatory movements). Examples are sales of
department stores at Christmas and other major holiday times, demand for
money (or cash balances) by households at holiday times, demand for ice
cream and soft drinks during summer, prices of crops right after harvesting
season, demand for air travel, etc. Often it is desirable to remove the sea-
sonal factor, or component, from a time series so that one can concentrate
on the other components, such as the trend.12 The process of removing
the seasonal component from a time series is known as deseasonalization
or seasonal adjustment, and the time series thus obtained is called the
deseasonalized, or seasonally adjusted, time series. Important economic
time series, such as the unemployment rate, the consumer price index (CPI),
the producer’s price index (PPI), and the index of industrial production, are
usually published in seasonally adjusted form.
There are several methods of deseasonalizing a time series, but we will
consider only one of these methods, namely, the method of dummy vari-
ables.13 To illustrate how the dummy variables can be used to deseasonalize
economic time series, consider the data given in Table 9.3. This table gives
quarterly data for the years 1978–1995 on the sale of four major appliances,
dishwashers, garbage disposers, refrigerators, and washing machines, all
data in thousands of units. The table also gives data on durable goods expen-
diture in 1982 billions of dollars.
To illustrate the dummy technique, we will consider only the sales of re-
frigerators over the sample period. But first let us look at the data, which is
shown in Figure 9.4. This figure suggests that perhaps there is a seasonal
pattern in the data associated with the various quarters. To see if this is the
case, consider the following model:
Yt = α1D1t + α2D2t + α3tD3t + α4D4t + ut (9.7.1)
where Yt = sales of refrigerators (in thousands) and the D’s are the dum-
mies, taking a value of 1 in the relevant quarter and 0 otherwise. Note that
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 313
Note: DISH = dishwashers; DISP = garbage disposers; FRIG = refrigerators; WASH = dishwashers; DUR =
durable goods expenditure, billions of 1992 dollars.
Source: Business Statistics and Survey of Current Business, Department of Commerce (various issues).
480 706 943 1036 247.7
530 582 1175 1019 249.1
557 659 1269 1047 251.8
602 837 973 918 262
658 867 1102 1137 263.3
749 860 1344 1167 280
827 918 1641 1230 288.5
858 1017 1225 1081 300.5
808 1063 1429 1326 312.6
840 955 1699 1228 322.5
893 973 1749 1297 324.3
950 1096 1117 1198 333.1
838 1086 1242 1292 344.8
884 990 1684 1342 350.3
905 1028 1764 1323 369.1
909 1003 1328 1274 356.4
to avoid the dummy variable trap, we are assigning a dummy to each quarter
of the year, but omitting the intercept term. If there is any seasonal effect in a
given quarter, that will be indicated by a statistically significant t value of the
dummy coefficient for that quarter.14
78
800
79 80 81 82 83 84 85 86
1000
1200
1400
1600
1800
T
h
o
u
sa
n
d
s
o
f
u
n
it
s
Year
FIGURE 9.4 Sales of refrigerators 1978–1985 (quarterly).
14Note a technical point. This method of assigning a dummy to each quarter assumes that
the seasonal factor, if present, is deterministic and not stochastic. We will revisit this topic
when we discuss time series econometrics in Part V of this book.
TABLE 9.3 QUARTERLY DATA ON APPLIANCE SALES (IN THOUSANDS)
AND EXPENDITURE ON DURABLE GOODS (1978-I TO 1985-IV)
DISH DISP FRIG WASH DUR DISH DISP FRIG WASH DUR
841 798 1317 1271 252.6
957 837 1615 1295 272.4
999 821 1662 1313 270.9
960 858 1295 1150 273.9
894 837 1271 1289 268.9
851 838 1555 1245 262.9
863 832 1639 1270 270.9
878 818 1238 1103 263.4
792 868 1277 1273 260.6
589 623 1258 1031 231.9
657 662 1417 1143 242.7
699 822 1185 1101 248.6
675 871 1196 1181 258.7
652 791 1410 1116 248.4
628 759 1417 1190 255.5
529 734 919 1125 240.4
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
314 PART ONE: SINGLE-EQUATION REGRESSION MODELS
Notice that in (9.7.1) we are regressing Y effectively on an intercept, ex-
cept that we allow for a different intercept in each season (i.e., quarter). As
a result, the dummy coefficient of each quarter will give us the mean refrig-
erator sales in each quarter or season (why?).
EXAMPLE 9.6
SEASONALITY IN REFRIGERATOR SALES
From the data on refrigerator sales given in Table 9.3, we obtain the following regression
results:
Ŷt = 1222.125D1t + 1467.500D2t + 1569.750D3t + 1160.000D4t
t = (20.3720) (24.4622) (26.1666) (19.3364) (9.7.2)
R2 = 0.5317
Note:We have not given the standard errors of the estimated coefficients, as each standard
error is equal to 59.9904, because all the dummies take only a value of 1 or zero.
The estimated α coefficients in (9.7.2) represent the average, or mean, sales of refriger-
ators (in thousands of units) in each season (i.e., quarter). Thus, the average sale of refrig-
erators in the first quarter, in thousands of units, is about 1222, that in the second quarter
about 1468, that in the third quarter about 1570, and that in the fourth quarter about 1160.
(Continued)
TABLE 9.4 U.S. REFRIGERATOR SALES (THOUSANDS),1978–1995 (QUARTERLY)
FRIG DUR D2 D3 D4 FRIG DUR D2 D3 D4
1317 252.6 0 0 0 943 247.7 0 0 0
1615 272.4 1 0 0 1175 249.1 1 0 0
1662 270.9 0 1 0 1269 251.8 0 1 0
1295 273.9 0 0 1 973 262.0 0 0 1
1271 268.9 0 0 0 1102 263.3 0 0 0
1555 262.9 1 0 0 1344 280.0 1 0 0
1639 270.9 0 1 0 1641 288.5 0 1 0
1238 263.4 0 0 1 1225 300.5 0 0 1
1277 260.6 0 0 0 1429 312.6 0 0 0
1258 231.9 1 0 0 1699 322.5 1 0 0
1417 242.7 0 1 0 1749 324.3 0 1 0
1185 248.6 0 0 1 1117 333.1 0 0 1
1196 258.7 0 0 0 1242 344.8 0 0 0
1410 248.4 1 0 0 1684 350.3 1 0 0
1417 255.5 0 1 0 1764 369.1 0 1 0
919 240.4 0 0 1 1328 356.4 0 0 1
Note: FRIG = refrigerator sales, thousands
DUR = durable goods expenditure, billions of 1992 dollars
D2 = 1 in the second quarter, 0 otherwise
D3 = 1 in the third quarter, 0 otherwise
D4 = 1 in the fourth quarter, 0 otherwise
Source: Business Statistics and Survey of Current Business, Department of Commerce (various
issues).
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 315
Incidentally, instead of assigning a dummy for each quarter and suppressing the intercept
term to avoid the dummy variable trap, we could assign only three dummies and include the
intercept term.Suppose we treat the first quarter as the reference quarter and assign dum-
mies to the second, third, and fourth quarters. This produces the following regression results
(see Table 9.4 for the data setup):
Ŷt = 1222.1250 + 245.3750D2t + 347.6250D3t − 62.1250D4t
t = (20.3720)* (2.8922)* (4.0974)* (−0.7322)** (9.7.3)
R2 = 0.5318
where * indicates p values less than 5 percent and ** indicates p values greater than 5 percent.
Since we are treating the first quarter as the benchmark, the coefficients attached to the
various dummies are now differential intercepts, showing by how much the average value of
Y in the quarter that receives a dummy value of 1 differs from that of the benchmark quarter.
Put differently, the coefficients on the seasonal dummies will give the seasonal increase or
decrease in the average value of Y relative to the base season. If you add the various differ-
ential intercept values to the benchmark average value of 1222.125, you will get the average
value for the various quarters. Doing so, you will reproduce exactly Eq. (9.7.2), except for the
rounding errors.
But now you will see the value of treating one quarter as the benchmark quarter, for
(9.7.3) shows that the average value of Y for the fourth quarter is not statistically different
from the average value for the first quarter, as the dummy coefficient for the fourth quarter is
not statistically significant. Of course, your answer will change, depending on which quarter
you treat as the benchmark quarter, but the overall conclusion will not change.
How do we obtain the deseasonalized time series of refrigerator sales? This can be done
easily. You estimate the values of Y from model (9.7.2) [or (9.7.3)] for each observation and
subtract them from the actual values of Y, that is, you obtain (Yt −Ŷt ) which are simply the
residuals from the regression (9.7.2). We show them in Table 9.5.15
What do these residuals represent? They represent the remaining components of the re-
frigerator time series, namely, the trend, cycle, and random components (but see the caution
given in footnote 15).
Since models (9.7.2) and (9.7.3) do not contain any covariates, will the picture change if
we bring in a quantitative regressor in the model? Since expenditure on durable goods has
an important factor influence on the demand for refrigerators, let us expand our model (9.7.3)
by bringing in this variable. The data for durable goods expenditure in billions of 1982 dollars
are already given in Table 9.3. This is our (quantitative) X variable in the model. The regres-
sion results are as follows
Ŷt = 456.2440 + 242.4976D2t + 325.2643D3t − 86.0804D4t + 2.7734Xt
t = (2.5593)* (3.6951)* (4.9421)* (−1.3073)** (4.4496)* (9.7.4)
R2 = 0.7298
where * indicates p values less than 5 percent and ** indicates p values greater than 5 percent.
15Of course, this assumes that the dummy variables technique is an appropriate method of
deseasonalizing a time series and that a time series (TS) can be represented as: TS = s + c +
t + u, where s represents the seasonal, t the trend, c the cyclical, and u the random component.
However, if the time series is of the form, TS = (s)(c)(t)(u), where the four components enter
multiplicatively, the preceding method of deseasonalization is inappropriate, for that method
assumes that the four components of a time series are additive. But we will have more to say
about this topic in the chapters on time series econometrics.
(Continued)
EXAMPLE 9.6 (Continued)
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
316 PART ONE: SINGLE-EQUATION REGRESSION MODELS
TABLE 9.5
REFRIGERATOR SALES REGRESSION: ACTUAL, FITTED, AND RESIDUAL
VALUES (EQ. 9.7.3)
Residual graph
Actual Fitted Residuals 0
1978-I 1317 1222.12 94.875 . .
1978-II 1615 1467.50 147.500 . .
1978-III 1662 1569.75 92.250 . .
1978-IV 1295 1160.00 135.000 . .
1979-I 1271 1222.12 48.875 . .
1979-II 1555 1467.50 87.500 . .
1979-III 1639 1569.75 69.250 . .
1979-IV 1238 1160.00 78.000 . .
1980-I 1277 1222.12 54.875 . .
1980-II 1258 1467.50 −209.500 . .
1980-III 1417 1569.75 −152.750 . .
1980-IV 1185 1160.00 25.000 . .
1981-I 1196 1222.12 −26.125 . .
1981-II 1410 1467.50 −57.500 . .
1981-III 1417 1569.75 −152.750 . .
1981-IV 919 1160.00 −241.000 . .
1982-I 943 1222.12 −279.125 . .
1982-II 1175 1467.50 −292.500 . .
1982-III 1269 1569.75 −300.750 . .
1982-IV 973 1160.00 −187.000 . .
1983-I 1102 1222.12 −120.125 . .
1983-II 1344 1467.50 −123.500 . .
1983-III 1641 1569.75 71.250 . .
1983-IV 1225 1160.00 65.000 . .
1984-I 1429 1222.12 206.875 . .
1984-II 1699 1467.50 231.500 . .
1984-III 1749 1569.75 179.250 . .
1984-IV 1117 1160.00 −43.000 . .
1985-I 1242 1222.12 19.875 . .
1985-II 1684 1467.50 216.500 . .
1985-III 1764 1569.75 194.250 . .
1985-IV 1328 1160.00 168.000 .
− 0 +
Again, keep in mind that we are treating the first quarter as our base. As in (9.7.3), we see
that the differential intercept coefficients for the second and third quarters are statistically dif-
ferent from that of the first quarter, but the intercepts of the fourth quarter and the first quar-
ter are statistically about the same. The coefficient of X (durable goods expenditure) of about
2.77 tells us that, allowing for seasonal effects, if expenditure on durable goods goes up
by a dollar, on average, sales of refrigerators go up by about 2.77 units, that is, approximately
3 units; bear in mind that refrigerators are in thousands of units and X is in (1982) billions
of dollars.
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
EXAMPLE 9.6 (Continued)
(Continued)
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 317
16For proof, see Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar, Lyme, U.K.,
1995, pp. 150–152.
An interesting question here is: Just as sales of refrigerators exhibit seasonal patterns,
would not expenditure on durable goods also exhibit seasonal patterns? How then do we
take into account seasonality in X? The interesting thing about (9.7.4) is that the dummy
variables in that model not only remove the seasonality in Y but also the seasonality, if any,
in X. (This follows from a well-known theorem in statistics, known as the Frisch–Waugh
theorem.16) So to speak, we kill (deseasonalize) two birds (two series) with one stone (the
dummy technique).
If you want an informal proof of the preceding statement, just follow these steps: (1) Run
the regression of Y on the dummies as in (9.7.2) or (9.7.3) and save the residuals, say, S1;
these residuals represent deseasonalized Y. (2) Run a similar regression for X and obtain
the residuals from this regression, say, S2; these residuals represent deseasonalized X.
(3) Regress S1 on S2. You will find that the slope coefficient in this regression is precisely the
coefficient of X in the regression (9.7.4).
9.8 PIECEWISE LINEAR REGRESSION
To illustrate yet another use of dummy variables, consider Figure 9.5, which
shows how a hypothetical company remunerates its sales representatives.
It pays commissions based on sales in such a manner that up to a certain
level, the target, or threshold, level X*, there is one (stochastic) commission
structure and beyond that level another. (Note: Besides sales, other factors
affect sales commission. Assume that these other factors are represented
EXAMPLE 9.6 (Continued)
I
II
S
a
le
s
c
o
m
m
is
si
o
n
X*
Y
X (sales)
FIGURE 9.5 Hypothetical relationship between sales commission and sales volume. (Note: The intercept on
the Y axis denotes minimum guaranteed commission.)
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy VariableRegression Models
© The McGraw−Hill
Companies, 2004
318 PART ONE: SINGLE-EQUATION REGRESSION MODELS
by the stochastic disturbance term.) More specifically, it is assumed that
sales commission increases linearly with sales until the threshold level X*,
after which also it increases linearly with sales but at a much steeper rate.
Thus, we have a piecewise linear regression consisting of two linear
pieces or segments, which are labeled I and II in Figure 9.5, and the com-
mission function changes its slope at the threshold value. Given the data on
commission, sales, and the value of the threshold level X*, the technique of
dummy variables can be used to estimate the (differing) slopes of the two
segments of the piecewise linear regression shown in Figure 9.5. We pro-
ceed as follows:
Yi = α1 + β1Xi + β2(Xi − X
*)Di + ui (9.8.1)
where Yi = sales commission
Xi = volume of sales generated by the sales person
X* = threshold value of sales also known as a knot (known in
advance)17
D = 1 if Xi > X
*
= 0 if Xi < X
*
Assuming E(ui) = 0, we see at once that
E(Yi | Di = 0, Xi , X
*) = α1 + β1Xi (9.8.2)
which gives the mean sales commission up to the target level X* and
E(Yi | Di = 1, Xi , X
*) = α1 − β2X
* + (β1 + β2)Xi (9.8.3)
which gives the mean sales commission beyond the target level X*.
Thus, β1 gives the slope of the regression line in segment I, and β1 + β2
gives the slope of the regression line in segment II of the piecewise linear
regression shown in Figure 9.5. A test of the hypothesis that there is no
break in the regression at the threshold value X* can be conducted easily by
noting the statistical significance of the estimated differential slope coeffi-
cient β̂2 (see Figure 9.6).
Incidentally, the piecewise linear regression we have just discussed is an
example of a more general class of functions known as spline functions.18
17The threshold value may not always be apparent, however. An ad hoc approach is to plot
the dependent variable against the explanatory variable(s) and observe if there seems to be a
sharp change in the relation after a given value of X (i.e., X*). An analytical approach to finding
the break point can be found in the so-called switching regression models. But this is an
advanced topic and a textbook discussion may be found in Thomas Fomby, R. Carter Hill, and
Stanley Johnson, Advanced Econometric Methods, Springer-Verlag, New York, 1984, Chap. 14.
18For an accessible discussion on splines (i.e., piecewise polynomials of order k), see
Douglas C. Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining, Introduction to Linear
Regression Analysis, John Wiley & Sons, 3d ed., New York, 2001, pp. 228–230.
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 319
Y
a1 – b2X
*α β
a1α
S
a
le
s
co
m
m
is
si
o
n
X*
1
b1β
1
b1+ b2β β
X (sales)
FIGURE 9.6 Parameters of the piecewise linear regression.
EXAMPLE 9.7
TOTAL COST IN RELATION TO OUTPUT
As an example of the application of the piecewise linear
regression, consider the hypothetical total cost–total
output data given in Table 9.6. We are told that the
total cost may change its slope at the output level of
5500 units.
Letting Y in (9.8.4) represent total cost and X total
output, we obtain the following results:
Ŷi = −145.72 + 0.2791Xi + 0.0945(Xi − X *i )Di
t = (−0.8245) (6.0669) (1.1447) (9.8.4)
R2 = 0.9737 X* = 5500
As these results show, the marginal cost of production
is about 28 cents per unit and although it is about
37 cents (28 + 9) for output over 5500 units, the differ-
ence between the two is not statistically significant be-
cause the dummy variable is not significant at, say, the
TABLE 9.6
HYPOTHETICAL DATA ON OUTPUT AND
TOTAL COST
Total cost, dollars Output, units
256 1,000
414 2,000
634 3,000
778 4,000
1,003 5,000
1,839 6,000
2,081 7,000
2,423 8,000
2,734 9,000
2,914 10,000
5 percent level. For all practical purposes, then, one can
regress total cost on total output, dropping the dummy
variable.
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
320 PART ONE: SINGLE-EQUATION REGRESSION MODELS
9.9 PANEL DATA REGRESSION MODELS
Recall that in Chapter 1 we discussed a variety of data that are available for
empirical analysis, such as cross-section, time series, pooled (combination
of time series and cross-section data), and panel data. The technique of
dummy variable can be easily extended to pooled and panel data. Since the
use of panel data is becoming increasingly common in applied work, we will
consider this topic in some detail in Chapter 16.
9.10 SOME TECHNICAL ASPECTS OF
THE DUMMY VARIABLE TECHNIQUE
The Interpretation of Dummy Variables
in Semilogarithmic Regressions
In Chapter 6 we discussed the log–lin models, where the regressand is loga-
rithmic and the regressors are linear. In such a model, the slope coefficients
of the regressors give the semielasticity, that is, the percentage change in the
regressand for a unit change in the regressor. This is only so if the regressor
is quantitative. What happens if a regressor is a dummy variable? To be spe-
cific, consider the following model:
lnYi = β1 + β2Di + ui (9.10.1)
where Y = hourly wage rate ($) and D = 1 for female and 0 for male.
How do we interpret such a model? Assuming E(ui) = 0, we obtain:
Wage function for male workers:
E(lnYi | Di = 0) = β1 (9.10.2)
Wage function for female workers:
E(lnYi | Di = 1) = β1 + β2 (9.10.3)
Therefore, the intercept β1 gives the mean log hourly earnings and the
“slope” coefficient gives the difference in the mean log hourly earnings of
male and females. This is a rather awkward way of stating things. But if we
take the antilog of β1, what we obtain is not the mean hourly wages of male
workers, but their median wages. As you know, mean, median, and mode
are the three measures of central tendency of a random variable. And if
we take the antilog of (β1 + β2), we obtain the median hourly wages of
female workers.
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 321
Dummy Variables and Heteroscedasticity
Let us revisit our savings–income regression for the United States for the
periods 1970–1981 and 1982–1995 and for the entire period 1970–1995. In
testing for structural stability using the dummy technique, we assumed that
the error var (u1i) = var (u2i) = σ
2, that is, the error variances in the two
periods were the same. This was also the assumption underlying the Chow
test. If this assumption is not valid—that is, the error variances in the two
subperiods are different—it is quite possible to draw misleading conclu-
sions. Therefore, one must first check on the equality of variances in the
subperiod, using suitable statistical techniques. Although we will discuss
this topic more thoroughly in the chapter on heteroscedasticity, in
Chapter 8 we showed how the F test can be used for this purpose.20 (See our
discussion of the Chow test in that chapter.) As we showed there, it seems
the error variances in the two periods are not the same. Hence, the results
of both the Chow test and the dummy variable technique presented before
may not be entirely reliable. Of course, our purpose here is to illustrate the
various techniques that one can use to handle a problem (e.g., the problem
of structural stability). In any particular application, these techniques may
not be valid. But that is par for most statistical techniques. Of course, one
can take appropriate remedial actions to resolve the problem, as we will do
in the chapter on heteroscedasticity later (however, see exercise 9.28).
19Robert Halvorsen and RaymondPalmquist, “The Interpretation of Dummy Variables in
Semilogarithmic Equations,” American Economic Review, vol. 70, no. 3, pp. 474–475.
20The Chow test procedure can be performed even in the presence of heteroscedasticity, but
then one will have to use the Wald test. The mathematics involved behind the test is somewhat
involved. But in the chapter on heteroscedasticity, we will revisit this topic.
we obtain 6.8796 ($), which is themedian hourly earnings
of female workers. Thus, the female workers’ median
hourly earnings is lower by about 21.94 percent com-
pared to their male counterparts [(8.8136 − 6.8796)/
8.8136].
Interestingly, we can obtain semielasticity for a
dummy regressor directly by the device suggested by
Halvorsen and Palmquist.19 Take the antilog (to base e)
of the estimated dummy coefficient and subtract 1 from it
and multiply the difference by 100. (For the underlying
logic, see Appendix 9.A.1.) Therefore, if you take the
antilog of−0.2437, you will obtain 0.78366. Subtracting 1
from this gives −0.2163, after multiplying this by 100, we
get −21.63 percent, suggesting that a female worker’s
(D = 1) median salary is lower than that of her male
counterpart by about 21.63 percent, the same as we
obtained previously, save the rounding errors.
EXAMPLE 9.8
LOGARITHM OF HOURLY WAGES
IN RELATION TO GENDER
To illustrate (9.10.1), we use the data that underlie
Example 9.2. The regression results based on 528 ob-
servations are as follows:
̂lnYi = 2.1763 − 0.2437Di
t = (72.2943)* (−5.5048)* (9.10.4)
R2 = 0.0544
where * indicates p values are practically zero.
Taking the antilog of 2.1763, we find 8.8136 ($),
which is the median hourly earnings of male workers,
and taking the antilog of [(2.1763 − 0.2437) = 1.92857],
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
322 PART ONE: SINGLE-EQUATION REGRESSION MODELS
21P. A.V.B. Swamy, Statistical Inference in Random Coefficient Regression Models, Springer-
Verlag, Berlin, 1971.
Dummy Variables and Autocorrelation
Besides homoscedasticity, the classical linear regression model assumes
that the error term in the regression models is uncorrelated. But what hap-
pens if that is not the case, especially in models involving dummy regres-
sors? Since we will discuss the topic of autocorrelation in depth in the chap-
ter on autocorrelation, we will defer the answer to this question until then.
What Happens if the Dependent Variable Is a Dummy Variable?
So far we have considered models in which the regressand is quantitative
and the regressors are quantitative or qualitative or both. But there are
occasions where the regressand can also be qualitative or dummy. Consider,
for example, the decision of a worker to participate in the labor force. The
decision to participate is of the yes or no type, yes if the person decides to
participate and no otherwise. Thus, the labor force participation variable is
a dummy variable. Of course, the decision to participate in the labor force
depends on several factors, such as the starting wage rate, education, and
conditions in the labor market (as measured by the unemployment rate).
Can we still use OLS to estimate regression models where the regressand
is dummy? Yes, mechanically, we can do so. But there are several statistical
problems that one faces in such models. And since there are alternatives
to OLS estimation that do not face these problems, we will discuss this topic
in a later chapter (see Chapter 15 on logit and probit models). In that chap-
ter we will also discuss models in which the regressand has more than two
categories; for example, the decision to travel to work by car, bus, or train,
or the decision to work part-time, full time, or not work at all. Such models
are called polytomous dependent variable models in contrast to dichoto-
mous dependent variable models in which the dependent variable has
only two categories.
9.11 TOPICS FOR FURTHER STUDY
Several topics related to dummy variables are discussed in the literature that
are rather advanced, including (1) random, or varying, parameters mod-
els, (2) switching regression models, and (3) disequilibrium models.
In the regression models considered in this text it is assumed that the
parameters, the β ’s, are unknown but fixed entities. The random coefficient
models—and there are several versions of them—assume the β ’s can be ran-
dom too. A major reference work in this area is by Swamy.21
In the dummy variable model using both differential intercepts and
slopes, it is implicitly assumed that we know the point of break. Thus, in
our savings–income example for 1970–1995, we divided the period into
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. Dummy Variable
Regression Models
© The McGraw−Hill
Companies, 2004
CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 323
22S.Goldfeld andR.Quandt,NonlinearMethods inEconometrics,NorthHolland, Amsterdam,
1972.
23Richard E. Quandt, The Econometrics of Disequilibrium, Basil Blackwell, New York, 1988.
1970–1981 and 1982–1995, the pre- and postrecession periods, under the
belief that the recession in 1982 changed the relation between savings and
income. Sometimes it is not easy to pinpoint when the break took place. The
technique of switching regression models (SRM) is developed for such
situations. SRM treats the breakpoint as a random variable and through an
iterative process determines when the break might have actually taken
place. The seminal work in this area is by Goldfeld and Quandt.22
Special estimation techniques are required to deal with what are known
as disequilibrium situations, that is, situations where markets do not
clear (i.e., demand is not equal to supply). The classic example is that of
demand for and supply of a commodity. The demand for a commodity is a
function of its price and other variables, and the supply of the commodity
is a function of its price and other variables, some of which are different
from those entering the demand function. Now the quantity actually bought
and sold of the commodity may not necessarily be equal to the one obtained
by equating the demand to supply, thus leading to disequilibrium. For a
thorough discussion of disequilibrium models, the reader may refer to
Quandt.23
9.12 SUMMARY AND CONCLUSIONS
1. Dummy variables, taking values of 1 and zero (or their linear trans-
forms), are a means of introducing qualitative regressors in regression
models.
2. Dummy variables are a data-classifying device in that they divide a
sample into various subgroups based on qualities or attributes (gender,
marital status, race, religion, etc. ) and implicitly allow one to run individual
regressions for each subgroup. If there are differences in the response of the
regressand to the variation in the qualitative variables in the various sub-
groups, they will be reflected in the differences in the intercepts or slope
coefficients, or both, of the various subgroup regressions.
3. Although a versatile tool, the dummy variable technique needs to be
handled carefully. First, if the regression contains a constant term, the num-
ber of dummy variables must be one less than the number of classifications
of each qualitative variable. Second, the coefficient attached to the dummy
variables must always be interpreted in relation to the base, or reference,
group—that is, the group that receives the value of zero. The base chosen
will depend on the purpose of research at hand. Finally, if a model has sev-
eral qualitative variables with several classes, introduction of dummy vari-
ables can consume a large number of degrees of freedom. Therefore, one
should always weigh the number of dummy variables to be introduced
against the total number of observations available for analysis.
Gujarati: Basic
Econometrics, Fourth
Edition
I. Single−Equation
Regression Models
9. DummyVariable
Regression Models
© The McGraw−Hill
Companies, 2004
324 PART ONE: SINGLE-EQUATION REGRESSION MODELS
*Jane Leuthold, “The Effect of Taxation on the Hours Worked by Married Women,” Indus-
trial and Labor Relations Review, no. 4, July 1978, pp. 520–526 (notation changed to suit our
format).
4. Among its various applications, this chapter considered but a few.
These included (1) comparing two (or more) regressions, (2) deseasonaliz-
ing time series data, (3) interactive dummies, (4) interpretation of dummies
in semilog models, and (4) piecewise linear regression models.
5. We also sounded cautionary notes in the use of dummy variables in
situations of heteroscedasticity and autocorrelation. But since we will cover
these topics fully in subsequent chapters, we will revisit these topics then.
EXERCISES
Questions
9.1. If you have monthly data over a number of years, how many dummy vari-
ables will you introduce to test the following hypotheses:
a. All the 12 months of the year exhibit seasonal patterns.
b. Only February, April, June, August, October, and December exhibit
seasonal patterns.
9.2. Consider the following regression results (t ratios are in parentheses)*:
Ŷi = 1286 + 104.97X2i − 0.026X3i + 1.20X4i + 0.69X5i
t = (4.67) (3.70) (−3.80) (0.24) (0.08)
−19.47X6i + 266.06X7i − 118.64X8i − 110.61X9i
(−0.40) (6.94) (−3.04) (−6.14)
R2 = 0.383 n = 1543
where Y = wife’s annual desired hours of work, calculated as usual hours
of work per year plus weeks looking for work
X2 = after-tax real average hourly earnings of wife
X3 = husband’s previous year after-tax real annual earnings
X4 = wife’s age in years
X5 = years of schooling completed by wife
X6 = attitude variable, 1 = if respondent felt that it was all right for
a woman to work if she desired and her husband agrees,
0 = otherwise
X7 = attitude variable, 1 = if the respondent’s husband favored his
wife’s working, 0 = otherwise
X8 = number of children less than 6 years of age
X9 = number of children in age groups 6 to 13
a. Do the signs of the coefficients of the various nondummy regressors
make economic sense? Justify your answer.
b. How would you interpret the dummy variables, X6 and X7? Are these
dummies statistically significant? Since the sample is quite large, you
may use the “2-t” rule of thumb to answer the question.