Baixe o app para aproveitar ainda mais
Prévia do material em texto
Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 312 PART ONE: SINGLE-EQUATION REGRESSION MODELS 12A time series may contain four components: a seasonal, a cyclical, a trend, and one that is strictly random. 13For the various methods of seasonal adjustment, see, for instance, Francis X. Diebod, Elements of Forecasting, 2d ed., South-Western Publishers, 2001, Chap. 5. The preceding example clearly reveals the role of interaction dummies when two or more qualitative regressors are included in the model. It is important to note that in the model (9.6.5) we are assuming that the rate of increase of hourly earnings with respect to education (of about 80 cents per additional year of schooling) remains constant across gender and race. But this may not be the case. If you want to test for this, you will have to intro- duce differential slope coefficients (see exercise 9.25) 9.7 THE USE OF DUMMY VARIABLES IN SEASONAL ANALYSIS Many economic time series based on monthly or quarterly data exhibit seasonal patterns (regular oscillatory movements). Examples are sales of department stores at Christmas and other major holiday times, demand for money (or cash balances) by households at holiday times, demand for ice cream and soft drinks during summer, prices of crops right after harvesting season, demand for air travel, etc. Often it is desirable to remove the sea- sonal factor, or component, from a time series so that one can concentrate on the other components, such as the trend.12 The process of removing the seasonal component from a time series is known as deseasonalization or seasonal adjustment, and the time series thus obtained is called the deseasonalized, or seasonally adjusted, time series. Important economic time series, such as the unemployment rate, the consumer price index (CPI), the producer’s price index (PPI), and the index of industrial production, are usually published in seasonally adjusted form. There are several methods of deseasonalizing a time series, but we will consider only one of these methods, namely, the method of dummy vari- ables.13 To illustrate how the dummy variables can be used to deseasonalize economic time series, consider the data given in Table 9.3. This table gives quarterly data for the years 1978–1995 on the sale of four major appliances, dishwashers, garbage disposers, refrigerators, and washing machines, all data in thousands of units. The table also gives data on durable goods expen- diture in 1982 billions of dollars. To illustrate the dummy technique, we will consider only the sales of re- frigerators over the sample period. But first let us look at the data, which is shown in Figure 9.4. This figure suggests that perhaps there is a seasonal pattern in the data associated with the various quarters. To see if this is the case, consider the following model: Yt = α1D1t + α2D2t + α3tD3t + α4D4t + ut (9.7.1) where Yt = sales of refrigerators (in thousands) and the D’s are the dum- mies, taking a value of 1 in the relevant quarter and 0 otherwise. Note that Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 313 Note: DISH = dishwashers; DISP = garbage disposers; FRIG = refrigerators; WASH = dishwashers; DUR = durable goods expenditure, billions of 1992 dollars. Source: Business Statistics and Survey of Current Business, Department of Commerce (various issues). 480 706 943 1036 247.7 530 582 1175 1019 249.1 557 659 1269 1047 251.8 602 837 973 918 262 658 867 1102 1137 263.3 749 860 1344 1167 280 827 918 1641 1230 288.5 858 1017 1225 1081 300.5 808 1063 1429 1326 312.6 840 955 1699 1228 322.5 893 973 1749 1297 324.3 950 1096 1117 1198 333.1 838 1086 1242 1292 344.8 884 990 1684 1342 350.3 905 1028 1764 1323 369.1 909 1003 1328 1274 356.4 to avoid the dummy variable trap, we are assigning a dummy to each quarter of the year, but omitting the intercept term. If there is any seasonal effect in a given quarter, that will be indicated by a statistically significant t value of the dummy coefficient for that quarter.14 78 800 79 80 81 82 83 84 85 86 1000 1200 1400 1600 1800 T h o u sa n d s o f u n it s Year FIGURE 9.4 Sales of refrigerators 1978–1985 (quarterly). 14Note a technical point. This method of assigning a dummy to each quarter assumes that the seasonal factor, if present, is deterministic and not stochastic. We will revisit this topic when we discuss time series econometrics in Part V of this book. TABLE 9.3 QUARTERLY DATA ON APPLIANCE SALES (IN THOUSANDS) AND EXPENDITURE ON DURABLE GOODS (1978-I TO 1985-IV) DISH DISP FRIG WASH DUR DISH DISP FRIG WASH DUR 841 798 1317 1271 252.6 957 837 1615 1295 272.4 999 821 1662 1313 270.9 960 858 1295 1150 273.9 894 837 1271 1289 268.9 851 838 1555 1245 262.9 863 832 1639 1270 270.9 878 818 1238 1103 263.4 792 868 1277 1273 260.6 589 623 1258 1031 231.9 657 662 1417 1143 242.7 699 822 1185 1101 248.6 675 871 1196 1181 258.7 652 791 1410 1116 248.4 628 759 1417 1190 255.5 529 734 919 1125 240.4 Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 314 PART ONE: SINGLE-EQUATION REGRESSION MODELS Notice that in (9.7.1) we are regressing Y effectively on an intercept, ex- cept that we allow for a different intercept in each season (i.e., quarter). As a result, the dummy coefficient of each quarter will give us the mean refrig- erator sales in each quarter or season (why?). EXAMPLE 9.6 SEASONALITY IN REFRIGERATOR SALES From the data on refrigerator sales given in Table 9.3, we obtain the following regression results: Ŷt = 1222.125D1t + 1467.500D2t + 1569.750D3t + 1160.000D4t t = (20.3720) (24.4622) (26.1666) (19.3364) (9.7.2) R2 = 0.5317 Note:We have not given the standard errors of the estimated coefficients, as each standard error is equal to 59.9904, because all the dummies take only a value of 1 or zero. The estimated α coefficients in (9.7.2) represent the average, or mean, sales of refriger- ators (in thousands of units) in each season (i.e., quarter). Thus, the average sale of refrig- erators in the first quarter, in thousands of units, is about 1222, that in the second quarter about 1468, that in the third quarter about 1570, and that in the fourth quarter about 1160. (Continued) TABLE 9.4 U.S. REFRIGERATOR SALES (THOUSANDS),1978–1995 (QUARTERLY) FRIG DUR D2 D3 D4 FRIG DUR D2 D3 D4 1317 252.6 0 0 0 943 247.7 0 0 0 1615 272.4 1 0 0 1175 249.1 1 0 0 1662 270.9 0 1 0 1269 251.8 0 1 0 1295 273.9 0 0 1 973 262.0 0 0 1 1271 268.9 0 0 0 1102 263.3 0 0 0 1555 262.9 1 0 0 1344 280.0 1 0 0 1639 270.9 0 1 0 1641 288.5 0 1 0 1238 263.4 0 0 1 1225 300.5 0 0 1 1277 260.6 0 0 0 1429 312.6 0 0 0 1258 231.9 1 0 0 1699 322.5 1 0 0 1417 242.7 0 1 0 1749 324.3 0 1 0 1185 248.6 0 0 1 1117 333.1 0 0 1 1196 258.7 0 0 0 1242 344.8 0 0 0 1410 248.4 1 0 0 1684 350.3 1 0 0 1417 255.5 0 1 0 1764 369.1 0 1 0 919 240.4 0 0 1 1328 356.4 0 0 1 Note: FRIG = refrigerator sales, thousands DUR = durable goods expenditure, billions of 1992 dollars D2 = 1 in the second quarter, 0 otherwise D3 = 1 in the third quarter, 0 otherwise D4 = 1 in the fourth quarter, 0 otherwise Source: Business Statistics and Survey of Current Business, Department of Commerce (various issues). Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 315 Incidentally, instead of assigning a dummy for each quarter and suppressing the intercept term to avoid the dummy variable trap, we could assign only three dummies and include the intercept term.Suppose we treat the first quarter as the reference quarter and assign dum- mies to the second, third, and fourth quarters. This produces the following regression results (see Table 9.4 for the data setup): Ŷt = 1222.1250 + 245.3750D2t + 347.6250D3t − 62.1250D4t t = (20.3720)* (2.8922)* (4.0974)* (−0.7322)** (9.7.3) R2 = 0.5318 where * indicates p values less than 5 percent and ** indicates p values greater than 5 percent. Since we are treating the first quarter as the benchmark, the coefficients attached to the various dummies are now differential intercepts, showing by how much the average value of Y in the quarter that receives a dummy value of 1 differs from that of the benchmark quarter. Put differently, the coefficients on the seasonal dummies will give the seasonal increase or decrease in the average value of Y relative to the base season. If you add the various differ- ential intercept values to the benchmark average value of 1222.125, you will get the average value for the various quarters. Doing so, you will reproduce exactly Eq. (9.7.2), except for the rounding errors. But now you will see the value of treating one quarter as the benchmark quarter, for (9.7.3) shows that the average value of Y for the fourth quarter is not statistically different from the average value for the first quarter, as the dummy coefficient for the fourth quarter is not statistically significant. Of course, your answer will change, depending on which quarter you treat as the benchmark quarter, but the overall conclusion will not change. How do we obtain the deseasonalized time series of refrigerator sales? This can be done easily. You estimate the values of Y from model (9.7.2) [or (9.7.3)] for each observation and subtract them from the actual values of Y, that is, you obtain (Yt −Ŷt ) which are simply the residuals from the regression (9.7.2). We show them in Table 9.5.15 What do these residuals represent? They represent the remaining components of the re- frigerator time series, namely, the trend, cycle, and random components (but see the caution given in footnote 15). Since models (9.7.2) and (9.7.3) do not contain any covariates, will the picture change if we bring in a quantitative regressor in the model? Since expenditure on durable goods has an important factor influence on the demand for refrigerators, let us expand our model (9.7.3) by bringing in this variable. The data for durable goods expenditure in billions of 1982 dollars are already given in Table 9.3. This is our (quantitative) X variable in the model. The regres- sion results are as follows Ŷt = 456.2440 + 242.4976D2t + 325.2643D3t − 86.0804D4t + 2.7734Xt t = (2.5593)* (3.6951)* (4.9421)* (−1.3073)** (4.4496)* (9.7.4) R2 = 0.7298 where * indicates p values less than 5 percent and ** indicates p values greater than 5 percent. 15Of course, this assumes that the dummy variables technique is an appropriate method of deseasonalizing a time series and that a time series (TS) can be represented as: TS = s + c + t + u, where s represents the seasonal, t the trend, c the cyclical, and u the random component. However, if the time series is of the form, TS = (s)(c)(t)(u), where the four components enter multiplicatively, the preceding method of deseasonalization is inappropriate, for that method assumes that the four components of a time series are additive. But we will have more to say about this topic in the chapters on time series econometrics. (Continued) EXAMPLE 9.6 (Continued) Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 316 PART ONE: SINGLE-EQUATION REGRESSION MODELS TABLE 9.5 REFRIGERATOR SALES REGRESSION: ACTUAL, FITTED, AND RESIDUAL VALUES (EQ. 9.7.3) Residual graph Actual Fitted Residuals 0 1978-I 1317 1222.12 94.875 . . 1978-II 1615 1467.50 147.500 . . 1978-III 1662 1569.75 92.250 . . 1978-IV 1295 1160.00 135.000 . . 1979-I 1271 1222.12 48.875 . . 1979-II 1555 1467.50 87.500 . . 1979-III 1639 1569.75 69.250 . . 1979-IV 1238 1160.00 78.000 . . 1980-I 1277 1222.12 54.875 . . 1980-II 1258 1467.50 −209.500 . . 1980-III 1417 1569.75 −152.750 . . 1980-IV 1185 1160.00 25.000 . . 1981-I 1196 1222.12 −26.125 . . 1981-II 1410 1467.50 −57.500 . . 1981-III 1417 1569.75 −152.750 . . 1981-IV 919 1160.00 −241.000 . . 1982-I 943 1222.12 −279.125 . . 1982-II 1175 1467.50 −292.500 . . 1982-III 1269 1569.75 −300.750 . . 1982-IV 973 1160.00 −187.000 . . 1983-I 1102 1222.12 −120.125 . . 1983-II 1344 1467.50 −123.500 . . 1983-III 1641 1569.75 71.250 . . 1983-IV 1225 1160.00 65.000 . . 1984-I 1429 1222.12 206.875 . . 1984-II 1699 1467.50 231.500 . . 1984-III 1749 1569.75 179.250 . . 1984-IV 1117 1160.00 −43.000 . . 1985-I 1242 1222.12 19.875 . . 1985-II 1684 1467.50 216.500 . . 1985-III 1764 1569.75 194.250 . . 1985-IV 1328 1160.00 168.000 . − 0 + Again, keep in mind that we are treating the first quarter as our base. As in (9.7.3), we see that the differential intercept coefficients for the second and third quarters are statistically dif- ferent from that of the first quarter, but the intercepts of the fourth quarter and the first quar- ter are statistically about the same. The coefficient of X (durable goods expenditure) of about 2.77 tells us that, allowing for seasonal effects, if expenditure on durable goods goes up by a dollar, on average, sales of refrigerators go up by about 2.77 units, that is, approximately 3 units; bear in mind that refrigerators are in thousands of units and X is in (1982) billions of dollars. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * EXAMPLE 9.6 (Continued) (Continued) Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 317 16For proof, see Adrian C. Darnell, A Dictionary of Econometrics, Edward Elgar, Lyme, U.K., 1995, pp. 150–152. An interesting question here is: Just as sales of refrigerators exhibit seasonal patterns, would not expenditure on durable goods also exhibit seasonal patterns? How then do we take into account seasonality in X? The interesting thing about (9.7.4) is that the dummy variables in that model not only remove the seasonality in Y but also the seasonality, if any, in X. (This follows from a well-known theorem in statistics, known as the Frisch–Waugh theorem.16) So to speak, we kill (deseasonalize) two birds (two series) with one stone (the dummy technique). If you want an informal proof of the preceding statement, just follow these steps: (1) Run the regression of Y on the dummies as in (9.7.2) or (9.7.3) and save the residuals, say, S1; these residuals represent deseasonalized Y. (2) Run a similar regression for X and obtain the residuals from this regression, say, S2; these residuals represent deseasonalized X. (3) Regress S1 on S2. You will find that the slope coefficient in this regression is precisely the coefficient of X in the regression (9.7.4). 9.8 PIECEWISE LINEAR REGRESSION To illustrate yet another use of dummy variables, consider Figure 9.5, which shows how a hypothetical company remunerates its sales representatives. It pays commissions based on sales in such a manner that up to a certain level, the target, or threshold, level X*, there is one (stochastic) commission structure and beyond that level another. (Note: Besides sales, other factors affect sales commission. Assume that these other factors are represented EXAMPLE 9.6 (Continued) I II S a le s c o m m is si o n X* Y X (sales) FIGURE 9.5 Hypothetical relationship between sales commission and sales volume. (Note: The intercept on the Y axis denotes minimum guaranteed commission.) Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy VariableRegression Models © The McGraw−Hill Companies, 2004 318 PART ONE: SINGLE-EQUATION REGRESSION MODELS by the stochastic disturbance term.) More specifically, it is assumed that sales commission increases linearly with sales until the threshold level X*, after which also it increases linearly with sales but at a much steeper rate. Thus, we have a piecewise linear regression consisting of two linear pieces or segments, which are labeled I and II in Figure 9.5, and the com- mission function changes its slope at the threshold value. Given the data on commission, sales, and the value of the threshold level X*, the technique of dummy variables can be used to estimate the (differing) slopes of the two segments of the piecewise linear regression shown in Figure 9.5. We pro- ceed as follows: Yi = α1 + β1Xi + β2(Xi − X *)Di + ui (9.8.1) where Yi = sales commission Xi = volume of sales generated by the sales person X* = threshold value of sales also known as a knot (known in advance)17 D = 1 if Xi > X * = 0 if Xi < X * Assuming E(ui) = 0, we see at once that E(Yi | Di = 0, Xi , X *) = α1 + β1Xi (9.8.2) which gives the mean sales commission up to the target level X* and E(Yi | Di = 1, Xi , X *) = α1 − β2X * + (β1 + β2)Xi (9.8.3) which gives the mean sales commission beyond the target level X*. Thus, β1 gives the slope of the regression line in segment I, and β1 + β2 gives the slope of the regression line in segment II of the piecewise linear regression shown in Figure 9.5. A test of the hypothesis that there is no break in the regression at the threshold value X* can be conducted easily by noting the statistical significance of the estimated differential slope coeffi- cient β̂2 (see Figure 9.6). Incidentally, the piecewise linear regression we have just discussed is an example of a more general class of functions known as spline functions.18 17The threshold value may not always be apparent, however. An ad hoc approach is to plot the dependent variable against the explanatory variable(s) and observe if there seems to be a sharp change in the relation after a given value of X (i.e., X*). An analytical approach to finding the break point can be found in the so-called switching regression models. But this is an advanced topic and a textbook discussion may be found in Thomas Fomby, R. Carter Hill, and Stanley Johnson, Advanced Econometric Methods, Springer-Verlag, New York, 1984, Chap. 14. 18For an accessible discussion on splines (i.e., piecewise polynomials of order k), see Douglas C. Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining, Introduction to Linear Regression Analysis, John Wiley & Sons, 3d ed., New York, 2001, pp. 228–230. Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 319 Y a1 – b2X *α β a1α S a le s co m m is si o n X* 1 b1β 1 b1+ b2β β X (sales) FIGURE 9.6 Parameters of the piecewise linear regression. EXAMPLE 9.7 TOTAL COST IN RELATION TO OUTPUT As an example of the application of the piecewise linear regression, consider the hypothetical total cost–total output data given in Table 9.6. We are told that the total cost may change its slope at the output level of 5500 units. Letting Y in (9.8.4) represent total cost and X total output, we obtain the following results: Ŷi = −145.72 + 0.2791Xi + 0.0945(Xi − X *i )Di t = (−0.8245) (6.0669) (1.1447) (9.8.4) R2 = 0.9737 X* = 5500 As these results show, the marginal cost of production is about 28 cents per unit and although it is about 37 cents (28 + 9) for output over 5500 units, the differ- ence between the two is not statistically significant be- cause the dummy variable is not significant at, say, the TABLE 9.6 HYPOTHETICAL DATA ON OUTPUT AND TOTAL COST Total cost, dollars Output, units 256 1,000 414 2,000 634 3,000 778 4,000 1,003 5,000 1,839 6,000 2,081 7,000 2,423 8,000 2,734 9,000 2,914 10,000 5 percent level. For all practical purposes, then, one can regress total cost on total output, dropping the dummy variable. Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 320 PART ONE: SINGLE-EQUATION REGRESSION MODELS 9.9 PANEL DATA REGRESSION MODELS Recall that in Chapter 1 we discussed a variety of data that are available for empirical analysis, such as cross-section, time series, pooled (combination of time series and cross-section data), and panel data. The technique of dummy variable can be easily extended to pooled and panel data. Since the use of panel data is becoming increasingly common in applied work, we will consider this topic in some detail in Chapter 16. 9.10 SOME TECHNICAL ASPECTS OF THE DUMMY VARIABLE TECHNIQUE The Interpretation of Dummy Variables in Semilogarithmic Regressions In Chapter 6 we discussed the log–lin models, where the regressand is loga- rithmic and the regressors are linear. In such a model, the slope coefficients of the regressors give the semielasticity, that is, the percentage change in the regressand for a unit change in the regressor. This is only so if the regressor is quantitative. What happens if a regressor is a dummy variable? To be spe- cific, consider the following model: lnYi = β1 + β2Di + ui (9.10.1) where Y = hourly wage rate ($) and D = 1 for female and 0 for male. How do we interpret such a model? Assuming E(ui) = 0, we obtain: Wage function for male workers: E(lnYi | Di = 0) = β1 (9.10.2) Wage function for female workers: E(lnYi | Di = 1) = β1 + β2 (9.10.3) Therefore, the intercept β1 gives the mean log hourly earnings and the “slope” coefficient gives the difference in the mean log hourly earnings of male and females. This is a rather awkward way of stating things. But if we take the antilog of β1, what we obtain is not the mean hourly wages of male workers, but their median wages. As you know, mean, median, and mode are the three measures of central tendency of a random variable. And if we take the antilog of (β1 + β2), we obtain the median hourly wages of female workers. Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 321 Dummy Variables and Heteroscedasticity Let us revisit our savings–income regression for the United States for the periods 1970–1981 and 1982–1995 and for the entire period 1970–1995. In testing for structural stability using the dummy technique, we assumed that the error var (u1i) = var (u2i) = σ 2, that is, the error variances in the two periods were the same. This was also the assumption underlying the Chow test. If this assumption is not valid—that is, the error variances in the two subperiods are different—it is quite possible to draw misleading conclu- sions. Therefore, one must first check on the equality of variances in the subperiod, using suitable statistical techniques. Although we will discuss this topic more thoroughly in the chapter on heteroscedasticity, in Chapter 8 we showed how the F test can be used for this purpose.20 (See our discussion of the Chow test in that chapter.) As we showed there, it seems the error variances in the two periods are not the same. Hence, the results of both the Chow test and the dummy variable technique presented before may not be entirely reliable. Of course, our purpose here is to illustrate the various techniques that one can use to handle a problem (e.g., the problem of structural stability). In any particular application, these techniques may not be valid. But that is par for most statistical techniques. Of course, one can take appropriate remedial actions to resolve the problem, as we will do in the chapter on heteroscedasticity later (however, see exercise 9.28). 19Robert Halvorsen and RaymondPalmquist, “The Interpretation of Dummy Variables in Semilogarithmic Equations,” American Economic Review, vol. 70, no. 3, pp. 474–475. 20The Chow test procedure can be performed even in the presence of heteroscedasticity, but then one will have to use the Wald test. The mathematics involved behind the test is somewhat involved. But in the chapter on heteroscedasticity, we will revisit this topic. we obtain 6.8796 ($), which is themedian hourly earnings of female workers. Thus, the female workers’ median hourly earnings is lower by about 21.94 percent com- pared to their male counterparts [(8.8136 − 6.8796)/ 8.8136]. Interestingly, we can obtain semielasticity for a dummy regressor directly by the device suggested by Halvorsen and Palmquist.19 Take the antilog (to base e) of the estimated dummy coefficient and subtract 1 from it and multiply the difference by 100. (For the underlying logic, see Appendix 9.A.1.) Therefore, if you take the antilog of−0.2437, you will obtain 0.78366. Subtracting 1 from this gives −0.2163, after multiplying this by 100, we get −21.63 percent, suggesting that a female worker’s (D = 1) median salary is lower than that of her male counterpart by about 21.63 percent, the same as we obtained previously, save the rounding errors. EXAMPLE 9.8 LOGARITHM OF HOURLY WAGES IN RELATION TO GENDER To illustrate (9.10.1), we use the data that underlie Example 9.2. The regression results based on 528 ob- servations are as follows: ̂lnYi = 2.1763 − 0.2437Di t = (72.2943)* (−5.5048)* (9.10.4) R2 = 0.0544 where * indicates p values are practically zero. Taking the antilog of 2.1763, we find 8.8136 ($), which is the median hourly earnings of male workers, and taking the antilog of [(2.1763 − 0.2437) = 1.92857], Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 322 PART ONE: SINGLE-EQUATION REGRESSION MODELS 21P. A.V.B. Swamy, Statistical Inference in Random Coefficient Regression Models, Springer- Verlag, Berlin, 1971. Dummy Variables and Autocorrelation Besides homoscedasticity, the classical linear regression model assumes that the error term in the regression models is uncorrelated. But what hap- pens if that is not the case, especially in models involving dummy regres- sors? Since we will discuss the topic of autocorrelation in depth in the chap- ter on autocorrelation, we will defer the answer to this question until then. What Happens if the Dependent Variable Is a Dummy Variable? So far we have considered models in which the regressand is quantitative and the regressors are quantitative or qualitative or both. But there are occasions where the regressand can also be qualitative or dummy. Consider, for example, the decision of a worker to participate in the labor force. The decision to participate is of the yes or no type, yes if the person decides to participate and no otherwise. Thus, the labor force participation variable is a dummy variable. Of course, the decision to participate in the labor force depends on several factors, such as the starting wage rate, education, and conditions in the labor market (as measured by the unemployment rate). Can we still use OLS to estimate regression models where the regressand is dummy? Yes, mechanically, we can do so. But there are several statistical problems that one faces in such models. And since there are alternatives to OLS estimation that do not face these problems, we will discuss this topic in a later chapter (see Chapter 15 on logit and probit models). In that chap- ter we will also discuss models in which the regressand has more than two categories; for example, the decision to travel to work by car, bus, or train, or the decision to work part-time, full time, or not work at all. Such models are called polytomous dependent variable models in contrast to dichoto- mous dependent variable models in which the dependent variable has only two categories. 9.11 TOPICS FOR FURTHER STUDY Several topics related to dummy variables are discussed in the literature that are rather advanced, including (1) random, or varying, parameters mod- els, (2) switching regression models, and (3) disequilibrium models. In the regression models considered in this text it is assumed that the parameters, the β ’s, are unknown but fixed entities. The random coefficient models—and there are several versions of them—assume the β ’s can be ran- dom too. A major reference work in this area is by Swamy.21 In the dummy variable model using both differential intercepts and slopes, it is implicitly assumed that we know the point of break. Thus, in our savings–income example for 1970–1995, we divided the period into Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. Dummy Variable Regression Models © The McGraw−Hill Companies, 2004 CHAPTER NINE: DUMMY VARIABLE REGRESSION MODELS 323 22S.Goldfeld andR.Quandt,NonlinearMethods inEconometrics,NorthHolland, Amsterdam, 1972. 23Richard E. Quandt, The Econometrics of Disequilibrium, Basil Blackwell, New York, 1988. 1970–1981 and 1982–1995, the pre- and postrecession periods, under the belief that the recession in 1982 changed the relation between savings and income. Sometimes it is not easy to pinpoint when the break took place. The technique of switching regression models (SRM) is developed for such situations. SRM treats the breakpoint as a random variable and through an iterative process determines when the break might have actually taken place. The seminal work in this area is by Goldfeld and Quandt.22 Special estimation techniques are required to deal with what are known as disequilibrium situations, that is, situations where markets do not clear (i.e., demand is not equal to supply). The classic example is that of demand for and supply of a commodity. The demand for a commodity is a function of its price and other variables, and the supply of the commodity is a function of its price and other variables, some of which are different from those entering the demand function. Now the quantity actually bought and sold of the commodity may not necessarily be equal to the one obtained by equating the demand to supply, thus leading to disequilibrium. For a thorough discussion of disequilibrium models, the reader may refer to Quandt.23 9.12 SUMMARY AND CONCLUSIONS 1. Dummy variables, taking values of 1 and zero (or their linear trans- forms), are a means of introducing qualitative regressors in regression models. 2. Dummy variables are a data-classifying device in that they divide a sample into various subgroups based on qualities or attributes (gender, marital status, race, religion, etc. ) and implicitly allow one to run individual regressions for each subgroup. If there are differences in the response of the regressand to the variation in the qualitative variables in the various sub- groups, they will be reflected in the differences in the intercepts or slope coefficients, or both, of the various subgroup regressions. 3. Although a versatile tool, the dummy variable technique needs to be handled carefully. First, if the regression contains a constant term, the num- ber of dummy variables must be one less than the number of classifications of each qualitative variable. Second, the coefficient attached to the dummy variables must always be interpreted in relation to the base, or reference, group—that is, the group that receives the value of zero. The base chosen will depend on the purpose of research at hand. Finally, if a model has sev- eral qualitative variables with several classes, introduction of dummy vari- ables can consume a large number of degrees of freedom. Therefore, one should always weigh the number of dummy variables to be introduced against the total number of observations available for analysis. Gujarati: Basic Econometrics, Fourth Edition I. Single−Equation Regression Models 9. DummyVariable Regression Models © The McGraw−Hill Companies, 2004 324 PART ONE: SINGLE-EQUATION REGRESSION MODELS *Jane Leuthold, “The Effect of Taxation on the Hours Worked by Married Women,” Indus- trial and Labor Relations Review, no. 4, July 1978, pp. 520–526 (notation changed to suit our format). 4. Among its various applications, this chapter considered but a few. These included (1) comparing two (or more) regressions, (2) deseasonaliz- ing time series data, (3) interactive dummies, (4) interpretation of dummies in semilog models, and (4) piecewise linear regression models. 5. We also sounded cautionary notes in the use of dummy variables in situations of heteroscedasticity and autocorrelation. But since we will cover these topics fully in subsequent chapters, we will revisit these topics then. EXERCISES Questions 9.1. If you have monthly data over a number of years, how many dummy vari- ables will you introduce to test the following hypotheses: a. All the 12 months of the year exhibit seasonal patterns. b. Only February, April, June, August, October, and December exhibit seasonal patterns. 9.2. Consider the following regression results (t ratios are in parentheses)*: Ŷi = 1286 + 104.97X2i − 0.026X3i + 1.20X4i + 0.69X5i t = (4.67) (3.70) (−3.80) (0.24) (0.08) −19.47X6i + 266.06X7i − 118.64X8i − 110.61X9i (−0.40) (6.94) (−3.04) (−6.14) R2 = 0.383 n = 1543 where Y = wife’s annual desired hours of work, calculated as usual hours of work per year plus weeks looking for work X2 = after-tax real average hourly earnings of wife X3 = husband’s previous year after-tax real annual earnings X4 = wife’s age in years X5 = years of schooling completed by wife X6 = attitude variable, 1 = if respondent felt that it was all right for a woman to work if she desired and her husband agrees, 0 = otherwise X7 = attitude variable, 1 = if the respondent’s husband favored his wife’s working, 0 = otherwise X8 = number of children less than 6 years of age X9 = number of children in age groups 6 to 13 a. Do the signs of the coefficients of the various nondummy regressors make economic sense? Justify your answer. b. How would you interpret the dummy variables, X6 and X7? Are these dummies statistically significant? Since the sample is quite large, you may use the “2-t” rule of thumb to answer the question.
Compartilhar