Prévia do material em texto
Problem Set 6 Answers Economics 3125 Fall 2013 Claire S.H. Lim 1. The median starting salary for new law school graduates is determined by log(salary) = β0 +β1LSAT +β2GPA+β3log(libvol)+β4log(cost)+β5rank+u where LSAT is the median LSAT score for the graduating class, GPA is the median college GPA for the class, libvol is the number of volumes in the law school library, cost is the annual cost of attending law school, and rank is a law school ranking (with rank = 1 being the best). (a) Explain why we expect β5 ≤ 0. Answer. A larger value of rank is associated with lower perceived law school quality. We would expect graduates of lower-ranked schools to have lower earnings. (b) What signs do you expect for the other slope parameters? Justify your answers. Answer. We expect that each of the other slope coefficients, β1 through β4, will be positive. Higher LSAT scores and college GPAs indicate higher student quality. The other two variables, the number of volumes in the law library and the cost of attendance, are indicators of the quality of the law school. For example, schools that hire better law faculty will have a higher cost of attendance. (c) Using the data in LAWSCH85.DTA, the estimated equation is ̂log(salary) = 8.34+0.0047LSAT +0.248GPA+0.095log(libvol) +0.038log(cost)−0.0033rank n = 136, R2 = 0.842 What is the predicted ceteris paribus difference in salary for schools with a median GPA differ- ent by one point? (Report your answer as a percentage.) 1 Answer. The estimate β̂2 says that a one point increase in median GPA, holding other ex- planatory variables constant, is associated with a 0.248 proportional increase in predicted salary, which is a 24.8% increase in predicted salary. (d) Interpret the coefficient on the variable log(libvol). Answer. This is an elasticity estimate: for a one percent increase in the number of volumes in the law library, holding other explanatory variables constant, predicted salary increases by 0.095%. [Note that we had to multiply the estimate by 100 to express a percentage change in part (c), but did not have to adjust the estimate to report a percentage change in part (d).] (e) Would you say it is better to attend a higher ranked law school? How much is a difference in ranking of 20 worth in terms of predicted starting salary? Answer. It does appear that students from better-ranked schools earn higher starting salaries, even after controlling for some important objective measures of student and school quality. The ceteris paribus effect of moving up 20 places in the ranking is (−20)(100)(−0.0033) = 6.6, interpreted as a 6.6% increase in predicted starting salary. 2. Suppose that you are interested in estimating the ceteris paribus relationship between y and x1. For this purpose, you can collect data on a control variable, x2. (For concreteness, you might think of y as final exam score, x1 as class attendance and x2 as GPA up through the previous semester.) Let β̃1 be the simple regression estimate from y on x1 and let β̂1 be the multiple regression estimate from y on x1 and x2. (a) If x1 is highly correlated with x2 in the sample, and x2 has a large partial effects on y, would you expect β̃1 and β̂1 be similar or very different? Explain. Answer. Recall that β̃1 = β̂1 + β̂2δ̃1, where β̂2 is the estimated coefficient on x2 in the multiple regression and δ̃1 is the estimated coefficient from a simple regression of x2 on x1. The problem tells us that δ̃1 is nonzero and that β̂2 is large, so we would expect β̃1 and β̂1 to be very different. (b) If x1 is almost uncorrelated with x2, but x2 has a large partial effect on y, will β̃1 and β̂1 tend to be similar or very different? Explain. Answer. In this case we are told that δ̃1 is nearly zero, so even though β̂2 is large, we would expect β̃1 and β̂1 to be similar. 2 3. Regression analysis can be used to test whether the market efficiently uses information in valuing stocks. For concreteness, let return be the total return from holding a firm’s stock over the four-year period from the end of 1990 to the end of 1994. The efficient markets hypothesis says that these returns should not be systematically related to information known in 1990. If firm characteristics known at the beginning of the period help to predict stock returns, then we could use this information in choosing stocks. For 1990, let dkr be a firm’s debt to capital ratio, let eps denote the earnings per share, let netinc denote net income, and let salary denote total compensation for the CEO. (a) Using the data in RETURN.DTA, the following equation was estimated: r̂eturn =−14.37 (6.89) + 0.321 (0.201) dkr+ 0.043 (0.078) eps−0.0051 (0.0047) netinc+0.0035 (0.0022) salary n = 142, R2 = 0.0395 Test whether the explanatory variables are jointly significant at the 5% level. Is any explanatory variable individually significant? Answer. For the test of joint significance, the null hypothesis is that all the slope coefficients are zero, so H0 : βdkr = βeps = βnetinc = βsalary = 0. The alternative hypothesis is that at least one slope coefficient is not zero. In the restricted model, there are no explanatory variables, so R2r = 0. The test statistic is F = (R2ur−R2r )/q (1−R2ur)/(n− k−1) = (0.0395−0)/4 (1−0.0395)/(142−4−1) = 1.41 The 5% critical value for an F distribution with 4 numerator degrees of freedom and 137 de- nominator degrees of freedom is c = 2.45. (It would also be fine to take c = 2.37.) Since F ≤ c, we cannot reject the null hypothesis. Each individual significance test is of the null hypothesis H0 : β j = 0 against the two-sided al- ternative hypothesis HA : β j 6= 0. The test statistic is t = β̂ j−0 se(β̂ j) . Because the sample is large, this test statistic is distributed approximately standard normal under the null hypothesis, and the 5% critical value for the test is 1.96. The test statistics are: dkr: t = 1.60 netinc: t =−1.09 eps: t = 0.55 salary: t = 1.59 3 None of these magnitudes is greater than 1.96, so in each of the four tests we fail to reject the null hypothesis. (b) Now, reestimate the model using the log form for netinc and salary: r̂eturn =−36.30 (39.37) + 0.327 (0.203) dkr+ 0.069 (0.080) eps− 4.74 (3.39) log(netinc)+ 7.24 (6.31) log(salary) n = 142, R2 = 0.0330 Do any of your conclusions from part (a) change? Answer. All of the reasoning given in part (a) also applies here. For the joint test, the test statistic is F = 1.17, so the null hypothesis is not rejected. For the individual tests, the test statistics are: dkr: t = 1.61 netinc: t =−1.40 eps: t = 0.86 salary: t = 1.15 As in part (a), none of these test statistics has a magnitude larger than 1.96, so in each test we fail to reject the null hypothesis. (c) Interpret the coefficient on log(salary). Answer. Holding other explanatory variables constant, a 1% increase CEO compensation is associated with an increase of 0.0724 in the predicted total return from holding the stock. (d) In this sample, some firms have zero debt and others have negative earnings. Should we try to use log(dkr) or log(eps) in the model to see if these improve the fit? Explain. Answer. No. Using logs for these variables will drop some observations from the regression, because log(x) is undefined for x≤ 0. This will be represented as a missing value in Stata. (e) Overall, is the evidence for predictability of stock returns strong or weak? Answer. The evidence is very weak. No explanatory variables have significant effects at the 5% level, and the null hypothesis that all the effects are zero cannot be rejected in either model. In addtion, the firm characteristics explain less than 4% of the variation in return. 4. A problem of interest to health officials (and others) is to determine the effects of smoking during pregnancy on infant health. One measure of infant health is birth weight; a birth weight that is too low can put an infant at risk for contracting various illnesses. Sincefactors other than cigarette smoking that affect birth weight are likely to be correlated with smoking, we should take those factors into 4 account. For example, higher income generally results in access to better prenatal care, as well as better nutrition for the mother. An equation that recognizes this is bwght = β0 +β1cigs+β2 f aminc+u (a) What is the most likely sign for β2? Answer. As the problem states, higher family income is generally associated with better prenatal care and better maternal nutrition, so we expect β2 > 0. (b) Do you think that cigs and faminc are likely to be correlated? Explain why the correlation might be positive or negative. Answer. If cigarettes are a normal good for individuals, smoking will increase with income at the individual level. But this relationship may break down at the aggregate level. For example, individuals with more education tend to have higher incomes and smoke less. In our sample, the correlation between cigs and faminc is −0.173, a mild negative relationship. (c) Now, estimate the equation with and without faminc, using the data in bwght.dta. Report the results in equation form, including the sample size and R-squared. Discuss your results, focusing on whether adding faminc substantially changes the estimated effect of cigs on bwght. Answer. The simple and multiple regression results are: b̂wght = 119.77−0.514cigs n = 1388, R2 = 0.023 b̂wght = 116.97−0.463cigs+0.093 f aminc n = 1388, R2 = 0.030 Unsurprisingly, smoking during pregnancy is associated with lower birthweight, while higher family income is associated with higher birthweight. The estimated effect of smoking increases absolutely (becomes less negative) by a small amount in the multiple regression compared to the simple regression. Recall the relationship between simple and multiple regression coefficients: since the multiple regression coefficient on faminc is positive and cigs and faminc are negatively correlated, the simple regression coefficient on cigs will be lower than the multiple regression coefficient on cigs. 5 5. Consider the following model of test score “production”, which can be used to study the effects of attending (or skipping) class on a student’s final exam score: f inal = β0 +β1ACT +β2attend +u where ACT is the student’s ACT score, and attend is the fraction of classes attended over the semester. Estimate the equation using the dataset attend.dta, and report the estimated coefficients, standard errors, sample size and r-squared. Answer. We have: f̂ inali = 9.41 (1.45) +0.530ACTi +0.174attendi (0.048) (0.031) n = 680, R2 = 0.17 (a) Using the estimates and standard errors, show how to construct the 95% confidence interval for β1. Answer. We construct the 95% confidence interval for β1 as follows: β̂1±se(β̂1) ·c α2 = 0.530± 0.048 ·1.96 = [0.436, 0.624]. (b) Can you reject the hypothesis H0 : β2 = 0 against the two-sided alternative at the 5% level? Find the p-value for this test. First construct your t-statistic and conduct the test using the estimates and standard errors, then confirm your answer using the test post-estimation command in Stata. (Note that the test command uses an F-test and not a t-test, but this should give you a similar answer.) Answer. The p-value for this test is constructed as: 2Φ(−|t|). The t-statistic, t, is: t = β̂2−0 se(β̂2) = 0.174−0 0.031 = 5.61 Therefore the p-value is: 2Φ(−5.61)≈ 0. Similarly, the test post-estimation command indicates a p-value of 0.0000. Since the p-value is less than 0.05, we reject the null hypothesis that β2 = 0 at the 5% level. (c) Can you reject the hypothesis H0 : β2 = 0.2 against the two-sided alternative at the 5% level? Find the p-value for this test. First construct your t-statistic and conduct the test using the estimates and standard errors, then confirm your answer using the test post-estimation command in Stata. 6 Answer. The t-statistic, t, is: t = β̂2−0.2 se(β̂2) = 0.174−0.2 0.031 =−0.839 Therefore the p-value is: 2Φ(−0.839) ≈ 2(0.2005) = 0.4010. The test command indicates a p-value of 0.3947. Since the p-value is greater than 0.05, we fail to reject the null hypothesis that β2 = 0.2 at the 5% level. (d) Compute β2 by running the bi-variate regression of f inal on ˜attend, where ˜attend is the residual of the regression of attend on ACT . (Hint: to put residuals into a new variable, type predict newvar, resid after your regression command). Answer. We find β̃2 = 0.174, which is identical to β̂2 above. (e) Using the formulas we covered in class, show that the estimate of β2 for the bivariate regression of f inal on attend is a function of the multivariate estimates and the coefficient of the auxiliary regression of ACT on attend. Answer. We’ll expand the coefficients in writing them out to reduce roundoff error. In the bivariate regression of f inal on attend, we have β̃2 = 0.1209031. In the multivariate regression of of f inal on ACT and attend, we have β̂1 = 0.5299129 and β̂2 = 0.1739337. Finally, in the auxiliary regression of ACT on attend, we have δ̃2 =−0.1000742. Then by the formula covered in class, we have 0.1209031 = β̃2 = β̂2 + β̂1 · δ̃2 = 0.1739337+0.5299129 ·−0.1000742. (f) Now, back to the multivariate model. Give one example of an omitted variable that can cause bias in the estimated coefficient on attend in the multivariate regression. Propose a formula for the bias in terms of the true parameter, the variance of attend, and the covariance of the omitted variable and attend. You can assume that ACT is uncorrelated with attend. Answer. An omitted variable that may cause bias in the estimated coefficient on attend in the multivariate regression is the number of other classes a student is enrolled in that semester (let’s call it courseload). Let β̃2 be the estimated coefficient on attend in the short regression (that omits courseload). Let β̂2 be the estimated coefficient on attend in the long regression, and let β̂3 be the estimated coefficient on courseload in the long regression. Then a formula for the bias is: β̃2 = β̂2 + β̂3 · ˆCov(attend,courseload) ˆVar(attend) . 7 Do and Log Files for Problems 4 and 5 DO FILE *Econ 3125, Applied Econometrics *Program Name: problem_set6.do set more off capture log close local path "C:/Econ 3125/Fall 2013/Problem Sets/Problem Set 6" log using "‘path’/problem_set6.log", replace /*Question 4*/ use "‘path’\bwght.dta" /*Part (c)*/ reg bwght cigs reg bwght faminc clear all /*Question 7*/ use "‘path’\attend.dta" reg final ACT attend 8 /*Part (b)*/ test attend /*Part (c)*/ test attend=0.2 /*Part (d)*/ reg attend ACT predict attend_tilde, resid reg final attend_tilde /*Part (e)*/ reg final ACT attend reg final attend reg ACT attend di 0.1739337 + 0.5299129*-0.1000742 log close exit LOG FILE 9 log: C:/Econ 3125/Fall 2013/Problem Sets/Problem Set 6/problem_set6.log . . /*Question 4*/ . . use "‘path’\bwght.dta" . . /*Part (c)*/ . reg bwght cigs Source | SS df MS Number of obs = 1388 -------------+------------------------------ F( 1, 1386) = 32.24 Model | 13060.4194 1 13060.4194 Prob > F = 0.0000 Residual | 561551.3 1386 405.159668 R-squared = 0.0227 -------------+------------------------------ Adj R-squared = 0.0220 Total | 574611.72 1387 414.283864 Root MSE = 20.129 ------------------------------------------------------------------------------ bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cigs | -.5137721 .0904909 -5.68 0.000 -.6912861 -.3362581 _cons | 119.7719 .5723407 209.27 0.000 118.6492 120.8946 ------------------------------------------------------------------------------ . . reg bwght faminc Source | SS df MS Number of obs = 1388 -------------+------------------------------ F( 1, 1386) = 16.65 Model | 6819.0527 1 6819.0527 Prob > F = 0.0000 Residual | 567792.667 1386 409.662819 R-squared = 0.0119 -------------+------------------------------Adj R-squared = 0.0112 Total | 574611.72 1387 414.283864 Root MSE = 20.24 ------------------------------------------------------------------------------ 10 bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- faminc | .1183234 .0290016 4.08 0.000 .0614317 .1752152 _cons | 115.265 1.001901 115.05 0.000 113.2996 117.2304 ------------------------------------------------------------------------------ . . . clear all . . . /*Question 5*/ . . use "‘path’\attend.dta" . . reg final ACT attend Source | SS df MS Number of obs = 680 -------------+------------------------------ F( 2, 677) = 69.38 Model | 2561.91192 2 1280.95596 Prob > F = 0.0000 Residual | 12500.0351 677 18.4638628 R-squared = 0.1701 -------------+------------------------------ Adj R-squared = 0.1676 Total | 15061.9471 679 22.1825435 Root MSE = 4.297 ------------------------------------------------------------------------------ final | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ACT | .5299129 .047828 11.08 0.000 .4360038 .6238219 attend | .1739337 .0306059 5.68 0.000 .1138398 .2340276 _cons | 9.414827 1.447809 6.50 0.000 6.572091 12.25756 ------------------------------------------------------------------------------ . . 11 . /*Part (b)*/ . . test attend ( 1) attend = 0 F( 1, 677) = 32.30 Prob > F = 0.0000 . . . /*Part (c)*/ . . test attend=0.2 ( 1) attend = .2 F( 1, 677) = 0.73 Prob > F = 0.3947 . . . /*Part (d)*/ . . reg attend ACT Source | SS df MS Number of obs = 680 -------------+------------------------------ F( 1, 678) = 17.00 Model | 494.15501 1 494.15501 Prob > F = 0.0000 Residual | 19711.1391 678 29.0724766 R-squared = 0.0245 -------------+------------------------------ Adj R-squared = 0.0230 Total | 20205.2941 679 29.7574287 Root MSE = 5.3919 ------------------------------------------------------------------------------ attend | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ACT | -.2443857 .0592769 -4.12 0.000 -.3607739 -.1279974 _cons | 31.64825 1.350265 23.44 0.000 28.99705 34.29946 ------------------------------------------------------------------------------ . 12 . predict attend_tilde, resid . . reg final attend_tilde Source | SS df MS Number of obs = 680 -------------+------------------------------ F( 1, 678) = 27.95 Model | 596.319818 1 596.319818 Prob > F = 0.0000 Residual | 14465.6272 678 21.3357334 R-squared = 0.0396 -------------+------------------------------ Adj R-squared = 0.0382 Total | 15061.9471 679 22.1825435 Root MSE = 4.6191 ------------------------------------------------------------------------------ final | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- attend_tilde | .1739337 .0329002 5.29 0.000 .1093353 .2385321 _cons | 25.89118 .1771329 146.17 0.000 25.54338 26.23897 ------------------------------------------------------------------------------ . . . /*Part (e)*/ . . reg final ACT attend Source | SS df MS Number of obs = 680 -------------+------------------------------ F( 2, 677) = 69.38 Model | 2561.91192 2 1280.95596 Prob > F = 0.0000 Residual | 12500.0351 677 18.4638628 R-squared = 0.1701 -------------+------------------------------ Adj R-squared = 0.1676 Total | 15061.9471 679 22.1825435 Root MSE = 4.297 ------------------------------------------------------------------------------ final | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- ACT | .5299129 .047828 11.08 0.000 .4360038 .6238219 13 attend | .1739337 .0306059 5.68 0.000 .1138398 .2340276 _cons | 9.414827 1.447809 6.50 0.000 6.572091 12.25756 ------------------------------------------------------------------------------ . . reg final attend Source | SS df MS Number of obs = 680 -------------+------------------------------ F( 1, 678) = 13.56 Model | 295.352008 1 295.352008 Prob > F = 0.0002 Residual | 14766.5951 678 21.7796387 R-squared = 0.0196 -------------+------------------------------ Adj R-squared = 0.0182 Total | 15061.9471 679 22.1825435 Root MSE = 4.6669 ------------------------------------------------------------------------------ final | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- attend | .1209031 .0328317 3.68 0.000 .0564391 .185367 _cons | 22.72992 .8769078 25.92 0.000 21.00814 24.4517 ------------------------------------------------------------------------------ . . reg ACT attend Source | SS df MS Number of obs = 680 -------------+------------------------------ F( 1, 678) = 17.00 Model | 202.353053 1 202.353053 Prob > F = 0.0000 Residual | 8071.57489 678 11.9049777 R-squared = 0.0245 -------------+------------------------------ Adj R-squared = 0.0230 Total | 8273.92794 679 12.1854609 Root MSE = 3.4504 ------------------------------------------------------------------------------ ACT | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- attend | -.1000742 .0242735 -4.12 0.000 -.1477344 -.052414 _cons | 25.12694 .6483252 38.76 0.000 23.85397 26.39991 14 ------------------------------------------------------------------------------ . . di 0.1739337 + 0.5299129*-0.1000742 .12090309 . . . log close 15