Essays on Weak Identification by Zhaoguo Zhan B.Eng., Renmin University of China, 2003 MSc, London School of Economics, 2005 M.A., Brown University, 2007 Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Economics at Brown University PROVIDENCE, RHODE ISLAND May 2011 © Copyright 2011 by Zhaoguo Zhan This dissertationby ZhaoguoZhan is acceptedin its presentform by the Departrnentof Economicsas satisfying the dissertationrequirementfor the degreeof Doctor of Philosophy. nutu01(26(zou FbankKleibergen,Advisor Recommendedto the Graduate Council oateO qf 2(rtr44 Blaise Mell Dat" :Qf2( hal/ Dhaerie, Approved by the Graduate Council DatePeter M. Weber.Dean of the GraduateSchool llI Vitae The author was born on November 5th, 1980 in China. He received his Bachelor in Engineering from Renmin University of China in 2003, and MSc in Econometrics and Mathematical Economics from London School of Economics in 2005. He entered Brown University in 2006 to pursue his degree in Economics. He received his M.A. in 2007 and Ph.D. in 2011. iv Acknowledgements I am deeply indebted to my advisor Frank Kleibergen for continuous support and encouragement. I thank Blaise Melly and Sophocles Mavroeidis for inspiration and insightful comments. Thanks to Geert Dhaene for reading the manuscript. I also owe much to my colleagues, Toru Kitagawa, Yuya Sasaki, Alexei Abrahams and Philipp Ketz, who spent time discussing econometrics with me throughout my graduate study. Lastly, I am most indebted to my wife Flora for her consideration and support. v Abstract of “ Essays on Weak Identification ” by Zhaoguo Zhan, Ph.D., Brown University, May 2011 Recent developments in econometrics have revealed weak identification in empirical studies. The linear Instrumental Variable (IV) regression with weak instruments is an example that has received sizable attention; when instruments are weak, conventional asymptotics fail to function, and empirical results based on conventional asymptotics are unreliable. The problem of weak identification, however, is not limited to the linear IV regression or the Generalized Method of Moments (GMM) framework. In a broad range of economic models, the quality of inference depends on the treatment of weak identification. In my dissertation, I consider three issues related to weak identification. The first chapter proposes a method to detect whether weak identification exists. The objective of this chapter is to develop an intuitive tool which can be universally used in IV and GMM applications. The second chapter investigates the empirical studies of the Capital Asset Pricing Model (CAPM) and Consumption CAPM, in which various risk factors are suggested to explain the variation in asset returns. I find that irrelevant risk factors may still appear useful in explaining the variation, because of the weak identification problem induced by irrelevant factors. The third chapter targets the ongoing debate on whether technology shocks increase the hours worked. The debate derives from the highly persistent time series of hours, which makes the impact of technology shocks on hours weakly identified. I construct confidence intervals for this impact by adopting an approach robust to weak identification. CONTENTS Vitae iv Acknowledgments v 1 Detecting Weak Identification by Bootstrap 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Motivation and Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Weak Identification, Rank Tests . . . . . . . . . . . . . . . . . 5 1.2.2 Test Strategy by Bootstrap . . . . . . . . . . . . . . . . . . . 7 1.2.3 An Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Estimation and Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4 IV and Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4.1 Model Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4.2 An Empirical Example: Card (1995) . . . . . . . . . . . . . . 28 1.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 vi 2 Weak Identification in (C)CAPM 43 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.2 Model and Fama-MacBeth . . . . . . . . . . . . . . . . . . . . . . . . 46 2.2.1 Linear Factor Model . . . . . . . . . . . . . . . . . . . . . . . 46 2.2.2 Fama-MacBeth . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.3 A Rank Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.4 Distribution of R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.4.1 Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.4.2 Four Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.5 Examples of (C)CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3 Does a Technology Shock Increase or Decrease Hours 70 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.2.1 VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.2.2 Long Run Restriction . . . . . . . . . . . . . . . . . . . . . . . 74 3.2.3 Model Simplification . . . . . . . . . . . . . . . . . . . . . . . 76 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.3.1 AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.3.2 Wald . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.3 vii CHAPTER One Detecting Weak Identification by Bootstrap 1 2 1.1 Introduction The Instrumental Variable (IV) regression and Generalized Method of Moments (GMM) are becoming the standard toolkit for empirical economists, and it is now well known that both IV and GMM applications may suffer from the problem of weak identification. An example that has received sizable attention is the linear IV regression with weak instruments studied in Staiger and Stock (1997). When the strength of identification is weak, the finite sample distribution of IV/GMM estimators is poorly approximated by the normal distribution, which further induces the malfunction of conventional inference methods that rely on the property of asymptotic normality.1 Two approaches co-exist to handle weak identification: the first approach is to use the robust methods in Anderson and Rubin (1949), Stock and Wright (2000), Kleibergen (2002)(2005), and Moreira (2003), which can produce confidence intervals/sets with the correct coverage, regardless of the strength of identification; the second approach that is popular in practice is to rule out weak identification by pretesting. Although identification-robust methods are available, excluding weak identification in IV/GMM applications has practical importance: if identification is not weak, then the rich set of conventional methods is applicable, making statistical inference and economic decisions much easier. For instance, other than its confidence interval, the point estimator of a parameter is usually preferred by policy makers, but it is not consistent or meaningful unless weak identification is excluded. In this chapter, I propose a method based on bootstrap resampling to detect whether weak identification exists in IV/GMM applications. This method has the unique feature of providing a graphic view of the strength of identification. In the econometric literature, there exists a group of tests on identification that this chap1 See the article by Stock et al. (2002) for a survey. 3 ter is in line with: for example, in the linear IV model, Hahn and Hausman (2002) and Stock and Yogo (2005) provide tests for the null of strong instruments and weak instruments respectively, and the first stage F test with the F > 10 rule of thumb proposed in Stock and Yogo (2005) is widely used; in Hansen (1982)’s GMM framework, which nests the linear IV model, the suggested tests include Wright (2002)(2003), Inoue and Rossi (2008), Bravo et al. (2009), etc. The proposed method is illustrated by Figure 1.1. The exact finite sample distribution of IV/GMM estimators is generally unknown, but can be approximated either by the limiting normal distribution or by the bootstrap distribution. When the IV/GMM models are strongly identified, these two approximation methods are both valid, i.e. both the normal distribution and the bootstrap distribution are close to the exact distribution, and in practice, they can be used exchangeably. Consequently, strong identification implies that the bootstrap distribution is not far away from the normal distribution. When the bootstrap distribution is substantially different from the normal distribution, inference based on these two distributions might contradict; in this situation, it is inappropriate to consider the identification strength as strong. As a result, whether or not weak identification exists can be inferred by comparing the bootstrap distribution with the normal distribution. Since its introduction by Efron (1979), the bootstrap has become a practical tool for statistical inference. The properties of the bootstrap are explained using the theory of the Edgeworth expansion in Hall (1992), and its econometric applications are illustrated in Horowitz (2001). As an alternative to the limiting distribution, the bootstrap approximates the distribution of a targeted statistic by resampling the data, and there is considerable evidence that it performs better than the first-order limiting distribution in finite samples.2 However, the bootstrap does not always 2 See, for example, Horowitz (1994). 4 work well. When the IV/GMM models are not well identified, for instance, the bootstrap is known to be problematic when it is used to approximate the commonly used t/Wald statistic, as explained in Hall and Horowitz (1996). Nevertheless, the fact that the bootstrap fails still conveys useful information: as illustrated by Figure 1.1, the substantial difference between the bootstrap distribution and the normal distribution indicates that it is problematic to approximate the exact finite sample distribution of IV/GMM estimators by the normal distribution; in other words, the identification strength is weak. Given the above introduction, it is tempting to apply normality tests, e.g. the Kolmogorov-Smirnov test, to examine whether the bootstrap distribution is normal in order to investigate the strength of identification. However, this route is not productive, when normality tends to be rejected as the number of bootstrap replications becomes large. This is due to the fact that, in general, the bootstrap distribution coincides with the normal distribution only at the limit where there are an infinite number of data points. In an IV/GMM application with a finite sample, the bootstrap distribution is not equivalent to the normal distribution, hence the null hypothesis for normality tests that the bootstrap distribution is identical to the normal distribution does not hold in empirical applications. Although normality tests do not work well, empirical researchers can still eyeball the graph of the bootstrap distribution to evaluate the strength of identification, and the test proposed in this chapter does not impose the equivalence of the bootstrap distribution and the normal distribution; only a substantial difference of these two distributions would induce the rejection of strong identification. The rest of the chapter is organized as follows: weak identification and the bootstrap strategy are illustrated in Section 2; in Section 3, the bootstrap-based method for determining whether weak identification exists is proposed; the linear IV regres- 5 sion model is used as an example in Section 4, with an application to Card (1995); in Section 5, Monte Carlo results are presented; Section 6 concludes. Although the linear IV model is employed in this chapter for expository purposes, the proposed method can be extended to the more general GMM framework. Throughout the chapter, the following notations are used: for an m by n matrix A, Ai (sometimes ai is used when A is a vector) is its ith row, and PA = A(A′ A)−1 A′ , MA = Im − PA , Im is the m by m identity matrix; vec(A) is the column vector containing the column by column vectorization of elements in A; for an object O, O∗ is p its bootstrap counterpart; ⇒ stands for weak convergence; → stands for convergence in probability; N (µ, σ 2 ) is the normal distribution with mean µ, variance σ 2 ; ⊗ is the Kronecker product; ||.|| is the Euclidean norm. 1.2 1.2.1 Motivation and Strategy Weak Identification, Rank Tests Let θ denote the population parameter of interest in an IV/GMM application, and θ̂n is the estimator for θ, e.g. the two stage least squares estimator. The subscript n for θ̂n refers to that θ̂n is computed by finite sample of n observations. Assuming that the rank condition and other regularity conditions are satisfied (cf., Wooldridge (2002)), √ the conventional first-order asymptotic theory yields that θ̂n is n−consistent, and asymptotically normally distributed: √ p n(θ̂n − θ) ⇒ N (0, σ 2 ), and there exists σ̂ → σ (1.1) 6 As surveyed by Stock et al. (2002), the exact finite sample distribution of √ n(θ̂n − θ) can be substantially different from the normal distribution N (0, σ 2 ), especially when the rank condition is weakly satisfied. The scenario that (1.1) does not provide a good approximation in finite sample applications is known as weak identification. It is helpful to distinguish weak identification from identification/noidentification: identification/noidentification refers to whether θ can be identified at the population level, hence it is unrelated to the sample size n; in contrast, weak identification targets that the first-order asymptotic theory may provide a poor approximation in finite samples. When θ is identified in the population, it is possible that the firstorder approximation of (1.1) under a given sample3 does not function well, hence weak identification may exist even though there is identification; the case that θ is unidentified is also nested by weak identification, since the first-order approximation of (1.1) also breaks down under noidentification. Loosely speaking, if the first-order approximation based on (1.1) is poor, then weak identification is concerned. Since the conventional result of (1.1) breaks down under weak identification, it is important to investigate the identification strength in IV/GMM applications. A natural way is to examine the rank condition, as weak identification is more likely to exist when the rank condition is only weakly satisfied. There are rank tests available to serve this purpose. For example, in the linear IV model with a single endogenous regressor, the rank condition corresponds to that the correlation of the endogenous regressor and the instruments is non-zero, and Stock and Yogo (2005) propose the F test to examine this correlation: if the F statistic is greater than the tabled critical values, typically around 10 for small number of instruments, then the rank condition is considered strongly satisfied, and weak identification is excluded. 3 Even when the sample size is very large, weak identification could exist, e.g. in some regressions of Angrist and Krueger (1991), there are 329000 observations, but their instruments are weak. 7 The approach of examining the rank condition, however, has some limitations when it is extended to the more general GMM framework: first of all, the rank statistic in the non-linear GMM may depend on the weakly identified parameters, hence could not be consistently estimated, see, e.g. Wright (2003); secondly, it is not clear how large the rank statistic needs to be in order to decide that identification in GMM is not weak, to the best of my knowledge. In the linear IV model, which can be seen as a special case of GMM, although the tabled critical values of the F test are available, they are derived under the i.i.d and homoscedasticity assumptions, hence it is not appropriate to apply them if heteroscedasticity or non-i.i.d. takes place. Given the importance of detecting weak identification in IV/GMM applications, and the limitations of the rank test, a question naturally arises: is there a tool that does not have these limitations, and is universally applicable to both IV and GMM applications? This chapter suggests that the bootstrap could be the tool. 1.2.2 Test Strategy by Bootstrap Let θ̂n∗ denote the bootstrap estimator of θ, and θ̂n∗ is the counterpart of θ̂n . θ̂n is computed by the data sample of the IV/GMM application, while θ̂n∗ is computed by resampling the data sample or the model estimated from the data. The bootstrap √ √ counterpart of n(θ̂n − θ) is n(θ̂n∗ − θ̂n ). When identification is not weak, together with mild regularity conditions, the √ bootstrap distribution of n(θ̂n∗ − θ̂n ) asymptotically coincides with the distribution √ of n(θ̂n − θ) (cf., Horowitz (2001)): √ n(θ̂n∗ − θ̂n ) ⇒ N (0, σ 2 ) (1.2) 8 √ There are now three objects: the exact distribution of n(θ̂n − θ), the normal √ distribution N (0, σ̂ 2 ), and the bootstrap distribution of n(θ̂n∗ − θ̂n ). By (1.1) and (1.2), both of which hold under mild conditions, these three distributions are asymptotically equivalent. The proposed bootstrap strategy for detecting weak identification is to compare √ the bootstrap distribution of n(θ̂n∗ − θ̂n ) with the normal distribution N (0, σ̂ 2 ), or equivalently, compare the standardized bootstrap distribution of ∗ −θ̂ θ̂n √n σ̂/ n with the standard normal distribution. Since the difference between these two distributions is negligible when identification is strong, a substantial difference is the evidence against strong identification. I find that this bootstrap strategy is appealing: (i) it provides a simple and intuitive way to detect weak identification, i.e. empirical researchers can draw the graph of the bootstrap distribution and compare it with the normal distribution to evaluate the strength of identification; (ii) it is universally applicable to IV/GMM applications, i.e. although the data of IV/GMM applications may not be i.i.d. and homoscedastic, there exist various bootstrap methods to construct the bootstrap distribution, e.g. the block bootstrap in Hall and Horowitz (1996) for time series data in GMM, the pair and wild bootstrap used in Davidson and MacKinnon (2010) for heteroscedasticity in IV. By the bootstrap strategy, as long as it is feasible to construct the bootstrap distribution, detecting weak identification reduces to the comparison of two distributions: the bootstrap distribution, and the normal distribution. Definition 1. Conditional on the sample of the observed data, the standardized bootstrap estimator X = ∗ −θ̂ θ̂n √n σ̂/ n follows a distribution with c.d.f. F (x). From now on, the bootstrap distribution refers to F (x) in this chapter. Normality tests appear to be the intuitive choice, since comparing the bootstrap 9 distribution with the normal distribution is proposed to infer the strength of identification. However, applying normality tests here will almost always induce the rejection of normality, and lead to the conclusion of weak identification. Take the classic Kolmogorov-Smirnov (KS) test for example. With B bootstrap replications, let Xi = ∗i −θ̂ θ̂n √ n,i σ̂/ n = 1, ..., B denote the i.i.d. bootstrapped estimators after standardization. To test the hypothesis that Xi , i = 1, ..., B are B points drawn from the standard normal distribution Φ(x), the KS statistic is the supremum distance between Φ(x) and the empirical c.d.f. F̂ (x), scaled by the square root of the number of points. As B → ∞, the KS statistic goes to infinity, instead of converging to the Kolmogorov distribution: KS = √ B sup |F̂ (x) − Φ(x)| → ∞ x where F̂ (x) = 1 B ∑B i=1 1(Xi ≤ x) = 1 B ∑B i=1 ∗i θ̂ n − √ n ≤ x). 1( θ̂σ̂/ n This is because the hypothesis F (x) = Φ(x) does not hold when n < ∞. In other words, in empirical applications where the sample size is finite, F (x) differs from Φ(x), although the difference may not be substantial, e.g. by the Edgeworth expansion in Horowitz (2001), F (x) − Φ(x) = O(n−1/2 ). Consequently, even when the difference between F (x) and Φ(x) is minor, as B → ∞, KS → ∞, i.e. the KS test tends to reject normality when the bootstrap replication gets large. Said differently, the bootstrap distribution is not identical to the normal distribution, although it can be asymptotically equivalent to the normal distribution. The KS test is for testing the hypothesis that the bootstrap distribution is identical to the normal distribution. For the purpose of examining whether the bootstrap distribution is close to the normal distribution, it is inappropriate to apply this test. 10 Instead of verifying the equivalence of the bootstrap distribution and the normal distribution by normality tests, this chapter provides a quantitative measure of the difference/distance between these two distributions: if the measure shows that the difference is substantial, then identification is considered as weak ; on the contrary, if the measure shows that the difference is negligible, then identification is considered as strong. The measure results from the comparison of two confidence intervals (C.I.): the conventional C.I. derived by inverting the t/Wald test, and the bootstrap percentile C.I.. Consider the practical task of constructing a confidence interval for θ, the parameter of interest. The 100(1 − α)% C.I. of θ derived by inverting the t/Wald test is written as: σ̂ σ̂ Ct ≡ (θ̂n − z1−α/2 √ , θ̂n + z1−α/2 √ ) n n (1.3) where z1−α/2 is the 1 − α/2 quantile of Φ(x). For example, when α = 5%, z1−α/2 ≈ 1.96, and the 95% C.I. of θ is approximately (θ̂n − 1.96 √σ̂n , θ̂n + 1.96 √σ̂n ). Alternatively, a C.I. can also be constructed by the bootstrap percentile method: order the bootstrapped estimators {θ̂n∗i , i = 1, ..., B}, write the ordered sequence as ∗(i) ∗(i) {θ̂n , i = 1, ..., B}, where θ̂n ∗ is the ith smallest of {θ̂n∗i , i = 1, ..., B}; define θ̂n,α/2 ∗(⌈Bα/2⌉) ∗ ∗ and θ̂n,1−α/2 : θ̂n,α/2 ≡ θ̂n ∗(⌈B(1−α/2)⌉) ∗ , θ̂n,1−α/2 ≡ θ̂n , where ⌈x⌉ denotes the integer ceiling of x. The bootstrapped 100(1 − α)% C.I. of θ is: ∗ ∗ Cb ≡ (θ̂n,α/2 , θ̂n,1−α/2 ) (1.4) The two intervals in (1.3) and (1.4) are asymptotically equivalent when identification is not weak. To see this, equalizing the boundaries of these two intervals 11 yields: ∗ ∗ θ̂n,α/2 − θ̂n θ̂n,1−α/2 − θ̂n √ √ = −z1−α/2 , = z1−α/2 σ̂/ n σ̂/ n These two equalities approximately hold if the distribution of θ̂n∗ after standardization, i.e. subtracting the estimate and dividing by the standard error, is close to the standard normal distribution. In practice, the above two methods of constructing the 100(1 − α)% C.I. for θ are both commonly used. From a practical point of view, no matter which method empirical researchers use, the correspondent intervals need not be substantially different from each other. If these two intervals do substantially differ, then it is difficult to make reliable economic inference: for instance, one interval may include zero while the other one does not, hence decisions of whether θ is significantly different from zero based on the two different C.I.’s could contradict.4 The difference between the two intervals in (1.3) and (1.4) boils down to the difference between the bootstrap distribution and the normal distribution. If these two intervals are substantially different, it indicates the bootstrap distribution is substantially different from the normal distribution, hence the identification strength is weak. The idea of comparing alternative C.I.’s to investigate the identification status comes from Wright (2002), where he compares the interval derived by inverting the robust tests with the conventional interval derived by inverting the t/Wald test. Different from Wright (2002), I use the bootstrap to construct a C.I. for comparison; in addition, Wright (2002) provides an identification test, while in this chapter, I target weak identification instead of identification. Based on the comparison of the two intervals in (1.3) and (1.4), a measure of the 4 This problem is encountered in the empirical example of Card (1995). 12 difference between the bootstrap distribution and the normal distribution, as well as a quantitative definition of weak identification, is provided below. Definition 2. Define D as the measure of the difference between the bootstrap distribution F (x) and the standard normal distribution Φ(x): D ≡ q1−α/2 − qα/2 −1 2z1−α/2 where z1−α/2 is the 1 − α/2 quantile of Φ(x), qα/2 , q1−α/2 are assumed to be the two unique quantiles of the continuous c.d.f. F (x), i.e. F −1 (α/2) = qα/2 , F −1 (1−α/2) = q1−α/2 . D is the relative difference of the lengths between the α/2, 1 − α/2 quantiles of the two distributions. D = 0 if F (x) = Φ(x). The deviation of D from 0 indicates the deviation of F (x) from Φ(x), and hence suggests existence of weak identification. Definition 3. Suppose there is a cutoff γ > 0. The identification strength is considered as weak in this chapter if |D| > γ, otherwise the identification strength is strong. It is important to set a non-zero γ: if γ = 0, then identification tends to be considered as weak, because F (x) is not exactly normal except in the limit of n → ∞. For the same reason, it is inappropriate to apply normality tests, which impose γ = 0 under their null hypothesis. D can be estimated by the relative difference of (1.3) and (1.4): q̂1−α/2 − q̂α/2 D̂ = −1= 2z1−α/2 ∗ θ̂n,1−α/2 −θ̂n √ σ̂/ n − ∗ θ̂n,α/2 −θ̂n √ σ̂/ n 2z1−α/2 ∗ ∗ θ̂n,1−α/2 − θ̂n,α/2 √ −1 −1= 2z1−α/2 σ̂/ n 13 Empirical researchers may have a certain tolerance level for D, which is the threshold5 γ. For example, a researcher may consider it acceptable if |D| ≤ γ = 0.25, i.e. the relative difference of the two C.I.’s is less than a quarter. If |D| goes above this tolerance level, then it is not unreasonable to determine that the identification strength is weak. To summarize, this chapter proposes to use the difference D between the bootstrap distribution F (x) and the standard normal distribution Φ(x) to evaluate the strength of identification. If |D| is greater than a given threshold, identification is weak. 1.2.3 An Alternative To my knowledge, Stock and Yogo (2005) is the first to provide a quantitative definition of weak instruments/identification. Like Stock and Yogo (2005), the quantitative Definition 3 of weak identification is also practically motivated: from the practical perspective, it is not appropriate to consider identification as strong if the bootstrap and the limiting normal distribution provide substantially different confidence intervals. Unlike Stock and Yogo (2005), Definition 3 is not directly related to the bias of the conventional IV/GMM estimator or the size of the t/Wald test, while Stock and Yogo (2005) define identification as weak if the relative bias of the conventional estimator or the size distortion of the t/Wald test exceeds a threshold. Although quantitatively different, the definition in this chapter is qualitatively similar to the one in Stock and Yogo (2005): both definitions of weak identification indicate that the distributions of the conventional IV/GMM estimators are poorly approximated by the normal distribution, or said differently, the conventional ap5 A choice of the threshold γ is suggested in the later part of Monte Carlo studies. 14 proximation of (1.1) breaks down. In this chapter, I follow the idea in Wright (2002) to use D, the relative difference of the lengths of two alternative C.I.’s, to measure the strength of identification. Alternatively, we could follow Stock and Yogo (2005) and define D as the difference of coverage: Da ≡ F (z1−α/2 ) − F (zα/2 ) − (1 − α) Note that Da is zero if the bootstrap distribution F (x) coincides with the standard normal distribution Φ(x). Following Stock and Yogo (2005), we can use γ a = 5%, and define identification as weak if |Da | > γ a = 5%. These definitions of (D, γ) and the alternative (Da , γ a ) are two sides of the same coin: both the difference in length and the difference in coverage indicate the bootstrap distribution and the normal distribution are different, and there is no clear advantage if either definition is used. For the rest of this chapter, I use (D, γ)6 . Note that weak identification refers to the severe disparity between the exact distribution and the normal distribution, while its quantitative definition stated above rests on the disparity between the bootstrap distribution and the normal distribution. In essence, the bootstrap strategy for detecting weak identification is to use the difference between the bootstrap distribution and the normal distribution as a proxy for the the difference between the exact distribution and the normal distribution. The so-called bootstrap principle (or the bootstrap analogy) in Hall (1992) can help clarify the proposed strategy. The bootstrap principle states that the mapping from the population to the sample (1st mapping) is similar to the mapping from the sample, which is also the bootstrap population, to the bootstrap resample (2nd 6 The empirical and MC results are found to be similar, when (Da , γ a ) is used. 15 mapping). By this principle, the identification strength in the 2nd mapping is expected to be similar to the identification strength in the 1st mapping. Consequently, the bootstrap strategy for detecting weak identification is to use the identification strength in the 2nd mapping as the proxy for the identification strength in the 1st mapping: the substantial disparity between the bootstrap distribution and the normal distribution corresponds to the weak identification strength in the 2nd mapping, which further suggests the weak identification strength in the 1st mapping, i.e. the disparity between the exact distribution and the normal distribution is also severe. The advantage of the bootstrap strategy is clear: the bootstrap population is the given sample, hence the identification strength in the 2nd mapping is known or recoverable. The only randomness in this mapping comes from the randomness of bootstrap resampling, and if the number of bootstrap replications is sufficiently large, this randomness is negligible. Once the identification strength in the 2nd mapping is recovered, it is used to infer the identification strength in the 1st mapping, since they are expected to be similar by the bootstrap principle. 1.3 Estimation and Test With the quantitative definition as well as the advantage of the bootstrap test strategy, detecting weak identification becomes straightforward. By definition, |D| > γ implies weak identification. D, the distance between the bootstrap distribution and the standard normal distribution, needs to be estimated. Draw B i.i.d. observations from F (x) by bootstrap: Xi = ∗i −θ̂ θ̂n √ n, σ̂/ n i = 1, ..., B. The bootstrap distribution F (x) can be estimated by the empirical c.d.f. F̂ (x) almost 16 surely: B B 1 ∑ θ̂n∗i − θ̂n 1 ∑ a.s. √ ≤ x) → F (x) 1(Xi ≤ x) = 1( B i=1 B i=1 σ̂/ n F̂ (x) = Consequently, D can be estimated almost surely, by the continuous mapping theorem: D̂ = q̂1−α/2 − q̂α/2 a.s. −1 → D 2z1−α/2 It would be ideal if B could be made infinity. For the given B realizations from F (x), though B can be arbitrarily large, whether |D| exceeds γ needs to be tested, and a test serving this purpose is presented next. Assume the following conditions hold for an IV/GMM model with the conventional estimator θ̂n , associated with standard error √σ̂ : n Assumption 1. There exist {θ̂n∗i , i = 1, ..., B}, the i.i.d. draws of the bootstrapped estimator θ̂n∗ ; conditional on the sample, the standardized random variable X = ∗ −θ̂ θ̂n √n σ̂/ n has a continuous density function f (x) that is non-zero in a neighborhood of the two quantiles qα/2 , q1−α/2 , and can be consistently estimated by the non-parametric kernel p estimation: fˆ(x) → f (x). Comments: 1. Under Assumption 1, the joint distribution of the two quantile estimators, namely, q̂α/2 = ∗ θ̂n,α/2 −θ̂n √ , σ̂/ n q̂1−α/2 = ∗ θ̂n,1−α/2 −θ̂n √ , σ̂/ n is asymptotically normal condi- tional on the sample, as B → ∞ (see David and Nagaraja (2003)):  √  B ∗ θ̂n,α/2 −θ̂n √ σ̂/ n ∗ θ̂n,1−α/2 −θ̂n √ σ̂/ n  − qα/2 − q1−α/2     0     ⇒ N   , Ω 0 17  where  Ω=  (1−α/2)α/2 f (qα/2 )2 (α/2)2 f (qα/2 )f (q1−α/2 ) (α/2)2 f (qα/2 )f (q1−α/2 ) (1−α/2)α/2 f (q1−α/2 )2   2. Silverman (1998) provides high level assumptions for the consistency of the p non-parametric kernel density estimator, while fˆ(x) → f (x) is directly as- sumed here for simplicity. As ∗ −θ̂n θ̂n,α/2 √ σ̂/ n p → qα/2 , f (qα/2 ) is consistently estimated ∗ θ̂ −θ̂n √ by fˆ( n,α/2 ); similarly, f (q1−α/2 ) is consistently estimable. The covariance σ̂/ n p matrix Ω is thus consistently estimable: there exists Ω̂ → Ω. The normal kernel and Silverman’s rule of thumb for choosing the bandwidth are used in the empirical application and simulation studies of this chapter. Theorem. Under Assumption 1, and conditional on the sample, the following result holds as B → ∞: √ B(D̂ − D) ⇒ N (0, Ω11 + Ω22 − 2Ω12 ) 2 4z1−α/2 (1.5) where Ωi,j is the element of Ω at row i, column j. The quantitative definition of strong and weak identification implies the following decision rule: reject the null of strong identification when |D| > γ. There are two cases that would induce the rejection, D > γ and D < −γ. Consequently, strong identification is rejected when D̂ is significantly greater than γ (Case 1), or significantly less than −γ (Case 2) in the test statistics below. Case 1: strong identification is rejected at 5% if b1 = √ B D̂ − γ 2 [(Ω̂11 + Ω̂22 − 2Ω̂12 )/(4z1−α/2 )]1/2 > z95% 18 Case 2: strong identification is rejected at 5% if b2 = √ B D̂ + γ 2 [(Ω̂11 + Ω̂22 − 2Ω̂12 )/(4z1−α/2 )]1/2 < −z95% Combining these two cases, reject strong identification at 5% if √ |D̂| > γ + z95% Ω̂11 + Ω̂22 − 2Ω̂12 2 4Bz1−α/2 As B → ∞, the test above ends up with a rule of thumb: reject strong identification if |D̂| > γ, hence this test can be substituted by the rule of thumb under sufficiently large B. From now on, this test is referred to as the b test, since it is based on the bootstrap. 1.4 IV and Bootstrap In this section, a linear IV regression model is used as a specific example to further illustrate the bootstrap approach for detecting weak identification, with an application to Card (1995). Most of the analytical results are well known, for example, the convergence results in (1.1) and (1.2) under mild conditions, and proofs of the listed results are attached in the appendix. The main objective of this section is to show that the difference between the bootstrap distribution and the normal distribution is a suitable proxy for the difference between the exact distribution and the normal distribution, hence it is a reasonable indicator of the identification strength. 19 1.4.1 Model Setup    Y = Xθ + U   X = ZΠ + V Y = (Y1 , ..., Yn )′ , X = (X1 , ..., Xn )′ are n × 1 vectors of endogenous observations, and Z = (Z1 , ..., Zn )′ is the n × k matrix of instruments, k ≥ 1. U = (u1 , ..., un )′ , V = (v1 , ..., vn )′ , where the error term (ui , vi )′ , i = 1, ..., n, is assumed to have mean   zero, 2 ρσu σv   σu and to be i.i.d., homoscedastic with covariance matrix Σ =  . The 2 ρσu σv σv parameter of interest is θ, and θ̂n is the IV estimator of θ: θ̂n = (X ′ Pz X)−1 X ′ Pz Y It is central to derive the distribution of θ̂n for statistical inference. The exact finite sample distribution of θ̂n , however, is unknown without making further distributional assumptions. Instead, two alternative methods are often used to approximate the exact distribution in econometric applications: the limiting normal distribution, and the bootstrap distribution. Under the conventional asymptotic theory where the k × 1 vector Π is modeled as non-zero and fixed, the IV estimator θ̂n is asymptotically normally distributed as the sample size n gets large. In contrast, to explore the distribution of θ̂n when the instruments are only weakly related to the endogenous variable, Staiger and Stock (1997) develop weak instrument asymptotics, i.e. Π is modeled as local to zero. Assumption 2. (a) Π = Π0 ̸= 0, and Π0 is fixed; (a’) Π = Πn = fixed. C √ , n and C is 20 The asymptotics under Assumption 2(a) are called Strong Instrument Asymptotics, and the asymptotics under Assumption 2(a’) are called Weak Instrument Asymptotics. (a)(a’) are two alternative rank conditions, and in (a’) the rank condition is only weakly satisfied. The following results and notations are used: ′ p ′ Z ′ Z/n → Qzz ≡ E(Zi′ Zi ), ( Z√nU , Z√nV ) ⇒ (Ψzu , Ψzv ), and (Ψzu , Ψzv )′ is distributed N (0, Σ ⊗ Qzz ). The validity of these results follows from law of large numbers and a central limit theorem, after assuming the existence of second moments. By the similar derivation as in Staiger and Stock (1997), the following two well-known results hold. Under Strong Instrument Asymptotics: √ n(θ̂n − θ) ⇒ (Π′0 Qzz Π0 )−1 Π′0 Ψzu ∼ N (0, (Π′0 Qzz Π0 )−1 σu2 ) (1.6) Under Weak Instrument Asymptotics: −1 ′ −1 θ̂n − θ ⇒ [(Qzz C + Ψzv )′ Q−1 zz (Qzz C + Ψzv )] (Qzz C + Ψzv ) Qzz Ψzu (1.7) If k = 1, i.e. the model is exactly identified, then (1.7) reduces to: θ̂n − θ ⇒ (Qzz C + Ψzv )−1 Ψzu The conventional result of (1.6) indicates that when instruments are strong, the IV estimator θ̂n is both consistent and asymptotically normally distributed. In contrast, the result of (1.7) indicates that the IV estimator θ̂n is neither consistent nor asymptotically normally distributed, if the rank condition is weak. As the magni- 21 tude of Π increases, however, the distribution of θ̂n in (1.7) gets closer to the normal √ distribution in (1.6). In the extreme case that C = nΠ0 , (1.6) and (1.7) coincide. Π, the vector of nuisance parameters, is thus the driving force of the linear IV regression model: it determines whether θ can be consistently estimated, and whether the estimator θ̂n can be well approximated by the normal distribution. As a function of Π, the concentration parameter µ2 is a unit-less measure of the identification strength in the studies of weak instruments: µ2 = Π′ Z ′ ZΠ σv2 The greater µ2 , the stronger the identification of the parameter θ, and the distribution of θ̂n gets closer to the normal distribution, as shown in Rothenberg (1984). In addition, Stock and Yogo (2005) suggest that there is a threshold of the concentration parameter for the set of weak instruments, i.e. instruments as well as identification are considered weak if µ2 /k is under the threshold. The first stage F test is suggested in Stock and Yogo (2005) to check whether the threshold is exceeded: if the F statistic is greater than the tabled critical values, typically around 10 for small k, then instruments as well as identification are not weak: F = Π̂′n Z ′ Z Π̂n /k σ̂v2 where Π̂n = (Z ′ Z)−1 Z ′ X, σ̂v2 = (X − Z Π̂n )′ (X − Z Π̂n )/(n − k). As an alternative to the limiting normal distribution, the bootstrap provides another way of approximating the distribution of θ̂n . For the linear IV regression model under homoscedasticity, the residual bootstrap is a commonly used bootstrap method. See, for example, Moreira et al. (2009). This bootstrap procedure or algo- 22 rithm is described as follows. 1. Û , V̂ are the residuals induced by θ̂n , Π̂n in the linear IV model: Û = Y − X θ̂n , V̂ = X − Z Π̂n 2. Re-center Û , V̂ to yield Ũ , Ṽ , by pre-multiplying a constant matrix Me , where Me = In − Pe , and e is the n by 1 vector of ones: Ũ = Me Û , Ṽ = Me V̂ 3. Sampling the rows of (Ũ , Ṽ ) and Z independently n times with replacement, and let (U ∗ , V ∗ ) and Z ∗ denote the outcome. The dependent variables (X ∗ , Y ∗ ) are generated by:    Y ∗ = X ∗ θ̂n + U ∗   X ∗ = Z ∗ Π̂n + V ∗ 4. As the counterpart of the IV estimator θ̂n , the bootstrapped IV estimator θ̂n∗ is computed by the bootstrap resample (X ∗ , Y ∗ , Z ∗ ): ′ ′ θ̂n∗ = (X ∗ Pz∗ X ∗ )−1 X ∗ Pz∗ Y ∗ 5. Re-do Steps 2-4 B times, and {θ̂n∗i , i = 1, ..., B} are B i.i.d. estimators. The bootstrap data generation process (D.G.P.) above aims to mimic the D.G.P. of the linear IV regression model: when instruments are strong, the equation Π̂n = Π+Op (n−1/2 ) indicates that Π̂n is not substantially different from Π; in addition, the 23 variance of the bootstrap error term (u∗i , vi∗ ) converges to Σ, the variance of (ui , vi ). Consequently, it is natural to expect that the mimicking process works well under strong instruments, and the distributions of θ̂n∗ and θ̂n are alike. This conjecture on the bootstrapped estimator θ̂n∗ is confirmed by the result below. Under Strong Instrument Asymptotics: √ ′ ′ n(θ̂n∗ − θ̂n ) ⇒ (Π0 Qzz Π0 )−1 Π0 Ψzu (1.8) The result of (1.8) motivates the usage of the bootstrap as a tool to detect the √ identification strength: under strong identification, the distribution of n(θ̂n∗ − θ̂n ) √ is asymptotically identical to the distribution of n(θ̂n − θ), and the asymptotic √ distribution is normal; if the distribution of n(θ̂n∗ − θ̂n ) is found to be substantially different from normal, then it indicates that identification is weak. The bootstrap mimicking process also helps explain why the bootstrap becomes problematic when instruments are weak: first of all, if Π is local to zero, then the connection Π̂n = Π + Op (n−1/2 ) implies that the difference between Π̂n and Π becomes substantial, hence the identification strength in the bootstrap resample is substantially different from the identification strength in the sample; secondly, when θ̂n does not consistently estimate θ under weak instruments, the residual Û does not converge to the error term U , since Û = U − X(θ̂n − θ). Both of these two facts contribute to that the bootstrap D.G.P. does not mimic the D.G.P. of the linear IV model well under weak instruments, and consequently, the distributions of θ̂n∗ and θ̂n are different. 24 To compare the identification strength in the bootstrap resample (X ∗ , Y ∗ , Z ∗ ) with the identification strength in the sample (X, Y, Z), consider the concentration parameter µ2∗ , the bootstrap counterpart of µ2 : ′ 2∗ µ ′ ′ Π̂n Z ∗ Z ∗ Π̂n V∗V∗ 2∗ = , where σv = σv2∗ n Under Strong Instrument Asymptotics: µ2 → ∞, and µ2∗ → ∞ (1.9) Under Weak Instrument Asymptotics: ′ C ′ Qzz C C ′ Qzz C + 2C ′ Ψzv + Ψzv Q−1 zz Ψzv 2∗ µ → , and µ ⇒ 2 2 σv σv 2 p (1.10) The result of (1.9) states that when identification is strong, µ2 and µ2∗ go to infinity, implying that the distributions of θ̂n and θ̂n∗ are both asymptotically normal, as stated in (1.6) and (1.8). On the contrary, the result of (1.10) states that when identification is weak, µ2∗ and µ2 do not go to infinity, and are asymptotically different, which implies that the distributions of θ̂n and θ̂n∗ are both asymptotically non-normal, and their asymptotical distributions are not identical. The asymptotic difference in µ2 and µ2∗ is ′ 2C ′ Ψzv +Ψzv Q−1 zz Ψzv : σv2 under homoscedasticity, it has mean k. Table 1.1 reports the difference between µ2 and µ2∗ when k = 1 by Monte Carlo studies: the relative difference is found to be substantial when µ2 is small, and negligible when µ2 is large; overall, µ2∗ is greater than µ2 . Another interpretation of (1.10) is that, loosely speaking, the F statistic of the bootstrap resample is above the 25 F statistic of the sample by 1, because of the relation E(F ) ≈ µ2 /k +1 in Stock et al. (2002). The comparison of µ2∗ and µ2 indicates a useful result in the linear IV model where µ2 /k and µ2∗ /k are measures of the identification strength: on average, the identification strength in the bootstrap resample is similar to, and slightly stronger (plus 1) than the identification strength in the sample; consequently, if the identification strength in the bootstrap resample is weak, then the identification strength in the sample must also be weak. In this sense, the proposed bootstrap strategy for detecting weak identification is conservative. The Edgeworth expansion provides another look at the bootstrap strategy for detecting weak identification. For the purpose of this chapter, it suffices to consider the two-term expansion of the standardized estimator θ̂n√ −θ , σ/ n where σ 2 = (Π′0 Qzz Π0 )−1 σu2 . Define Ri = (Zi Xi , Zi Yi , vec(Zi′ Zi )′ )′ , µ = E(Vi ) = (Qzz Π, Qzz Πβ, vec(Qzz )′ )′ . ∑ √ θ̂n√ −θ Rewrite σ/ in the form of nA(R̄), where R̄ = n1 ni=1 Ri , and A(µ) = 0. The n following result is the application of the smooth function model and Theorem 2.2 in Hall (1992). A similar result is available in Moreira et al. (2009). Theorem. Under Assumption 2(a), and assume two conditions: (i) E(||Ri ||3 ) < ∞, (ii) lim sup||t||→∞ |Eexp(it′ Ri )| < 1, θ̂n√ −θ σ/ n and its bootstrap counterpart admit two- term Edgeworth expansions uniformly in x: θ̂n − θ √ ≤ x) = Φ(x) + n−1/2 p(x)ϕ(x) + o(n−1/2 ) σ/ n (1.11) θ̂n∗ − θ̂n √ ≤ x) = Φ(x) + n−1/2 p∗ (x)ϕ(x) + o(n−1/2 ) σ̂/ n (1.12) P( P( where p(x) is a polynomial of degree 2, with coefficients depending on θ, Π, and moments of Ri up to order 3, p∗ (x) is the bootstrap counterpart of p(x) with coefficients 26 depending on θ̂n , Π̂n , and moments of Ri∗ . Figure 1.2 is drawn based on the results of the Edgeworth expansion. The distance between the exact distribution of θ̂n√ −θ σ/ n and the standard normal distribution has order O(n−1/2 ), which is the same as the order of the distance between the bootstrap distribution of ∗ −θ̂ θ̂n √n σ̂/ n and the standard normal distribution. Compared with the normal distribution, the bootstrap distribution has the well known property of asymptotic refinement: it is closer to the the exact distribution, as the distance has order O(n−1 ), which results from p∗ (x) − p(x) = O(n−1/2 ): θ̂n , Π̂n , and moments of Ri∗ approach θ, Π, and moments of Ri at rate n−1/2 . As discussed above, the bootstrap strategy for detecting weak identification is to use the distance between the bootstrap distribution and the normal as a proxy of the distance between the exact distribution and the normal distribution. This strategy is supported by the Edgeworth expansion in two ways: firstly, these two distances have the same order O(n−1/2 ); secondly, the price of using the proxy is low, i.e. the proxy ∗ θ̂ θ̂n√ −θ n− √ n ≤ x) − Φ(x)] = O(n−1 ). ≤ x) − Φ(x)] − [P ( θ̂σ̂/ error is O(n−1 ), because [P ( σ/ n n To summarize, the bootstrap distribution is a good proxy for the exact distribution for the purpose of this chapter. Under the null hypothesis of strong identification, the disparity between the exact distribution and the normal distribution is reflected by the disparity between the bootstrap distribution and the normal distribution. If the bootstrap distribution is found to be substantially different from normal, then it is appropriate to conclude that the exact distribution is substantially different from normal as well, hence the identification strength is weak. So far, the discussion is restricted to the linear IV model under the homoscedasticity assumption. However, it is well understood that this assumption is unlikely 27 to hold in practice: for example, E(u2i |Zi ) may not be constant, but depend on Zi . As a result, it is common that empirical researchers need to take the existence of heteroscedasticity into consideration. Once the homoscedasticity assumption is loosened, the validity of the popular F test for detecting weak instruments/identification is under doubt: the way in which Stock and Yogo (2005) derive the critical values of the F test crucially depends on the homoscedasticity assumption. Consequently, it is not clear whether these critical values can still be used when heteroscedasticity instead of homoscedasticity takes place. The validity of the bootstrap test approach, on the contrary, stays unaffected, when the pair bootstrap replaces the residual bootstrap. Freedman (1984) shows that the pair bootstrap of the IV estimator remains valid under heteroscedasticity: the IV estimator and its bootstrap counterpart asymptotically have the same normal distribution under strong instruments. The idea of the pair bootstrap is to directly resample the data. In the special case of k = 1, the pair bootstrap is to draw the bootstrap resample (X ∗ , Y ∗ , Z ∗ ) from the empirical distribution of (X, Y, Z). Compared with the residual bootstrap above, the pair bootstrap is non-parametric, and it preserves the possible heteroscedastic relations in the IV model, as proved in Freedman (1984). Consequently, if heteroscedasticity is concerned, the bootstrap test procedure involves two minor modifications: (i) use the pair bootstrap to compute θ̂n∗ ; (ii) use a heteroscedasticity-robust estimator for σ̂. 28 1.4.2 An Empirical Example: Card (1995) In this section, an empirical application is investigated to illustrate the bootstrap approach for detecting weak identification. The same data as in Card (1995) is used. By employing the IV approach, Card (1995) answers the following question: what is the return to education? Or specifically, how much more can an individual earn if he/she completes an extra year of schooling? The dataset is ultimately taken from the National Longitudinal Survey of Young Men between 1966-1981 with 3010 observations, and there are two variables in the dataset that measure college proximity: nearc2 and nearc4, both are dummy variables, and are 1 if there is a 2-year, 4-year college in the local area respectively. See Card (1995) for the detailed description of the data. To identify the return to education, Card (1995) considers a structural wage equation as follows: lwage = α + θedu + W ′ β + u where lwage is the log of wage, edu is the years of schooling, the covariate vector W contains the control variables, and u is the error term. Among the set of parameters (α, θ, β ′ ), θ measuring the return to education is of interest. In the basic specification, Card (1995) uses five control variables: experience, the square of experience, the dummy for race, the dummy for living in the south, and the dummy for living in the standard metropolitan statistical area (SMSA). To bypass the issue that experience is also endogenous, Davidson and MacKinnon (2010) replace experience, the square of experience with age, the square of age. Following Davidson and MacKinnon (2010), I used age, square of age, and the three dummy variables as control variables, hence edu is the only endogenous regressor. In 29 addition, an extra instrument nearc2 ∧ nearc4 is constructed: nearc2 ∧ nearc4 is a dummy variable, and is 1 if there are both a 2-year and a 4-year college in the local area. Unlike Davidson and MacKinnon (2010), I use the three instruments, nearc2, nearc2 ∧ nearc4, nearc4, one by one as the single instrument for edu to better illustrate the approach discussed in this chapter, while Davidson and MacKinnon (2010) simultaneously use more than one instrument. The identification strength under the three potential IV’s is examined by the first stage F test in Stock and Yogo (2005): if nearc2 is used as the IV, F ≈ 0.54; if nearc2 ∧ nearc4 is used as the IV, F ≈ 6.98; if nearc4 is used as the IV, F ≈ 10.22. According to the rule of thumb F > 10 suggested in Stock and Yogo (2005), these F statistics suggest that nearc4 is a strong IV, nearc2, nearc2 ∧ nearc4 are not. Table 1.2 reports the point estimate and 95% confidence interval of θ under each of these instruments. If the point estimate of the return to education is of interest, 0.0936 derived by nearc4 is more reliable, compared with the other point estimates. Based on these empirical results, an additional year of education increases the wage by about 9.36%; however, the possibility that this effect is zero can not be rejected at 95%. As proposed in this chapter, the bootstrap can help evaluate the identification strength. To allow for heteroscedasticity, the pair bootstrap is employed, and the IV estimator of θ is computed B = 9999 times, using the three instruments one by one: specifically, the bootstrap resample is directly drawn from the sample with replacement, and the size of the resample is equal to the number of observations; the bootstrapped IV estimator is computed by each bootstrap resample, and this process is replicated B times. To make it comparable to the standard normal variate, the bootstrapped estimator is standardized by subtracting the IV estimate and dividing by the standard error of the IV estimator, where the standard error is computed in 30 the way of White (1980). The p.d.f and quantiles of the bootstrapped IV estimator after standardization are plotted against the standard normal variate in Figure 1.3. Figure 1.3 shows that, the bootstrap distribution is closer to the normal distribution, when the instrument is stronger. The figure of the bootstrap distribution and the Q-Q plot7 hence are useful tools to help detect weak identification, since they provide empirical researchers a graphic evaluation of the identification strength. Table 1.2 reports the 95% C.I. of the return to education by the bootstrap percentile method, in addition, the identification-robust conditional likelihood ratio (CLR)8 test by Moreira (2003) is also applied to construct a C.I. for comparison. Instead of the F test, the proposed b test is applied to determine whether the strength of identification is strong or weak, with the tentative threshold γ = 0.25. nearc2: the bootstrap C.I. of the return to education is (−4.3279, 4.8868), while the C.I. by t/Wald is (−0.8188, 1.8346); the relative difference is significantly larger than the threshold with D̂ ≈ 2.46 > γ, b1 ≈ 12.09 > z95% , hence the null of strong identification under nearc2 is rejected. nearc2 ∧ nearc4: the bootstrap C.I. of the return to education is found to be (0.0019, 0.4664), while the C.I. by t/Wald is (−0.0099, 0.2692); the relative difference is significantly larger than the threshold with D̂ ≈ 0.66 > γ, b1 ≈ 6.36 > z95% , hence the null of strong identification under nearc2 ∧ nearc4 is rejected. nearc4: the bootstrap C.I. of the return to education is found to be (0.0034, 0.2579), while the C.I. by t/Wald is (−0.0027, 0.1899); the relative difference is significantly larger than the threshold with D̂ ≈ 0.32 > γ, b1 ≈ 2.71 > z95% , hence the null of 7 Compared to the p.d.f., the Q-Q plot is known to be a better way of comparing distributions. Note: The IV model under consideration is just identified, hence CLR is equivalent to the AR test in Anderson and Rubin (1949) and K test in Kleibergen (2002). 8 31 strong identification under nearc4 is rejected. To conclude, both the F test and the bootstrap based b test can detect the weak instrument nearc2, nearc2∧nearc4 and support the view that nearc4 is the strongest among the three potential instruments. The difference is, the b test considers nearc4 as weak, while the F test treats nearc4 as strong (but just across the threshold). It is surprising that, although F > 10 holds under nearc4, the relative difference of the C.I. by t/Wald and the bootstrap C.I. is as large as 0.32. It thus indicates that the F > 10 rule is not strict enough, i.e. the disparity between the two C.I.’s is still severe, although F > 10 holds in this example. The b and F tests are proposed on different grounds, i.e. the b test is based on the comparison of the length of confidence intervals, while the F test is based on the threshold of the rank condition, hence it is not surprising that these two tests can imply different outcomes. In contrast with the F test, the b test has its advantages: firstly of all, it provides a graphic view of the identification strength, i.e. the Q-Q plot in Figure 1.3 shows that the bootstrap distribution is not very close to the normal distribution, hence the identification strength appears weak; secondly, unlike the F test, the b test does not rely on the restrictive homoscedasticity assumption, and has the potential of being extended to the generalized GMM framework. 1.5 Simulation This section presents Monte Carlo results to evaluate the power of the proposed b test. The disparity between the two intervals, (1.3) and (1.4), is reported. The threshold γ is calibrated to the F > 10 rule. 32 The linear IV model described inSection employed in the D.G.P.,  with  1.4.1 is      ui   0   1 ρ  the following choice of parameters:   ∼ N ID   ,  , where vi 0 ρ 1 ρ ∈ {0.99, 0.50, 0.01} to introduce high, moderate and low degrees of endogeneity, respectively; θ = 0; zi ∼ N ID(0, 1), i = 1, ..., n, and n = 1000. A sequence of µ2 , µ2 ∈ {2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 40, 60, 80}, is chosen by assigning different values to Π. For each µ2 , the data of (X, Y, Z) is generated by the linear IV model with the parameters specified above, and the bootstrapped IV estimator θ̂n∗ is computed B = 9999 times by the residual bootstrap. The number of replications equals 1000. The results of the Monte Carlo studies are reported in Table 1.3. C0 vs. Ct : C0 denotes the interval derived by taking the 2.5%, 97.5% quantiles of θ̂n . Ct is the 95% C.I. derived by inverting the t test. Table 1.3 reports the relative difference in lengths of these two intervals: the median absolute difference in the lengths of C0 , Ct weighted by the length of Ct is reported. The departure of Ct from C0 is more severe for larger ρ, and as expected, the departure of Ct from C0 shrinks as µ2 increases. It is found that the relative difference can be as high as 0.62 when µ2 = 10. This is a bit surprising since the identification strength under µ2 = 10 is generally not considered as very weak: as reported in the table, there is about half chance that the F test will consider the identification under µ2 = 10 as strong. In other words, although the F test and its decision rule may treat the identification strength under µ2 = 10 as strong, the departure of Ct from C0 can still be severe. Cb vs. Ct : the 95% C.I. Cb by the bootstrap is compared with Ct . Table 1.3 reports the median absolute value of the relative difference in the lengths of Cb , Ct . 33 As expected, the difference shrinks as µ2 increases. In particular, when µ2 > 10, the relative difference does not exceed 0.50 in absolute value; when µ2 ≥ 20, the relative difference does not exceed 0.25 in absolute value; when µ2 ≥ 80, the difference is negligible. b test: the proposed b test is applied to test the null of strong identification, and the percentage of rejecting strong identification is reported. Two tentative cutoffs, γ = 0.25, 0.50 are considered. Under the stricter rule of γ = 0.25, the b test rejects the null more often. F test: the F test and its decision rule in Stock and Yogo (2005) are applied to provide a benchmark for the b test. Table 1.3 reports the percentage of concluding weak identification by the F test (the frequency of the F statistic is less than its critical value) for different ρ’s. As discussed above, the F test examines whether µ2 /k exceeds the cutoff, hence it does not depend on ρ, the degree of endogeneity. By comparing the performance of the b test with the F test, γ = 0.25 appears to be a reasonable choice. It corresponds to µ2 around 10 with endogeneity close to zero. With this threshold, the frequency of concluding weak identification by the b test is comparable to the F test, although b rejects strong identification slightly more often. This chapter hence suggests γ = 0.25 as the quantitative threshold for distinguishing strong and weak identification, and this simple rule is not off the mark for practical reasons. 34 1.6 Conclusion This chapter suggests that the bootstrap is a useful tool for detecting weak identification in IV/GMM applications. The distinguishing feature of the bootstrap based approach is that it provides a graphic view of the identification strength. By eyeballing the graph of the bootstrap distribution, and comparing it with the normal distribution through the Q-Q plot, empirical researchers can evaluate whether or not weak identification exists. The underlying reason is simple: strong identification implies the bootstrap distribution is close to, and asymptotically identical to the normal distribution. A quantitative threshold for distinguishing strong and weak identification is suggested based on the comparison of two C.I.’s for the parameter of interest: the C.I. by inverting the t/Wald test and the bootstrap percentile C.I.. The difference of these two C.I.’s boils down to the difference between the bootstrap distribution and the normal distribution, and exceeding the threshold implies that the relative difference of the two C.I.’s is at least as large as a quarter. For practical purposes, the identification strength is considered as weak in this chapter once this threshold is exceeded. Monte Carlo experiments show that this threshold is comparable to and slightly stricter than the F > 10 rule of thumb in Stock and Yogo (2005). Even in the i.i.d. and homoscedasticity setting, F > 10 is found to be not strict enough, when it comes to the comparison of C.I.’s: the relative difference of the two commonly used C.I.’s named above can be very large, even when F > 10 holds; an application to Card (1995) also makes the same point. 35 Figure 1.1: Three Related Distributions Normal Distribution Exact Distribution Bootstrap Distribution Figure 1.2: Expressions and Distances Notes: The normal distribution and the bootstrap distribution are two approximations to θ̂n√ −θ the exact distribution of IV/GMM estimators. If the exact distribution of σ/ is well n approximated by the normal distribution N (0, 1), then the bootstrap distribution of ∗ −θ̂ θ̂n √n σ̂/ n is well approximated by N (0, 1) as well, because of the same magnitude O(n−1/2 ) of the approximation error. A substantial difference between the difference between of weak identification. θ̂n√ −θ σ/ n ∗ −θ̂ θ̂n √n σ̂/ n and N (0, 1) indicates and N (0, 1) is also substantial, hence provides the evidence 36 Table 1.1: A Monte Carlo study of µ2 and µ2∗ µ2∗ 12 µ2 14 2 4 6 8 10 16 18 20 40 60 80 3.1 5.1 7.1 9.1 11.2 13.2 15.2 17.2 19.2 21.2 41.4 61.5 81.6 Notes: This table compares the concentration parameter µ2 with µ2∗ , the bootstrap counterpart of µ2 , by a Monte Carlo study. For each µ2 , the data of X, Z, V are generated by xi = zi Π + vi , where: (i) xi , zi , vi are the ith elements of X, Z, V ; (ii) zi ∼ N ID(0, 1), vi ∼ N ID(0, 1), i = 1, ..., 1000; (iii) Π is determined by the value of µ2 . The reported µ2∗ is the sample average of 1000 replications; in each replication, the residual bootstrap is conducted 1000 times. Table 1.2: Return to Education IV F statistic θ̂n 95% C.I. by t/Wald by bootstrap by AR/CLR/K nearc2 nearc2 ∧ nearc4 nearc4 0.54 0.5079 (-0.8188, 1.8346) (-4.3279, 4.8868) (−∞, −0.1750] ∪ [0.0867, +∞) 6.98 0.1297 (-0.0099, 0.2692) (0.0019, 0.4664) [0.0133, 0.5253] 10.22 0.0936 (-0.0027, 0.1899) (0.0034, 0.2579) [0.0009, 0.2550] Notes: This table presents the estimate θ̂n and confidence interval for return to education using the data of Card (1995). The first stage F statistic is reported for the three instrumental variables, nearc2, nearc2 ∧ nearc4, nearc4, which are used one by one for the endogenous years of schooling. The included control variables are age, square of age, and three dummy variables for race, living in the south, living in the SMSA. 37 Figure 1.3: The bootstrap distribution, p.d.f and Q-Q plot 1 4 0.9 3 Quantiles of Bootstrap Sample 0.8 0.7 0.6 0.5 0.4 0.3 2 1 0 −1 −2 0.2 −3 0.1 0 −4 −3 −2 −1 0 1 2 3 −4 −4 4 1 −3 −2 −1 0 1 Standard Normal Quantiles 2 3 4 −3 −2 −1 0 1 Standard Normal Quantiles 2 3 4 −3 −2 −1 0 1 Standard Normal Quantiles 2 3 4 4 0.9 3 Quantiles of Bootstrap Sample 0.8 0.7 0.6 0.5 0.4 0.3 2 1 0 −1 −2 0.2 −3 0.1 0 −4 −3 −2 −1 0 1 2 3 −4 −4 4 1 4 0.9 3 Quantiles of Bootstrap Sample 0.8 0.7 0.6 0.5 0.4 0.3 2 1 0 −1 −2 0.2 −3 0.1 0 −4 −3 −2 −1 0 1 2 3 4 −4 −4 Notes: The p.d.f. and Q-Q plot of the bootstrap distribution are presented, under three instrumental variables in the application of Card (1995). 1st row: nearc2 as IV; 2nd row: nearc2 ∧ nearc4 as IV; 3rd row: nearc4 as IV; Left: p.d.f of the bootstrapped IV estimator after standardization (dotted) against the standard normal (solid); Right: the Q-Q plot. 9999 bootstrap replications are conducted. 38 Table 1.3: The performance of b, F for detecting weak identification 2 4 6 8 10 12 ρ = 0.99 5.24 1.64 1.37 0.80 0.62 0.60 0.50 2.62 1.10 0.83 0.62 0.41 0.01 1.90 0.95 0.53 0.39 ρ = 0.99 1.87 1.61 1.10 0.50 1.56 1.06 0.01 1.45 ρ = 0.99 µ2 14 16 18 20 40 60 80 0.50 0.41 0.38 0.37 0.22 0.18 0.14 0.38 0.33 0.29 0.29 0.26 0.15 0.13 0.11 0.32 0.27 0.23 0.21 0.21 0.19 0.12 0.10 0.08 0.86 0.61 0.45 0.37 0.33 0.27 0.23 0.11 0.07 0.05 0.72 0.47 0.37 0.29 0.22 0.19 0.16 0.14 0.07 0.05 0.03 0.97 0.61 0.40 0.29 0.22 0.18 0.15 0.13 0.12 0.05 0.04 0.03 90.1 84.3 75.9 65.4 53.9 40.3 30.4 22.9 14.4 10.1 0.1 0.0 0.0 0.50 83.3 69.8 57.7 43.5 34.0 24.3 16.9 13.6 9.0 5.7 0.0 0.0 0.0 0.01 81.4 68.2 52.3 38.3 27.4 18.2 12.3 7.7 4.7 3.0 0.0 0.0 0.0 ρ = 0.99 96.4 96.0 94.4 89.7 81.7 73.3 63.1 56.6 44.7 35.5 1.6 0.0 0.0 0.50 93.6 87.5 82.6 67.6 60.3 49.5 37.9 29.0 23.8 17.9 0.6 0.0 0.0 0.01 92.2 87.1 77.2 62.2 50.2 39.2 29.5 22.6 15.7 11.5 0.2 0.0 0.0 ρ = 0.99 94.1 82.7 70.5 53.0 43.3 32.1 22.6 16.1 12.5 7.9 0.0 0.0 0.0 0.50 93.9 82.1 66.3 56.9 42.9 32.7 22.4 15.7 12.5 8.2 0.0 0.0 0.0 0.01 93.4 83.7 69.4 54.9 45.7 33.3 22.3 15.2 11.6 7.4 0.0 0.0 0.0 Comparison of C.I. C0 vs. Ct : Cb vs. Ct : Rejection freq. of b γ = 0.50: γ = 0.25: Rejection freq. of F Notes: This table presents the Monte Carlo results of comparing the bootstrap based b test proposed in this chapter with the F test of Stock and Yogo (2005). The frequencies of concluding weak identification by the b test is reported for each µ2 , and the frequencies of concluding weak identification by the F test is also reported. µ2 is the concentration parameter, and the greater µ2 is, the stronger the strength of identification; ρ is the correlation coefficient, and the greater ρ is, the stronger the endogeneity; C0 , Ct and Cb are the C.I. derived by taking the 2.5%, 97.5% quantiles of θ̂n , the 95% C.I. by inverting t, the 95% C.I. by the bootstrap respectively, and their relative difference in length (the median absolute value) is reported; γ is the threshold, identification is considered as weak by the b test if the relative difference between Ct and Cb exceeds γ. 39 Appendix Proof. (1.5) By the joint normal distribution of quantile estimators, the difference of two quantile estimators is also normally distributed: √ [( B ∗ ∗ − θ̂n θ̂n,1−α/2 − θ̂n θ̂n,α/2 √ √ − σ̂/ n σ̂/ n ) ] − (q1−α/2 − qα/2 ) ⇒ N (0, Ω11 + Ω22 − 2Ω12 ) Rewrite the LHS: √ [( B ∗ ∗ θ̂n,1−α/2 − θ̂n,α/2 √ −1 2z1−α/2 σ̂/ n ) ( − )] q1−α/2 − qα/2 − 1 · 2z1−α/2 ⇒ N (0, Ω11 + Ω22 − 2Ω12 ) 2z1−α/2 The result follows after the substitution of D̂ = ∗ ∗ θ̂n,1−α/2 −θ̂n,α/2 √ 2z1−α/2 σ̂/ n − 1, D = Proof. (1.6) X ′ Pz X n = = X ′ Z(Z ′ Z)−1 Z ′ X n Z ′ Z ′ −1 ′ X ′ Z(Z ′ Z)−1 (Z Z) Z X n p → Π′0 Qzz Π0 X ′ Pz U √ n = Z ′U X ′ Z(Z ′ Z)−1 √ n ⇒ Π′0 Ψzu √ n(θ̂n − θ) = ( X ′ Pz X −1 X ′ Pz U √ ) n n ⇒ (Π′0 Qzz Π0 )−1 Π′0 Ψzu q1−α/2 −qα/2 2z1−α/2 − 1. 40 Proof. (1.7) X ′ Pz X = = X ′ Z Z ′ Z −1 Z ′ X √ ( ) √ n n n ′ ′ ZZ Z V Z ′ Z −1 Z ′ Z Z ′V ( C + √ )′ ( ) ( C+ √ ) n n n n n ⇒ (Qzz C + Ψzv )′ Q−1 zz (Qzz C + Ψzv ) X ′ Pz U = X ′ Z Z ′ Z −1 Z ′ U √ ( ) √ n n n ⇒ (Qzz C + Ψzv )′ Q−1 zz Ψzu θ̂n − θ = (X ′ Pz X)−1 X ′ Pz U −1 ′ −1 ⇒ [(Qzz C + Ψzv )′ Q−1 zz (Qzz C + Ψzv )] (Qzz C + Ψzv ) Qzz Ψzu If exactly identified : θ̂n − θ = (Z ′ X)−1 Z ′ U ⇒ (Qzz C + Ψzv )−1 Ψzu Proof. (1.8) Π̂∗n ≡ ′ ′ (Z ∗ Z ∗ )−1 Z ∗ X ∗ ′ = = ′ Z ∗ Z ∗ −1 Z ∗ V ∗ ) n n ′ ′ ′ ′ Z ∗ Z ∗ −1 Z ∗ V ∗ Z Z −1 Z V ) +( ) Π0 + ( n n n n Π̂n + ( p → Π0 ′ Z∗ U ∗ √ n ⇒ Ψzu by Lyapunov’s Central Limit Theorem: 41 ′ Let mi = Zi∗ u∗i , rewrite ′ ∗ U∗ Z√ n ∑n = ′ Z ∗ u∗ i=1 √ i i n pendent with mean µi = 0, variance σi2 = = ∑n mi i=1 √ . n Z ′ Z U˜′ Ũ n n , By construction, m′i s are indep and σi2 → Qzz σu2 under Strong Instru- ment Asymptotics. To verify the Lyapunov’s condition: for 1 > δ > 0, the expected values ∑n Op (n) 2+δ ] = lim E[|mi |2+δ ] < ∞, and limn→∞ ∑n 1 2+δ n→∞ O (n+ nδ ) = 0. i=1 E[|mi − µi | ( i=1 σi2 ) p 2 2 Combining the two results above: √ [ n(θ̂n∗ − θ̂n ) ∗′ = ∗ −1 Z X Z (Z Z ) [ = ∗′ ∗ ( ′ Π̂∗n ′ ′ Z∗ Z∗ n n ]−1 ) Π̂∗n ′ ∗′ Z ∗ ⇒ (Π0 Qzz Π0 )−1 Π0 Ψzu ]−1 ∗′ ∗ −1 (Z Z ) ∗′ ∗ ∗Z U Π̂n √ n ∗′ Z X ∗ ′ Z∗ U ∗ ′ ′ X ∗ Z ∗ (Z ∗ Z ∗ )−1 √ n 42 Proof. (1.9) µ2 = Π′ Z ′ ZΠ σv2 = Op (n) → ∞ ′ p p Π̂n → Π, σv2∗ → σv2 , ′ 2∗ µ Z∗ Z∗ p → Qzz n ′ = Π̂n Z ∗ Z ∗ Π̂n σv2∗ = Op (n) → ∞ Proof. (1.10) ′ µ2 = p → C ′ ZnZ C σv2 ′ C Qzz C σv2 ′ 2∗ µ = ′ ′ [C + ( ZnZ )−1 Z√nV ] ′ Z∗ Z∗ n [C 2∗ σv ′ ⇒ ′ ′ + ( ZnZ )−1 Z√nV ] C ′ Qzz C + 2C ′ Ψzv + Ψzv Q−1 zz Ψzv 2 σv CHAPTER Two Weak Identification in (C)CAPM 43 44 2.1 Introduction The Fama-MacBeth (FM) procedure analyzed by Shanken (1992) is widely used in empirical studies of the (C)CAPM. This two-pass procedure involves a first stage time series regression to derive β, the correlation matrix of financial assets with risk factors, and a second stage cross-sectional regression using β estimated from the first stage as regressors. The R2 at the second stage is typically reported by empirical researchers to illustrate how much cross-sectional variation of asset returns can be explained by their proposed factors, and it often increases dramatically when a new factor is added: see, e.g., Jagannathan and Wang (1996), Lettau and Ludvigson (2001), Acharya and Pedersen (2005), Lustig and Van Nieuwerburgh (2005), Li et al. (2006), Santos and Veronesi (2006), Hansen et al. (2008). Although it is common to evaluate how well models fit real data by looking at the R2 , what is surprising is that the empirical (C)CAPM literature heavily depends on the usage of the R2 for model comparison. The various versions of the (C)CAPM with new risk factors are favored, at least partially due to their large R2 . This chapter shows that the R2 of the FM two-pass procedure can be large even when risk factors are irrelevant, hence a large R2 does not imply risk factors are relevant. The underlying reason is, when factors are irrelevant, the β matrix at the first stage does not have full rank, which further induces a spurious distribution of the R2 at the second stage. I call this problem weak identification, because it displays the similar feature as the weak instrument problem surveyed by Stock et al. (2002): the validity of the FM procedure crucially depends on the quality of regressors generated from its first stage; if irrelevant factors are included, the rank condition of the second stage is violated, hence the outcome of the FM procedure is no longer reliable. 45 I start by taking the influential work by Lettau and Ludvigson (2001) for example to illustrate the main point of this chapter. Lettau and Ludvigson (2001) report larger values of R2 after adding a conditioning variable, the log consumptionwealth ratio cay, as a new factor. I consider four specifications of the (C)CAPM in Lettau and Ludvigson (2001), and replace cay with caysim , where caysim is randomly drawn from a normal distribution; the other settings remain unchanged. Table 2.1 presents the R2 under cay and caysim , and it shows that the R2 under caysim is comparable to the R2 under cay. In other words, the irrelevant risk factor caysim can also dramatically improve the R2 . In this chapter, I explain why the result in Table 2.1 could happen, by deriving the distribution of the R2 under irrelevant factors; in addition, I use the rank test of Kleibergen and Paap (2006) as a tool to investigate whether irrelevant risk factors commonly exist in empirical studies of the (C)CAPM. Although the formal analysis of the R2 is rare, the inadequacy of the FM twopass procedure has been examined in several aspects. Kan and Zhang (1999) study a single irrelevant factor model, and provide theoretical results and simulation evidence to show the existence of bias in the second stage statistical tests and overflation in the cross-sectional R2 . Employing techniques from the weak identification literature, Kleibergen (2009) shows that the statistical inference based on the FM two-pass estimator is misleading when β is equal or close to zero. Lewellen et al. (2010) find that the R2 of the FM two-pass procedure is not informative and offer suggestions to improve its performance, but their main point is not on the magnitude of β. This article extends this discussion by deriving the distribution of the R2 of a linear multifactor pricing model, and the findings cast doubt on the success of the recent conditional versions of the (C)CAPM: an increase in the R2 could be a byproduct of introducing irrelevant risk factors, hence are not strong evidence to support the conditional (C)CAPM. 46 I evaluate the performances of some recent versions of the (C)CAPM, including the following: Lettau and Ludvigson (2001), Lustig and Van Nieuwerburgh (2005), Yogo (2006), Santos and Veronesi (2006), Li et al. (2006). Previous studies indicate that risk factors proposed in these papers are successful in explaining the crosssectional variation in asset returns. However, I can not rule out these factors are in fact irrelevant. The rest of the chapter is organized as follows: the linear factor model is set up in Section 2; a rank test is suggested in Section 3; the distribution of the crosssectional R2 based on the FM two-pass procedure is derived in Section 4; in section 5, the performances of several versions of the (C)CAPM are evaluated; Section 6 d concludes. The following notations are used throughout this chapter: “ →” indicates p convergence in distribution, and “ →” indicates convergence in probability. For a T × n matrix A = (a1 , ..., an ), PA = A(A′ A)−1 A′ , MA = IT − PA , IT is the T × T identity matrix, ιT is the T ×1 vector of ones, vec(A) = (a′1 , ..., a′n )′ , vecinvn ((a′1 , ..., a′n )′ ) = A, and ⊗ is the Kronecker operator. 2.2 2.2.1 Model and Fama-MacBeth Linear Factor Model The linear factor model is constructed by three equations: E(Rt ) = ιn λ1 + βλF cov(Rt , Ft ) = βvar(Ft ) E(Ft ) = µF (2.1) (2.2) (2.3) 47 where excess asset returns Rt , market risk factors Ft , are n by 1, k by 1 vectors respectively. Equation (2.1) reflects Sharpe (1964) and Lintner (1965)’s view: in equilibrium average returns should be priced by the risk measure β, plus the risk free return λ1 . Equation (2.2) defines β as the n by k correlation matrix of asset returns and risk factors. See Cochrane (2001) for a book-length discussion. The unconditional version of CAPM considers the market return as the single factor, hence k = 1; in the unconditional CCAPM, the single factor is nondurable consumption growth. Fama and French (1992)(1993) show that a single factor model could only explain a small fraction of the total cross-sectional variation. The failings of the unconditional (C)CAPM have induced researchers to introduce new risk factors. For instance, a conditional version of the (C)CAPM adds at least a conditioning variable to Ft , hence k ≥ 2. There exists a sizeable empirical literature on the (C)CAPM, suggesting various factors can successfully explain the cross-sectional variation of asset returns, and these factors include: the consumption-to-wealth ratio in Lettau and Ludvigson (2001), the labor income-to-consumption ratio in Santos and Veronesi (2006), the housing-collateral ratio in Lustig and Van Nieuwerburgh (2005), the investment growth rate in Li et al. (2006), the growth in durable consumption in Yogo (2006), the consumption risk in Parker and Julliard (2005), the housing expenditure in Piazzesi et al. (2007), etc. Equations (2.1)(2.2)(2.3) imply the following model: Rt = ιn λ1 + βλF + βvt + ut = ιn λ1 + β(F̄t + λF ) + ϵt Ft = µF + vt 48 where ut , vt are n×1, k×1 vectors of errors, F̄t = Ft − T1 ∑T t=1 Ft , ϵt = ut +β T1 ∑T t=1 vt , and ut , vt are uncorrelated because of Equation (2.2). The data observed by empirical researchers are Rt , Ft , t = 1, ..., T . To make expressions neat and facilitate the econometric analysis of the R2 , define matrices R, F , and a vector R̄: Rn×T = (R1 , R2 , ..., RT ) Fk×T = (F1 , F2 , ..., FT ) T 1∑ Rt R̄n×1 = T t=1 2.2.2 Fama-MacBeth The commonly used FM two-pass procedure in empirical studies of the (C)CAPM is as follows: (i) at the first stage, estimate β in a time series regression, i.e. regressing Rt on Ft with intercepts; (ii) estimate the risk premium λF using β̂ in a second stage cross-sectional regression, i.e. regressing R̄ on β̂ with intercepts. Shanken (1992) provides a detailed econometric analysis of this approach. The expressions of β̂, λ̂F , and the R2 of the FM two-pass procedure are given below, where MιT , Mιn are two projection matrices of constants (see Appendix A): β̂ = RMιT F ′ (F MιT F ′ )−1 λ̂F = (β̂ ′ Mιn β̂)−1 β̂ ′ Mιn R̄ R2 = R̄′ Mιn β̂(β̂ ′ Mιn β̂)−1 β̂ ′ Mιn R̄ R̄′ Mιn R̄ (2.4) (2.5) (2.6) Given the setup of the model, a large value of the cross-sectional R2 is expected 49 in empirical studies of the (C)CAPM. This is because the R2 converges to 1 in large samples (see Appendix B), when the risk factors chosen by empirical researchers coincide with the factors in the above model. Consequently, a small value of the R2 suggests that the chosen risk factors do not coincide with the factors in the model, hence are not useful in explaining the variation of asset returns. 2.3 A Rank Test Note that the validity of the FM two-pass procedure relies on the full rank of the matrix β: in the second stage regression, the n×k matrix β is the matrix of regressors. If the rank of β is less than k, the rank condition of the ordinary least squares estimator is violated, and the validity of the FM two-pass procedure demonstrated by Shanken (1992) collapses. When only a single factor is considered, k = 1, then the rank condition fails if this single factor is irrelevant, which is unlikely to happen. However, as the empirical literature on the (C)CAPM starts to propose various multifactor models, k ≥ 2, there is a larger chance that β does not have full rank. For example, if one of the k factors is irrelevant, then a column of β is zero, hence its rank is reduced; more generally, if one column of β can be written as a linear combination of the other columns, then the rank condition fails to hold. There exist several rank tests to examine the rank condition, given a consistent estimator β̂ of the matrix β is available by the first-pass of the FM procedure: see Anderson (1951), Cragg and Donald (1996), etc. The rank test of Kleibergen and Paap (2006) is used in this chapter as this novel test overcomes some 50 deficiencies of other tests: it is robust to heteroscedasticity, while homoscedasticity is assumed in Anderson (1951); in addition, it is easier for implementation, while the rank test of Cragg and Donald (1996) involves numerical optimization. If the rank test of Kleibergen and Paap (2006) suggests that β does not have full rank, it indicates the violation of the rank condition, and further casts doubt on the validity of the risk premium estimator and the cross-sectional R2 from the second-pass of the FM procedure. Specifically, if an irrelevant factor is introduced by empirical researchers, then the rank of β in the empirical model is not full, hence the FM two-pass procedure is no longer valid; the rank test of Kleibergen and Paap (2006) helps determine whether this is the case. Despite of its importance, the necessity of a rank test has not been recognized in empirical studies of the (C)CAPM, to the best of my knowledge. I now use the rank test of Kleibergen and Paap (2006) to show it is helpful for excluding irrelevant risk factors. Take a specification of the conditional CAPM from Lettau and Ludvigson (2001) for example: the log consumption-wealth ratio cay, the value weighted return Rvw , and the interaction term cay · Rvw are the three factors. The estimate of β, the associated t statistic and p value using 25 size and book-to-market sorted portfolios are presented in Table 2.2. A feature observed from Table 2.2 that has motivated this chapter is that, Rvw is significantly related to all of the 25 portfolio returns, while cay is insignificant at 5% except for 2 portfolios only. The small magnitude of the estimate of β for cay induces the suspicion that the column of β corresponding to cay is zero, hence β does not have full rank. The result of the Kleibergen and Paap (2006) rank test is in line with Table 2.2: it tests the null that the rank of the β matrix (its dimension is 25 by 3 in this example) is 2, and reports a p value around 51 0.77. The large p value implies the failure of rejecting the null at 5% significance level, indicating that the rank condition is violated. Table 2.3 presents the rank test results for three specifications of the (C)CAPM suggested in Lettau and Ludvigson (2001), together with the unconditional (C)CAPM, and the Fama-French three factor model. I can not reject that the three versions of the conditional (C)CAPM violate the rank condition, as their p values are large. In contrast, p values for the unconditional (C)CAPM and the Fama-French three factor model are all close to zero, indicating that the rank condition is satisfied for these models. 2.4 Distribution of R2 As discussed above, if at least one risk factor is irrelevant, the rank condition is not satisfied. In this situation, the FM estimator of risk premium is inconsistent, as shown in Kleibergen (2009); however, it is not clear how the commonly used cross-sectional R2 behaves. The presumption may be that irrelevant risk factors could not dramatically increase the R2 at the second stage. This section shows this presumption is incorrect: the R2 converges to a random variable if β does not have full rank, hence it could be large, even when factors are irrelevant. Kan and Zhang (1999) have analytical results for the distribution of the R2 under k = 1, which is generalized here to consider k ≥ 1, because multifactor models are favored in recent empirical studies of the (C)CAPM. I am interested in answering the following question: if an irrelevant risk factor is used together with some relevant factors, how may the R2 be affected? 52 To visualize the asymptotic distribution of R2 , Monte Carlo simulations are conducted to draw the densities of the risk premium estimator λ̂F and the R2 . In the underlying data generating process (D.G.P.), n = 25 as the 25 Fama-French size and book-to-market sorted portfolios are the focus of many empirical studies. The risk premium λF is fixed to 1 if it exists, and the variance of errors are obtained from Lettau and Ludvigson (2001). 10000 replications are conducted. For simplicity, a three factor model is used in D.G.P. of the Monte Carlo study, unless otherwise stated. 2.4.1 Assumption The same assumption as in Kleibergen (2009) is made. Assumption 1 is a statement of a central limit theorem, which implies that R̄ and β̂ follow two independent normal distributions as stated in Lemma 1. Proof of this lemma is contained in the appendix of Kleibergen (2009).  Assumption 1: √1 T  Lemma 1:     ∑T  1   d  φR  ⊗ (R − ι λ − β( F̄ + λ ))    →  t n 1 t F t=1 Ft φβ    √  R̄ − ιn λ1 − βλF  d  ψR  T  → ψβ vec(β̂ − β) where ψR (n × 1), ψβ (nk × 1) are two independent normally distributed random variables with mean 0, and covariance matrices Ω, VF−1 ⊗Ω. Ω = var(ϵt ), VF F = var(Ft ), F ′     1   1   (φ′R , φ′β )′ ∼ N (0, V ), V = Q ⊗ Ω, Q = E   .  Ft Ft 53 Lemma 1 describes the distributions of R̄ and β̂, the main components of the R2 in Equation (2.6). The asymptotic distribution of the R2 is derived in terms of the distributions of R̄ and β̂. 2.4.2 Four Cases I consider four cases, depending on the values of β. . . Case 1: If (ιn ..β)′ (ιn ..β) has full rank, and risk factors used in the empirical study p coincide with those in the model, then R2 → 1. In this ideal setting, as the sample size T increases to ∞, λ̂F converges to λF , and the R2 goes to 1. See Figure 2.1(a)(b). Figure 2.1(a)(b) present the densities of λ̂F and R2 as the sample size T increases. Figure 2.1(a) shows that λ̂F shrinks to the true value λF = 1, and Figure 2.1(b) shows that the cross-sectional R2 converges to 1. Case 1 illustrates the foundation of reporting the R2 in practice: if the same risk factors Ft as in the underlying model are chosen, then the sample R2 tends to be large. As a result, a large value of the cross-sectional R2 is a positive indicator, since it suggests that the correct risk factors have been discovered. The R2 thus serves as a criterion for model comparison in the empirical (C)CAPM literature: risk factors that yield larger R2 are generally favored. Not surprisingly, the way of using the R2 to compare models has its pitfalls, many of which are documented in Lewellen et al. (2010). In this chapter, I focus on the problem induced by the magnitude of β: the second stage regression fails if β is 54 not large enough, and the R2 consequently becomes a misleading measure. Case 2 describes an extreme case: useless factors can yield large values of the R2 . d Case 2: If β = 0 and E(R̄) = c, then R2 → d R2 → ′ P ψR Mιn Ψβ ψR ′ M ψ ψR ιn R c′ PMιn Ψβ c c′ Mιn c . Specially, if c = ιn λ1 , then , where Ψβ = vecinvk (ψβ ), PMιn Ψβ = Mιn Ψβ (Ψ′β Mιn Ψβ )−1 Ψ′β Mιn . This is the extreme setting that all k factors included in the model are irrelevant, in the sense that these factors are uncorrelated with asset returns. Although it is tempting to think that in this setting the R2 is zero as all factors have no power of pricing average asset returns, in fact the sample R2 can be close to 1 even in large samples. See Figure 2.1(c)(d). Figure 2.1(c) shows that λ̂F does not converge to a point when the sample size increases, instead it stays randomly centered around 0. Similarly, Figure 2.1(d) shows that the R2 does not converge to 0 or 1, but stays as a random variable, as Case 2 states. The behaviors of the R2 in Figure 2.1(b) and 2.1(d) are different: in Figure 2.1(d), the R2 stays as a random variable, instead of converging to a fixed point. The randomness of the R2 in the limiting distribution makes it a misleading criterion for model comparison, i.e. irrelevant factors have a positive probability of yielding large R2 even in large samples, hence a model made of irrelevant factors always has a chance of being favored by empirical researchers, if the R2 is the sole criterion of model comparison. In practice, it is unlikely that all of the factors used by empirical researchers are unrelated with asset returns, hence the settings in Case 2 that β is zero for all k factors are very restrictive. What is possible, however, is that β is sizeable for most factors, but close to zero for one factor (or more). Case 3 corresponds to the setting 55 that a factor uncorrelated with asset returns is included in the model, together with several other relevant factors. The question under consideration is, if an irrelevant factor is introduced into the model as the conditioning variable, what will happen to the cross-sectional R2 in the conditional (C)CAPM? . Case 3: Suppose k1 + 1 factors with β̃ = (βk1 ..0) are chosen in the empirical √ . d model, where βk1 is n × k1 , 0 is n × 1, k1 < k, T (β̃ˆ − (βk1 ..0)) → ψ̃β , Ψβ = . vecinvk1 +1 (ψ̃β ) = (Ψk1 ..Ψ1 ), and E(R̄) = c, then: d R2 → c′ Mιn (V1 + V2 + V3 + V4 )Mιn c c′ Mιn c where random variables V1 , V2 , V3 , V4 are defined as follows: V0 ≡ (βk′ 1 Mιn βk1 − βk′ 1 Mιn Ψ1 (Ψ′1 Mιn Ψ1 )−1 Ψ′1 Mιn βk1 )−1 V1 ≡ βk1 V0 βk′ 1 V2 ≡ −βk1 V0 βk′ 1 Mιn Ψ1 (Ψ′1 Mιn Ψ1 )−1 Ψ′1 V3 ≡ −Ψ1 (Ψ′1 Mιn Ψ1 )−1 Ψ′1 Mιn βk1 V0 βk′ 1 V4 ≡ Ψ1 ((Ψ′1 Mιn Ψ1 )−1 + (Ψ′1 Mιn Ψ1 )−1 Ψ′1 Mιn βk1 V0 βk′ 1 Mιn Ψ1 (Ψ′1 Mιn Ψ1 )−1 )Ψ′1 This setting corresponds to the scenario that an irrelevant factor is added to an empirical model with other k1 relevant factors. Case 3 states that adding an irrelevant factor causes the limiting distribution of the R2 to be spurious. As shown by simulation, the value of the R2 could be increased even if the added factor is irrelevant. See Figure 2.1(e)(f ). The Fama and French (1993) three factor model is used in the D.G.P. of the simulation for Figure 2.1(e)(f ). A single factor from the D.G.P., and an extra useless factor are chosen to compute the R2 in Figure 2.1(f ). 56 The useless factor is constructed by a normally distributed variate independent of other variables. Figure 2.1(e) plots the density of λ̂F corresponding to the useless factor, and Figure 2.1(f ) shows that the R2 converges to a random variable after a useless factor is added. The restriction in Case 3 can be loosened to allow the added factor to be weakly related to assets, i.e. β corresponding to this added factor is close, but not strictly equal to zero. This is considered in Case 4. The result is similar: the R2 after including a nearly irrelevant factor is still random in its limiting distribution. . Case 4: Suppose k1 + 1 factors with β̃ = (βk1 .. √bT ) are chosen in the empirical √ . d model, where βk1 is n × k1 , b is n × 1, k1 < k, T (β̃ˆ − (βk1 .. √bT )) → ψ̃β , Ψβ = . vecinvk1 +1 (ψ̃β ) = (Ψk1 ..Ψ1 ), and E(R̄) = c, then: d R2 → c′ Mιn (Ṽ1 + Ṽ2 + Ṽ3 + Ṽ4 )Mιn c c′ Mιn c where random variables Ṽ1 , Ṽ2 , Ṽ3 , Ṽ4 are defined as follows: Ψ̃1 ≡ b + Ψ1 Ṽ0 ≡ (βk′ 1 Mιn βk1 − βk′ 1 Mιn Ψ̃1 (Ψ̃′1 Mιn Ψ̃1 )−1 Ψ̃′1 Mιn βk1 )−1 Ṽ1 ≡ βk1 Ṽ0 βk′ 1 Ṽ2 ≡ −βk1 Ṽ0 βk′ 1 Mιn Ψ̃1 (Ψ̃′1 Mιn Ψ̃1 )−1 Ψ̃′1 Ṽ3 ≡ −Ψ̃1 (Ψ′1 Mιn Ψ̃1 )−1 Ψ̃′1 Mιn βk1 Ṽ0 βk′ 1 Ṽ4 ≡ Ψ̃1 ((Ψ′1 Mιn Ψ̃1 )−1 + (Ψ̃′1 Mιn Ψ̃1 )−1 Ψ̃′1 Mιn βk1 Ṽ0 βk′ 1 Mιn Ψ̃1 (Ψ̃′1 Mιn Ψ̃1 )−1 )Ψ̃′1 Case 3 and 4 help explain the success of the conditional versions of the (C)CAPM. If the conditioning variable has β close to zero, then adding a conditioning variable 57 to the model is similar to adding an irrelevant factor, which causes the R2 to fail to converge, hence it is not surprising to get large R2 ’s in empirical studies, even though risk factors are irrelevant. The motivating example presented in Table 2.1 in the introduction section makes the same point: irrelevant factors can appear useful in explaining the cross-sectional variation of asset returns. To summarize, the analytical results in this section formalize the view that the R2 is a misleading measure for the empirical studies of the (C)CAPM, if the rank condition on β is not satisfied. With irrelevant or nearly irrelevant factors, it is not surprising to find a sizeable value of the R2 , because the limiting distribution of the R2 is spurious. 2.5 Examples of (C)CAPM In the early part of this chapter, I show that for the specifications of the (C)CAPM in Lettau and Ludvigson (2001), the possibility of violating the rank condition can not be ruled out. Does this concern commonly exist in the empirical (C)CAPM literature, or is it just unique in Lettau and Ludvigson (2001)? To answer this question, I evaluate several versions of the (C)CAPM in this section by checking whether the rank condition is satisfied. As discussed above, a large p value of the Kleibergen and Paap (2006) rank test suggests the violation of the rank condition. The 25 Fama-French size and book-to-market sorted portfolios are used as assets. Quarterly returns are compounded by monthly returns obtained from Kenneth French’s web site, together with the Fama-French three factors, Rm-Rf, SMB, and HML. Details of constructing the portfolios and benchmark factors are available on 58 the web site. I use the following factors: the nondurable consumption growth △cN dur and the durable consumption growth △cDur in Yogo (2006), the housing-collateral ratio myf a in Lustig and Van Nieuwerburgh (2005), the labor income-to-consumption ratio sw in Santos and Veronesi (2006), the investment growth rate in the household sector HHOLDS, the nonfinancial corporate business NFINCO, and the financial cooperations FINAN in Li et al. (2006). The data of these factors are either directly offered by the authors, or constructed following the descriptions in their paper. 1952Q1-2001Q4 is the time period within which all of the factors have data available. One specification of the (C)CAPM from each paper listed above is used as an example in Table 4. The p value of the suggested rank test by Kleibergen and Paap (2006) is reported, together with estimation results based on the ordinary least squares (OLS), generalized least squares (GLS) (see Lewellen et al. (2010) for the GLS method). The OLS R2 in Table 4 is encouraging: all of the models explain at least 41% of the cross-sectional variation in average portfolio returns, when the FM two-pass procedure is applied. However, the results of the rank test are discouraging: except the benchmark Fama-French three factor model, all the other models have large p values. The evaluation based on the GLS R2 supports the outcome of the rank test: only the Fama-French three factor model has a large GLS R2 , while all the other models have the GLS R2 under 20%. Compared to the OLS R2 , the GLS R2 is much smaller for most models, except for the Fama-French model. These results hence support 59 the view in Lewellen et al. (2010): the GLS R2 appears to be a better criterion than the OLS R2 .1 Overall, the violation of the rank condition can not be ruled out. Since introducing an irrelevant or nearly irrelevant factor into the linear multifactor model would make the R2 spurious, as demonstrated in the last section, the large values of the OLS R2 based on the FM two-pass procedure are not reliable. 2.6 Conclusion This chapter cautions that if irrelevant risk factors are included in the empirical studies of the (C)CAPM, they may appear useful in explaining the cross-sectional variation of asset returns. With one or more irrelevant risk factors in a linear multifactor model, the full rank condition of the β matrix no longer holds, which further induces the spurious limiting behavior of the cross-sectional R2 . Consequently, the value of the sample R2 can be large, even though factors are irrelevant. From the perspective of the empirical (C)CAPM literature, this chapter highlights the necessity of applying a rank test on the β matrix, which helps make empirical results more reliable. The empirical findings are easy to summarize: I can not rule out the possibility that some recent versions of the (C)CAPM violate the rank condition, while the only model that remains trustworthy is the Fama and French (1993) three factor model. 1 Lewellen, Nagel, and Shanken (2008) also use these factors as examples, and they use a different time period, 1963-2004, with 30 extra industry portfolios to expand the test assets. 60 Appendix A. The cross-sectional R2 based on the FM two-pass procedure Equations (2.1)(2.2)(2.3) imply the following model: Rt = ιn λ1 + βλF + βvt + ut Ft = µF + vt where ut , vt are n × 1, k × 1 vectors of errors, and ut , vt are uncorrelated. The model implies the following: Rt = ιn λ1 + β(λF + Ft − µF ) + ut = (ιn λ1 + βλF − βµF ) + βFt + ut Use R = (R1 , R2 , ..., RT ), F = (F1 , F2 , ..., FT ), U = (u1 , u2 , ..., uT ): R = ι′T ⊗ (ιn λ1 + βλF − βµF ) + βF + U MιT R′ = MιT F ′ β ′ + MιT U ′ Hence the OLS estimator β̂ is: β̂ = [(F MιT F ′ )−1 F MιT R′ ]′ = RMιT F ′ (F MιT F ′ )−1 61 Similary, λ̂F = (β̂ ′ Mιn β̂)−1 β̂ ′ Mιn R̄, where R̄ = R2 = (Mιn β̂ λ̂F )′ (Mιn β̂ λ̂F ) (Mιn R̄)′ (Mιn R̄) λ̂′F β̂ ′ Mιn β̂ λ̂F R̄′ Mιn R̄ = R̄′ Mιn β̂(β̂ ′ Mιn β̂)−1 β̂ ′ Mιn β̂(β̂ ′ Mιn β̂)−1 β̂ ′ Mιn R̄ R̄′ Mιn R̄ = n β̂ By definition: = = where PMι 1 T T Σt=1 Rt . R̄′ Mιn β̂(β̂ ′ Mιn β̂)−1 β̂ ′ Mιn R̄ R̄′ Mιn R̄ ′ R̄ PMι β̂ R̄ n R̄′ Mιn R̄ = Mιn β̂(β̂ ′ Mιn β̂)−1 β̂ ′ Mιn . B. Proof of Case 1 Start from Equation (2.1): E(Rt ) = ιn λ1 + βλF Mιn E(Rt ) = Mιn ιn λ1 + Mιn βλF Mιn E(Rt ) = Mιn βλF (Mιn βλF )′ (Mιn βλF ) (Mιn E(Rt ))′ (Mιn E(Rt )) = 1 p p p In the ideal setting, as T → ∞, β̂ → β, λ̂F → λF and R̄ → E(Rt ), by Slutsky’s theorem: R2 = p → (Mιn β̂ λ̂F )′ (Mιn β̂ λ̂F ) (Mιn R̄)′ Mιn R̄ (Mιn βλF )′ (Mιn βλF ) (Mιn E(Rt ))′ (Mιn E(Rt )) p → 1 C. Proof of Case 2 If β = 0 and E(R̄) = c, Lemma 1 reduces to: 62     √  R̄ − c  d  ψR  T →  ψβ vec(β̂) Hence, the following are true: p Mιn R̄ → Mιn c √ d T β̂ → vecinvk (ψβ ) = Ψβ Apply the continuous mapping theorem: R2 = d → d → R̄′ Mιn β̂(β̂ ′ Mιn β̂)−1 β̂ ′ Mιn R̄ R̄′ Mιn R̄ c′ Mιn Ψβ (Ψ′β Mιn Ψβ )−1 Ψ′β Mιn c c′ Mιn c c′ PMιn Ψβ c c′ Mιn c As a special case, if c = ιn λ1 , then: √ T Mιn R̄ = √ T Mιn (R̄ − ιn λ1 ) d → M ιn ψ R Apply the continuous mapping theorem: R2 = d → d → R̄′ Mιn β̂(β̂ ′ Mιn β̂)−1 β̂ ′ Mιn R̄ R̄′ Mιn R̄ ′ M Ψ (Ψ′ M Ψ )−1 Ψ′ M ψ ψR ιn β β ιn R β ιn β ′ M ψ ψR ιn R ′ P ψR Mιn Ψβ ψR ′ ψ R M ιn ψ R D. Proof of Case 3 If β̃ = (βk1 : 0) and E(R̄) = c, Lemma 1 reduces to: 63  √  T  R̄ − c ˆ vec(β̃ − (βk1    d  ψR  →  : 0)) ψ̃β Hence, the following are true: p R̄ → c √ d ˆ T (β̃ − (βk1 : 0)) → Ψβ = (Ψk1 : Ψ1 ) √ √ 1 1 ˆ β̃ = (βk1 + Ψk1 / T + o(T − 2 ) : Ψ1 / T + o(T − 2 )) This gives the expression of R2 as: R 2 = = ˆ ˆ ˆ ˆ R̄′ Mιn β̃(β̃ ′ Mιn β̃)−1 β̃ ′ Mιn R̄ R̄′ Mιn R̄  −1 √ − 12 − 12 ′ ′ βk1 Mιn Ψ1 / T + o(T )  ˆ ˆ  βk1 Mιn βk1 + O(T ) R̄′ Mιn β̃   β̃ ′ Mιn R̄ √ 1 −2 ′ −1 ′ Ψ1 Mιn Ψ1 /T + o(T ) Ψ1 Mιn βk1 / T + o(T ) R̄′ Mιn R̄ The formula of inverse of a block matrix states, with SD = A − BD−1 C:  −1  A B    C D   =  −1 SD −1 −SD BD−1 −1 −1 −D−1 CSD D−1 + D−1 CSD BD−1    Introduce new notations to make expressions neat:  −1  A B    C D   =   A∗ B∗ C ∗ D∗   Apply the formula and rearrange terms: R 2 = √ √ 1 R̄′ Mιn (βk1 A∗ βk′ 1 + βk1 B ∗ Ψ′1 / T + Ψ1 C ∗ βk′ 1 / T + Ψ1 D∗ Ψ′1 /T + O(T − 2 ))Mιn R̄ R̄′ Mιn R̄ 64 Derive asymptotic distributions of the four leading terms above: A∗ → (βk′ 1 Mιn βk1 − βk′ 1 Mιn Ψ1 (Ψ′1 Mιn Ψ1 )−1 Ψ′1 Mιn βk1 )−1 ≡ V0 d β1 A∗ βk′ 1 √ β1 B ∗ Ψ′1 / T √ Ψ1 C ∗ β1′ / T Ψ1 D∗ Ψ′1 /T → βk1 V0 βk′ 1 ≡ V1 d → −βk1 V0 βk′ 1 Mιn Ψ1 (Ψ′1 Mιn Ψ1 )−1 Ψ′1 ≡ V2 d → −Ψ1 (Ψ′1 Mιn Ψ1 )−1 Ψ′1 Mιn βk1 V0 βk′ 1 ≡ V3 d → Ψ1 (Ψ′1 Mιn Ψ1 )−1 (1 + Ψ′1 Mιn βk1 V0 βk′ 1 Mιn Ψ1 (Ψ′1 Mιn Ψ1 )−1 )Ψ′1 ≡ V4 d Apply the continuous mapping theorem: d R2 → c′ Mιn (V1 + V2 + V3 + V4 )Mιn c c′ Mιn c E. Proof of Case 4 If β̃ = (βk1 : √b ) T and E(R̄) = c, Lemma 1 reduces to:  √  T  R̄ − c ˆ vec(β̃ − (βk1    d  ψR  →  : √bT )) ψ̃β Hence, the following are true: p R̄ → c √ b d ˆ T (β̃ − (βk1 : √ )) → Ψβ = (Ψk1 : Ψ1 ) T √ √ 1 1 ˆ β̃ = (βk1 + Ψk1 / T + o(T − 2 ) : (b + Ψ1 )/ T + o(T − 2 )) Define Ψ̃1 ≡ b + Ψ1 , and follow the steps in the proof of Case 3. 65 Table 2.1: A motivating example, R2 under cay and caysim Specification (4) (5) (6) (7) cay 0.31 0.31 0.77 0.75 caysim 0.44 0.29 0.72 0.69 Notes: This table presents the values of the cross-section R2 of the Fama-MacBeth two-pass procedure using the versions of the (C)CAPM in Lettau and Ludvigson (2001). R2 ’s under cay are identical to those reported in Table 1 of Lettau and Ludvigson (2001), and the results are replicated using the same data as in Lettau and Ludvigson (2001). R2 ’s under caysim are the average of 10000 simulations: caysim is randomly drawn from a normal distribution, with the same mean and variance as cay. In each simulation, we compute the R2 replacing cay with caysim , and take the average of 10000 simulations. Each row in this table corresponds to a specification of the (C)CAPM in Table 1 of Lettau and Ludvigson (2001). For instance, Specification (6) corresponds to a five-factor version of the (C)CAPM: the log consumption-wealth ratio cay, the value weighted return Rvw , the labor income growth △y, and the interaction terms cay · Rvw , cay · △y; in the simulation, we use caysim , Rvw , △y, caysim · Rvw , caysim · △y to generate the R2 under caysim . This table shows that the irrelevant caysim can increase the R2 , just like cay does. 66 Table 2.2: An example of the β matrix in Lettau and Ludvigson (2001) 25 portfolios 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 βcay 0.02 0.03 0.02 0.02 -0.01 0.03 0.02 0.01 0.02 -0.04 0.04 0.07 0.03 0.03 -0.02 0.06 0.03 -0.01 -0.01 -0.04 -0.05 0.02 0.04 -0.01 -0.04 cay t 0.30 0.47 0.30 0.31 -0.09 0.56 0.35 0.30 0.63 -0.83 1.04 2.21 0.87 0.82 -0.47 1.95 1.05 -0.30 -0.48 -0.95 -2.09 0.89 1.44 -0.36 -0.96 p 0.76 0.64 0.76 0.76 0.93 0.57 0.73 0.76 0.53 0.41 0.30 0.03 0.39 0.41 0.64 0.05 0.29 0.77 0.63 0.34 0.04 0.37 0.15 0.72 0.34 βRvw 1.60 1.43 1.32 1.25 1.28 1.54 1.35 1.24 1.14 1.18 1.43 1.21 1.10 1.03 1.09 1.25 1.16 1.05 1.02 1.11 1.04 0.95 0.77 0.83 0.80 Rvw t 17.17 17.48 17.64 16.93 14.74 21.81 21.97 22.99 21.44 17.49 26.61 27.24 25.72 23.73 17.49 30.45 35.03 31.86 25.52 19.35 30.46 33.89 22.93 24.80 14.81 p 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 cay · Rvw βcay·Rvw t 0.21 0.31 0.27 0.45 0.31 0.56 0.08 0.14 0.60 0.92 0.07 0.14 -0.20 -0.44 0.00 0.01 0.29 0.73 0.65 1.29 -0.32 -0.79 0.02 0.05 0.02 0.06 0.47 1.46 -0.01 -0.02 -0.27 -0.89 -0.33 -1.35 0.27 1.10 0.75 2.50 1.05 2.44 0.30 1.17 0.19 0.91 -0.25 -1.01 0.10 0.40 0.16 0.39 p 0.76 0.66 0.58 0.89 0.36 0.89 0.66 1.00 0.47 0.20 0.43 0.96 0.95 0.15 0.99 0.37 0.18 0.27 0.01 0.01 0.24 0.36 0.31 0.69 0.70 Notes: This table presents the estimate of the correlation matrix β = (βcay , βRvw , βcay·Rvw ), and its associated t statistic and p value, from the first stage time series regression of the Fama-MacBeth two-pass procedure. The three risks factors are the log consumption-wealth ratio cay, the value weighted return Rvw , and the interaction term cay · Rvw in Lettau and Ludvigson (2001). The quarterly returns of 25 Fama-French size and book-to-market sorted portfolios are from 1963Q3 to 1998Q3. 67 Table 2.3: p value of the rank test for 6 specifications in Lettau and Ludvigson (2001) cay 1 Rvw -0.32 △y SMB HML cay·Rvw cay·△y △c cay·△c R2 0.01 p value 0.00 0.16 0.00 0.80 0.00 0.31 0.77 0.77 0.55 0.70 0.33 (0.96) 2 0.22 (0.18) 3 4 5 6 1.33 0.47 1.46 (1.59) (0.10) (0.12) -0.52 -0.06 1.14 (3.28) (1.47) (0.46) -0.44 -1.99 0.56 0.34 -0.17 (0.45) (1.58) (0.44) (0.32) (0.12) -0.13 0.02 0.06 (0.40) (0.16) (0.02) Notes: This table presents p values of the rank test in Kleibergen and Paap (2006), for 6 specifications of the (C)CAPM. A large p value indicates that the rank condition is likely violated. These 6 specifications are: the unconditional CAPM is Specification 1, the unconditional CCAPM is Specification 2, the Fama-French three factor model is Specification 3, and the other three specifications are from Lettau and Ludvigson (2001). We use the same data as Lettau and Ludvigson (2001), hence the OLS estimates of the risk premium and the R2 are identical to Lettau and Ludvigson (2001). Standard errors with Shanken (1992) correction are in brackets. The quarterly returns of 25 Fama-French size and book-to-market sorted portfolios are from 1963Q3 to 1998Q3. 68 Table 2.4: p value of the rank test for 5 specifications of (C)CAPM Model Fama-French (1993) OLS GLS Lustig-Nieuwerburgh (2005) OLS GLS Li-Vassalou-Xing (2006) OLS GLS RM -1.06 Factors SMB 0.46 HML 1.34 (1.30) (0.11) (0.15) -1.50 0.59 1.06 (0.98) (0.06) (0.11) myf a 0.02 △c 0.14 myf a△c 0.05 (0.03) (0.33) (0.03) -0.01 -0.05 0.02 (0.01) (0.12) (0.01) HHOLDS 0.03 NFINCO 0.01 FINAN 0.09 (0.02) (0.01) (0.04) 0.01 -0.01 0.01 (0.01) (0.01) (0.02) Santos-Veronesi (2006) OLS RM 0.53 sw -2.38 (1.73) (1.14) GLS -0.99 -0.15 (0.68) (0.29) Yogo (2006) OLS RM 0.06 △cN dur 0.68 (1.70) (0.38) (0.30) GLS -1.03 -0.01 -0.03 (0.69) (0.10) (0.12) R2 p value 0.73 0.00 0.79 0.74 0.54 0.18 0.58 0.57 0.19 0.41 0.96 0.04 △cDur -0.12 0.54 0.78 0.03 Notes: The table presents the OLS, GLS estimates of risk premium, the R2 , and p value from the rank test of Kleibergen and Paap (2006). 5 specifications of the (C)CAPM are considered, including the benchmark Fama-French three factor model. For each specification, the OLS, GLS estimates of the risk premium with the corresponding R2 separated by rows are reported: OLS in the first row, GLS in the second row. Standard errors with Shanken (1992) correction are in brackets. The quarterly returns of 25 Fama-French size and book-to-market sorted portfolios used for this table are from 1952Q1 to 2001Q4. 69 Figure 2.1: Densities of λ̂F and R2 7 25 6 20 5 15 4 3 10 2 5 1 0 −3 −2 −1 0 1 OLS λF 2 3 4 5 0 0 0.2 25 0.2 20 0.15 15 0.1 10 0.05 5 −1 0 OLS λF 0.8 1 0.6 0.8 1 (b) 0.25 −2 0.6 2 OLS R (a) 0 −3 0.4 1 2 3 0 0 0.2 0.4 OLS R2 (c) (d) 0.8 5 4.5 0.7 4 0.6 3.5 0.5 3 0.4 2.5 2 0.3 1.5 0.2 1 0.1 0 −3 0.5 −2 −1 0 OLS λF (e) 1 2 3 0 0 0.2 0.4 0.6 0.8 1 OLS R2 (f) Notes: This figure presents the densities of λ̂F and R2 as the sample size T increases. T = 1400(dotted), 14000(dash-dotted), 70000(dashed), 140000(solid). Case 1 is in Row 1, where factors are correctly chosen; Case 2 is in Row 2, where all factors are irrelevant; Case 3 is in Row 3, where one irrelevant factor is chosen with one relevant factor. CHAPTER Three Does a Technology Shock Increase or Decrease Hours 70 71 3.1 1 Introduction Structural Vector Autoregressions (SVAR) have recently been employed to investi- gate the impact of technology shocks on production inputs, e.g. the hours worked. The empirical findings are conflicting, and there is currently a debate on whether a positive technology shock would increase or decrease hours: for example, Gali (1999), Shea (1999), Francis and Ramey (2005) find that a positive technology shock decreases hours at short horizons, while Christiano et al. (2003) report the increase in hours after a positive technology shock. The sign of the impact of technology shocks on hours is of interest, because real business cycle models have the typical implication that the hours worked will increase after a positive shock to technology, while the implication of models with sticky prices is often the opposite: see, e.g. Rebelo (2005). The widely used identification strategy of the aforementioned empirical studies in SVAR is to impose the same restriction that only technology shocks have the long run effect on labor productivity, see e.g., Gali (1999). In addition, it is found that empirical outcome crucially depends on whether the time series of hours are specified in levels or in first differences: as stated in Pesavento and Rossi (2005), the decrease in hours is typically found in the first difference specification of hours, while the increase is often found in the level specification. Since the series of hours is highly persistent, and unit root tests are often not powerful enough to determine which specification should be chosen, Pesavento and Rossi (2005), Gospodinov (2010) etc. model hours as local-to-unity, an instrument to nest both the level and difference specifications. However, it has been shown that identification through the longrun restriction becomes weak when the variables that enter the model are highly 1 This chapter is based on a joint project with Sophocles Mavroeidis. 72 persistent, and under the local-to-unity asymptotics, Gospodinov (2010) proves that the structural parameters as well as the impulse response functions (IRF) of interest can not be consistently estimated: this failure in estimation is interpreted as a weak identification problem in Gospodinov (2010), by using the framework of the linear instrumental variable (IV) regression in Staiger and Stock (1997). From the weak identification perspective, although a consistent point estimator may not exist, it is often feasible to construct a confidence interval for the parameter of interest. In this chapter, we propose two tests to construct such confidence intervals, and we consider these intervals as robust, because no matter whether the true specification of hours is in levels or in first differences, these intervals would cover the structural parameter of interest with the probability at least as high as the nominal coverage rate. Given the robust intervals for structural parameters are derivable, and the IRF is a function of structural parameters, we consequently construct the bounds for the IRF to investigate whether a positive technology shock would increase or decrease the hours worked. The empirical findings are as follows: the impact of a positive technology shock on hours is likely positive at short horizons, but the 95% confidence interval of the contemporaneous effect also includes a negative region. The rest of the chapter contains the following parts: the VAR(p + 1) model is described in Section 2; our robust approach of deriving the confidence intervals are presented in Section 3; Section 4 contains the empirical findings; Section 5 concludes. 73 3.2 Model Following the convention, we use the following notations throughout this chapter: let lt , ht denote labor productivity and the hours worked respectively; structural shocks ϵt = (ϵzt , ϵdt )′ , where ϵzt is the technology shock and ϵdt is the non-technology shock; the object of interest is the IRF ∂ht+j , ∂ϵzt i.e. the response of hours to positive technology shocks, and we would like to explore whether its sign is positive or negative for small j’s; in the special case that j = 0, ∂ht ∂ϵzt is the contemporaneous effect of the technology shock on the hours worked. 3.2.1 VAR We consider the same setup as in Gospodinov  etal. (2009), which is an extension  lt  of the model in Gospodinov (2010). Ỹt =  , t = 1, ..., T , are assumed to be ht generated by a bivariate VAR(p + 1) model of (3.1) with assumptions i − iv: Ψ(L)(I − ΦL)Ỹt = ut  where Ψ(L) = I −   (3.1)    ψ11 (L) ψ12 (L)   1 β(ρ − 1)  i Ψ L = , Φ =    . i i=1 ψ21 (L) ψ22 (L) 0 ρ ∑p    2  σ1 σ12   u1,t  is i.i.d. with covariance Σ = i. ut =   and finite fourth   u 2 σ12 σ2 u2,t moments; ii. the largest roots of the system are contained in Φ with the conditions that lt has a unit-root, while ht follows a local-to-unity process, i.e. ρ = 1 + Tc , 74 and c is a fixed constant: this device nests both the first difference and level specifications of ht , depending on whether c = 0 or c < 0; iii. |Ψ(z)| = 0 has roots outside the unit circle and Ψ(z)−1 is one-summable; iv. there is an off diagonal element β(ρ − 1) in Φ. Gospodinov et al. (2009) and Gospodinov (2010) both emphasize that it is appropriate to use the model of (3.1) to study the impact of technology shocks on hours, and the off diagonal element β(ρ − 1) in Φ is crucial: β ̸= 0 allows the low frequency co-movement between lt and ht in the level specification of ht , and this co-movement is assumed to be removed by the first difference filter, i.e. when c = 0, Φ reduces to an identity matrix. Gospodinov et al. (2009) offer arguments for why β(ρ − 1) needs to be in place. For example, if technology shocks have lasting effects on the labor market, then the low frequency co-movement between lt and ht may be plausible. Following Gospodinov et al. (2009), the lower-left element in Φ is set to zero, to avoid that ht is I(2) when c = 0 or I(1) when c ̸= 0. If β = 0, (3.1) reduces to the model considered in Gospodinov (2010), hence (3.1) is an extension of the setup in Gospodinov (2010). 3.2.2 Long Run Restriction      △lt   1 β(1 − ρ)L  Rewrite (3.1) as A(L)Yt = ut , with A(L) = Ψ(L)  . , Yt =  ht 0 1 − ρL 75 A(L)Yt = ut is thus equivalent to:        ψ11 (L) ψ12 (L)   1 β(1 − ρ)L   △lt   u1,t      =   ψ21 (L) ψ22 (L) 0 1 − ρL ht u2,t   Premultiplying the equation above by B0 =   1 −b21 −b12   yields the SVAR 1 B(L)Yt = ϵt , with B(L) = B0 A(L), ϵt = B0 ut :     1 −b21      −b12   ψ11 (L) ψ12 (L)   1 β(1 − ρ)L   △lt       =  1 ψ21 (L) ψ22 (L) 0 1 − ρL ht ϵdt ϵzt The identification restriction that ϵdt has no long run effect on lt implies two different expressions for the structural parameter b12 , depending on whether c = 0 or not: i. if c ̸= 0:   Let M1 (L) =   1 −b21   −b12   ψ11 (L) ψ12 (L)   1 β(1 − ρ)L    , then 1 ψ21 (L) ψ22 (L) 0 1 − ρL the long run restriction corresponds to that M1 (1) is lower-triangular, i.e. the upper-right element of the 2 by 2 matrix M1 (1) is zero, implying b12 = ψ12 (1)+βψ11 (1) ; ψ22 (1)+βψ21 (1) ii. if c = 0:    Let M2 (L) =  1 −b21  −b12   ψ11 (L) ψ12 (L)  , then the long run restric ψ21 (L) ψ22 (L) 1 76 tion corresponds to that the upper-right element in M2 (1) is zero, implying b12 = ψ12 (1) . ψ22 (1) Gospodinov et al. (2009) use the above discontinuity in the solution of b12 to explain why the level specification and the first difference specification of ht produce substantially different IRF’s in empirical studies: the IRF’s are functions of the structural parameters; using a different specification of ht implies a different identification condition for b12 , as described above, hence it is not surprising that IRFs substantially differ when the specifications of ht differ, if β ̸= 0. The condition β ̸= 0 is thus crucial to help reconcile the conflicting empirical results: the discontinuity in the solution of b12 disappears once β = 0 is imposed, i.e. b12 = ψ12 (1) , ψ22 (1) b12 by no matter whether c = 0 or c ̸= 0. Gospodinov (2010) proposes to infer ψ12 (1) , ψ22 (1) after imposing β = 0; without imposing β = 0, Gospodinov (2010)’s approach of deriving b12 through ψ12 (1) ψ22 (1) is no longer applicable in the current setup. In the later part of this chapter, we will propose methods to construct C.I.’s for b12 , without assuming β = 0. 3.2.3 Model Simplification Define Ã(L) = Ψ(L)(I − ΦL), A∗ (L) = L−1 (I − Ã(L)) = A∗0 + A∗1 L + ... + A∗p Lp , ∑p ∗ A∗∗ i = − j=i+1 Aj . The Vector Error Correction (VEC) form of (3.1) is as follows: ∗ △Ỹt = (A (1) − I)Ỹt−1 + p ∑ i=1 A∗∗ i−1 △Ỹt−i + ut 77 More explicitly:    △lt = (ψ12 (1) + βψ11 (1)) c ht−1 + ∑p a∗∗ △lt−i + ∑p a∗∗ △ht−i + u1,t i=1 i−1,11 i=1 i−1,12 T ∑ ∑  p ∗∗  △ht = (ψ22 (1) + βψ21 (1)) c ht−1 + pi=1 a∗∗ i−1,21 △lt−i + i=1 ai−1,22 △ht−i + u2,t T To simplify the VEC form above, project △lt , △ht and ht−1 on the predetermined variables △lt−1 , ..., △lt−p , △ht−1 , ..., △ht−p , and save the residuals from projection as △˜lt , △h̃t and h̃t−1 , i.e. regress △lt , △ht and ht−1 on △lt−1 , ..., △lt−p , △ht−1 , ..., △ht−p , and the residuals are denoted by △˜lt , △h̃t and h̃t−1 . By projection, the VEC form is simplified to (up to op (1) terms):    △˜lt = (ψ12 (1) + βψ11 (1)) c h̃t−1 + u1,t T (3.2)   △h̃t = (ψ22 (1) + βψ21 (1)) c h̃t−1 + u2,t T Let c∗ ≡ (ψ22 (1) + βψ21 (1))c, and use the condition b12 = ψ12 (1)+βψ11 (1) ψ22 (1)+βψ21 (1) implied by the long run restriction, we get:    △˜lt = b12 c∗ h̃t−1 + u1,t T   △h̃t = Note that the condition b12 = c∗ h̃ T t−1 (3.3) + u2,t ψ12 (1)+βψ11 (1) ψ22 (1)+βψ21 (1) holds only when c ̸= 0, hence (3.3) is derived under c ̸= 0. However, this simplified model remains correct when c = 0: c = 0 implies c∗ = 0, hence b12 c∗ = 0, no matter which expression of b12 is used. In other words, no matter whether or not c equals 0, it is valid to simplify (3.1) to (3.3). 78 ∗∗ Premultiplying the VEC form by B0 yields the structural form below, with Bi−1 = B0 A∗∗ i−1 : ∗ B0 △Ỹt = B0 (A (1) − I)Ỹt−1 + p ∑ ∗∗ Bi−1 △Ỹt−i + ϵt i=1 Impose the long run restriction and write the structural form more explicitly:   ∑ ∑p ∗∗  z △lt = b12 △ht + pi=1 b∗∗ i−1,11 △lt−i + i=1 bi−1,12 △ht−i + ϵt  ∑ ∑p ∗∗  d △ht = b21 △lt + b∗22 ht−1 + pi=1 b∗∗ i−1,21 △lt−i + i=1 bi−1,22 △ht−i + ϵt where b∗22 = [(ψ22 (1) + βψ21 (1)) − b21 (ψ12 (1) + βψ11 (1))]c/T . After projecting out the lags, the structural form reduces to:    △˜lt = b12 △h̃t + ϵzt   △h̃t = b21 △˜lt + b∗22 h̃t−1 + ϵdt 3.3 3.3.1 (3.4) Tests AR For a given c, a confidence interval for b12 can be constructed by inverting AR test. Consider the auxiliary regression below for testing H0 : b12,0 = b12 . Under H0 , θ = 0: △˜lt − b12,0 △h̃t = θh̃t−1 + ϵzt 79 The tAR statistic is the t statistic for testing θ = 0 in the auxiliary regression above: ∑ ∑ ( Tt=2 h̃2t−1 )−1 Tt=2 h̃t−1 (△˜lt − b12,0 △h̃t ) tAR (b12,0 ) = ∑ ( Tt=2 h̃2t−1 )−1/2 σ̂ϵ where σ̂ϵ 2 is the sample variance of the OLS residual. Theorem. Under assumptions i − iv, and H0 : b12 = b12,0 , tAR has the asymptotic distribution that depends on c: tAR =⇒ ρvϵ τc + √ 1 − ρ2vϵ z ∑p ∗∗ where ρvϵ is the long run correlation of vt and ϵzt , vt ≡ i=1 ai−1,21 △lt−i + u2,t , (∫ )−1/2 ∫ 1 1 τc ≡ 0 Jc (s)2 ds J (s)dW (s), z is a standard normal variate. 0 c Proof. ∑T ∑ h̃2t−1 )−1 Tt=2 h̃t−1 ϵzt tAR (b12,0 ) = ∑ ( Tt=2 h̃2t−1 )−1/2 σ̂ϵ ∑ ∑ ( Tt=2 h̃2t−1 )−1/2 Tt=2 h̃t−1 ϵzt = σ̂ϵ ( t=2 Apply two results in Hansen (1995), with a(1) ≡ ψ22 (1) + βψ21 (1): ∫ 1 T 1 ∑ 2 −2 2 J c (s)2 ds h̃ ⇒ a(1) σv T 2 t=2 t−1 0 ) ( ∫ 1 ∫ 1 T 1∑ c † −1 c 2 1/2 z J (s)dW (s) J (s)dW (s) + (1 − ρvϵ ) h̃t−1 ϵt ⇒ a(1) σv σϵ ρvϵ T t=2 0 0 80 (∫ )−1/2 ( 1 2 tAR (b12,0 ) =⇒ Jc (s) ds √ =⇒ ρvϵ τc + 1 − ρ2vϵ z ∫ ρvϵ 0 3.3.2 1 Jc (s)dW (s) + 0 √ ∫ 1 − ρ2vϵ ) 1 † Jc (s)dW (s) 0 Wald A joint confidence set for (b12 , c∗ ) can be constructed by inverting Wald test.  c∗0 ˜ t=2 h̃t−1 △lt − b12,0 T   ( Let ϕT (b12,0 , c∗0 ) =  . The Wald statis∑ ∑ c∗ ( Tt=2 h̃2t−1 )−1 Tt=2 h̃t−1 △h̃t − T0 tic is: [ ] T ∑ W (b12,0 , c∗0 ) = ϕT (b12,0 , c∗0 )′ Σ̂−1 h̃2t−1 ϕT (b12,0 , c∗0 ) u  ∑T 2 −1 t=2 h̃t−1 ) ∑T t=2 Theorem. Under assumptions i − iv, and H0 : b12 = b12,0 , c∗ = c∗0 , W (b12,0 , c∗0 ) has the asymptotic distribution dependent on c: W (b12,0 , c∗0 ) =⇒ (ρv2 τc + (1 − ρ2v2 )1/2 z2 )2 + (ρv1−2 τc + (1 − ρ2v1−2 )1/2 z1 )2 where ρv2 is the long run correlation of vt and u2,t , ρv1−2 as the long run correlation of vt with u1−2 , z1 , z2 are independent standard normal variates, and u1−2 results from the decomposition of u1,t = σ1 ( ρσu2 u2,t + (1 − ρ2u )1/2 u1−2 ), ρu is the correlation of u1,t , u2,t . Proof. Decompose u1,t : u1,t = σ1 ( ρu u2,t + (1 − ρ2u )1/2 u1−2 ) σ2 81 where ρu is the correlation of u1,t and u2,t , and u1−2 is uncorrelated with u2,t : the subscript indicates that it comes from u1,t after excluding u2,t ; in addition, the variance of u1−2 is 1. T T ∑ ∑ 2 −1/2 Let D1 denote the convergence outcome of ( h̃t−1 ) h̃t−1 u1−2 : t=2 t=2 T T ∑ ∑ −1/2 2 h̃t−1 u1−2 ⇒ D1 ( h̃t−1 ) t=2 t=2 where D1 = ρv1−2 τc + (1 − ρ2v1−2 )1/2 z1 , and z1 is a standard normal variate. Similarly: T T ∑ ∑ ( h̃2t−1 )−1/2 h̃t−1 u2,t ⇒ σ2 D2 t=2 t=2 where D2 = ρv2 τc + (1 − ρ2v2 )1/2 z2 , and z2 is a standard normal variate independent of z1 . T T ∑ ∑ 2 −1/2 With the above notations, rewrite ( h̃t−1 ) h̃t−1 u1,t as: t=2 [ σ1 [ t=2 T T ∑ ∑ ρu 2 −1/2 ( h̃t−1 ) h̃t−1 ( u2,t + (1 − ρ2u )1/2 u1−2 ) σ2 t=2 t=2 ] T T T T ∑ ∑ ρu ∑ 2 −1/2 ∑ 2 1/2 2 −1/2 = σ1 ( h̃ ) h̃t−1 u2,t + (1 − ρu ) ( h̃t−1 ) h̃t−1 u1−2 σ2 t=2 t−1 t=2 t=2 t=2 [ ] ⇒ σ1 ρu D2 + (1 − ρ2u )1/2 D1 ] 82 Now the Wald statistic becomes:  ′  T T ∑ ∑ h̃2 )−1/2 h̃t−1 u1,t   (   t=2 t−1   −1 t=2   Σu  T T  ∑   ∑    h̃2t−1 )−1/2 h̃t−1 u2,t (  t=2 t=2 ( ( T ∑ t=2 T ∑ h̃2t−1 )−1/2 h̃2t−1 )−1/2 t=2 T ∑ t=2 T ∑  h̃t−1 u1,t      h̃t−1 u2,t t=2 ′   [ ] [ ] 2 1/2 σ ρu D2 + (1 − ρ2u )1/2 D1   σ1 ρu D2 + (1 − ρu ) D1  −1  1 ⇒  Σ  u   σ2 D2 σ2 D2      [ ] ′ [ ] 2 1/2 2 2 1/2  σ1 ρu D2 + (1 − ρu ) D1   σ2 −σ12   σ1 ρu D2 + (1 − ρu ) D1       2 σ 2 D2 −σ12 σ1 σ 2 D2 ⇒ 2 σ12 σ22 − σ12 ⇒ D12 + D22 We are ultimately most interested in the IRF ∂ht+j : ∂ϵzt the effect of a positive technology shock on the hours worked. Once we get b12 by inverting AR or Wald, we derive ∂ht+j ∂ϵzt in the following manner. Given b12 , there is a correspondent b21 : b21 = b12 σ22 −σ12 . b12 σ12 −σ11 This relation holds because the covariance matrix E(ϵt ϵ′t ) of the is diagonal, and  structural shocks  −b12   1 B0 E(ut u′t )B0′ = E(ϵt ϵ′t ). Consequently, B0 =   can be recovered. −b21 1 Combining B0 with A1 , ..., Ap+1 , which are estimated by VAR(p + 1) of Yt = 83    △lt   , the IRF ht ∂ht+j ∂ϵzt can be derived through the companion matrix F :    A1 A2 ... Ap+1   I  F =   ...   I         Take the j th power of F , and denote the up left 2 by 2 matrix of F j by F̃ j , then ∂Yt+j ∂ϵt ′ = F̃ j B0−1 , and ∂ht+j ∂ϵzt is the (2, 1) element of ∂Yt+j . ∂ϵt ′ As an example, when j = 0, the contemporaneous effect of ∂ht ∂ϵzt is: ∂ht b21 = z ∂ϵt 1 − b12 b21 b12 σ22 − σ12 = 2b12 σ12 − σ11 − b212 σ22 Once a robust confidence interval of b12 is available, we derive the confidence interval for ∂ht+j ∂ϵzt by using the point estimates of A1 , ..., Ap+1 , Σu , and all possible values of b12 in its confidence interval: the maximum/minimum value of upper/lower bound, respectively. ∂ht+j ∂ϵzt becomes its 84 3.4 Application The same data as in Christiano et al. (2003) and Pesavento and Rossi (2005) is used in our empirical application: the quarterly data of labor productivity and the hours worked (denoted by lt , ht respectively in this chapter) between 1948Q1 − 2001Q4 are from the DRI Economics Database; labor productivity is measured by the output per hour in the business sector, while hours is measured by the hours worked in the business sector divided by the population; both labor productivity and hours are in natural logarithm. See Christiano et al. (2003) for further details. Although Christiano et al. (2003) and Pesavento and Rossi (2005) use the same dataset, their conclusions contradict each other: Christiano et al. (2003) argue that the level specification of hours should be chosen over the difference specification, and report that a positive technology shock will drive hours up, while Pesavento and Rossi (2005) employ an agnostic procedure which works for both the level and difference specifications of hours, and report that a positive productivity shock drives hours down. The recent work by Gospodinov et al. (2009), Gospodinov (2010) suggest that it is appropriate to use the model of (3.1) to investigate the impact of technology shocks on hours, and we adopt the robust approach for (3.1) to construct the confidence interval for the response of hours to technology shocks. Since we are using the same dataset, it will be interesting to compare our results with Christiano et al. (2003), Pesavento and Rossi (2005), etc. As a starting point, we consider the two specifications of hours, in levels and in first differences. The conventional identification procedure (see, e.g., Gali (1999)) under the restriction that only technology shocks have a long run effect on labor pro- 85 ductivity is applied for the two specifications, respectively: following Christiano et al. (2003), a VAR model with a drift and four lags is employed, where lt is specified in first differences, ht is specified either in levels or in first differences. The response of the hours worked to the positive technology shocks is plotted in Figure 3.1, and it is consistent with the current debate: at short horizons, the effect of a positive technology shock on the hours worked is found positive under the level specification of hours, and this effect is negative under the difference specification. The 95% confidence bounds are constructed by the bootstrap. For future use, we estimate the following matrices: (i) the coefficient matrix    △lt  A1 , A2 , A3 , A4 in the VAR(4) model of  ; (ii) the covariance matrix Σu of ht the reduced error ut ; (iii) the long run matrix Ψ(1).    △lt  i. Â1 , Â2 , Â3 , Â4 : estimate a VAR(4) model with a drift for  , Â1 , Â2 , Â3 , Â4 ht are the OLS estimates of the coefficients before  the first, second, third,  fourth  A1 A2 A3 A4     I 0 0 0    lag, respectively; the companion matrix F =   is thus  0 I 0 0      0 0 I 0 86 estimated by:   −0.1008 0.0698 0.0317 −0.2719 −0.0616 0.0858 −0.0286 0.1344      0.1611 1.4193 0.2083 −0.4265 0.1303 0.0131 0.0503 −0.0329         1.0000  0 0 0 0 0 0 0       0 1.0000 0 0 0 0 0 0   F̂ =     0 0 1.0000 0 0 0 0 0         0 0 0 1.0000 0 0 0 0       0 0 0 0 1.0000 0 0 0     0 0 0 0 0 1.0000 0 0    0.8707 0.0015  ii. Σ̂u : save the residuals of the VAR(4) model above, and Σ̂u =   0.0015 0.5390 is the sample covariance matrix of the residuals.    △lt  iii. Ψ̂(1): estimate a VAR(3) model with a drift for  , Ψ̂1 , Ψ̂2 , Ψ̂3 are the △ht OLS estimates of the coefficients before the first, second, third lag, respectively,  and:   1.0895 0.2986  Ψ̂(1) = 1 − Ψ̂1 − Ψ̂2 − Ψ̂3 =   −0.4372 0.4970 Using the same dataset, Pesavento and Rossi (2005) report that the CADF (Covariate Augmented Dickey-Fuller) statistic is around −3.072, and the 95% confidence interval of ρ is (0.897, 0.997). When T = 216 in the application, it implies that the 95% C.I. of c is: (cl , cu ) = (−22.2480, −0.6480). Take the following steps to invert AR and construct C.I. for the IRF: 1. For each c ∈ (cl , cu ), invert tAR for a C.I. of b12 ; 87 2. Bonferroni: take the union of C.I.’s of b12 from above; 3. For each b12 in its C.I., compute ∂ht+j ; ∂ϵzt 4. Upper/lower bounds of C.I. come from max/min of ∂ht+j . ∂ϵzt Similarly, take the following steps to invert Wald test and construct C.I. for the IRF: 1. For each (b12,0 , c∗0 ), compute the Wald statistic, and save (b12,0 , c∗0 ) if W (b12,0 , c∗0 ) does not exceed the critical value; 2. Construct the C.I. of b12,0 by projection; 3. For each b12 in its C.I., compute ∂ht+j ; ∂ϵzt 4. Upper/lower bounds of C.I. come from max/min of ∂ht+j . ∂ϵzt The response of the hours worked to the positive technology shocks by AR and Wald is plotted in Figure 3.2. It is found that the IRF lies mostly in the positive region at short horizons, although at 95%, we can not reject that the contemporaneous effect is negative. 3.5 Conclusion We contribute to the current debate on whether a positive technology shock will bring the hours worked up or down by providing new robust results: this effect on hours is found most positive at short horizons, although we do not rule out the possibility that the contemporaneous effect is negative at 95%. 88 Figure 3.1: IRF under level and first-difference hours: in levels 1.5 1 0.5 0 −0.5 −1 0 2 4 6 8 10 12 Periods After Shock 14 16 18 20 14 16 18 20 hours: in first differences 1.5 1 0.5 0 −0.5 −1 0 2 4 6 8 10 12 Periods After Shock ∂h Notes: this figure reports the IRF ∂ϵt+j z , i.e. the effect of positive technology shocks t on the hours worked, and the IRF is constructed under the long run restriction by the same approach as in Christiano et al. (2003). The solid line is the point estimate, with ht in levels in the upper panel, and ht in first differences in the low panel; the dashed line is the 95% upper and lower bounds by the bootstrap. 89 Figure 3.2: IRF bounds by AR and Wald Robust: AR 1.5 1 0.5 0 −0.5 −1 0 5 10 Periods After Shock 15 20 15 20 Robust: Wald 1.5 1 0.5 0 −0.5 −1 0 5 10 Periods After Shock ∂h Notes: this figure reports the 95% bounds of the IRF ∂ϵt+j z , i.e. the effect of positive t technology shocks on the hours worked, and the bounds are constructed by the AR and Wald approach proposed in this chapter. BIBLIOGRAPHY Acharya, V. and Pedersen, L. (2005). Asset pricing with liquidity risk. Journal of Financial Economics, 77 (2), 375–410. Anderson, T. (1951). Estimating linear restrictions on regression coefficients for multivariate normal distributions. The Annals of Mathematical Statistics, pp. 327– 351. — and Rubin, H. (1949). Estimation of the parameters of a single equation in a complete system of stochastic equations. The Annals of Mathematical Statistics, pp. 46–63. Angrist, J. and Krueger, A. (1991). Does compulsory school attendance affect schooling and earnings? The Quarterly Journal of Economics, pp. 979–1014. Bravo, F., Escanciano, J. and Otsu, T. (2009). Testing for Identification in GMM under Conditional Moment Restrictions, working Paper. Card, D. (1995). Using geographic variation in college proximity to estimate the return to schooling, Aspects of labour market behaviour: essays in honour of John Vanderkamp. ed. L.N. Christofides, E.K. Grant, and R. Swidinsky. Christiano, L., Eichenbaum, M. and Vigfusson, R. (2003). What happens after a technology shock? NBER working paper. Cochrane, J. (2001). Asset pricing. Princeton University Press. Cragg, J. and Donald, S. (1996). On the asymptotic properties of ldu-based tests of the rank of a matrix. Journal of the American Statistical Association, pp. 1301–1309. David, H. and Nagaraja, H. (2003). Order statistics. Wiley-Interscience. Davidson, R. and MacKinnon, J. (2010). Wild bootstrap tests for IV regression. Journal of Business and Economic Statistics, 28 (1), 128–144. Efron, B. (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics, pp. 1–26. 90 91 Fama, E. and French, K. (1992). The cross-section of expected stock returns. Journal of finance, pp. 427–465. — and — (1993). Common risk factors in the returns on stocks and bonds. Journal of financial economics, 33 (1), 3–56. Francis, N. and Ramey, V. (2005). Is the technology-driven real business cycle hypothesis dead? Shocks and aggregate fluctuations revisited. Journal of Monetary Economics, 52 (8), 1379–1399. Freedman, D. (1984). On bootstrapping two-stage least-squares estimates in stationary linear models. The Annals of Statistics, 12 (3), 827–842. Gali, J. (1999). Technology, employment, and the business cycle: do technology shocks explain aggregate fluctuations? The American Economic Review, 89 (1), 249–271. Gospodinov, N. (2010). Inference in Nearly Nonstationary SVAR Models with Long-Run Identifying Restrictions. Journal of Business and Economic Statistics, 28 (1), 1–12. —, Maynard, A. and Pesavento, E. (2009). Sensitivity of Impulse Responses to Small Low Frequency Co-Movements: Reconciling the Evidence on the Effects of Technology Shocks. Cahiers de recherche. Hahn, J. and Hausman, J. (2002). A new specification test for the validity of instrumental variables. Econometrica, 70 (1), 163–189. Hall, P. (1992). The bootstrap and Edgeworth expansion. Springer Verlag. — and Horowitz, J. (1996). Bootstrap critical values for tests based on generalized method of moments estimators. Econometrica, 64 (4), 891–916. Hansen, B. (1995). Rethinking the univariate approach to unit root testing: Using covariates to increase power. Econometric Theory, 11 (05), 1148–1171. Hansen, L. (1982). Large sample properties of generalized method of moments estimators. Econometrica: Journal of the Econometric Society, pp. 1029–1054. —, Heaton, J. and Li, N. (2008). Consumption strikes back? measuring long-run risk. Journal of Political Economy, 116 (2), 260–302. Horowitz, J. (1994). Bootstrap-based critical values for the information matrix test. Journal of Econometrics, 61 (2), 395–411. — (2001). The Bootstrap. Handbook of Econometrics, Vol. 5. Inoue, A. and Rossi, B. (2008). Testing for weak identification in possibly nonlinear models. Jagannathan, R. and Wang, Z. (1996). The conditional capm and the crosssection of expected returns. Journal of Finance, pp. 3–53. 92 Kan, R. and Zhang, C. (1999). Two-pass tests of asset pricing models with useless factors. Journal of Finance, pp. 203–235. Kleibergen, F. (2002). Pivotal statistics for testing structural parameters in instrumental variables regression. Econometrica, pp. 1781–1803. — (2005). Testing parameters in GMM without assuming that they are identified. Econometrica, pp. 1103–1123. — (2009). Tests of risk premia in linear factor models. Journal of Econometrics, 149 (2), 149–173. — and Paap, R. (2006). Generalized reduced rank tests using the singular value decomposition. Journal of Econometrics, 133 (1), 97–126. Lettau, M. and Ludvigson, S. (2001). Resurrecting the (c) capm: A crosssectional test when risk premia are time-varying. Journal of Political Economy, 109 (6), 1238–1287. Lewellen, J., Nagel, S. and Shanken, J. (2010). A skeptical appraisal of asset pricing tests. Journal of Financial Economics, 96 (2), 175–194. Li, Q., Vassalou, M. and Xing, Y. (2006). Sector investment growth rates and the cross section of equity returns. The Journal of Business, 79 (3), 1637–1665. Lintner, J. (1965). The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. The Review of Economics and Statistics, pp. 13–37. Lustig, H. and Van Nieuwerburgh, S. (2005). Housing collateral, consumption insurance, and risk premia: An empirical perspective. Journal of Finance, pp. 1167–1219. Moreira, M. (2003). A conditional likelihood ratio test for structural models. Econometrica, pp. 1027–1048. —, Porter, J. and Suarez, G. (2009). Bootstrap validity for the score test when instruments may be weak. Journal of Econometrics, 149 (1), 52–64. Parker, J. and Julliard, C. (2005). Consumption risk and the cross section of expected returns. Journal of Political Economy, 113 (1), 185–222. Pesavento, E. and Rossi, B. (2005). Do technology shocks drive hours up or down? A little evidence from an agnostic procedure. Macroeconomic Dynamics, 9 (04), 478–488. Piazzesi, M., Schneider, M. and Tuzel, S. (2007). Housing, consumption and asset pricing. Journal of Financial Economics, 83 (3), 531–569. Rebelo, S. (2005). Real business cycle models: past, present and future. Scandinavian Journal of Economics, 107 (2), 217–238. Rothenberg, T. (1984). Approximating the distributions of econometric estimators and test statistics. Handbook of econometrics, 2, 881–935. 93 Santos, T. and Veronesi, P. (2006). Labor income and predictable stock returns. Review of Financial Studies, 19 (1), 1–44. Shanken, J. (1992). On the estimation of beta-pricing models. Review of Financial Studies, pp. 1–33. Sharpe, W. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of finance, pp. 425–442. Shea, J. (1999). What do technology shocks do? NBER Macroeconomics Annual. Silverman, B. (1998). Density estimation for statistics and data analysis. Chapman & Hall/CRC. Staiger, D. and Stock, J. (1997). Instrumental variables regression with weak instruments. Econometrica, pp. 557–586. Stock, J. and Wright, J. (2000). GMM with weak identification. Econometrica, pp. 1055–1096. —, — and Yogo, M. (2002). A survey of weak instruments and weak identification in generalized method of moments. Journal of Business and Economic Statistics, 20 (4), 518–529. — and Yogo, M. (2005). Testing for weak instruments in linear IV regression, Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg. ed. DW Andrews and JH Stock, pp. 80–108. White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: Journal of the Econometric Society, 48 (4), 817–838. Wooldridge, J. (2002). Econometric analysis of cross section and panel data. MIT press. Wright, J. (2002). Testing the Null of Identification in GMM. International Finance Discussion Papers, 732. — (2003). Detecting lack of identification in GMM. Econometric Theory, 19 (02), 322–330. Yogo, M. (2006). A consumption-based explanation of expected stock returns. The Journal of Finance, 61 (2), 539–580.