The issue of sample selection can even hinge on the presence or absence of a single industry. For example, as mentioned earlier FH (1997) omit the computer industry (SIC revision 2 industry #3573) from their sample because they report that results with computers are not plausible. SSh do not literally drop computers from the sample, but the effectively do so with a dummy variable for this industry. As discussed earlier, SSh treat computers differently because of concerns that reported computer prices do not adequately reflect the extent of this industry’s quality upgrading.
How much do computers really matter for the SSh results? They do not report results for the (SSh) equation excluding the computer dummy, but this can easily be done using the NBER’s Productivity Data Base. SSh use three-digit data, I use the four-digit data assuming that more-disaggregated data are better. SSh report unweighted regressions; for robustness I also use value of shipments and employment to weight industries.
|Industry||Estimation||Coefficient on||Coefficient on|
|Sample||Method||Prod’n Share||Computer Dummy|
|No Computer Dummy||(-0.594)|
|(value of shipments)||(0.813)|
|(value of shipments)||(-2.518)||(-52.504)|
Table 2 reports the results. The key message is that “computers matter.” Without a computer-industry dummy no strong relationship appears between product-price changes and the share of production workers in total industry employment. But as was reported for SSh earlier, with a computer-industry dummy effectively removing this industry a strong negative relationship appears among the non-computer industries.
Given that computers (and perhaps other single industries?) can play such an important role, when can industries be excluded from analysis? Lack of data seems to be one justifiable reason. Examples include LS and BC using all tradables price data that exist and FH excluding three industries which did not have materials prices. LS explicitly state their assumption that their smaller samples are representative of overall manufacturing (fn. 55, p. 195 and fn. 63, p. 202).
The issue of selectively excluding data which do exist seems to be a trickier issue. FH invoke the reasonable criterion of excluding data that drive nonsensical results. SSh invoke the criterion of bad data quality. They do not elaborate this point, however, either in terms of why computer-price data are so bad in absolute terms or, more importantly, relative to other industries. Presumably other industries also had quality improvements which need to be accounted for in constructing “true” price changes.