************************************************************************ * Set main directory ************************************************************************ cd "" ************************************************************************ * IMPORTANT: always run this section in order to use the * command density2 ************************************************************************ *! version 2.0 28Apr2010 B. Sianesi cap program drop density2 program define density2 version 10.0 syntax varname [if] [in], Group(varname) [Matched(varname) *] marksample touse tempvar w fw fw0 fw1 kdensity `varlist' if `touse', nograph gen(`w' `fw') if `"`matched'"' != `""' { kdensity `varlist' if `touse' & `group'==0 [fw=`matched'], nograph gen(`fw0') at(`w') label var `fw0' "Matched `group'=0" kdensity `varlist' if `touse' & `group'==1 [fw=`matched'], nograph gen(`fw1') at(`w') label var `fw1' "Matched `group'=1" } if `"`matched'"' == `""' { kdensity `varlist' if `touse' & `group'==0, nograph gen(`fw0') at(`w') label var `fw0' "`group'=0" kdensity `varlist' if `touse' & `group'==1, nograph gen(`fw1') at(`w') label var `fw1' "`group'=1" } twoway line `fw1' `fw0' `w', lw(thick medium) `options' end ************************************************************************ * Open the dataset and analyze the content ************************************************************************ use nsw-dataset , clear describe more tabulate treated more summarize more ************************************************************************ *** RANDOMISED EXPERIMENTS ************************************************************************ * NSW control group (experimental) use nsw-dataset , clear keep if randomized == 1 /* use only NSW sample */ * 1. Check randomization across control and treatment group. * (a) Perform a t-test on each of the variables. ttest age, by(treat) unequal more ttest educ, by(treat) unequal more ttest black, by(treat) unequal more ttest hisp, by(treat) unequal more ttest marr, by(treat) unequal more ttest nodeg, by(treat) unequal more ttest re75, by(treat) unequal more * (b) Perfom a Hotelling T-squared test of the hypothesis that the vector of means of all variables are equal across groups. hotelling age educ black hispanic married nodegree re75, by(treat) more // Alternative you can replicate the command hotelling with the following commands: reg treat age educ black hispanic married nodegree re75 test age educ black hispanic married nodegree re75 more /* Random assignment in the NSW overall balances the X We cannot reject H0 of equal means for all the variables except nodegree. The treated group seems to be statistically significantly more educated in terms of higher education. Overall, from Hotelling's T-squared test we cannot reject that the vectors of all means are equal between the two groups. */ * 2. Derive the experimental impact estimate over the post-programme earnings. * (a) Compare treatment and control group computing within-group means. more summarize re78 if treated==1 more summarize re78 if treated==0 more display 5976.352-5090.048 more * (b) Derive the estimate using regression tecnique controlling for the sole treatment variable. regress re78 treated more /* The average NSW impact is 886, significant only at the 10% level. */ * (c) Derive the same estimate controlling for other demographic variables. more regress re78 treated age age2 educ black hispanic nodegree more /* Controlling for additional X's provides consistent estimates too. Advantages: the X's may soak up some of the residual variance in Y, increasing precision, and one could additionally control for possibly still unbalanced X's. Neither of these however seems to matter much in this data. */ * 3. Check for heterogenous effects. * (a) Check the effect among young (24 years old or younger) and adult males (older than 24). more regress re78 treated if age > 24 more regress re78 treated if age <= 24 more /* There is quite a bit of heterogeneity in impacts, at least in terms of age. NSW does not seem to be very useful for young people. Would need to formally test the significance of the difference though. */ more gen treated_age = treated*(age>24) more tab age treated_age if treated==1 more regress re78 treated treated_age more * (b) Check the effect across different ethnic backgrounds. more regress re78 treated if black == 1 more regress re78 treated if hispanic == 1 more regress re78 treated if (black == 0 & hispanic == 0) more * PSID control group (non-experimental) use NSW-dataset, clear keep if randomized == 0 /* use only PSID sample */ * 1. Compare the treatment group with the PSID comparison group. * (a) Compare the means of the main variables for the two groups. more ttest age, by(treat) unequal more ttest educ, by(treat) unequal more ttest black, by(treat) unequal more ttest hisp, by(treat) unequal more ttest marr, by(treat) unequal more ttest nodeg, by(treat) unequal more ttest re75, by(treat) unequal more * (b) Compare the distribution of post-programme earnings for both groups. ttest re75, by(treat) unequal more twoway(kdensity re75 if re75<40000 & treated==1)(kdensity re75 if re75<40000 & treated==0), graphregion(color(white)) more /* The PSID sample is: - older - better educated (both in years and % without HE) - less likely to be black of Hispanic - more likely to be married - with higher pre-treatment Y Overall, the unadjusted non-experimental comparison group is not very comparable to the experimental treated group. */ * 2. Derive the naive non-experimental estimator. regress re78 treated more regress re78 treated age age2 educ black hispanic nodegree more /* Very poor performance! We find large and significant and negative impact estimates. These are however consistent with the different distribution of observed characteristics between the NSW and the PSID samples. In particular, the comparison group has much "labour market friendly" characteristics. */ more ********************************************************* *** MATCHING ********************************************************** use NSW-dataset, clear keep if randomized == 0 /* use only PSID sample */ * 1. Estimate the propensity score of being treated by the programme. * (a) Run a probit model (estimate the probability to be treated on different demographic variables). more dprobit treated age black hispanic married educ nodegree re75 more * (b) Analyze what variables determines the selection into treatment. /* Trainees are more likely to be - younger - of ethnic minorities - without HE - with lower pre-training earnings - unmarried This is in line with our findings as to differences between the two groups. */ more * (c) Generate the predicted probability to be treated and name the variable “score”. more cap drop score predict double score more * 2. Compare the distribution of the propensity score across the two groups, using the command “psgraph, treated(treated) psscore(score) bin(50)” and analyze the result in relation to the common support issue. more summarize score if treated== 1, detail more summarize score if treated== 0, detail more psgraph, treated(treated) pscore(score) bin(50) graphregion(color(white)) more /* A VERY small proportion of the PSID sample seems to be comparable to our treated. But note we do have A LOT of PSID individuals. */ * 3. Perform one-to-one matching * (a) Perform Nearest-neighbour matching with replacement psmatch2 treated, pscore(score) outcome(re78) more summarize re78 if treated==1 [fw=_weight] scalar y1 = r(mean) summarize re78 if treated==0 [fw=_weight] scalar y0 = r(mean) display y1-y0 more tab _weight if treated==0 more /* Given what we have seen in terms of comparability before matching, we may want to impose some form of common support. Hence we turn to caliper matching... */ * (b) Perform Nearest-neighbour caliper matching with replacement. To introduce a caliper just add the option “caliper(real)”, where real is a number between 0 and 1. * i. Decide what is the best caliper to be introduced by looking at the average difference between treated and matched controls (this can be done by looking at _pdif stored by psmatch2). summarize _pdif, detail /* A caliper of 2% is not really binding in this data. Let's impose 1%. */ more psmatch2 treated, pscore(score) outcome(re78) caliper(0.01) more summarize _support if treated == 1 display 1 - r(mean) /* We lose 26 treated. The point estimate is reduced in size. */ more psgraph more * ii. Compare the results for binding and non-binging caliper. psmatch2 treated, pscore(score) outcome(re78) caliper(0.05) psgraph more psmatch2 treated, pscore(score) outcome(re78) caliper(0.01) psgraph more * 4. Assess matching quality * (a) Plot the density distribution for treated versus non treated and for treated versus matched non-treated. psmatch2 treated, pscore(score) outcome(re78) caliper(0.01) more density2 score, g(treated) graphregion(color(white)) more density2 score, g(treated) m(_weight) graphregion(color(white)) more /* We have aligned the propensity scores very well. These are pretty obvious results, mechanically depending only on strictness of caliper But more important than checking if the probabilities used for matching were balanced, what really matters is whether matching on these probabilities balances our regressores. */ more pstest age black hispanic married educ nodegree re75, both graph more /* Matching has decreased bias extremely well; overall the two groups are jointly balanced. The percentage biases look at means, here we look at the two densities of a given X. */ density2 re75 if re75<40000, g(treated) more density2 re75 if re75<40000, g(treated) m(_weight) more /* The distributions of earnings have been pretty realigned too. */ more * (b) Check whether the propensity score is sumarizing well all the control variables. * (c) Run a pstest for Nearest neighbour without replacement and without caliper. psmatch2 treated, pscore(score) outcome(re78) noreplacement more /* The estimates get worse. The latter is likely to be due to a worsening of the balancing in the X's. We are going to check this now. */ more summarize _pdif, detail more pstest age black hispanic married educ nodegree re75, both graph nodist more /* Indeed, bias remains high; the two groups are not balanced at all. */ more density2 re75 if re75<40000, g(treated) m(_weight) more * (d) Run a pstest for Nearest neighbour without replacement but with caliper. more psmatch2 treated, pscore(score) outcome(re78) caliper(0.01) noreplacement more summarize _support if treated==1 display 1-r(mean) more psgraph more pstest age black hispanic married educ nodegree re75, both graph nodist more density2 re75 if re75<40000, g(treated) m(_weight) more /* A MUCH better balancing of the X's...at the cost of losing more than half of the treated group. */ * 5. Try generating the results using other types of matching. * (a) Nearest neighbours psmatch2 treated, pscore(score) outcome(re78) neigh(10) more psmatch2 treated, pscore(score) outcome(re78) neigh(20) more psmatch2 treated, pscore(score) outcome(re78) neigh(10) cal(0.01) more tab _nn if treated==1 more psmatch2 treated, pscore(score) outcome(re78) neigh(20) cal(0.01) more tab _nn if treated==1 more * (b) Kernel matching psmatch2 treated, pscore(score) outcome(re78) kernel more psmatch2 treated, pscore(score) outcome(re78) kernel k(normal) more psmatch2 treated, pscore(score) outcome(re78) kernel common more psmatch2 treated, pscore(score) outcome(re78) kernel k(normal) common more psmatch2 treated, pscore(score) outcome(re78) kernel common bw(0.01) more psmatch2 treated, pscore(score) outcome(re78) kernel k(normal) common bw(0.01) /* For the two types of kernel and with/without common support: the estimates are negative and large. Especially bad for normal kernel. Results are however very sensitive to the choice of bandwidth, with bw=0.01 producing much better results than bw=0.06. They are not particularly sensitive to the imposition of the common support at the boundaries. */ psmatch2 treated, pscore(score) outcome(re78) kernel k(normal) bw(2) /* Note that imposing a large bandwith (covering all the range of pscore) amounts to weighting all the comparisons equally -- hence the estimate basically coincides with the naive comparison of mean outcomes. */ ************************************************************************ *** INSTRUMENTAL VARIABLES ************************************************************************ use NSW-dataset, clear keep if randomized == 0 /* use only PSID sample */ * 1. Compute the Wald estimator. more summarize re78 if married==1 scalar m1=r(mean) more summarize re78 if married==0 scalar m0=r(mean) more summarize treated if married==1 scalar p1=r(mean) more summarize treated if married==0 scalar p0=r(mean) more display (m1-m0)/(p1-p0) more ivreg re78 (treated = married), first more /* Large negative impact estimate, nowhere near the experimental benchmark */ * 2. Compute the 2SLS using demographic control variables. regress treated age educ black hisp nodeg re75 married more predict double trhat more regress re78 trhat age educ black hisp nodeg re75 more ivreg re78 age educ black hisp nodeg re75 (treated = married), first more /* The impact estimate is reduced in (absolute) size, still nowhere near the experimental benchmark for ATT. If we believe our exclusion restriction, the IV estimates has thus to be a LATE. The LATE associated with the married instrument = mean effect of NSW for those who would participate in it *only* if they are single. Not of great policy interest. This instrument is a strong one -- ie it predicts treatment well, with or without regressors -- but the exclusion restriction may not be too convincing, especially when not controlling for any X. */ *********************************************************** *** LONGITUDINAL METHODS *********************************************************** * 1. Using the cross section database in the EXPERIMENTAL dataset: use NSW-dataset, clear keep if randomized == 0 /* use only PSID sample */ * (a) Expand the dataset in order to obtain panel data (duplicate observations are generate a dependent variable that is time dependent) generate id=_n expand 2 more bysort id: generate time=1 if _n==1 bysort id: replace time=0 if _n==_N more generate re=re78 if time==1 replace re=re75 if time==0 more list id treated time re re75 re78 age in 1/20, noob sep(2) more * (b) Run a DiD estimate for the programme controlling for the covariates. generate DT = treated*time more reg re DT treated time age educ black hisp nodeg, cluster(id) more * (c) Estimating the effect using instead a FE model xtset id time xtreg re DT treated time, fe cluster(id) more /* Notice results are the same as in DiD since DiD is applied to a panel. */ * 3. Use the cross section database in the EXPERIMENTAL dataset to compare with social experiment use NSW-dataset, clear keep if randomized == 1 * (a) Expand the dataset in order to obtain panel data (duplicate observations are generate a dependent variable that is time dependent) generate id=_n expand 2 more bysort id: generate time=1 if _n==1 bysort id: replace time=0 if _n==_N more generate re=re78 if time==1 replace re=re75 if time==0 more * (b) Run a DiD estimate for the programme controlling for the covariates. generate DT = treated*time more reg re DT treated time age educ black hisp nodeg, cluster(id) more * (c) Estimating the effect using instead a FE model xtset id time xtreg re DT treated time, fe cluster(id) more