************************************************************************
* 	Set main directory
************************************************************************

	cd ""

	
************************************************************************
* 	IMPORTANT:  always run this section in order to use the
* 				command density2 
************************************************************************
*! version 2.0  28Apr2010 B. Sianesi

	cap program drop density2
	program define density2

	version 10.0

	syntax varname [if] [in], Group(varname) [Matched(varname) *]

		marksample touse
		
		tempvar w fw fw0 fw1
		
		kdensity `varlist' if `touse', nograph gen(`w' `fw')
		
		if `"`matched'"' != `""'  {
			kdensity `varlist' if `touse' & `group'==0 [fw=`matched'], nograph gen(`fw0') at(`w')
			label var `fw0' "Matched `group'=0"
			kdensity `varlist' if `touse' & `group'==1 [fw=`matched'], nograph gen(`fw1') at(`w')
			label var `fw1' "Matched `group'=1"
		}
		if `"`matched'"' == `""'  {
			kdensity `varlist' if `touse' & `group'==0, nograph gen(`fw0') at(`w')
			label var `fw0' "`group'=0"
			kdensity `varlist' if `touse' & `group'==1, nograph gen(`fw1') at(`w')
			label var `fw1' "`group'=1"
		}
		
		twoway line `fw1' `fw0' `w',  lw(thick medium) `options' 

	end
	
************************************************************************
* 	Open the dataset and analyze the content
************************************************************************

	use nsw-dataset , clear

	describe
		more
	tabulate treated
		more
	summarize
		more

************************************************************************
*** RANDOMISED EXPERIMENTS  						
************************************************************************
	
* NSW control group (experimental)	
		
		use nsw-dataset , clear
		keep if randomized == 1  	/* use only NSW sample */
	
	* 1. Check randomization across control and treatment group.
		
		* (a) Perform a t-test on each of the variables.
		
			ttest age,   by(treat) unequal
			more
			ttest educ,  by(treat) unequal
			more
			ttest black, by(treat) unequal
			more
			ttest hisp,  by(treat) unequal
			more
			ttest marr,  by(treat) unequal
			more
			ttest nodeg, by(treat) unequal
			more
			ttest re75,  by(treat) unequal
			more

		* (b) Perfom a Hotelling T-squared test of the hypothesis that the vector of means of all variables are equal across groups.
		
			hotelling age educ black hispanic married nodegree re75, by(treat)
			more
			// Alternative you can replicate the command hotelling with the following commands:
			reg treat age educ black hispanic married nodegree re75
			test  age educ black hispanic married nodegree re75
			more
			
			/*
				Random assignment in the NSW overall balances the X

				We cannot reject H0 of equal means for all the variables except nodegree. 
				The treated group seems to be statistically significantly more educated in terms of higher education.

				Overall, from Hotelling's T-squared test we cannot reject that the vectors of all means are equal between the two groups.
			*/


	* 2. Derive the experimental impact estimate over the post-programme earnings.

		* (a) Compare treatment and control group computing within-group means.
		
			more
			summarize re78 if treated==1
			more
			summarize re78 if treated==0
			more
			display 5976.352-5090.048 
			more
	
		* (b) Derive the estimate using regression tecnique controlling for the sole treatment variable.
		
			regress re78 treated
			more	
			
			/* 
				The average NSW impact is 886, significant only at the 10% level. 
			*/


		* (c) Derive the same estimate controlling for other demographic variables.


			more
			regress re78 treated age age2 educ black hispanic nodegree 
			more

			/*
				Controlling for additional X's provides consistent estimates too.
	
				Advantages: the X's may soak up some of the residual variance in Y,	increasing precision, and one could additionally control for possibly still unbalanced X's.
	
				Neither of these however seems to matter much in this data.
			*/

	* 3. Check for heterogenous effects.

		* (a) Check the effect among young (24 years old or younger) and adult males (older than 24).

			more
			regress re78 treated if age > 24 
			more
			regress re78 treated if age <= 24
			more

			/*
				There is quite a bit of heterogeneity in impacts, at least in terms of age.
	
				NSW does not seem to be very useful for young people.
	
				Would need to formally test the significance of the difference though.
			*/
			
			more

			gen treated_age = treated*(age>24)
			more
	
			tab age treated_age if treated==1
			more

			regress re78 treated treated_age
			more

		* (b) Check the effect across different ethnic backgrounds.
	
			more
			regress re78 treated if black == 1 
			more
			regress re78 treated if hispanic == 1
			more
			regress re78 treated if (black == 0 & hispanic == 0)
			more


* PSID control group (non-experimental)

		use NSW-dataset, clear
		keep if randomized == 0  	/* use only PSID sample */

	* 1. Compare the treatment group with the PSID comparison group.

		* (a) Compare the means of the main variables for the two groups.
		
			more
			ttest age,   by(treat) unequal
			more
			ttest educ,  by(treat) unequal
			more
			ttest black, by(treat) unequal
			more
			ttest hisp,  by(treat) unequal
			more
			ttest marr,  by(treat) unequal
			more
			ttest nodeg, by(treat) unequal
			more
			ttest re75,  by(treat) unequal
			more

		* (b) Compare the distribution of post-programme earnings for both groups.
		
			ttest re75,  by(treat) unequal
			more
	
			twoway(kdensity re75 if re75<40000 & treated==1)(kdensity re75 if re75<40000 & treated==0), graphregion(color(white))
			more

		/*
			The PSID sample is:
			- older
			- better educated (both in years and % without HE)
			- less likely to be black of Hispanic
			- more likely to be married
			- with higher pre-treatment Y 

			Overall, the unadjusted non-experimental comparison group is not very comparable to the experimental treated group.
		*/

	* 2. Derive the naive non-experimental estimator.
		
			regress re78 treated
			more

			regress re78 treated age age2 educ black hispanic nodegree 
			more

		/*
			Very poor performance!
			We find large and significant and negative impact estimates.
			These are however consistent with the different distribution of observed characteristics between the NSW and the PSID samples. 
			In particular, the comparison group has much  "labour market friendly" characteristics.
		*/

			more


*********************************************************
*** MATCHING    						   
**********************************************************

	use NSW-dataset, clear
	keep if randomized == 0  	/* use only PSID sample */
		
	* 1. Estimate the propensity score of being treated by the programme.

		* (a) Run a probit model (estimate the probability to be treated on different demographic variables).
		
			more
			dprobit treated age black hispanic married educ nodegree re75
			more
 
		* (b) Analyze what variables determines the selection into treatment.
		
			/*
				Trainees are more likely to be
				- younger
				- of ethnic minorities
				- without HE
				- with lower pre-training earnings
				- unmarried
				This is in line with our findings as to differences between the two groups.

			*/
			
			more

		* (c) Generate the predicted probability to be treated and name the variable “score”.
			
			more
			cap drop score
			predict double score
			more

	* 2. Compare the distribution of the propensity score across the two groups, using the command “psgraph, treated(treated) psscore(score) bin(50)” and analyze the result in relation to the common support issue.

		more

		summarize score if treated== 1, detail
		more
		summarize score if treated== 0, detail
		more

		psgraph, treated(treated) pscore(score) bin(50) graphregion(color(white))
		more

		/*
			A VERY small proportion of the PSID sample seems to be comparable to our treated.
			But note we do have A LOT of PSID individuals.
		*/
		
	* 3. Perform one-to-one matching

		* (a) Perform Nearest-neighbour matching with replacement
		
			psmatch2 treated, pscore(score) outcome(re78)
			more

			summarize re78 if treated==1 [fw=_weight]
			scalar y1 = r(mean)
			summarize re78 if treated==0 [fw=_weight]
			scalar y0 = r(mean)
			display y1-y0
			more

			tab _weight if treated==0
			more

		/*
			Given what we have seen in terms of comparability before matching,
			we may want to impose some form of common support.
			Hence we turn to caliper matching...
		*/


		* (b) Perform Nearest-neighbour caliper matching with replacement. To introduce a caliper just add the option “caliper(real)”, where real is a number between 0 and 1.

			* i. Decide what is the best caliper to be introduced by looking at the average difference between treated and matched controls (this can be done by looking at _pdif stored by psmatch2).

				summarize _pdif, detail

				/*
					A caliper of 2% is not really binding in this data. Let's impose 1%.
				*/
				more

				psmatch2 treated, pscore(score) outcome(re78) caliper(0.01)
				more

				summarize _support if treated == 1
				display  1 - r(mean)

				/*
					We lose 26 treated. The point estimate is reduced in size.
				*/
				
				more
				psgraph
				more

			* ii. Compare the results for binding and non-binging caliper.
			
				psmatch2 treated, pscore(score) outcome(re78) caliper(0.05)
				psgraph
				more
				psmatch2 treated, pscore(score) outcome(re78) caliper(0.01)
				psgraph
				more

	* 4. Assess matching quality

		* (a) Plot the density distribution for treated versus non treated and for treated versus matched non-treated.
		
			psmatch2 treated, pscore(score) outcome(re78) caliper(0.01) 
			more

			density2 score, g(treated) graphregion(color(white))
			more
			density2 score, g(treated) m(_weight) graphregion(color(white))
			more

			/*
				We have aligned the propensity scores very well.
				These are pretty obvious results, mechanically depending only on strictness of caliper

				But more important than checking if the probabilities used for matching
				were balanced, what really matters is whether matching on these probabilities
				balances our regressores.
			*/
			
			more
			pstest age black hispanic married educ nodegree re75, both graph
			more

			/*
				Matching has decreased bias extremely well;
				overall the two groups are jointly balanced.

				The percentage biases look at means,
				here we look at the two densities of a given X.
			*/

			density2 re75 if re75<40000, g(treated) 
			more
			density2 re75 if re75<40000, g(treated) m(_weight)
			more
	
			/*
			The distributions of earnings have been pretty realigned too.
			*/
			more

		* (b) Check whether the propensity score is sumarizing well all the control variables.

		* (c) Run a pstest for Nearest neighbour without replacement and without caliper.
		
			psmatch2 treated, pscore(score) outcome(re78) noreplacement
			more

			/*
				The estimates get worse. The latter is likely to be due to a
				worsening of the balancing in the X's. We are going to check this now.
			*/

			more
			summarize _pdif, detail
			more

			pstest age black hispanic married educ nodegree re75, both graph nodist
			more

			/*
				Indeed, bias remains high; the two groups are not balanced at all.
			*/

			more
			density2 re75 if re75<40000, g(treated) m(_weight)
			more

		* (d) Run a pstest for Nearest neighbour without replacement but with caliper.
		
			more
			psmatch2 treated, pscore(score) outcome(re78) caliper(0.01) noreplacement
			more

			summarize _support if treated==1
			display  1-r(mean)
			more

			psgraph
			more

			pstest age black hispanic married educ nodegree re75, both graph nodist
			more

			density2 re75 if re75<40000, g(treated) m(_weight)
			more

			/*
				A MUCH better balancing of the X's...at the cost of losing more than half of the treated group.
			*/

	* 5. Try generating the results using other types of matching.

		* (a) Nearest neighbours
		
			psmatch2 treated, pscore(score) outcome(re78) neigh(10)
			more
			psmatch2 treated, pscore(score) outcome(re78) neigh(20)
			more

			psmatch2 treated, pscore(score) outcome(re78) neigh(10) cal(0.01)
			more
			tab _nn if treated==1
			more
			psmatch2 treated, pscore(score) outcome(re78) neigh(20) cal(0.01)
			more
			tab _nn if treated==1
			more

		* (b) Kernel matching
		
			psmatch2 treated, pscore(score) outcome(re78) kernel
			more
			psmatch2 treated, pscore(score) outcome(re78) kernel k(normal)
			more
			psmatch2 treated, pscore(score) outcome(re78) kernel common
			more
			psmatch2 treated, pscore(score) outcome(re78) kernel k(normal)  common
			more
			psmatch2 treated, pscore(score) outcome(re78) kernel common	 bw(0.01)
			more
			psmatch2 treated, pscore(score) outcome(re78) kernel k(normal)  common bw(0.01)			
			
			/*
				For the two types of kernel and with/without common support: the estimates are negative and large.
				Especially bad for normal kernel.
				Results are however very sensitive to the choice of bandwidth, with
				bw=0.01 producing much better results than bw=0.06.
				They are not particularly sensitive to the imposition of the common support at the boundaries.
			*/
	
			psmatch2 treated, pscore(score) outcome(re78) kernel k(normal)  bw(2)

			/*
				Note that imposing a large bandwith (covering all the range of pscore)
				amounts to weighting all the comparisons equally -- hence the estimate
				basically coincides with the naive comparison of mean outcomes.
			*/

	
************************************************************************
***		INSTRUMENTAL VARIABLES  							
************************************************************************
	
	use NSW-dataset, clear
	keep if randomized == 0  	/* use only PSID sample */
		
	* 1. Compute the Wald estimator.

			more
			summarize re78 if married==1 
			scalar m1=r(mean)
			more
			summarize re78 if married==0 
			scalar m0=r(mean)
			more
			summarize treated if married==1 
			scalar p1=r(mean)
			more
			summarize treated if married==0 
			scalar p0=r(mean)
			more

			display (m1-m0)/(p1-p0)
			more
			
			ivreg re78 (treated = married), first
			more

			/*
				Large negative impact estimate, nowhere near the experimental benchmark
			*/
		
	* 2. Compute the 2SLS using demographic control variables.
	
			regress treated age educ black hisp nodeg re75 married
			more
			predict double trhat
			more
			regress re78 trhat age educ black hisp nodeg re75 
			more

			ivreg re78 age educ black hisp nodeg re75 (treated = married), first
			more
	

	/*
		The impact estimate is reduced in (absolute) size, still nowhere near the experimental benchmark for ATT.

		If we believe our exclusion restriction, the IV estimates has thus to be a LATE.

		The LATE associated with the married instrument	= mean effect of NSW for those who would participate in it *only* if they are single.
		
		Not of great policy interest.

		This instrument is a strong one -- ie it predicts treatment well, with or without regressors -- but the exclusion restriction may not be too convincing, 
		especially when not controlling for any X.
	*/


***********************************************************
*** LONGITUDINAL METHODS 					
***********************************************************
	
	* 1. Using the cross section database in the EXPERIMENTAL dataset:
	
		use NSW-dataset, clear
		keep if randomized == 0  	/* use only PSID sample */

		
		* (a) Expand the dataset in order to obtain panel data (duplicate observations are generate a dependent variable that is time dependent)

			generate id=_n
			expand 2
			more
			bysort id: generate time=1 if _n==1
			bysort id: replace  time=0 if _n==_N
			more
			generate re=re78 if time==1
			replace  re=re75 if time==0
			more

			list id treated time re re75 re78 age in 1/20, noob sep(2)
			more
		
		
		* (b) Run a DiD estimate for the programme controlling for the covariates.

			generate DT = treated*time
			more
	
			reg re DT treated time age educ black hisp nodeg, cluster(id) 
			more
		
		
		* (c) Estimating the effect using instead a FE model 

			xtset id time
			
			xtreg re DT treated time, fe cluster(id) 
			more
		
			/*
			
				Notice results are the same as in DiD since DiD is applied to a panel.
			
			*/
			
	* 3. Use the cross section database in the EXPERIMENTAL dataset to compare with social experiment
	
		use NSW-dataset, clear
		keep if randomized == 1
		
		* (a) Expand the dataset in order to obtain panel data (duplicate observations are generate a dependent variable that is time dependent)

			generate id=_n
			expand 2
			more
			bysort id: generate time=1 if _n==1
			bysort id: replace  time=0 if _n==_N
			more
			generate re=re78 if time==1
			replace  re=re75 if time==0
			more

		* (b) Run a DiD estimate for the programme controlling for the covariates.

			generate DT = treated*time
			more
	
			reg re DT treated time age educ black hisp nodeg, cluster(id) 
			more
		
		* (c) Estimating the effect using instead a FE model 

			xtset id time
			
			xtreg re DT treated time, fe cluster(id) 
			more