* Set WD *

cd "C:\Users\dhughe10\Dropbox\AUM\courses\fall_2021\puad_7130\slides"

* Run the regression *

reg realrinc female educ age

* Get the residuals *

predict res, res

* Get a histogram * 

hist res

* Quantile plot *

qnorm res

* Shapiro-Wilk Test *

swilk res

kdensity educ, student(1)
kdensity age, student(1)

* Transform response variable *
drop rev_inc res2
gen rev_inc=(realrinc)^(1/4)

* New regression *

reg rev_inc female educ age

* Predict errors again * 

predict res2, res

* Assess the new errors *

qnorm res2

* Assess heteroskedasticty *

reg realrinc female educ age

predict yhat, xb
predict res3, res

twoway scatter res3 yhat || qfit res2 yhat
twoway scatter res3 age || qfit res2 age
twoway scatter res3 educ || qfit res2 educ

estat hettest

reg rev_inc female educ age

twoway scatter res2 yhat || qfit res2 yhat
twoway scatter res2 age || qfit res2 age
twoway scatter res2 educ || qfit res2 educ

estat hettest

quietly reg realrinc female educ age
estat hettest
quietly reg rev_inc female educ age
estat hettest

* Robust standard Errors * 

reg rev_inc female educ age
reg rev_inc female educ age, robust

reg y x1 x2 x3 ... xk, cluster(group)

* Finding outliers *

sample 10
gen id=_n
reg realrinc female educ age

* findit extremes *

extremes realrinc educ age

twoway scatter realrinc educ, mlabel(id)
twoway scatter realrinc age, mlabel(id)
lvr2plot, mlabel(id)

* Cook's d * 

predict cooksd, cooksd
extremes cooksd 

* Calculate VIF * 

quietly reg realrinc female educ age
vif

* Model specification *

quietly reg rev_inc female educ age
estat ovtest

* mean imputation *

reg realrinc female educ age
sum realrinc
gen realrinc2=realrinc
replace realrinc2=24994.19 if realrinc2==.
sum realrinc realrinc2
pwcorr realrinc realrinc2 age educ, sig
reg realrinc2 female educ age

twoway scatter realrinc educ
twoway scatter realrinc2 educ

* regressive imputation *

reg realrinc female educ age
predict yhat, xb
gen realrinc3=realrinc
replace realrinc3=yhat if realrinc==.
sum realrinc realrinc2 realrinc3

twoway scatter realrinc3 educ

reg realrinc female educ age
reg realrinc3 female educ age

* Stochastic regressive imputation *

gen realrinc4=realrinc
replace realrinc4=yhat + rnormal(0, 28881.51)
sum realrinc realrinc2 realrinc3 realrinc4

twoway scatter realrinc4 educ

reg realrinc female educ age
reg realrinc3 female educ age
reg realrinc4 female educ age

* Multiple imputation *

findit mdesc
mdesc realrinc female educ age

* Declare data to be multiple-imputation *

mi set mlong 

* Identify which variables in the imputation model are going to be used for estimates *

mi register imputed realrinc female educ age

* Specify the imputation model to be used and the number of imputations *

mi impute mvn realrinc female educ age=, add(10)

* Run the model with the imputed values of interest * 

mi estimate: reg realrinc female educ age







