Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Heckman Sample Selection

  • 12-07-2010 6:27pm
    #1
    Closed Accounts Posts: 784 ✭✭✭


    Hi,

    I'm running a Heckman sample selection model however the selection equation does not seem to be correlated with the second stage with any statistical significance (i.e. LR test of rho = 0 is not statistically significant :confused:) when it really should be. I'm using the same variables in both selection and the second second stage except for including one extra variable in the selection stage which should not make a lot of difference (its pretty standard in the literature). My selection stage uses a binary probit, dummy = 0 or 1 with logged variables and my second stage uses a linear log specification. This does not make any sense to me and other literature seems to show my variables are pretty standard for these types of Heckman models. Is there any reason why this should happen? I'm thinking my specification is wrong but the models in the literature use the same specification as I do. I know this is probably a bit general but any help/suggestions on what might be going wrong would be greatly appreciated.


Comments

  • Registered Users, Registered Users 2 Posts: 8,452 ✭✭✭Time Magazine


    First things first: it's probably a programming error. Can you post your relevant Stata commands?


  • Closed Accounts Posts: 784 ✭✭✭Anonymous1987


    First things first: it's probably a programming error. Can you post your relevant Stata commands?

    Sure
    heckman ln_fdiout_oecd ln_spop ln_scgdp ln_hpop ln_hcgdp com_lng ln_km edu ln_exptime ln_taxtime ln_starttime, select(fdi_oecd=ln_spop ln_scgdp ln_hpop ln_hcgdp com_lng ln_km edu ln_exptime ln_taxtime ln_starttime)
    All the variables are logged except for com_lng which is a dummy variable (0 or 1) and fdi_nonoecd which is also a dummy variable (0 or 1).
    Also I exclude the command twostep because I want the second stage performed through maximum likelihood. Other things to note is that I run two samples; one for oecd countries, the other for non oecd countries, when I run a heckman for the oecd countries I get a very large standard error for com_lng greater than 1 which seems really strange, a similar thing happens when I add another dummy variable (0 or 1) for robustness tests, not sure if its related though.


  • Closed Accounts Posts: 784 ✭✭✭Anonymous1987


    No ideas anyone?


  • Registered Users, Registered Users 2 Posts: 8,452 ✭✭✭Time Magazine


    No ideas anyone?
    Did you get this sorted/semi-bump?

    It's not something that obviously apparent so would take a bit of work to rectify. It's a bad time to be asking as lots of people (myself included) have been taking holidays so they're not going to spend that time helping you.

    (If I get a spare hour next week (not very likely though) I'll have a look.)


  • Closed Accounts Posts: 784 ✭✭✭Anonymous1987


    Thanks for the interest, kinda sorted it.

    A few things are kinda strange in the data. I'm basically running a standard gravity model (controls for source and host GDP and population, the difference education and common language) for FDI. I found that when I do not log the variables, rho is statistically significant but not when logged. I use a dummy probit in the first stage so I don't think its anything to do with not picking up the zero values. This shouldn't really be happening.

    Anyway, after messing around with different specifications I found that when I drop population or GDP in one of the equations (i.e. either the selection equation or level equation) then rho is statistically significant. I have no idea why this happens so I decided rather than drop the variables without stating a reason why, I would try and redefine them slightly instead.

    I decided that the difference between the source and host country is what is important so I just got the absolute difference between them for population and GDP. However rho was still slightly insignificant so I had another look at the data, spotted some outliers and decided to use robust standard errors. Now rho is statistically significant but I don't have very much faith in the results.

    I'm concerned that I may have constructed the data incorrectly or something which is causing this in the first place. The data is cross sectional but other papers have used cross sectional data with a standard gravity model of FDI without seeming to have any problems.


  • Advertisement
Advertisement