I have two heavily skewed samples and am trying to use bootstrapping to compare their means using tstatistic.
What is the correct procedure to do it?
The process I am using
I am concerned about the appropriateness of using the standard error of the original/observed data in the final step when I know that this is not normally distributed.
Here are my steps:
 Bootstrap – randomly sample with replacement (N=1000)
 Calculate tstatistic for each bootstrap to create a tdistribution:
$$
T(b) = \frac{(\overline{X}_{b1}\overline{X}_{b2})(\overline{X}_1\overline{X}_2) }{\sqrt{ \sigma^2_{xb1}/n + \sigma^2_{xb2}/n }}
$$ Estimate t confidence intervals by getting $\alpha/2$ and $1\alpha/2$ percentiles of tdistribution
Get confidence intervals via:
$$
CI_L = (\overline{X}_1\overline{X}_2) – T\_{CI_L}.SE_{original}
$$
$$
CI_U = (\overline{X}_1\overline{X}_2) + T\_{CI_U}.SE_{original}
$$
where
$$
SE = \sqrt{ \sigma^2_{X1}/n + \sigma^2_{X2}/n }
$$ Look where the confidence intervals fall to determine if there is a significant difference in means (i.e. nonzero)
I have also looked at the Wilcoxon ranksum but it is not giving very reasonable results due to the very heavily skewed distribution (e.g. the 75th == 95th percentile). For this reason I would like to explore the bootstrapped ttest further.
So my questions are:
 Is this an appropriate methodology?
 Is it appropriate to use the SE of observed data when I know it is heavily skewed?
Possible duplicate: What method is preferred, a bootstrapping test or a nonparametric rankbased test?
Answer
I would just do a regular bootstrap test:
 compute the tstatistic in your data and store it
 change the data such that the nullhypothesis is true. In this case, subtract the mean in group 1 for group 1 and add the overall mean, and do the same for group 2, that way the means in both group will be the overall mean.
 Take bootstrap samples from this dataset, probably in the order of 20,000.
 compute the tstatistic in each of these bootstrap samples. The distribution of these tstatistics is the bootstrap estimate of the sampling distribution of the tstatistic in your skewed data if the nullhypothesis is true.
 The proportion of bootstrap tstatistics that is larger than or equal to your observed tstatistic is your estimate of the $p$value. You can do a bit better by looking at $($the number of bootstrap tstatistics that are larger than or equal to the observed tstatistic $+1)$ divided by $($the number of bootstrap samples $+1)$. However, the difference is going to be small when the number of bootstrap samples is large.
You can read more on that in:

Chapter 4 of A.C. Davison and D.V. Hinkley (1997) Bootstrap Methods and their Application. Cambridge: Cambridge University Press.

Chapter 16 of Bradley Efron and Robert J. Tibshirani (1993) An Introduction to the Bootstrap. Boca Raton: Chapman & Hall/CRC.
Attribution
Source : Link , Question Author : CatsLoveJazz , Answer Author : dfrankow