I have weights of SNP variation (output through Eigenstrat program) for each SNP for the three main PCs. I wish to reduce my list of SNPs to those that show maximum differentiation between the three PCs. Can anyone help me with which statistical method to use to do this.

say, if each PC describes the magnitude of variation, how can I find mutually exclusive rows or rows that define the strongest differenciator of each PC. e.g:

SNPNam PC1-wt PC 2-wt PC3-wt

SNP_1 -1.489 -0.029 -0.507

SNP_2 -1.446 -0.816 0.661

SNP_3 -1.416 0.338 1.631

SNP_4 -1.392 -1.452 0.062

SNP_5 -1.278 0.362 -1.006

SNP_6 -1.21 0.514 0.144

SNP_7 -1.119 -0.633 0.163

SNP_8 -1.112 -0.193 -0.256

SNP_9 -1.054 1.081 -1.519

SNP_10 -0.936 -1.052 -0.419

SNP_11 -0.861 0.381 -0.207

SNP_12 -0.662 0.852 -0.211

SNP_13 -0.503 -1.602 0.585

SNP_14 -0.417 0.529 -1.003

SNP_15 0.101 -0.85 -0.258

SNP_16 0.198 -0.435 -1.599

SNP_17 0.588 -1.292 -1.257

SNP_18 1.167 0.891 1.106

SNP_19 1.35 0.036 0.729

SNP_20 1.532 1.599 0.499

Any help regarding which test to perform and how to (which package) is very much appreciated.

It looks like you are referring to eigenanalysis for SNPs data and the article from Nick Patterson, Population Structure and Eigenanalysis (PLoS Genetics 2006), where the first component explains the largest variance on allele frequency wrt. potential stratification in the sample (due to ethnicity or, more generally, ancestry). So I wonder why you want to consider all three first components, unless they appear to be significant from their expected distribution according to TW distribution. Anyway, in R you can isolate the most informative SNPs (i.e. those that are at the extreme of the successive principal axes) with the `apply()`

function, working on row, e.g.

```
apply(snp.df, 1, function(x) any(abs(x)>threshold))
```

where `snp.df`

stands for the data you show and which is stored either as a `data.frame`

or `matrix`

under R, and `threshold`

is the value you want to consider (this can be Mean $\pm$ 6 SD, as in Price et al. *Nature Genetics* 2007 38(8): 904, or whatever value you want).
You may also implement the iterative PCA yourself.

Finally, the TW test can be implemented as follows:

```
##' Test for the largest eigenvalue in a gaussian covariance matrix
##'
##' This function computes the test statistic and associated p-value
##' for a Wishart matrix focused on individuals (Tracy-Widom distribution).
##'
##' @param C a rectangular matrix of bi-allelic markers (columns) indexed
##' on m individuals. Caution: genotype should be in {0,1,2}.
##' @return test statistic and tabulated p-value
##' @reference \cite{Johnstone:2001}
##' @seealso The RMTstat package provides other interesting functions to
##' deal with Wishart matrices.
##' @example
##' X <- replicate(100,sample(0:2,20,rep=T))
tw.test <- function(C) {
m <- nrow(C) # individuals
n <- ncol(C) # markers
# compute M
C <- scale(C, scale=F)
pj <- attr(C,"scaled:center")/2
M <- C/sqrt(pj*(1-pj))
# compute X=MM'
X <- M %*% t(M)
ev <- sort(svd(X)$d, decr=T)[1:(m-1)]
nprime <- ((m+1)*sum(ev)^2)/(((m-1)*sum(ev^2))-sum(ev)^2)
l <- (m-1)*ev[1]/sum(ev)
# normalize l and compute test statistic
num <- (sqrt(nprime-1)+sqrt(m))
mu <- num^2/nprime
sigma <- num/nprime*(1/sqrt(nprime-1)+1/sqrt(m))^(1/3)
l <- (l-mu)/sigma
# estimate associated p-value
if (require(RMTstat)) pv <- ptw(l, lower.tail=F)
else pv <- NA
return(list(stat=l, pval=pv))
}
```