[R] Pairing and
Dalthorp, Daniel
ddalthorp at usgs.gov
Fri Feb 12 00:58:34 CET 2016
Hi Val,
There are probably more elegant ways to do it, but the following is fairly
transparent:
# input data arranged as an array:
indat<-cbind(c(1,2,2,1),c(1,2,1,1),c(2,2,2,2),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(1,2,1,1),c(1,2,1,2))
indat
outdat<-array(dim=c(dim(indat)[1],dim(indat)[2]/2)) # output data has same
number of rows and half as many columns
for (i in 1:dim(outdat)[2]){
outdat[,i]<-apply(indat[,(i-1)*2+1:2],F=sum,M=1) # each column of output
= sum(two columns of input)
}
outdat[outdat==2]<-0 # allele pairs that sum to 2 are genotype 0
outdat[outdat==4]<-1 # allele pairs that sum to 4 are genotype 1
# allele pairs that sum to 3 are genotype 3, so no need to change anything
with them
outdat
# faster but a little more difficult to see what is going on:
outdat<-indat %*%
array(c(rep(c(rep(1,2),rep(0,dim(indat)[2])),dim(indat)[2]/2),1,1),dim=c(dim(indat)[2],dim(indat)[2]/2))
outdat[outdat==2]<-0
outdat[outdat==4]<-1
outdat
-Dan
On Thu, Feb 11, 2016 at 2:52 PM, Val <valkremk at gmail.com> wrote:
> Hi all,
>
> I have SNP data set: the first column is the ID and the the
> subsequent pair of columns are the alleles for each
> SNP1, SNP2 and So on. Each SNP has two columns. Based on the alleles
> I want make phenotype
>
> if the alleles are 1 1 then genotype is 0
> 2 2 then genotype is 1
> and if it is 1 2 or 2 1 then genotyep is 3
>
> This is a sample data set but the actual has 13,000 SNP(26,000columns)
>
>
> Geno data
> AB95 1 1 2 2 2 2 2 2 1 1
> AB82 2 2 2 2 2 2 2 2 2 2
> AB95 2 1 2 2 2 2 2 2 1 1
> AB59 1 1 2 2 1 2 1 2 1 2
> AB32 2 1 2 2 2 2 2 2 1 2
> AB46 2 1 2 2 1 2 1 1 2 2
> AB61 1 1 2 2 1 2 1 2 1 2
> AB32 2 2 1 2 2 2 2 2 1 2
> AB35 2 2 1 2 2 2 2 2 2 2
> AB43 2 2 1 2 2 2 2 2 2 2
>
> Desired output
> AB95 0 1 1 1 0
> AB82 1 1 1 1 1
> AB95 3 1 1 1 0
> AB59 0 1 3 3 3
> AB32 3 1 1 1 3
> AB46 3 1 3 0 1
> AB61 0 1 3 3 3
> AB32 1 3 1 1 3
> AB35 1 3 1 1 1
> AB43 1 3 1 1 1
>
> I would appreciate if you help me out here.
> Thank you in advance
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Dan Dalthorp, PhD
USGS Forest and Rangeland Ecosystem Science Center
Forest Sciences Lab, Rm 189
3200 SW Jefferson Way
Corvallis, OR 97331
ph: 541-750-0953
ddalthorp at usgs.gov
[[alternative HTML version deleted]]
More information about the R-help
mailing list