# [R] Pairing and

Dalthorp, Daniel ddalthorp at usgs.gov
Fri Feb 12 00:58:34 CET 2016

```Hi Val,
There are probably more elegant ways to do it, but the following is fairly
transparent:

# input data arranged as an array:
indat<-cbind(c(1,2,2,1),c(1,2,1,1),c(2,2,2,2),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(1,2,1,1),c(1,2,1,2))
indat

outdat<-array(dim=c(dim(indat)[1],dim(indat)[2]/2)) # output data has same
number of rows and half as many columns
for (i in 1:dim(outdat)[2]){
outdat[,i]<-apply(indat[,(i-1)*2+1:2],F=sum,M=1) # each column of output
= sum(two columns of input)
}
outdat[outdat==2]<-0 # allele pairs that sum to 2 are genotype 0
outdat[outdat==4]<-1 # allele pairs that sum to 4 are genotype 1
# allele pairs that sum to 3 are genotype 3, so no need to change anything
with them
outdat

# faster but a little more difficult to see what is going on:
outdat<-indat %*%
array(c(rep(c(rep(1,2),rep(0,dim(indat)[2])),dim(indat)[2]/2),1,1),dim=c(dim(indat)[2],dim(indat)[2]/2))
outdat[outdat==2]<-0
outdat[outdat==4]<-1
outdat

-Dan

On Thu, Feb 11, 2016 at 2:52 PM, Val <valkremk at gmail.com> wrote:

> Hi  all,
>
> I have SNP data set: the first column is the ID and the the
> subsequent pair of columns are the alleles for each
> SNP1, SNP2 and So on. Each SNP has two columns.  Based on the alleles
> I want make phenotype
>
> if the alleles  are 1 1        then  genotype is 0
>                   2 2        then  genotype is 1
> and if it is          1 2 or 2 1 then  genotyep is 3
>
> This is a sample data set but the actual has 13,000 SNP(26,000columns)
>
>
> Geno data
> AB95 1 1 2 2 2 2 2 2 1 1
> AB82 2 2 2 2 2 2 2 2 2 2
> AB95 2 1 2 2 2 2 2 2 1 1
> AB59 1 1 2 2 1 2 1 2 1 2
> AB32 2 1 2 2 2 2 2 2 1 2
> AB46 2 1 2 2 1 2 1 1 2 2
> AB61 1 1 2 2 1 2 1 2 1 2
> AB32 2 2 1 2 2 2 2 2 1 2
> AB35 2 2 1 2 2 2 2 2 2 2
> AB43 2 2 1 2 2 2 2 2 2 2
>
> Desired output
> AB95  0   1   1   1   0
> AB82  1   1   1   1   1
> AB95  3   1   1   1   0
> AB59  0   1   3   3   3
> AB32  3   1   1   1   3
> AB46  3   1   3   0   1
> AB61  0   1   3   3   3
> AB32  1   3   1   1   3
> AB35  1   3   1   1   1
> AB43  1   3   1   1   1
>
> I would appreciate if you help me out here.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Dan Dalthorp, PhD
USGS Forest and Rangeland Ecosystem Science Center
Forest Sciences Lab, Rm 189
3200 SW Jefferson Way
Corvallis, OR 97331
ph: 541-750-0953
ddalthorp at usgs.gov

[[alternative HTML version deleted]]

```