[R] Pairing and

Fri Feb 12 01:05:39 CET 2016

Thank you very much Dan!

I want go with the  second one, because the data very huge (>25,000
columns) and > 3,000 row.
The data is loaded  as "testdat"

Can you help  me to fit in the following code please,

# faster but a little more difficult to see what is going on:
outdat<-indat %*%
array(c(rep(c(rep(1,2),rep(0,dim(indat)[2])),dim(indat)[2]/2),1,1),dim=c(dim(indat)[2],dim(indat)[2]/2))
outdat[outdat==2]<-0
outdat[outdat==4]<-1
outdat

Thank you!

On Thu, Feb 11, 2016 at 5:58 PM, Dalthorp, Daniel <ddalthorp at usgs.gov>
wrote:

> Hi Val,
> There are probably more elegant ways to do it, but the following is fairly
> transparent:
>
> # input data arranged as an array:
>
> indat<-cbind(c(1,2,2,1),c(1,2,1,1),c(2,2,2,2),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(1,2,1,1),c(1,2,1,2))
> indat
>
> outdat<-array(dim=c(dim(indat)[1],dim(indat)[2]/2)) # output data has same
> number of rows and half as many columns
> for (i in 1:dim(outdat)[2]){
>   outdat[,i]<-apply(indat[,(i-1)*2+1:2],F=sum,M=1) # each column of output
> = sum(two columns of input)
> }
> outdat[outdat==2]<-0 # allele pairs that sum to 2 are genotype 0
> outdat[outdat==4]<-1 # allele pairs that sum to 4 are genotype 1
> # allele pairs that sum to 3 are genotype 3, so no need to change anything
> with them
> outdat
>
> # faster but a little more difficult to see what is going on:
> outdat<-indat %*%
> array(c(rep(c(rep(1,2),rep(0,dim(indat)[2])),dim(indat)[2]/2),1,1),dim=c(dim(indat)[2],dim(indat)[2]/2))
> outdat[outdat==2]<-0
> outdat[outdat==4]<-1
> outdat
>
> -Dan
>
>
>
> On Thu, Feb 11, 2016 at 2:52 PM, Val <valkremk at gmail.com> wrote:
>
>> Hi  all,
>>
>> I have SNP data set: the first column is the ID and the the
>> subsequent pair of columns are the alleles for each
>> SNP1, SNP2 and So on. Each SNP has two columns.  Based on the alleles
>> I want make phenotype
>>
>> if the alleles  are 1 1        then  genotype is 0
>>                   2 2        then  genotype is 1
>> and if it is          1 2 or 2 1 then  genotyep is 3
>>
>> This is a sample data set but the actual has 13,000 SNP(26,000columns)
>>
>>
>> Geno data
>> AB95 1 1 2 2 2 2 2 2 1 1
>> AB82 2 2 2 2 2 2 2 2 2 2
>> AB95 2 1 2 2 2 2 2 2 1 1
>> AB59 1 1 2 2 1 2 1 2 1 2
>> AB32 2 1 2 2 2 2 2 2 1 2
>> AB46 2 1 2 2 1 2 1 1 2 2
>> AB61 1 1 2 2 1 2 1 2 1 2
>> AB32 2 2 1 2 2 2 2 2 1 2
>> AB35 2 2 1 2 2 2 2 2 2 2
>> AB43 2 2 1 2 2 2 2 2 2 2
>>
>> Desired output
>> AB95  0   1   1   1   0
>> AB82  1   1   1   1   1
>> AB95  3   1   1   1   0
>> AB59  0   1   3   3   3
>> AB32  3   1   1   1   3
>> AB46  3   1   3   0   1
>> AB61  0   1   3   3   3
>> AB32  1   3   1   1   3
>> AB35  1   3   1   1   1
>> AB43  1   3   1   1   1
>>
>> I would appreciate if you help me out here.
>> Thank you in advance
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Dan Dalthorp, PhD
> USGS Forest and Rangeland Ecosystem Science Center
> Forest Sciences Lab, Rm 189
> 3200 SW Jefferson Way
> Corvallis, OR 97331
> ph: 541-750-0953
> ddalthorp at usgs.gov
>
>

	[[alternative HTML version deleted]]