[R] Pairing and
Val
valkremk at gmail.com
Fri Feb 12 01:05:39 CET 2016
Thank you very much Dan!
I want go with the second one, because the data very huge (>25,000
columns) and > 3,000 row.
The data is loaded as "testdat"
Can you help me to fit in the following code please,
# faster but a little more difficult to see what is going on:
outdat<-indat %*%
array(c(rep(c(rep(1,2),rep(0,dim(indat)[2])),dim(indat)[2]/2),1,1),dim=c(dim(indat)[2],dim(indat)[2]/2))
outdat[outdat==2]<-0
outdat[outdat==4]<-1
outdat
Thank you!
On Thu, Feb 11, 2016 at 5:58 PM, Dalthorp, Daniel <ddalthorp at usgs.gov>
wrote:
> Hi Val,
> There are probably more elegant ways to do it, but the following is fairly
> transparent:
>
> # input data arranged as an array:
>
> indat<-cbind(c(1,2,2,1),c(1,2,1,1),c(2,2,2,2),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(1,2,1,1),c(1,2,1,2))
> indat
>
> outdat<-array(dim=c(dim(indat)[1],dim(indat)[2]/2)) # output data has same
> number of rows and half as many columns
> for (i in 1:dim(outdat)[2]){
> outdat[,i]<-apply(indat[,(i-1)*2+1:2],F=sum,M=1) # each column of output
> = sum(two columns of input)
> }
> outdat[outdat==2]<-0 # allele pairs that sum to 2 are genotype 0
> outdat[outdat==4]<-1 # allele pairs that sum to 4 are genotype 1
> # allele pairs that sum to 3 are genotype 3, so no need to change anything
> with them
> outdat
>
> # faster but a little more difficult to see what is going on:
> outdat<-indat %*%
> array(c(rep(c(rep(1,2),rep(0,dim(indat)[2])),dim(indat)[2]/2),1,1),dim=c(dim(indat)[2],dim(indat)[2]/2))
> outdat[outdat==2]<-0
> outdat[outdat==4]<-1
> outdat
>
> -Dan
>
>
>
> On Thu, Feb 11, 2016 at 2:52 PM, Val <valkremk at gmail.com> wrote:
>
>> Hi all,
>>
>> I have SNP data set: the first column is the ID and the the
>> subsequent pair of columns are the alleles for each
>> SNP1, SNP2 and So on. Each SNP has two columns. Based on the alleles
>> I want make phenotype
>>
>> if the alleles are 1 1 then genotype is 0
>> 2 2 then genotype is 1
>> and if it is 1 2 or 2 1 then genotyep is 3
>>
>> This is a sample data set but the actual has 13,000 SNP(26,000columns)
>>
>>
>> Geno data
>> AB95 1 1 2 2 2 2 2 2 1 1
>> AB82 2 2 2 2 2 2 2 2 2 2
>> AB95 2 1 2 2 2 2 2 2 1 1
>> AB59 1 1 2 2 1 2 1 2 1 2
>> AB32 2 1 2 2 2 2 2 2 1 2
>> AB46 2 1 2 2 1 2 1 1 2 2
>> AB61 1 1 2 2 1 2 1 2 1 2
>> AB32 2 2 1 2 2 2 2 2 1 2
>> AB35 2 2 1 2 2 2 2 2 2 2
>> AB43 2 2 1 2 2 2 2 2 2 2
>>
>> Desired output
>> AB95 0 1 1 1 0
>> AB82 1 1 1 1 1
>> AB95 3 1 1 1 0
>> AB59 0 1 3 3 3
>> AB32 3 1 1 1 3
>> AB46 3 1 3 0 1
>> AB61 0 1 3 3 3
>> AB32 1 3 1 1 3
>> AB35 1 3 1 1 1
>> AB43 1 3 1 1 1
>>
>> I would appreciate if you help me out here.
>> Thank you in advance
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Dan Dalthorp, PhD
> USGS Forest and Rangeland Ecosystem Science Center
> Forest Sciences Lab, Rm 189
> 3200 SW Jefferson Way
> Corvallis, OR 97331
> ph: 541-750-0953
> ddalthorp at usgs.gov
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list