[BioC] (CRLMM) an issue needs to be clarified

Thu Mar 5 00:46:38 CET 2009

Dear Benilton,

Thanks for your email.
First, I would like to thank you for designing/implementing the great CRLMM algorithm/package.
It provides us a better results of genotyping and benefits further downstream analyses.

Yes, I did find the source of the problem after I read the source codes of CRLMM() and getCrlmmSummries() as well ran the debugging mode of R. not an easy task though...... :)

The problem comes up when an individual celfile name is not single word, i.e. the file name contains white spaces. 

getCrlmmSummries() calls readSummaries() to gather summary stats for alleleA and alleleB. In readSummaries(),
the column names "Colnames" are gathered by func. read.table() from the CRLMM resulting file "crlmm-calls.txt". If there are white spaces existing in any celfile's name, read.table() would in default chop the file name by white space (sep="") which generates many redundant/incorrect columns. Therefore the length of tmp[[2]] is shorter than the length of the output of read.table(), and R returns this error to the screen.

I believe a possible way to deal with this is simply replacing read.table() with read.delim()since the default separator for read.delim() is "\t", which is not often to see in file names. Or making an note on the CRLMM vignette is another easy way to address this issue.

Thanks,
Ping-Hsun Hsieh

-----Original Message-----
From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu] 
Sent: Wednesday, March 04, 2009 8:00 AM
To: Ping-Hsun Hsieh
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] (CRLMM) an issue needs to be clarified

Dear PinnHsun,

Unfortunately, I cannot reproduce the problem you report.

I do need to upgrade to 2.8.1, but I hardly believe this is the source  
of the problem.

I genotyped 9 SNP 6.0 samples and ran getCrlmmSummaries(), and this is  
what I've got (below).

Did you have any success in the meantime?

benilton

--

 > y = getCrlmmSummaries("test")
 > sessionInfo()
R version 2.8.0 (2008-10-20)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE 
= 
en_US 
.UTF 
-8 
;LC_NUMERIC 
= 
C 
;LC_TIME 
= 
en_US 
.UTF 
-8 
;LC_COLLATE 
= 
en_US 
.UTF 
-8 
;LC_MONETARY 
= 
C 
;LC_MESSAGES 
= 
en_US 
.UTF 
-8 
;LC_PAPER 
= 
en_US 
.UTF 
-8 
;LC_NAME 
= 
C 
;LC_ADDRESS 
=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] splines   tools     stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] pd.genomewidesnp.6_0.4.2 oligo_1.6.0              oligoClasses_1.5.1
[4] affxparser_1.14.1        AnnotationDbi_1.4.2       
preprocessCore_1.4.0
[7] RSQLite_0.7-1            DBI_0.2-4                Biobase_2.2.1
 > y
SnpCnvCallSetPlus (storageMode: lockedEnvironment)
assayData: 906600 features, 9 samples
   element names: calls, callsConfidence, thetaA, thetaB
phenoData
   sampleNames: NA06985_GW6_C.CEL, NA06991_GW6_C.CEL, ...,  
NA07034_GW6_C.CEL  (9 total)
   varLabels and varMetadata description: none
featureData
   featureNames: SNP_A-4270094, SNP_A-8282305, ..., SNP_A-8433021   
(906600 total)
   fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: pd.genomewidesnp.6
 >

b

On Mar 2, 2009, at 3:33 PM, Ping-Hsun Hsieh wrote:

> Dear all,
>
> I got the following error message when I was trying to use the  
> function “getCrlmmSummaries()” to retrieve results generated by  
> running CRLMM genotyping algorithm successfully over 9 Affy SNP 6.0  
> chips.
>
> ####################
>> outObj_crlmm<-getCrlmmSummaries(outDir)
> Error in dimnames(x) <- dn :
>  length of 'dimnames' [2] not equal to array extent
>
> Enter a frame number, or 0 to exit
>
> 1: getCrlmmSummaries(outDir)
> 2: new("SnpCnvCallSetPlus", calls = readSummaries("calls", tmpdir),  
> callsConfi
> 3: initialize(value, ...)
> 4: initialize(value, ...)
> 5: .local(.Object, ...)
> 6: assayDataNew("lockedEnvironment", calls = calls, callsConfidence  
> = callsCon
> 7: readSummaries("alleleA", tmpdir)
> 8: `colnames<-`(`*tmp*`, value = c("MDSNP",  
> "02_00004758_CN_080925.CEL", "MDSN
> #####################
>
> My system:
> Linux x86_64 with 16 GB memory.
>
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE 
> = 
> en_US 
> .UTF 
> -8 
> ;LC_NUMERIC 
> = 
> C 
> ;LC_TIME 
> = 
> en_US 
> .UTF 
> -8 
> ;LC_COLLATE 
> = 
> en_US 
> .UTF 
> -8 
> ;LC_MONETARY 
> = 
> C 
> ;LC_MESSAGES 
> = 
> en_US 
> .UTF 
> -8 
> ;LC_PAPER 
> = 
> en_US 
> .UTF 
> -8 
> ;LC_NAME 
> = 
> C 
> ;LC_ADDRESS 
> =C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils      
> datasets
> [8] methods   base
>
> other attached packages:
> [1] oligo_1.6.0          oligoClasses_1.4.0   affxparser_1.14.2
> [4] AnnotationDbi_1.4.3  preprocessCore_1.4.0 RSQLite_0.7-1
> [7] DBI_0.2-4            Biobase_2.2.2
>
>
> Any comments are welcome.
> Thanks in advance!
>
> PingHsun Hsieh
>
>        [[alternative HTML version deleted]]
>
> <ATT00001.txt>