[BioC] problem with impute.knn in the impute package

He, Yiwen (NIH/CIT) heyiwen at mail.nih.gov
Mon May 2 18:42:46 CEST 2005


Thank you Marcus. I'm glad to know that I'm not the only one using that
library. However, I tested:

> exists(".Random.seed")
[1] FALSE

So the .Random.seed was never there. 

To think about it, since I'm using all the default setting when calling
impute.knn(myData), the default seed is set to be 362436069 and .Random.seed
is not even involved.

Any other suggestions?

Thanks, Yiwen


-----Original Message-----
From: marcus [mailto:marcusb at biotech.kth.se] 
Sent: Monday, May 02, 2005 2:39 AM
To: He, Yiwen (NIH/CIT); bioconductor at stat.math.ethz.ch
Subject: RE: [BioC] problem with impute.knn in the impute package


Hello.

You have to remove the random seed using:

if(exists(".Random.seed")) rm(.Random.seed)

before you run the impute.knn function if you are using a Windows machine.

Regards

Marcus





Marcus Gry Björklund
 
Royal Institute of Technology
AlbaNova University Center 
Department of Molecular Biotechnology
106 91 Stockholm, Sweden

Phone (office): +46 8 553 783 44
Fax: + 46 8 553 784 81
Visiting address: Roslagstullsbacken 21, Floor 3
Delivery address: Roslagsvägen 30B                          
Web: http://www.biotech.kth.se/molbio/microarray/index.html
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of He, Yiwen
(NIH/CIT)
Sent: Friday, April 29, 2005 19:22
To: 'bioconductor at stat.math.ethz.ch'
Cc: Powell, John (NIH/CIT); Asaki, Esther (NIH/CIT)
Subject: [BioC] problem with impute.knn in the impute package

Hi,

I have R version 2.0.1 and bioconductor 1.5 on both PC and Unix. I was
trying to use the impute.knn function of the impute package on a dataset of
7332 genes and 3 arrays:

> library(impute)
> dim(dd) 
[1] 7332    3
> is.matrix(dd)
[1] TRUE
> dd.imputed <- impute.knn(dd)

When run on PC (windows XP), the R program crashes after a few seconds. When
run on a unix box, I can see such output:
Cluster size 7332 broken into 5667 1665
Cluster size 5667 broken into 4141 1526
Cluster size 4141 broken into 1796 2345
Cluster size 1796 broken into 840 956
Done cluster 840
Done cluster 956
Done cluster 1796

And R session was closed. So the clustering was started but aborted
somewhere in the middle.

I searched the archive and found another report of such problem, for a
dataset of 30000 x 2, but with no answers.

I have some interesting findings playing around with the parameters and data
size:

1). 
> impute.knn(dd, k=3) works, but for k bigger than 3, R crashes as
described.

2).
> dd2 <- cbind(dd,dd)
> dim(dd2)
[1] 7332    6
> impute.knn(dd2, k=8) works, but for k bigger than 8, R crashes.

3).
> dd3 <- cbind(dd, dd, dd)
> dim(dd3)
[1] 7332    9
> impute.knn(dd3) works. (k defaults to 10)
> impute.knn(dd3, k=17) R crashes.

I also played around with other parameters but they didn't help.

My conclusion is that the number of neighbors (k) is critical here. However,
it's not straightforward how to set it based on data size.

Can anybody help, or at least point me to the maintainer of the impute
package?

Thanks, Yiwen

Yiwen He
Contractor
Center for Information Technology
National Institute of Health

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list