[R] SnpMatrix super slow to access cells when large

nickDIL njc72 at cam.ac.uk
Thu Jun 6 13:32:12 CEST 2013


Dear snpStats users,

I'm working with a large SnpMatrix object (roughly 5000 samples x 200K snps)
and I've noticed using numerical accessors is extremely slow, e.g, see times
below, takes over 1.5 seconds to retrieve a single cell in SnpMatrix format
[1,1], versus 0.0 seconds to access the same datapoint in RAW format. It
also takes no longer (still 1.5s) to access an entire row or column [1,] or
[,1].

Is snpStats::SnpMatrix doing something unnecessary prior to returning the
matrix entry? [NB: 'chopsticks' seems to give the same slow result]

Is there any way around this delay other than copying the entire SnpMatrix
into RAW format? I want to access specific cell ranges many times in an
algorithm i'm writing and this would be excessively slow with access times
of 1.5s.

Code to show this below.

Many thanks,

N.

# generate raw matrix
rawd <- as.raw(sample(0:3,(10^9),replace=T)); dim(rawd) <- c(5000,200000)

# copy to a SnpMatrix object
snpd <- new("SnpMatrix",rawd)


# show class details
> class(snpd)
[1] "SnpMatrix"
attr(,"package")
[1] "snpStats"


# access times in SnpMatrix format

> system.time(snpd[1,])
   user  system elapsed 
  0.876   0.681   1.554 
> system.time(snpd[1,1])
   user  system elapsed 
  0.872   0.668   1.538 
> system.time(snpd[,1])
   user  system elapsed 
  0.896   0.644   1.540 



# access times in raw format

> system.time(rawd[1,])
   user  system elapsed 
  0.012   0.004   0.011 
> system.time(rawd[,1])
   user  system elapsed 
      0       0       0 
> system.time(rawd[1,1])
   user  system elapsed 
      0       0       0 





--
View this message in context: http://r.789695.n4.nabble.com/SnpMatrix-super-slow-to-access-cells-when-large-tp4668812.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list