[R] data storage/cubes and pointers in R
Martin Morgan
mtmorgan at fhcrc.org
Thu Nov 9 23:26:24 CET 2006
In case the other replies aren't to your liking, and you want to write
something yourself...
Piet van Remortel <piet.vanremortel at gmail.com> writes:
[snip]
> Also considering implementing a similar setup myself, I started
> wondering about the possibility of use references (or "pointers"
> aargh) to dataframes and store them in a list etc. Separate lists
My own experimentation with this is to create an S4 'View' class that
indexes / subsets / accesses small parts of the 'big' data, with the
actual data treated essentially as 'read-only' or otherwise abstracted
out of memory. Something along the lines of
setClass("ViewSet",
representation=representation(
data="environment", # environments are reference-like
idx="list" # 1 element per dimension, or something more clever
))
setMethod("initialize",
signature(.Object="ViewSet"),
function(.Object, ...) {
env <- new.env()
## get the big data: arguments to "new" / SQL query / ???
## assign big data to env (e.g., see below) then
.Object at env <- env
## set up idx
## ...
.Object
})
setMethod("[",
signature(x="ViewSet"),
function (x, i, j, ..., drop = TRUE) {
## adjust x at idx, maybe querying x at data for help
})
setReplaceMethod("[",
signature(x="ViewSet"),
function (x, i, j, ..., value)
## adjust x at idx[i, j, ...
## return x, i.e., a ViewSet -- bigData not changed / copied
})
> can then represent different 'views' on the shared instance
> dataframes etc. I have no knowledge if that is even possible in R,
> and if that is even the smart way to do it. If someone could provide
> some help, that would be great.
>
> Other option is of course to link to MySQL and do all data handling
> in that way. Also considering that.
or do both, i.e., write ViewSqlSet to 'contain' ViewSet, etc.
> Any thoughts/hints would be appreciated !
Probably you could implement the same ideas in the less intimidating
S3 way, using e.g., a list with
makeView <- function(data) {
## e.g., 'data' a named list of commonly-sized elements, in or out
## of memory -- details depend on needs
env <- new.env()
for (elt in names(data)) env[[elt]] <- data[[elt]]
## initialize index
idx <- list(rows=1:nrow(data[[1]]), cols=1:ncol(data[[1]]))
lst <- list(env=env, idx=idx)
class(lst) <- "View"
lst
}
"[.View" <- function (x, i, j, ..., drop = TRUE) {
## x will be like lst from above, use i, j, etc to subset
## adjust and then return idx, e.g.,...
x$idx$rows <- x$idx$rows[i]
x
}
getData <- function(x, ...) UseMethod("getData")
getData.View <- function(x, ...) {
## return list of subsetted elements
res <- with(x,
lapply(ls(env), function(elt) env[[elt]][idx$rows, idx$cols]))
names(res) <- ls(x$env)
res
}
and then...
> bigView <- makeView(list(df=data.frame(x=1:100, y=100:1),
+ m=matrix(1:200, ncol=2)))
> smallView <- bigView[1:5,]
> getData(smallView) ## copies, but only the 'small' data
$df
x y
5 5 96
4 4 97
3 3 98
2 2 99
1 1 100
$m
[,1] [,2]
[1,] 5 105
[2,] 4 104
[3,] 3 103
[4,] 2 102
[5,] 1 101
Obviously a hack, but perhaps it gets you going...
> thanks,
>
> Piet
>
>
>
> --
> Dr. P. van Remortel
> Intelligent Systems Lab
> Dept. of Mathematics and Computer Science
> University of Antwerp
> Belgium
> http://www.islab.ua.ac.be
> +32 3 265 33 57 (secr.)
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Martin T. Morgan
Bioconductor / Computational Biology
http://bioconductor.org
More information about the R-help
mailing list