[R-sig-hpc] Trouble with subset.ff
christian.kamenik at astra.admin.ch
christian.kamenik at astra.admin.ch
Wed May 7 16:58:57 CEST 2014
Jens,
I tested and implemented your code - works like a charm!
Many thanks
Christian
-----Ursprüngliche Nachricht-----
Von: "Dr. Jens Oehlschlägel" [mailto:joehl at web.de]
Gesendet: Donnerstag, 1. Mai 2014 00:03
An: Kamenik Christian ASTRA
Cc: r-sig-hpc at r-project.org
Betreff: Re: [R-sig-hpc] Trouble with subset.ff
Christian,
1. package ff has no subset method for ff or ffdf. ffbase has, but that is creating a physical copy of the subsetted object on disk, which is expensive, is that what you want?
2. indexing with logical ff is neither implemented nor said to be implemented, only indexing with integer ff
3. the recommended way of doing logical indexing of ff or ffdf is using bit vectors. For converting your logical ff to bit you can use
as.bit(logical_ff[]) or a chunked version thereof, consider
l <- sample(c(FALSE,TRUE), 100, TRUE)
f <- as.ff(l)
b <- bit(length(f))
for (i in chunk(f)){
b[i] <- f[i]
}
identical(as.logical(b), l)
f[b]
Jens
Am 23.04.2014 14:54, schrieb christian.kamenik at astra.admin.ch:
> Dear all
>
> I am having trouble with subsetting an ffdf object, hopefully somebody can help...
>
> I have an index, which is a ff object of vmode "logical":
>
>> index.SAS
> ff (open) logical length=4977231 (4977231)
> [1] [2] [3] [4] [5] [6] [7] [8] [4977224] [4977225] [4977226] [4977227] [4977228] [4977229]
> TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE : TRUE TRUE TRUE TRUE TRUE TRUE
> [4977230] [4977231]
> TRUE TRUE
>
> I would like to use this index to subset the ffdf object "data.SAS".
> The number of rows in data.SAS equals the length of index.SAS.
> However, the command
>
>
>> Missing.data <- subset(data.SAS, !index.SAS)
>
>
> gives me the following error:
>
> Error in ffdf(x = x) : ffdf components must be atomic ff objects
>
> A similar command also results in an error:
>
>> Missing.data <- data.SAS[!index.SAS,]
>
> Error: vmode(index) == "integer" is not TRUE
>
> I do not want to use "index.SAS[]" (which works in many cases, but sometimes crashes), because - as far as I understand - this will cause trouble with really large index vectors (I would prefer using ff objects).
>
> So I came up with the following syntax, which seems to work:
>
>> Missing.data <- data.SAS[ffwhich(index.SAS,index.SAS==FALSE),]
>
> ...I am just not sure if this is the right approach.
>
> I am running
>
>
> platform i386-w64-mingw32
>
> arch i386
>
> os mingw32
>
> system i386, mingw32
>
> status
>
> major 3
>
> minor 0.3
>
> year 2014
>
> month 03
>
> day 06
>
> svn rev 65126
>
> language R
>
> version.string R version 3.0.3 (2014-03-06)
>
> nickname Warm Puppy
>
>
> with ffbase_0.11.3 and ff_2.2-12
>
>
> Many thanks in advance
> Christian
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
More information about the R-sig-hpc
mailing list