[R-sig-hpc] Trouble with subset.ff

christian.kamenik at astra.admin.ch christian.kamenik at astra.admin.ch
Wed May 7 16:58:57 CEST 2014


Jens,

I tested and implemented your code - works like a charm!

Many thanks
Christian


-----Ursprüngliche Nachricht-----
Von: "Dr. Jens Oehlschlägel" [mailto:joehl at web.de] 
Gesendet: Donnerstag, 1. Mai 2014 00:03
An: Kamenik Christian ASTRA
Cc: r-sig-hpc at r-project.org
Betreff: Re: [R-sig-hpc] Trouble with subset.ff

Christian,

1. package ff has no subset method for ff or ffdf. ffbase has, but that is creating a physical copy of the subsetted  object on disk, which is expensive, is that what you want?

2. indexing with logical ff is neither implemented nor said to be implemented, only indexing with integer ff

3. the recommended way of doing logical indexing of ff or ffdf is using bit vectors. For converting your logical ff to bit you can use
as.bit(logical_ff[]) or a chunked version thereof, consider

l <- sample(c(FALSE,TRUE), 100, TRUE)
f <- as.ff(l)
b <- bit(length(f))
for (i in chunk(f)){
  b[i] <- f[i]
}
identical(as.logical(b), l)
f[b]


Jens

Am 23.04.2014 14:54, schrieb christian.kamenik at astra.admin.ch:
> Dear all
>
> I am having trouble with subsetting an ffdf object, hopefully somebody can help...
>
> I have an index, which is a ff object of vmode "logical":
>
>> index.SAS
> ff (open) logical length=4977231 (4977231)
>        [1]       [2]       [3]       [4]       [5]       [6]       [7]       [8]           [4977224] [4977225] [4977226] [4977227] [4977228] [4977229]
>       TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE         :      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE
> [4977230] [4977231]
>       TRUE      TRUE
>
> I would like to use this index to subset the ffdf object "data.SAS". 
> The number of rows in data.SAS equals the length of index.SAS. 
> However, the command
>
>
>> Missing.data <- subset(data.SAS, !index.SAS)
>
>
> gives me the following error:
>
> Error in ffdf(x = x) : ffdf components must be atomic ff objects
>
> A similar command also results in an error:
>
>> Missing.data <- data.SAS[!index.SAS,]
>
> Error: vmode(index) == "integer" is not TRUE
>
> I do not want to use "index.SAS[]" (which works in many cases, but sometimes crashes), because - as far as I understand - this will cause trouble with really large index vectors (I would prefer using ff objects).
>
> So I came up with the following syntax, which seems to work:
>
>> Missing.data <- data.SAS[ffwhich(index.SAS,index.SAS==FALSE),]
>
> ...I am just not sure if this is the right approach.
>
> I am running
>
>
> platform       i386-w64-mingw32
>
> arch           i386
>
> os             mingw32
>
> system         i386, mingw32
>
> status
>
> major          3
>
> minor          0.3
>
> year           2014
>
> month          03
>
> day            06
>
> svn rev        65126
>
> language       R
>
> version.string R version 3.0.3 (2014-03-06)
>
> nickname       Warm Puppy
>
>
> with ffbase_0.11.3 and ff_2.2-12
>
>
> Many thanks in advance
> Christian
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list