[R] a function more appropriate than 'sapply'?

Uwe Ligges ligges at statistik.tu-dortmund.de
Sat Jan 26 21:41:48 CET 2013



On 26.01.2013 21:23, Berend Hasselman wrote:
>
> On 26-01-2013, at 21:09, Uwe Ligges <ligges at statistik.tu-dortmund.de> wrote:
>
>>
>>
>> On 26.01.2013 20:46, Berend Hasselman wrote:
>>>
>>> On 26-01-2013, at 19:43, emorway <emorway at usgs.gov> wrote:
>>>
>>>> I'm wondering if I need to use a function other than sapply as the following
>>>> line of code runs indefinitely (or > 30 min so far) and uses up all 16Gb of
>>>> memory on my machine for what seems like a very small dataset (data attached
>>>> in a txt file  wells.txt
>>>> <http://r.789695.n4.nabble.com/file/n4656723/wells.txt>  ).  The R code is:
>>>>
>>>> wells<-read.table("c:/temp/wells.txt",col.names=c("name","plc_hldr"))
>>>> wells2<-wells[sapply(wells[,1],function(x)length(strsplit(as.character(x),
>>>> "_")[[1]])==2),]
>>>>
>>>> The 2nd line of R code above gets bogged down and takes all my RAM with it:
>>>> <http://r.789695.n4.nabble.com/file/n4656723/memory_loss.png>
>>>>
>>>> I'm simply trying to extract all of the lines of data that have a single "_"
>>>> in the first column and place them into a dataset called "wells2".  If that
>>>> were to work, I then want to extract the lines of data that have two "_" and
>>>> put them into a separate dataset, say "wells3".  Is there a better way to do
>>>> this than the one-liner above?
>>>
>>>
>>> Read your file with
>>>
>>> 	wells<-read.table("wells.txt",col.names=c("name","plc_hldr"), stringsAsFactors=FALSE)
>>>
>>> Remove all non underscores with
>>>
>>> 	w.sub <- gsub("[^_]+","",wells[,1])
>>>
>>> then select elements of w.sub with 2 underscores and a single underscore with
>>>
>>> 	u.2 <- which(w.sub=="__")
>>> 	u.1 <- which(w.sub=="_")
>>>
>>> and use u.1 and u.2 to select the appropriate rows of wells.
>>
>> With grep:
>>
>> wells1 <- wells[grep("^[^\\_]*_[^\\_]*$", wells[,1]),]
>> wells2 <- wells[grep("^[^\\_]*_[^\\_]*_[^\\_]*$", wells[,1]),]
>>
>
> Are the \\ necessary?
> I tried without the \\ and that gives identical results.

Ah, I was not sure and then I forgot to look into the docs. Let's pass 
it as an exercise to the reader.

Best,
Uwe



>
> Berend
>



More information about the R-help mailing list