[R] Large Dataset
Edwin Sendjaja
edwin7 at web.de
Tue Jan 6 18:48:06 CET 2009
Bellow, you can see the R data.
But this stucks even in first line (read.table..).
I dont know how to calculate this and write the result into a new table.
Edwin
data <- read.table("test.data")
data <- subset(data, (data$Zusatz!="60") & (data$Zusatz!="0"))
list(EndpointKeepAliveTimeOutIntervalLimit,"", PE_ID,"", Registrar, Region,
RelTime)))
split.data <- with(data, split(Zusatz,
list(EndpointKeepAliveTimeOutIntervalLimit, PE_ID, Run)))
#Find the min, max and std dev for each element in the resulting list
mins <- sapply(split.data, min)
maxs <- sapply(split.data, max)
devs <- sapply(split.data, sd)
mean <- sapply(split.data, mean)
name.list <- strsplit(names(split.data), "\\.")
endpointkeepalivetimeoutintervallimit <- as.numeric(sapply(name.list,
function(x) x[[1]]))
pe_id <- sapply(name.list, function(x) x[[2]])
run <- sapply(name.list, function(x) x[[3]])
#Now construct a new data frame from these values
output <-
data.frame(EndpointKeepAliveTimeOutIntervalLimit=endpointkeepalivetimeoutintervallimit,
PE_ID=pe_id, Run=run, Min=mins, Max=maxs, Standardabweichung=devs, Mean=mean)
output <- subset(output, (output$Min !="Inf"))
output_sort<-sort(output$EndpointKeepAliveTimeOutIntervalLimit)
output<-output[order(output$EndpointKeepAliveTimeOutIntervalLimit,
partial=order(output$PE_ID)),]
rownames(output) <- seq(length=nrow(output))
write.table(output,file=Sys.getenv("filepdf"), quote = FALSE)
> For the mean, min, max and standard deviance (deviation I suppose) you
> don't need to store all data in the memory, you can calculate them
> incrementally. Read the file line by line (if it is a text file).
>
> G.
>
> On Tue, Jan 6, 2009 at 6:10 PM, Edwin Sendjaja <edwin7 at web.de> wrote:
> > Hi Ben,
> >
> > Using colClasses doensnt improve the performace much.
> >
> > With the data, I will calculate the mean, min, max, and standard
> > deviance.
> >
> > I have also failed to import the data in a Mysql Database. I dont have
> > much knowledge in Mysql.
> >
> > Edwin
> >
> >> Edwin Sendjaja <edwin7 <at> web.de> writes:
> >> > Hi Simon,
> >> >
> >> > My RAM is only 3.2 GB (actually it should be 4 GB, but my Motherboard
> >> > doesnt support it.
> >> >
> >> > R use almost of all my RAM and half of my swap. I think memory.limit
> >> > will not solve my problem. It seems that I need RAM.
> >> >
> >> > Unfortunately, I can't buy more RAM.
> >> >
> >> > Why R is slow reading big data set?
> >> >
> >> > Edwin
> >>
> >> Start with FAQ 7.28 ,
> >> http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-is-read_002etable_0028_
> >>002 9-so-inefficient_003f
> >>
> >> However, I think you're going to have much bigger problems
> >> if you have a 3.1G data set and a total of 3.2G of RAM: what do
> >> you expect to be able to do with this data set once you've read
> >> it in? Have you considered storing it in a database and accessing
> >> just the bits you need at any one time?
> >>
> >> Ben Bolker
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html and provide commented,
> >> minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
More information about the R-help
mailing list