[R] Vector allocation problem while trying to plot 6 MB data file

Uwe Ligges ligges at statistik.tu-dortmund.de
Wed May 8 14:25:56 CEST 2013



On 08.05.2013 13:38, Ramon Hofer wrote:
> Thanks for your answer Uwe
>
>
> On Fri, 03 May 2013 23:36:24 +0200
> Uwe Ligges <ligges at statistik.tu-dortmund.de> wrote:
>
>> On 02.05.2013 14:37, Ramon Hofer wrote:
>>>
>>> I'm trying to analyse the network speed and used iperf to create a
>>> csv file containing the link test data. It's only about 6 MB big but
>>> contains about 40'000 samples.
>>>
>>> I can do boxplots (apart from printing the number of samples but I
>>> ask separately for that).
>>>
>>> To find the behaviour over time I wanted to plot the throuphput. So
>>> I have this command:
>>>
>>> plot(A$Timestamp, A$Bandwidth.bit.sec., xlab = "Timestamp", ylab =
>>> "Bandwidth [bit/s]", ylim = quantile(A$Bandwidth.bit.sec.,
>>> c(0, .99), na.rm = TRUE))
>>>
>>> Unfortunately I get this:
>>> Error: cannot allocate vector of size 12.5 Gb
>>
>> 40000 samples and 6MB can't be the issue unless this is not a regular
>> plot but the classes of A$Timestamp or A$Bandwidth.bit.sec are rather
>> special.
>
> This should be a standard plot. I can't tell exactly as I'm new to R...
>
>
>> What do
>> str(A$Timestamp)
>
> Factor w/ 40886 levels "2013-04-29_10:31:47.189194629",..: 1 2 3 4 5 6
> 7 8 9 10 ...



That's the problem: you have a factor, hence producing 40886 parallel 
boxplots with a median line only in each one:

plot(factor(1:2), 1:2)

You have to convert that to a time object, e.g. to POSIXlt via:

A$Timestamp <- strptime(as.character(A$Timestamp), "%Y-%m-%d_%H:%M:%S")

Then it should work rather quickly.

Best,
Uwe Ligges




>
>> str(A$Bandwidth.bit.sec.)
>
> int [1:40886] NA 79106 86086 49918 96353 97268 98027 99369 98049
> 97280 ...
>
>
>> Can you make a reprducible examples available?
>
> You can find the data here:
> http://people.ee.ethz.ch/~hoferr/download/data.csv
>
> The script I have so far:
> http://people.ee.ethz.ch/~hoferr/download/draw_graph.R
>
> The same error occurs if I start a new R session and enter these two
> commands:
>
>   A <- read.csv('data.csv')
>   plot(A$Timestamp, A$Bandwidth.bit.sec., xlab = "Timestamp", ylab =
>   "Bandwidth [bit/s]", ylim = quantile(A$Bandwidth.bit.sec., c(0, .99),
>   na.rm = TRUE))
>
>
> Best regards
> Ramon
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list