[R] First time r user

Steve Lianoglou lianoglou.steve at gene.com
Sun Aug 18 20:17:53 CEST 2013


Yes, please do some reading first and give take a crack at your data first.

This will only be a fruitful endeavor for you after you get some
working knowledge of R.

Hadley is compiling a nice book online that I think is very helpful to
read through:
https://github.com/hadley/devtools/wiki/Introduction

The section on "functional looping patterns" will be immediately
useful (once you have a bit more background working with R):
http://github.com/hadley/devtools/wiki/functionals#looping-patterns

It's really a great resource and you should spend the time to read
through it. Once you read and understand the looping-patterns section,
you'll be able to handle your data like a pro and you can move on to
asking more interesting questions ;-)

If something is unclear there, though, please do raise that issue.

HTH,
-steve


On Sun, Aug 18, 2013 at 7:22 AM, Bert Gunter <gunter.berton at gene.com> wrote:
> This is ridiculous!
>
> Please read "An Introduction to R" (ships with R) or other online R
> tutorial. There are many good ones. There are also probably online
> courses. Please make an effort to learn the basics before posting
> further here.
>
> -- Bert
>
>
>
> On Sun, Aug 18, 2013 at 7:13 AM, Dylan Doyle <ddoyle.dub at gmail.com> wrote:
>> Hello all thank-you for your speedy replies ,
>>
>> Here is the first few lines from the head function
>>
>>  brewery_id            brewery_name review_time review_overall review_aroma
>> review_appearance review_profilename
>> 1      10325         Vecchio Birraio  1234817823            1.5
>>  2.0               2.5            stcules
>> 2      10325         Vecchio Birraio  1235915097            3.0
>>  2.5               3.0            stcules
>> 3      10325         Vecchio Birraio  1235916604            3.0
>>  2.5               3.0            stcules
>> 4      10325         Vecchio Birraio  1234725145            3.0
>>  3.0               3.5            stcules
>> 5       1075 Caldera Brewing Company  1293735206            4.0
>>  4.5               4.0     johnmichaelsen
>> 6       1075 Caldera Brewing Company  1325524659            3.0
>>  3.5               3.5            oline73
>>
>>        beer_style review_palate review_taste              beer_name
>> beer_abv beer_beerid
>> 1                     Hefeweizen           1.5          1.5           Sausa
>> Weizen      5.0       47986
>> 2             English Strong Ale           3.0          3.0
>> Red Moon      6.2       48213
>> 3         Foreign / Export Stout           3.0          3.0 Black Horse
>> Black Beer      6.5       48215
>> 4                German Pilsener           2.5          3.0
>> Sausa Pils      5.0       47969
>> 5 American Double / Imperial IPA           4.0          4.5
>>  Cauldron DIPA      7.7       64883
>> 6           Herbed / Spiced Beer           3.0          3.5    Caldera
>> Ginger Beer      4.7       52159
>>
>> '
>> I have only discovered how to import the data set , and run some basic r
>> functions on it my goal is to be able to answer questions like what are the
>> top 10 pilsner's , or the brewer with the highest abv average. Also using
>> two factors such as best beer aroma and appearance, which beer style should
>> I try. Let me know if i can give you any more information you might need to
>> help me.
>>
>> Thanks again ,
>>
>> Dylan
>>
>>>
>>
>>
>>
>> On Sun, Aug 18, 2013 at 4:16 AM, Paul Bernal <paulbernal07 at gmail.com> wrote:
>>
>>> Thank you so much Steve.
>>>
>>> The computer I'm currently working with is a 32 bit windows 7 OS. And RAM
>>> is only 4GB so I guess thats a big limitation.
>>> El 18/08/2013 03:11, "Steve Lianoglou" <lianoglou.steve at gene.com>
>>> escribió:
>>>
>>> > Hi Paul,
>>> >
>>> > On Sun, Aug 18, 2013 at 12:56 AM, Paul Bernal <paulbernal07 at gmail.com>
>>> > wrote:
>>> > > Thanks a lot for the valuable information.
>>> > >
>>> > > Now my question would necessarily be, how many columns can R handle,
>>> > > provided that I have millions of rows and, in general, whats the
>>> maximum
>>> > > amount of rows and columns that R can effortlessly handle?
>>> >
>>> > This is all determined by your RAM.
>>> >
>>> > Prior to R-3.0, R could only handle vectors of length 2^31 - 1. If you
>>> > were working with a matrix, that meant that you could only have that
>>> > many elements in the entire matrix.
>>> >
>>> > If you were working with a data.frame, you could have data.frames with
>>> > 2^31-1 rows, and I guess as many columns, since data.frames are really
>>> > a list of vectors, the entire thing doesn't have to be in one
>>> > contiguous block (and addressable that way)
>>> >
>>> > R-3.0 introduced "Long Vectors" (search for that section in the release
>>> > notes):
>>> >
>>> > https://stat.ethz.ch/pipermail/r-announce/2013/000561.html
>>> >
>>> > It almost doubles the size of a vector that R can handle (assuming you
>>> > are running 64bit). So, if you've got the RAM, you can have a
>>> > data.frame/data.table w/ billion(s) of rows, in theory.
>>> >
>>> > To figure out how much data you can handle on your machine, you need
>>> > to know the size of real/integer/whatever and the number of elements
>>> > of those you will have so you can calculate the amount of RAM you need
>>> > to load it all up.
>>> >
>>> > Lastly, I should mention there are packages that let you work with
>>> > "out of memory" data, like bigmemory, biglm, ff. Look at the HPC Task
>>> > view for more info along those lines:
>>> >
>>> > http://cran.r-project.org/web/views/HighPerformanceComputing.html
>>> >
>>> >
>>> > >
>>> > > Best regards and again thank you for the help,
>>> > >
>>> > > Paul
>>> > > El 18/08/2013 02:35, "Steve Lianoglou" <lianoglou.steve at gene.com>
>>> > escribió:
>>> > >
>>> > >> Hi Paul,
>>> > >>
>>> > >> First: please keep your replies on list (use reply-all when replying
>>> > >> to R-help lists) so that others can help but also the lists can be
>>> > >> used as a resource for others.
>>> > >>
>>> > >> Now:
>>> > >>
>>> > >> On Aug 18, 2013, at 12:20 AM, Paul Bernal <paulbernal07 at gmail.com>
>>> > wrote:
>>> > >>
>>> > >> > Can R really handle millions of rows of data?
>>> > >>
>>> > >> Yup.
>>> > >>
>>> > >> > I thought it was not possible.
>>> > >>
>>> > >> Surprise :-)
>>> > >>
>>> > >> As I type, I'm working with a ~5.5 million row data.table pretty
>>> > >> effortlessly.
>>> > >>
>>> > >> Columns matter too, of course -- RAM is RAM, after all and you've got
>>> > >> to be able to fit the whole thing into it if you want to use
>>> > >> data.table. Once loaded, though, data.table enables one to do
>>> > >> split/apply/combine calculations over these data quite efficiently.
>>> > >> The first time I used it, I was honestly blown away.
>>> > >>
>>> > >> If you find yourself wanting to work with such data, you could do
>>> > >> worse than read through data.table's vignette and FAQ and give it a
>>> > >> spin.
>>> > >>
>>> > >> HTH,
>>> > >>
>>> > >> -steve
>>> > >>
>>> > >> --
>>> > >> Steve Lianoglou
>>> > >> Computational Biologist
>>> > >> Bioinformatics and Computational Biology
>>> > >> Genentech
>>> > >>
>>> > >
>>> > >         [[alternative HTML version deleted]]
>>> > >
>>> > >
>>> > > ______________________________________________
>>> > > R-help at r-project.org mailing list
>>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > > PLEASE do read the posting guide
>>> > http://www.R-project.org/posting-guide.html
>>> > > and provide commented, minimal, self-contained, reproducible code.
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Steve Lianoglou
>>> > Computational Biologist
>>> > Bioinformatics and Computational Biology
>>> > Genentech
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>         [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the R-help mailing list