[R] Two last questions: about output

Gabor Grothendieck ggrothendieck at gmail.com
Thu Oct 16 12:30:19 CEST 2008


Please read the last line of every message to r-help and provide
reproducible code.

On Thu, Oct 16, 2008 at 12:44 AM, Ted Byers <r.ted.byers at gmail.com> wrote:
> Thanks Gabor,
>
> To be clear, would something like testframe$est[[i]] <- fp$estimate be
> valid within my loop, as in (assuming I created testframe before the
> loop):
>
> for (i in 1:length(V4) ) {
>   x = read.csv(as.character(V4[[i]]), header = FALSE, na.strings="");
>   y = x[,1];
>   fp = fitdistr(y,"exponential");
>   print(c(V1[[i]],V2[[i]],V3[[i]],fp$estimate,fp$sd))
>   testframe$est[[i]] <- fp$estimate
>   testframe$sd[[i]] <- fp$sd
> }
>
> Thanks
>
> Ted
>
> On Thu, Oct 16, 2008 at 12:08 AM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> testframe$newvar <- ...whatever...
>> (or see ?transform for another way)
>> adds a new column to the data frame.   The table does not
>> have to pre-exist in your MySQL database and you don't need
>> a create statement; however, if the table does pre-exist the columns
>> of your data frame and those of the database table should have the
>> same names in the same order and use dbWriteTable(..., append = TRUE)
>>
>>
>> On Wed, Oct 15, 2008 at 11:54 PM, Ted Byers <r.ted.byers at gmail.com> wrote:
>>> Thanks Gabor,
>>>
>>> I get how to make a frame using existing vectors.  In my example, the
>>> following puts my first three columns into a frame (and displays it:
>>>
>>>> testframe <- data.frame(mid=V1,year=V2,week=V3)
>>>> testframe
>>>   mid year week
>>> 1  251 2008   18
>>> 2  251 2008   19
>>> 3  251 2008   20
>>> 4  251 2008   22
>>> 5  251 2008   23
>>> 6  251 2008   24
>>> 7  251 2008   25
>>>
>>> I show the first of about 60 rows, and I am pleased that these values
>>> appear as integers.
>>>
>>> But what I don't see is how to add the fp$estimate,fp$sd values
>>> obtained from my analyses to vectors to form the last two columns in
>>> the data frame.  Is there something like a vector type, analogous to
>>> the vector class std::vector from C++, that has a push_back function
>>> allowing a vector to grow as new values are generated?
>>>
>>> And suppose I have the following table in MySQL (ignoring for the
>>> moment keys and indeces):
>>>
>>> CREATE TABLE (
>>>  id INTEGER  UNSIGNED NOT NULL auto_increment,
>>>  mid INTEGER NOT NULL,
>>>  y  INTEGER NOT NULL,
>>>  w INTEGER NOT NULL,
>>>  rate DOUBLE NOT NULL,
>>>  sd DOUBLE NOT NULL
>>>  process_date DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
>>> ) ENGINE=InnoDB;
>>>
>>> How would I tell dbWriteTable() that my frame's five columns
>>> correspond to mid,y,w,rate and sd in that order, and that the fields
>>> id and process_date will take the appropriate default values?  Or do I
>>> need a temporary table, in memory, that has only the five columns, and
>>> use a stored procedure to move the data to its final home?
>>>
>>> Thanks again,
>>>
>>> Ted
>>>
>>>
>>> On Wed, Oct 15, 2008 at 9:57 PM, Gabor Grothendieck
>>> <ggrothendieck at gmail.com> wrote:
>>>> Put the data in an R data frame and use dbWriteTable() to
>>>> write it to your MySQL database directly.
>>>>
>>>> On Wed, Oct 15, 2008 at 9:34 PM, Ted Byers <r.ted.byers at gmail.com> wrote:
>>>>>
>>>>> Here is my little scriptlet:
>>>>>
>>>>> optdata =
>>>>> read.csv("K:\\MerchantData\\RiskModel\\AutomatedRiskModel\\soptions.dat",
>>>>> header = FALSE, na.strings="")
>>>>> attach(optdata)
>>>>> library(MASS)
>>>>> setwd("K:\\MerchantData\\RiskModel\\AutomatedRiskModel")
>>>>> for (i in 1:length(V4) ) {
>>>>>   x = read.csv(as.character(V4[[i]]), header = FALSE, na.strings="");
>>>>>   y = x[,1];
>>>>>   fp = fitdistr(y,"exponential");
>>>>>   print(c(V1[[i]],V2[[i]],V3[[i]],fp$estimate,fp$sd))
>>>>> }
>>>>>
>>>>>
>>>>> And here are the first few lines of output:
>>>>>
>>>>>                                               rate         rate
>>>>> 2.510000e+02 2.008000e+03 1.800000e+01 6.869301e-02 6.462095e-03
>>>>>                                               rate         rate
>>>>> 2.510000e+02 2.008000e+03 1.900000e+01 5.958023e-02 4.491029e-03
>>>>>                                               rate         rate
>>>>> 2.510000e+02 2.008000e+03 2.000000e+01 8.631714e-02 7.428996e-03
>>>>>                                               rate         rate
>>>>> 2.510000e+02 2.008000e+03 2.200000e+01 1.261538e-01 1.137491e-02
>>>>>                                               rate         rate
>>>>> 2.510000e+02 2.008000e+03 2.300000e+01 1.339523e-01 1.332875e-02
>>>>>                                               rate         rate
>>>>> 2.510000e+02 2.008000e+03 2.400000e+01 8.916084e-02 1.248501e-02
>>>>>
>>>>> There are only two things wrong, here.
>>>>>
>>>>> 1) the first three columns are integers, and are output variously as
>>>>> integers, floating point numbers and, as shown here, in scientific notation.
>>>>> 2) this output isn't going to a file or to my DB.  This second issue isn't
>>>>> much of a problem, as I think I know now how to deal with it.
>>>>>
>>>>> This output data is, in one sense, perfectly organized, and there is a table
>>>>> with a nearly identical structure (these five columns, plus one to hold the
>>>>> date on which the analysis is performed (and of course, therefore, it has a
>>>>> default value of the current timestamp  - handled in MySQL).  If I can get
>>>>> the data written to a CSV file, with the first three columns provided as
>>>>> integers, I can use the DB's bulk load utility to get the data into the DB,
>>>>> and this may be faster than having this scriptlet connecting directly to the
>>>>> DB to insert the data (unless the DBI has a function for a bulk load that
>>>>> helps here).
>>>>>
>>>>> Any idea how best to handle my formatting problem here?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Ted
>>>>> --
>>>>> View this message in context: http://www.nabble.com/Two-last-questions%3A-about-output-tp20005519p20005519.html
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>
>>
>



More information about the R-help mailing list