[R-sig-DB] Data manipulation: transforming SQL query in an appropriate dataframe

M@rk_Otto m@iii@g oii iws@gov M@rk_Otto m@iii@g oii iws@gov
Fri Sep 5 18:26:06 CEST 2008


Here are three ways of handling the problem of taking values of a variable 
and making them columns of another table.  Let the original data be 
OldTable.  Note I had to clean up your example a little bit.

fh<-textConnection("Id            Fund                  Return  Period
1             Fund01                0,5                         may/08
2             Fund01                0,4                         june/08
3             Fund02               -0,3                        may/08
4             Fund02                -0,4                        june/08
")

OldTable<-read.table(fh,header=T,dec=",",sep="",row.names="Id")


1. Use reshape in R.  This is used with longitudinal data, so you have to 
look closely at the manual page to get what you need out of it.

 
NewTable<-reshape(OldTable,idvar="Period",v.names='Return',timevar='Fund',direction="wide")

2. I you can to the data rearranging in your database.  SQL is not great 
at converting values of variables to columns.  It is great at doing lots 
of manipulations on large amounts of data better than R and without the 
memory worries.  For this reshaping, you can use pivot tables to do this 
in a straight forward way in Excel or MS Access. 

Here we let OldTable be a table in the database too. So, make a view in 
the database:

  create view WhatIWantMyRTableToBeView as 
  select Period,Fund01,Fund02
   from (
    select Period,Return as Fund01
     from OldTable
     where Fund='Fund01'
   ) as f1 inner join (
    select Period,Return as Fund02
     from OldTable
     where Fund=in('Fund02','Fund 02')
   ) as f1 on(f1.Period=f2=Period)
   order by Period;
 
Then, you can then read in the data to R using the RODBC or other database 
package such as RSQLite

  library(RODBC)
  dbh<-odbcConnect("<path>/<database name>")
  NewTable<-sqlFetch(dbh,"WhatIWantMyRTableToBeView")

or with an SQL query in R. 

 NewTable<-sqlQuery(dbh,"
  select Period,Fund01,Fund02
   from (
    select Period,Return as Fund01
     from OldTable
     where Fund='Fund01'
   ) as f1 inner join (
    select Period,Return as Fund02
     from OldTable
     where Fund=in('Fund02','Fund 02')
   ) as f1 on(f1.Period=f2=Period)
   order by Period;
 ")

3. These could be done all in R with a merge:

 fund1<-OldTable[OldTable$Fund=="Fund01",]
 fund1<-dataframe(Period=fund1$Period,Fund01=fund1$Fund)
 fund2<-OldTable[OldTable$Fund%in%c("Fund02","Fund 02"),]
 fund2<-dataframe(Period=fund2$Period,Fund02=fund2$Fund)
 NewTable<-merge(fund1,fund2)  # merge keys on the variables common to 
both datasets.

I haven't run the last two parts, so you may need to tweak the code. 
Others on this list are more knowledgeable on these matters. 

Mark
 


Mark Otto, Biometrician 
U. S. Fish and Wildlife Service 
Patuxent Wildlife Research Center
11510 American Holly Dr
Laurel MD  20708-4002




"Gabor Grothendieck" <ggrothendieck using gmail.com> 
Sent by: r-sig-db-bounces using stat.math.ethz.ch
09/05/2008 10:39 AM

To
"Hebbertt de Farias Soares" <hebbertt.soares using rosasoffice.com>
cc
"r-sig-db using stat.math.ethz.ch" <r-sig-db using stat.math.ethz.ch>
Subject
Re: [R-sig-DB] Data manipulation: transforming SQL query in an appropriate 
dataframe






Correcting the inconsistencies in the data, try this
(and also see ?reshape and the reshape package for
other approaches):

DF <- structure(list(Id = 1:4, Fund = structure(c(1L, 1L, 2L, 2L),
.Label = c("Fund01",
"Fund02"), class = "factor"), Return = c(0.5, 0.4, -0.3, -0.4
), Period = structure(c(2L, 1L, 2L, 1L), .Label = c("june/08",
"may/08"), class = "factor")), .Names = c("Id", "Fund", "Return",
"Period"), class = "data.frame", row.names = c(NA, -4L))
DF

tapply(DF$Return, DF[c(2, 4)], c)


On Fri, Sep 5, 2008 at 10:13 AM, Hebbertt de Farias Soares
<hebbertt.soares using rosasoffice.com> wrote:
> Dear list,
>
> I am a beginner with R and I have  a question which may sound stupid: 
How can I transform my SQL query into an appropriate dataframe?
>
> I've looking around, and haven't found any solution.
>
> My return data is stored in a SQL database. When I query it, it  gives 
me the following output:
> Id            Fund                     Return                 Period
> 1             Fund01                0,5                         may/08
> 2             Fund01                0,4                         june/08
> 3             Fund 02               -0,3                        may/08
> 4             Fund02                -0,4 june/2008
>
> However this data is not suitable for analysis. I would like to 
transform it to a format like:

> Period                  Fund01                Fund02
> May/08                0,5                         0,4
> June/08               -0,3                        -0,4
>
> Is it possible to do this with R? Could you please recommend me 
references or packages which are useful for handling this kind of data?
>
> Thank you very much for your time.
>
> Regards,
>
> Hebbertt
>
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>

_______________________________________________
R-sig-DB mailing list -- R Special Interest Group
R-sig-DB using stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-db


	[[alternative HTML version deleted]]




More information about the R-sig-DB mailing list