[R-sig-DB] Data manipulation: transforming SQL query in an appropriate dataframe
M@rk_Otto m@iii@g oii iws@gov
M@rk_Otto m@iii@g oii iws@gov
Fri Sep 5 18:26:06 CEST 2008
Here are three ways of handling the problem of taking values of a variable
and making them columns of another table. Let the original data be
OldTable. Note I had to clean up your example a little bit.
fh<-textConnection("Id Fund Return Period
1 Fund01 0,5 may/08
2 Fund01 0,4 june/08
3 Fund02 -0,3 may/08
4 Fund02 -0,4 june/08
")
OldTable<-read.table(fh,header=T,dec=",",sep="",row.names="Id")
1. Use reshape in R. This is used with longitudinal data, so you have to
look closely at the manual page to get what you need out of it.
NewTable<-reshape(OldTable,idvar="Period",v.names='Return',timevar='Fund',direction="wide")
2. I you can to the data rearranging in your database. SQL is not great
at converting values of variables to columns. It is great at doing lots
of manipulations on large amounts of data better than R and without the
memory worries. For this reshaping, you can use pivot tables to do this
in a straight forward way in Excel or MS Access.
Here we let OldTable be a table in the database too. So, make a view in
the database:
create view WhatIWantMyRTableToBeView as
select Period,Fund01,Fund02
from (
select Period,Return as Fund01
from OldTable
where Fund='Fund01'
) as f1 inner join (
select Period,Return as Fund02
from OldTable
where Fund=in('Fund02','Fund 02')
) as f1 on(f1.Period=f2=Period)
order by Period;
Then, you can then read in the data to R using the RODBC or other database
package such as RSQLite
library(RODBC)
dbh<-odbcConnect("<path>/<database name>")
NewTable<-sqlFetch(dbh,"WhatIWantMyRTableToBeView")
or with an SQL query in R.
NewTable<-sqlQuery(dbh,"
select Period,Fund01,Fund02
from (
select Period,Return as Fund01
from OldTable
where Fund='Fund01'
) as f1 inner join (
select Period,Return as Fund02
from OldTable
where Fund=in('Fund02','Fund 02')
) as f1 on(f1.Period=f2=Period)
order by Period;
")
3. These could be done all in R with a merge:
fund1<-OldTable[OldTable$Fund=="Fund01",]
fund1<-dataframe(Period=fund1$Period,Fund01=fund1$Fund)
fund2<-OldTable[OldTable$Fund%in%c("Fund02","Fund 02"),]
fund2<-dataframe(Period=fund2$Period,Fund02=fund2$Fund)
NewTable<-merge(fund1,fund2) # merge keys on the variables common to
both datasets.
I haven't run the last two parts, so you may need to tweak the code.
Others on this list are more knowledgeable on these matters.
Mark
Mark Otto, Biometrician
U. S. Fish and Wildlife Service
Patuxent Wildlife Research Center
11510 American Holly Dr
Laurel MD 20708-4002
"Gabor Grothendieck" <ggrothendieck using gmail.com>
Sent by: r-sig-db-bounces using stat.math.ethz.ch
09/05/2008 10:39 AM
To
"Hebbertt de Farias Soares" <hebbertt.soares using rosasoffice.com>
cc
"r-sig-db using stat.math.ethz.ch" <r-sig-db using stat.math.ethz.ch>
Subject
Re: [R-sig-DB] Data manipulation: transforming SQL query in an appropriate
dataframe
Correcting the inconsistencies in the data, try this
(and also see ?reshape and the reshape package for
other approaches):
DF <- structure(list(Id = 1:4, Fund = structure(c(1L, 1L, 2L, 2L),
.Label = c("Fund01",
"Fund02"), class = "factor"), Return = c(0.5, 0.4, -0.3, -0.4
), Period = structure(c(2L, 1L, 2L, 1L), .Label = c("june/08",
"may/08"), class = "factor")), .Names = c("Id", "Fund", "Return",
"Period"), class = "data.frame", row.names = c(NA, -4L))
DF
tapply(DF$Return, DF[c(2, 4)], c)
On Fri, Sep 5, 2008 at 10:13 AM, Hebbertt de Farias Soares
<hebbertt.soares using rosasoffice.com> wrote:
> Dear list,
>
> I am a beginner with R and I have a question which may sound stupid:
How can I transform my SQL query into an appropriate dataframe?
>
> I've looking around, and haven't found any solution.
>
> My return data is stored in a SQL database. When I query it, it gives
me the following output:
> Id Fund Return Period
> 1 Fund01 0,5 may/08
> 2 Fund01 0,4 june/08
> 3 Fund 02 -0,3 may/08
> 4 Fund02 -0,4 june/2008
>
> However this data is not suitable for analysis. I would like to
transform it to a format like:
> Period Fund01 Fund02
> May/08 0,5 0,4
> June/08 -0,3 -0,4
>
> Is it possible to do this with R? Could you please recommend me
references or packages which are useful for handling this kind of data?
>
> Thank you very much for your time.
>
> Regards,
>
> Hebbertt
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
_______________________________________________
R-sig-DB mailing list -- R Special Interest Group
R-sig-DB using stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-db
[[alternative HTML version deleted]]
More information about the R-sig-DB
mailing list