[R] Replicate an SAS/SQL request

Guibert TCHINDE gtchinde at aid.fr
Thu Aug 1 18:51:33 CEST 2013


I am very confuse,
I send with a wrong subject. 
All my Apologizes

-----
________________________________________
De : Guibert TCHINDE
Date d'envoi : jeudi 1 août 2013 18:51
À : r-help at r-project.org
Cc : R help
Objet : RE : [R] Conversion of matrix in r to integer

Dear List,

I am trying to replicate an SQL request run wih SAS using R
The request is :

PROC SQL;
        CREATE VIEW myview AS
        SELECT a.*,
                   b.semaine
                   b.date_debut
                   b.date_fin
FROM myfirsttable. a
INNER JOIN
                mycalendar
                ON (  a.DEBUT1    <=  b.date_fin  AND a.FIN>=  b.date_debut   AND (a.MIG = 'O' AND a.DATE_MIG <= b. date_fin)
QUIT;

Like we see, ther's not join key between the two datasets and the job run by sas is to : make cartesian product and subsetting according to the conditions


So when i try to replicate this in R. I do this :

> system.time(extract<-merge(myfirsttable,mycalendar))
utilisateur     système      écoulé
    6509.51       47.61     6792.12
> system.time(VueAnalyse<-subset(extracts,
+                                (DEBUT1<=date_fin & FIN>=date_debut) &
+                                ((MIG == 'O' & DATE_MIG <= date_fin))
utilisateur     système      écoulé
     178.59       14.07      211.11

The first datasets contains 40 millions rows and the second just 5.
So the first cartesian product take ~2 hours.

Somebody have any idea to do this in R with less time?

Thanks in advance

Regards,

GT



More information about the R-help mailing list