[R] R issue with unequal large data frames with multiple columns

Jim Holtman jholtman at gmail.com
Thu May 2 11:43:49 CEST 2013


?duplicated
?intersect

Sent from my iPad

On May 2, 2013, at 2:28, Adeel Amin <adeel.amin at gmail.com> wrote:

> I'm a bit of an amateur R programmer.  I can do simple R scenarios but my
> handle on complex grammatical issues isn't steady.
> 
> I have 12 CSV files that I've read into dataframes.  Each has 8 columns and
> over 2000000 rows.  Each dataframe has data associated by time component
> and a date component in the format of:
> 
> X.DATE and then X.TIME
> 
> X.DATE is in the format of MMDDYYYY and X.TIME is format HHMM.  The issue
> is that even though each dataframe begins and ends with the same X.DATE and
> X.TIME values, each data frame has different number of rows.  One may have
> as many 100000 rows more than the other.
> 
> I want to do two things:
> 
> 1) I want to extract a certain portion of data depending on date and time
> (easy)
> 
> 2) In lock step with number 2 I want to eliminate values from the data
> frame that are a) redundant or b) do not appear in the other data sets.
> 
> When step 2 is done, all the time/date data within all 12 dataframes will
> be the same.
> 
> Suggestions?  Thanks R Community --
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list