[R] Looping Through DataFrames with Differing Lenghts

Paul Bernal paulbernal07 at gmail.com
Tue Mar 28 16:40:43 CEST 2017


Dear friend David,

Thank you for your valuable suggestion. So here is the file in .txt format.

Best of regards,

Paul

2017-03-28 9:35 GMT-05:00 David L Carlson <dcarlson at tamu.edu>:

> We did not get the file on the list. You need to rename your file to
> "Container.txt" or the mailing list will strip it from your message. The
> read.csv() function returns a data frame so Data is already a data frame.
> The command DataFrame<-data.frame(Data) just makes a copy of Data.
>
> Without the file, it is difficult to be certain, but your dates are
> probably stored as character strings and read.csv() will turn those to
> factors unless you tell it not to do that. Try
>
> Data<-read.csv("Container.csv", stringsAsFactors=FALSE)
> str(Data) # To see how the dates are stored
>
> and see if things work better. If not, rename the file or use dput(Data)
> and copy the result into your email message. If the data is very long, use
> dput(head(Data, 15)).
>
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Paul
> Bernal
> Sent: Tuesday, March 28, 2017 9:12 AM
> To: Ng Bo Lin <ngbolin91 at gmail.com>
> Cc: r-help at r-project.org
> Subject: Re: [R] Looping Through DataFrames with Differing Lenghts
>
> Dear friends Ng Bo Lin, Mark and Ulrik, thank you all for your kind and
> valuable replies,
>
> I am trying to reformat a date as follows:
>
> Data<-read.csv("Container.csv")
>
> DataFrame<-data.frame(Data)
>
> DataFrame$TransitDate<-as.Date(DataFrame$TransitDate, "%Y-%m-%d")
>
> #trying to put it in YYYY-MM-DD format
>
> However, when I do this, I get a bunch of NAs for the dates.
>
> I am providing a sample dataset as a reference.
>
> Any help will be greatly appreciated,
>
> Best regards,
>
> Paul
>
> 2017-03-28 8:15 GMT-05:00 Ng Bo Lin <ngbolin91 at gmail.com>:
>
> > Hi Paul,
> >
> > Using the example provided by Ulrik, where
> >
> > > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01”,
> > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> > c(15,20)),
> >
> > You could also try the following function:
> >
> > for (i in 1:dim(exdf1)[1]){
> >         if (!exdf1[i, 1] %in% exdf2[, 1]){
> >                 exdf2 <- rbind(exdf2, exdf1[i,])
> >         }
> > }
> >
> > Basically, what the function does is that it runs through the number of
> > rows in exdf1, and checks if the Date of the exdf1 row already exists in
> > Date column of exdf2. If so, it skips it. Otherwise, it binds the row to
> > df2.
> >
> > Hope this helps!
> >
> >
> > Side note.: Computational efficiency wise, think Ulrik’s answer is
> > probably better. Presentation wise, his is also much better.
> >
> > Regards,
> > Bo Lin
> >
> > > On 28 Mar 2017, at 5:22 PM, Ulrik Stervbo <ulrik.stervbo at gmail.com>
> > wrote:
> > >
> > > Hi Paul,
> > >
> > > does this do what you want?
> > >
> > > exdf1 <- data.frame(Date = c("1985-10-01", "1985-11-01", "1985-12-01",
> > > "1986-01-01"), Transits = c(NA, NA, NA, NA))
> > > exdf2 <- data.frame(Date = c("1985-10-01", "1986-01-01"), Transits =
> > c(15,
> > > 20))
> > >
> > > tmpdf <- subset(exdf1, !Date %in% exdf2$Date)
> > >
> > > rbind(exdf2, tmpdf)
> > >
> > > HTH,
> > > Ulrik
> > >
> > > On Tue, 28 Mar 2017 at 10:50 Paul Bernal <paulbernal07 at gmail.com>
> wrote:
> > >
> > > Dear friend Mark,
> > >
> > > Great suggestion! Thank you for replying.
> > >
> > > I have two dataframes, dataframe1 and dataframe2.
> > >
> > > dataframe1 has two columns, one with the dates in YYYY-MM-DD format and
> > the
> > > other colum with number of transits (all of which were set to NA
> values).
> > > dataframe1 starts in 1985-10-01 (october 1st 1985) and ends in
> 2017-03-01
> > > (march 1 2017).
> > >
> > > dataframe2 has the same  two columns, one with the dates in YYYY-MM-DD
> > > format, and the other column with number of transits. dataframe2 starts
> > > have the same start and end dates, however, dataframe2 has missing
> dates
> > > between the start and end dates, so it has fewer observations.
> > >
> > > dataframe1 has a total of 378 observations and dataframe2 has a  total
> of
> > > 362 observations.
> > >
> > > I would like to come up with a code that could do the following:
> > >
> > > Get the dates of dataframe1 that are missing in dataframe2 and add them
> > as
> > > records to dataframe 2 but with NA values.
> > >
> > > <dataframe1                              <dataframe2
> > >
> > > Date              Transits                  Date
> > > Transits
> > > 1985-10-01    NA                         1985-10-01                15
> > > 1985-11-01    NA                         1986-01-01                 20
> > > 1985-12-01    NA                         1986-02-01                 5
> > > 1986-01-01    NA
> > > 1986-02-01    NA
> > > 2017-03-01    NA
> > >
> > > I would like to fill in the missing dates in dataframe2, with NA as
> value
> > > for the missing transits, so that I  could end up with a dataframe3
> > looking
> > > as follows:
> > >
> > > <dataframe3
> > > Date                                Transits
> > > 1985-10-01                      15
> > > 1985-11-01                       NA
> > > 1985-12-01                       NA
> > > 1986-01-01                       20
> > > 1986-02-01                       5
> > > 2017-03-01                       NA
> > >
> > > This is what I want to accomplish.
> > >
> > > Thanks, beforehand for your help,
> > >
> > > Best regards,
> > >
> > > Paul
> > >
> > >
> > > 2017-03-27 15:15 GMT-05:00 Mark Sharp <msharp at txbiomed.org>:
> > >
> > >> Make some small dataframes of just a few rows that illustrate the
> > problem
> > >> structure. Make a third that has the result you want. You will get an
> > >> answer very quickly. Without a self-contained reproducible problem,
> > > results
> > >> vary.
> > >>
> > >> Mark
> > >> R. Mark Sharp, Ph.D.
> > >> msharp at TxBiomed.org
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>> On Mar 27, 2017, at 3:09 PM, Paul Bernal <paulbernal07 at gmail.com>
> > wrote:
> > >>>
> > >>> Dear friends,
> > >>>
> > >>> I have one dataframe which contains 378 observations, and another
> one,
> > >>> containing 362 observations.
> > >>>
> > >>> Both dataframes have two columns, one date column and another one
> with
> > >> the
> > >>> number of transits.
> > >>>
> > >>> I wanted to come up with a code so that I could fill in the dates
> that
> > >> are
> > >>> missing in one of the dataframes and replace the column of transits
> > with
> > >>> the value NA.
> > >>>
> > >>> I have tried several things but R obviously complains that the length
> > of
> > >>> the dataframes are different.
> > >>>
> > >>> How can I solve this?
> > >>>
> > >>> Any guidance will be greatly appreciated,
> > >>>
> > >>> Best regards,
> > >>>
> > >>> Paul
> > >>>
> > >>> [[alternative HTML version deleted]]
> > >>>
> > >>> ______________________________________________
> > >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide http://www.R-project.org/
> > >> posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >> CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments
> > >> transmitted, may contain privileged and confidential information and
> is
> > >> intended solely for the exclusive use of the individual or entity to
> > whom
> > >> it is addressed. If you are not the intended recipient, you are hereby
> > >> notified that any review, dissemination, distribution or copying of
> this
> > >> e-mail and/or attachments is strictly prohibited. If you have received
> > > this
> > >> e-mail in error, please immediately notify the sender stating that
> this
> > >> transmission was misdirected; return the e-mail to sender; destroy all
> > >> paper copies and delete all electronic copies from your system without
> > >> disclosing its contents.
> > >>
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/
> > posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-------------- next part --------------
TransitDate	Transits
1-Oct-85	4
1-Nov-85	4
1-Dec-85	5
1-Jan-86	4
1-Feb-86	3
1-Mar-86	6
1-Apr-86	4
1-May-86	3
1-Jun-86	4
1-Jul-86	5
1-Aug-86	5
1-Sep-86	4
1-Oct-86	4
1-Nov-86	5
1-Dec-86	2
1-Feb-88	1
1-Mar-88	1
1-Apr-88	2
1-May-88	2
1-Jul-88	1
1-Aug-88	1
1-Sep-88	1
1-Oct-88	2
1-Dec-88	2
1-Jan-89	3
1-Mar-89	2
1-Apr-89	3
1-May-89	4
1-Jun-89	3
1-Jul-89	3
1-Aug-89	2
1-Sep-89	5
1-Oct-89	3
1-Nov-89	3
1-Dec-89	4
1-Jan-90	6
1-Feb-90	4
1-Mar-90	6
1-Apr-90	3
1-May-90	7
1-Jun-90	7
1-Jul-90	3
1-Aug-90	6
1-Sep-90	5
1-Oct-90	6
1-Nov-90	7
1-Dec-90	6
1-Jan-91	5
1-Feb-91	7
1-Mar-91	7
1-Apr-91	7
1-May-91	8
1-Jun-91	7
1-Jul-91	7
1-Aug-91	8
1-Sep-91	9
1-Oct-91	8
1-Nov-91	8
1-Dec-91	9
1-Jan-92	10
1-Feb-92	8
1-Mar-92	8
1-Apr-92	7
1-May-92	9
1-Jun-92	8
1-Jul-92	12
1-Aug-92	12
1-Sep-92	11
1-Oct-92	12
1-Nov-92	12
1-Dec-92	11
1-Jan-93	13
1-Feb-93	10
1-Mar-93	11
1-Apr-93	12
1-May-93	15
1-Jun-93	14
1-Jul-93	12
1-Aug-93	14
1-Sep-93	11
1-Oct-93	16
1-Nov-93	10
1-Dec-93	14
1-Jan-94	12
1-Feb-94	14
1-Mar-94	14
1-Apr-94	16
1-May-94	15
1-Jun-94	14
1-Jul-94	16
1-Aug-94	16
1-Sep-94	14
1-Oct-94	17
1-Nov-94	14
1-Dec-94	14
1-Jan-95	16
1-Feb-95	18
1-Mar-95	15
1-Apr-95	17
1-May-95	19
1-Jun-95	21
1-Jul-95	23
1-Aug-95	24
1-Sep-95	21
1-Oct-95	24
1-Nov-95	20
1-Dec-95	26
1-Jan-96	22
1-Feb-96	21
1-Mar-96	25
1-Apr-96	23
1-May-96	24
1-Jun-96	24
1-Jul-96	22
1-Aug-96	25
1-Sep-96	24
1-Oct-96	24
1-Nov-96	25
1-Dec-96	25
1-Jan-97	25
1-Feb-97	20
1-Mar-97	26
1-Apr-97	22
1-May-97	26
1-Jun-97	24
1-Jul-97	21
1-Aug-97	27
1-Sep-97	23
1-Oct-97	25
1-Nov-97	25
1-Dec-97	26
1-Jan-98	25
1-Feb-98	20
1-Mar-98	25
1-Apr-98	19
1-May-98	28
1-Jun-98	24
1-Jul-98	25
1-Aug-98	25
1-Sep-98	26
1-Oct-98	28
1-Nov-98	25
1-Dec-98	26
1-Jan-99	28
1-Feb-99	24
1-Mar-99	26
1-Apr-99	26
1-May-99	30
1-Jun-99	24
1-Jul-99	28
1-Aug-99	26
1-Sep-99	24
1-Oct-99	29
1-Nov-99	27
1-Dec-99	25
1-Jan-00	29
1-Feb-00	25
1-Mar-00	29
1-Apr-00	25
1-May-00	31
1-Jun-00	24
1-Jul-00	36
1-Aug-00	29
1-Sep-00	30
1-Oct-00	37
1-Nov-00	34
1-Dec-00	42
1-Jan-01	41
1-Feb-01	37
1-Mar-01	42
1-Apr-01	43
1-May-01	46
1-Jun-01	49
1-Jul-01	41
1-Aug-01	50
1-Sep-01	46
1-Oct-01	47
1-Nov-01	49
1-Dec-01	56
1-Jan-02	55
1-Feb-02	54
1-Mar-02	55
1-Apr-02	59
1-May-02	60
1-Jun-02	58
1-Jul-02	66
1-Aug-02	68
1-Sep-02	66
1-Oct-02	68
1-Nov-02	67
1-Dec-02	79
1-Jan-03	73
1-Feb-03	71
1-Mar-03	85
1-Apr-03	79
1-May-03	80
1-Jun-03	82
1-Jul-03	86
1-Aug-03	78
1-Sep-03	86
1-Oct-03	81
1-Nov-03	90
1-Dec-03	93
1-Jan-04	95
1-Feb-04	84
1-Mar-04	93
1-Apr-04	88
1-May-04	92
1-Jun-04	99
1-Jul-04	90
1-Aug-04	105
1-Sep-04	99
1-Oct-04	103
1-Nov-04	97
1-Dec-04	97
1-Jan-05	106
1-Feb-05	95
1-Mar-05	102
1-Apr-05	98
1-May-05	117
1-Jun-05	100
1-Jul-05	111
1-Aug-05	115
1-Sep-05	111
1-Oct-05	116
1-Nov-05	120
1-Dec-05	118
1-Jan-06	126
1-Feb-06	107
1-Mar-06	128
1-Apr-06	123
1-May-06	140
1-Jun-06	135
1-Jul-06	142
1-Aug-06	138
1-Sep-06	147
1-Oct-06	149
1-Nov-06	146
1-Dec-06	153
1-Jan-07	143
1-Feb-07	131
1-Mar-07	134
1-Apr-07	132
1-May-07	143
1-Jun-07	137
1-Jul-07	152
1-Aug-07	146
1-Sep-07	152
1-Oct-07	153
1-Nov-07	141
1-Dec-07	142
1-Jan-08	130
1-Feb-08	122
1-Mar-08	124
1-Apr-08	127
1-May-08	138
1-Jun-08	126
1-Jul-08	138
1-Aug-08	142
1-Sep-08	137
1-Oct-08	137
1-Nov-08	139
1-Dec-08	130
1-Jan-09	134
1-Feb-09	115
1-Mar-09	122
1-Apr-09	129
1-May-09	130
1-Jun-09	122
1-Jul-09	117
1-Aug-09	114
1-Sep-09	119
1-Oct-09	112
1-Nov-09	102
1-Dec-09	98
1-Jan-10	92
1-Feb-10	86
1-Mar-10	108
1-Apr-10	95
1-May-10	109
1-Jun-10	110
1-Jul-10	109
1-Aug-10	118
1-Sep-10	115
1-Oct-10	123
1-Nov-10	110
1-Dec-10	117
1-Jan-11	114
1-Feb-11	110
1-Mar-11	114
1-Apr-11	120
1-May-11	131
1-Jun-11	122
1-Jul-11	124
1-Aug-11	133
1-Sep-11	129
1-Oct-11	133
1-Nov-11	133
1-Dec-11	126
1-Jan-12	137
1-Feb-12	110
1-Mar-12	128
1-Apr-12	127
1-May-12	132
1-Jun-12	127
1-Jul-12	150
1-Aug-12	136
1-Sep-12	135
1-Oct-12	140
1-Nov-12	124
1-Dec-12	136
1-Jan-13	136
1-Feb-13	127
1-Mar-13	128
1-Apr-13	130
1-May-13	132
1-Jun-13	128
1-Jul-13	122
1-Aug-13	130
1-Sep-13	124
1-Oct-13	129
1-Nov-13	117
1-Dec-13	108
1-Jan-14	115
1-Feb-14	104
1-Mar-14	120
1-Apr-14	117
1-May-14	122
1-Jun-14	109
1-Jul-14	111
1-Aug-14	116
1-Sep-14	117
1-Oct-14	115
1-Nov-14	110
1-Dec-14	106
1-Jan-15	109
1-Feb-15	93
1-Mar-15	111
1-Apr-15	107
1-May-15	120
1-Jun-15	113
1-Jul-15	131
1-Aug-15	127
1-Sep-15	120
1-Oct-15	124
1-Nov-15	123
1-Dec-15	117
1-Jan-16	132
1-Feb-16	117
1-Mar-16	124
1-Apr-16	121
1-May-16	122
1-Jun-16	114
1-Jul-16	99
1-Aug-16	76
1-Sep-16	60
1-Oct-16	64
1-Nov-16	47
1-Dec-16	54
1-Jan-17	48
1-Feb-17	41
1-Mar-17	30


More information about the R-help mailing list