[R] read txt file - date - no space

Diego Avesani d|ego@@ve@@n| @end|ng |rom gm@||@com
Thu Aug 2 08:55:34 CEST 2018


Dear

I have check the one of the line that gives me problem. I mean, which give
NA after R processing. I think that is similar to the others:

10/12/1998 10:00,0,0,0
10/12/1998 11:00,0,0,0
10/12/1998 12:00,0,0,0
10/12/1998 13:00,0,0,0
10/12/1998 14:00,0,0,0
10/12/1998 15:00,0,0,0
10/12/1998 16:00,0,0,0
10/12/1998 17:00,0,0,0

@jim: It seems that you suggestion is focus on reading data from the
terminal. It is possible to apply it to a *.csv file?

@Pikal: Could it be that there are some date conversion error?

Thanks again,
Diego


Diego


On 1 August 2018 at 17:01, jim holtman <jholtman using gmail.com> wrote:

>
> Try this:
>
> > library(lubridate)
> > library(tidyverse)
> > input <- read.csv(text = "date,str1,str2,str3
> + 10/1/1998 0:00,0.6,0,0
> +                   10/1/1998 1:00,0.2,0.2,0.2
> +                   10/1/1998 2:00,0.6,0.2,0.4
> +                   10/1/1998 3:00,0,0,0.6
> +                   10/1/1998 4:00,0,0,0
> +                   10/1/1998 5:00,0,0,0
> +                   10/1/1998 6:00,0,0,0
> +                   10/1/1998 7:00,0.2,0,0", as.is = TRUE)
> > # convert the date and add the "day" so summarize
> > input <- input %>%
> +   mutate(date = mdy_hm(date),
> +          day = floor_date(date, unit = 'day')
> +   )
> >
> > by_day <- input %>%
> +   group_by(day) %>%
> +   summarise(m_s1 = mean(str1),
> +             m_s2 = mean(str2),
> +             m_s3 = mean(str3)
> +   )
> >
> > by_day
> # A tibble: 1 x 4
>   day                  m_s1   m_s2  m_s3
>   <dttm>              <dbl>  <dbl> <dbl>
> 1 1998-10-01 00:00:00 0.200 0.0500 0.150
>
> Jim Holtman
> *Data Munger Guru*
>
>
> *What is the problem that you are trying to solve?Tell me what you want to
> do, not how you want to do it.*
>
>
> On Tue, Jul 31, 2018 at 11:54 PM Diego Avesani <diego.avesani using gmail.com>
> wrote:
>
>> Dear all,
>> I am sorry, I did a lot of confusion. I am sorry, I have to relax and stat
>> all again in order to understand.
>> If I could I would like to start again, without mixing strategy and
>> waiting
>> for your advice.
>>
>> I am really appreciate you help, really really.
>> Here my new file, a *.csv file (buy the way, it is possible to attach it
>> in
>> the mailing list?)
>>
>> date,str1,str2,str3
>> 10/1/1998 0:00,0.6,0,0
>> 10/1/1998 1:00,0.2,0.2,0.2
>> 10/1/1998 2:00,0.6,0.2,0.4
>> 10/1/1998 3:00,0,0,0.6
>> 10/1/1998 4:00,0,0,0
>> 10/1/1998 5:00,0,0,0
>> 10/1/1998 6:00,0,0,0
>> 10/1/1998 7:00,0.2,0,0
>>
>>
>> I read it as:
>> MyData <- read.csv(file="obs_prec.csv",header=TRUE, sep=",")
>>
>> at this point I would like to have the daily mean.
>> What would you suggest?
>>
>> Really Really thanks,
>> You are my lifesaver
>>
>> Thanks
>>
>>
>>
>> Diego
>>
>>
>> On 1 August 2018 at 01:01, Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
>> wrote:
>>
>> > ... and the most common source of NA values in time data is wrong
>> > timezones. You really need to make sure the timezone that is assumed
>> when
>> > the character data are converted to POSIXt agrees with the data. In most
>> > cases the easiest way to insure this is to use
>> >
>> > Sys.setenv(TZ="US/Pacific")
>> >
>> > or whatever timezone from
>> >
>> > OlsonNames()
>> >
>> > corresponds with your data. Execute this setenv function before the
>> > strptime or as.POSIXct() function call.
>> >
>> > You can use
>> >
>> > MyData[ is.na(MyData$datetime), ]
>> >
>> > to see which records are failing to convert time.
>> >
>> > [1] https://github.com/jdnewmil/eci298sp2016/blob/master/QuickHowto1
>> >
>> > On July 31, 2018 3:04:05 PM PDT, Jim Lemon <drjimlemon using gmail.com>
>> wrote:
>> > >Hi Diego,
>> > >I think the error is due to NA values in your data file. If I extend
>> > >your example and run it, I get no errors:
>> > >
>> > >MyData<-read.table(text="103001930 103001580 103001530
>> > >1998-10-01 00:00:00 0.6 0 0
>> > >1998-10-01 01:00:00 0.2 0.2 0.2
>> > >1998-10-01 02:00:00 0.6 0.2 0.4
>> > >1998-10-01 03:00:00 0 0 0.6
>> > >1998-10-01 04:00:00 0 0 0
>> > >1998-10-01 05:00:00 0 0 0
>> > >1998-10-01 06:00:00 0 0 0
>> > >1998-10-01 07:00:00 0.2 0 0
>> > >1998-10-01 08:00:00 0.6 0 0
>> > >1998-10-01 09:00:00 0.2 0.2 0.2
>> > >1998-10-01 10:00:00 0.6 0.2 0.4
>> > >1998-10-01 11:00:00 0 0 0.6
>> > >1998-10-01 12:00:00 0 0 0
>> > >1998-10-01 13:00:00 0 0 0
>> > >1998-10-01 14:00:00 0 0 0
>> > >1998-10-01 15:00:00 0.2 0 0
>> > >1998-10-01 16:00:00 0.6 0 0
>> > >1998-10-01 17:00:00 0.2 0.2 0.2
>> > >1998-10-01 18:00:00 0.6 0.2 0.4
>> > >1998-10-01 19:00:00 0 0 0.6
>> > >1998-10-01 20:00:00 0 0 0
>> > >1998-10-01 21:00:00 0 0 0
>> > >1998-10-01 22:00:00 0 0 0
>> > >1998-10-01 23:00:00 0.2 0 0
>> > >1998-10-02 00:00:00 0.6 0 0
>> > >1998-10-02 01:00:00 0.2 0.2 0.2
>> > >1998-10-02 02:00:00 0.6 0.2 0.4
>> > >1998-10-02 03:00:00 0 0 0.6
>> > >1998-10-02 04:00:00 0 0 0
>> > >1998-10-02 05:00:00 0 0 0
>> > >1998-10-02 06:00:00 0 0 0
>> > >1998-10-02 07:00:00 0.2 0 0
>> > >1998-10-02 08:00:00 0.6 0 0
>> > >1998-10-02 09:00:00 0.2 0.2 0.2
>> > >1998-10-02 10:00:00 0.6 0.2 0.4
>> > >1998-10-02 11:00:00 0 0 0.6
>> > >1998-10-02 12:00:00 0 0 0
>> > >1998-10-02 13:00:00 0 0 0
>> > >1998-10-02 14:00:00 0 0 0
>> > >1998-10-02 15:00:00 0.2 0 0
>> > >1998-10-02 16:00:00 0.6 0 0
>> > >1998-10-02 17:00:00 0.2 0.2 0.2
>> > >1998-10-02 18:00:00 0.6 0.2 0.4
>> > >1998-10-02 19:00:00 0 0 0.6
>> > >1998-10-02 20:00:00 0 0 0
>> > >1998-10-02 21:00:00 0 0 0
>> > >1998-10-02 22:00:00 0 0 0
>> > >1998-10-02 23:00:00 0.2 0 0",
>> > >skip=1,stringsAsFactors=FALSE)
>> > >names(MyData)<-c("date","time","st1","st2","st3")
>> > >MyData$datetime<-strptime(paste(MyData$date,MyData$time),
>> > > format="%Y-%m-%d %H:%M:%S")
>> > >MyData$datetime
>> > >st1_daily<-by(MyData$st1,MyData$date,mean)
>> > >st2_daily<-by(MyData$st2,MyData$date,mean)
>> > >st3_daily<-by(MyData$st3,MyData$date,mean)
>> > >st1_daily
>> > >st2_daily
>> > >st3_daily
>> > >
>> > >Try adding na.rm=TRUE to the "by" calls:
>> > >
>> > >st1_daily<-by(MyData$st1,MyData$date,mean,na.rm=TRUE)
>> > >st2_daily<-by(MyData$st2,MyData$date,mean,na.rm=TRUE)
>> > >st3_daily<-by(MyData$st3,MyData$date,mean,na.rm=TRUE)
>> > >
>> > >Jim
>> > >
>> > >On Tue, Jul 31, 2018 at 11:11 PM, Diego Avesani
>> > ><diego.avesani using gmail.com> wrote:
>> > >> Dear all,
>> > >>
>> > >> I have still problem with date.
>> > >> Could you please tel me how to use POSIXct.
>> > >> Indeed I have found this command:
>> > >> timeAverage, but I am not able to convert MyDate to properly date.
>> > >>
>> > >> Thank a lot
>> > >> I hope to no bother you, at least too much
>> > >>
>> > >>
>> > >> Diego
>> > >>
>> > >>
>> > >> On 31 July 2018 at 11:12, Diego Avesani <diego.avesani using gmail.com>
>> > >wrote:
>> > >>>
>> > >>> Dear Jim, Dear all,
>> > >>>
>> > >>> thanks a lot.
>> > >>>
>> > >>> Unfortunately, I get the following error:
>> > >>>
>> > >>>
>> > >>>  st1_daily<-by(MyData$st1,MyData$date,mean)
>> > >>> Error in tapply(seq_len(0L), list(`MyData$date` = c(913L, 914L,
>> > >925L,  :
>> > >>>   arguments must have same length
>> > >>>
>> > >>>
>> > >>> This is particularly strange. indeed, if I apply
>> > >>>
>> > >>>
>> > >>> mean(MyData$str1,na.rm=TRUE)
>> > >>>
>> > >>>
>> > >>> it works
>> > >>>
>> > >>>
>> > >>> Sorry, I have to learn a lot.
>> > >>> You are really boosting me
>> > >>>
>> > >>> Diego
>> > >>>
>> > >>>
>> > >>> On 31 July 2018 at 11:02, Jim Lemon <drjimlemon using gmail.com> wrote:
>> > >>>>
>> > >>>> Hi Diego,
>> > >>>> One way you can get daily means is:
>> > >>>>
>> > >>>> st1_daily<-by(MyData$st1,MyData$date,mean)
>> > >>>> st2_daily<-by(MyData$st2,MyData$date,mean)
>> > >>>> st3_daily<-by(MyData$st3,MyData$date,mean)
>> > >>>>
>> > >>>> Jim
>> > >>>>
>> > >>>> On Tue, Jul 31, 2018 at 6:51 PM, Diego Avesani
>> > ><diego.avesani using gmail.com>
>> > >>>> wrote:
>> > >>>> > Dear all,
>> > >>>> > I have found the error, my fault. Sorry.
>> > >>>> > There was an extra come in the headers line.
>> > >>>> > Thanks again.
>> > >>>> >
>> > >>>> > If I can I would like to ask you another questions about the
>> > >imported
>> > >>>> > data.
>> > >>>> > I would like to compute the daily average of the different date.
>> > >>>> > Basically I
>> > >>>> > have hourly data, I would like to ave the daily mean of them.
>> > >>>> >
>> > >>>> > Is there some special commands?
>> > >>>> >
>> > >>>> > Thanks a lot.
>> > >>>> >
>> > >>>> >
>> > >>>> > Diego
>> > >>>> >
>> > >>>> >
>> > >>>> > On 31 July 2018 at 10:40, Diego Avesani <diego.avesani using gmail.com
>> >
>> > >>>> > wrote:
>> > >>>> >>
>> > >>>> >> Dear all,
>> > >>>> >> I move to csv file because originally the date where in csv
>> > >file.
>> > >>>> >> In addition, due to the fact that, as you told me, read.csv is a
>> > >>>> >> special
>> > >>>> >> case of read.table, I prefer start to learn from the simplest
>> > >one.
>> > >>>> >> After that, I will try also the *.txt format.
>> > >>>> >>
>> > >>>> >> with read.csv, something strange happened:
>> > >>>> >>
>> > >>>> >> This us now the file:
>> > >>>> >>
>> > >>>> >> date,st1,st2,st3,
>> > >>>> >> 10/1/1998 0:00,0.6,0,0
>> > >>>> >> 10/1/1998 1:00,0.2,0.2,0.2
>> > >>>> >> 10/1/1998 2:00,0.6,0.2,0.4
>> > >>>> >> 10/1/1998 3:00,0,0,0.6
>> > >>>> >> 10/1/1998 4:00,0,0,0
>> > >>>> >> 10/1/1998 5:00,0,0,0
>> > >>>> >> 10/1/1998 6:00,0,0,0
>> > >>>> >> 10/1/1998 7:00,0.2,0,0
>> > >>>> >> 10/1/1998 8:00,0.6,0.2,0
>> > >>>> >> 10/1/1998 9:00,0.2,0.4,0.4
>> > >>>> >> 10/1/1998 10:00,0,0.4,0.2
>> > >>>> >>
>> > >>>> >> When I apply:
>> > >>>> >> MyData <- read.csv(file="obs_prec.csv",header=TRUE, sep=",")
>> > >>>> >>
>> > >>>> >> this is the results:
>> > >>>> >>
>> > >>>> >> 10/1/1998 0:00    0.6    0.00    0.0 NA
>> > >>>> >> 2        10/1/1998 1:00    0.2    0.20    0.2 NA
>> > >>>> >> 3        10/1/1998 2:00    0.6    0.20    0.4 NA
>> > >>>> >> 4        10/1/1998 3:00    0.0    0.00    0.6 NA
>> > >>>> >> 5        10/1/1998 4:00    0.0    0.00    0.0 NA
>> > >>>> >> 6        10/1/1998 5:00    0.0    0.00    0.0 NA
>> > >>>> >> 7        10/1/1998 6:00    0.0    0.00    0.0 NA
>> > >>>> >> 8        10/1/1998 7:00    0.2    0.00    0.0 NA
>> > >>>> >>
>> > >>>> >> I do not understand why.
>> > >>>> >> Something wrong with date?
>> > >>>> >>
>> > >>>> >> really really thanks,
>> > >>>> >> I appreciate a lot all your helps.
>> > >>>> >>
>> > >>>> >> Diedro
>> > >>>> >>
>> > >>>> >>
>> > >>>> >> Diego
>> > >>>> >>
>> > >>>> >>
>> > >>>> >> On 31 July 2018 at 01:25, MacQueen, Don <macqueen1 using llnl.gov>
>> > >wrote:
>> > >>>> >>>
>> > >>>> >>> Or, without removing the first line
>> > >>>> >>>   dadf <- read.table("xxx.txt", stringsAsFactors=FALSE, skip=1)
>> > >>>> >>>
>> > >>>> >>> Another alternative,
>> > >>>> >>>    dadf$datetime <- as.POSIXct(paste(dadf$V1,dadf$V2))
>> > >>>> >>> since the dates appear to be in the default format.
>> > >>>> >>> (I generally prefer to work with datetimes in POSIXct class
>> > >rather
>> > >>>> >>> than
>> > >>>> >>> POSIXlt class)
>> > >>>> >>>
>> > >>>> >>> -Don
>> > >>>> >>>
>> > >>>> >>> --
>> > >>>> >>> Don MacQueen
>> > >>>> >>> Lawrence Livermore National Laboratory
>> > >>>> >>> 7000 East Ave., L-627
>> > >>>> >>> Livermore, CA 94550
>> > >>>> >>> 925-423-1062
>> > >>>> >>> Lab cell 925-724-7509
>> > >>>> >>>
>> > >>>> >>>
>> > >>>> >>>
>> > >>>> >>> On 7/30/18, 4:03 PM, "R-help on behalf of Jim Lemon"
>> > >>>> >>> <r-help-bounces using r-project.org on behalf of
>> > >drjimlemon using gmail.com>
>> > >>>> >>> wrote:
>> > >>>> >>>
>> > >>>> >>>     Hi Diego,
>> > >>>> >>>     You may have to do some conversion as you have three fields
>> > >in
>> > >>>> >>> the
>> > >>>> >>>     first line using the default space separator and five
>> > >fields in
>> > >>>> >>>     subsequent lines. If the first line doesn't contain any
>> > >important
>> > >>>> >>> data
>> > >>>> >>>     you can just delete it or replace it with a meaningful
>> > >header
>> > >>>> >>> line
>> > >>>> >>>     with five fields and save the file under another name.
>> > >>>> >>>
>> > >>>> >>>     It looks as thought you have date-time as two fields. If
>> > >so, you
>> > >>>> >>> can
>> > >>>> >>>     just read the first field if you only want the date:
>> > >>>> >>>
>> > >>>> >>>     # assume you have removed the first line
>> > >>>> >>>     dadf<-read.table("xxx.txt",stringsAsFactors=FALSE
>> > >>>> >>>     dadf$date<-as.Date(dadf$V1,format="%Y-%m-%d")
>> > >>>> >>>
>> > >>>> >>>     If you want the date/time:
>> > >>>> >>>
>> > >>>> >>>
>> > >dadf$datetime<-strptime(paste(dadf$V1,dadf$V2),format="%Y-%m-%d
>> > >>>> >>> %H:%M:%S")
>> > >>>> >>>
>> > >>>> >>>     Jim
>> > >>>> >>>
>> > >>>> >>>     On Tue, Jul 31, 2018 at 12:29 AM, Diego Avesani
>> > >>>> >>> <diego.avesani using gmail.com> wrote:
>> > >>>> >>>     > Dear all,
>> > >>>> >>>     >
>> > >>>> >>>     > I am dealing with the reading of a *.txt file.
>> > >>>> >>>     > The txt file the following shape:
>> > >>>> >>>     >
>> > >>>> >>>     > 103001930 103001580 103001530
>> > >>>> >>>     > 1998-10-01 00:00:00 0.6 0 0
>> > >>>> >>>     > 1998-10-01 01:00:00 0.2 0.2 0.2
>> > >>>> >>>     > 1998-10-01 02:00:00 0.6 0.2 0.4
>> > >>>> >>>     > 1998-10-01 03:00:00 0 0 0.6
>> > >>>> >>>     > 1998-10-01 04:00:00 0 0 0
>> > >>>> >>>     > 1998-10-01 05:00:00 0 0 0
>> > >>>> >>>     > 1998-10-01 06:00:00 0 0 0
>> > >>>> >>>     > 1998-10-01 07:00:00 0.2 0 0
>> > >>>> >>>     >
>> > >>>> >>>     > If it is possible I have a coupe of questions, which will
>> > >sound
>> > >>>> >>> stupid but
>> > >>>> >>>     > they are important to me in order to understand ho R deal
>> > >with
>> > >>>> >>> file
>> > >>>> >>> or date.
>> > >>>> >>>     >
>> > >>>> >>>     > 1) Do I have to convert it to a *csv file?
>> > >>>> >>>     > 2) Can a deal with space and not ","
>> > >>>> >>>     > 3) How can I read date?
>> > >>>> >>>     >
>> > >>>> >>>     > thanks a lot to all of you,
>> > >>>> >>>     > Thanks
>> > >>>> >>>     >
>> > >>>> >>>     >
>> > >>>> >>>     > Diego
>> > >>>> >>>     >
>> > >>>> >>>     >         [[alternative HTML version deleted]]
>> > >>>> >>>     >
>> > >>>> >>>     > ______________________________________________
>> > >>>> >>>     > R-help using r-project.org mailing list -- To UNSUBSCRIBE and
>> > >more,
>> > >>>> >>> see
>> > >>>> >>>     > https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>>> >>>     > PLEASE do read the posting guide
>> > >>>> >>> http://www.R-project.org/posting-guide.html
>> > >>>> >>>     > and provide commented, minimal, self-contained,
>> > >reproducible
>> > >>>> >>> code.
>> > >>>> >>>
>> > >>>> >>>     ______________________________________________
>> > >>>> >>>     R-help using r-project.org mailing list -- To UNSUBSCRIBE and
>> > >more, see
>> > >>>> >>>     https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>>> >>>     PLEASE do read the posting guide
>> > >>>> >>> http://www.R-project.org/posting-guide.html
>> > >>>> >>>     and provide commented, minimal, self-contained,
>> > >reproducible
>> > >>>> >>> code.
>> > >>>> >>>
>> > >>>> >>>
>> > >>>> >>
>> > >>>> >
>> > >>>
>> > >>>
>> > >>
>> > >
>> > >______________________________________________
>> > >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > >https://stat.ethz.ch/mailman/listinfo/r-help
>> > >PLEASE do read the posting guide
>> > >http://www.R-project.org/posting-guide.html
>> > >and provide commented, minimal, self-contained, reproducible code.
>> >
>> > --
>> > Sent from my phone. Please excuse my brevity.
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

	[[alternative HTML version deleted]]




More information about the R-help mailing list