[R] Importing fixed-width data
Dennis Murphy
djmuser at gmail.com
Wed May 25 21:03:11 CEST 2011
I get a data frame on my end:
lines <- "2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
2011-05-13 00:00:05 EONBHS229 mia13001621NON"
df = read.fwf(textConnection(lines), widths=c(19,-4,7,3,8,2,1,3,1),
col.names=c("DateTime","Flight","Dest","ArrTime","MsgType","Conf","Runway","Source"),
colClasses=c("POSIXct",NA,"factor","factor","character","factor","factor","factor"))
> df
DateTime Flight Dest ArrTime MsgType Conf Runway Source
1 2011-05-13 00:00:00 AAL330 dfa 13002516 PS C NON A
2 2011-05-13 00:00:01 AAL223 laa 13044510 AS . NON M
3 2011-05-13 00:00:05 BHS229 mia 13001621 NO N <NA> <NA>
> str(df)
'data.frame': 3 obs. of 8 variables:
$ DateTime: POSIXct, format: "2011-05-13 00:00:00" "2011-05-13 00:00:01" ...
$ Flight : Factor w/ 3 levels "AAL223 ","AAL330 ",..: 2 1 3
$ Dest : Factor w/ 3 levels "dfa","laa","mia": 1 2 3
$ ArrTime : Factor w/ 3 levels "13001621","13002516",..: 2 3 1
$ MsgType : chr "PS" "AS" "NO"
$ Conf : Factor w/ 3 levels ".","C","N": 2 1 3
$ Runway : Factor w/ 1 level "NON": 1 1 NA
$ Source : Factor w/ 2 levels "A","M": 1 2 NA
> sessionInfo()
R version 2.13.0 Patched (2011-04-19 r55523)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets grid methods
[8] base
other attached packages:
[1] gplots_2.8.0 caTools_1.12 bitops_1.0-4.1 gdata_2.8.2
[5] gtools_2.6.2 sos_1.3-0 brew_1.0-6 lattice_0.19-26
[9] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2
loaded via a namespace (and not attached):
[1] tools_2.13.0
Dennis
On Wed, May 25, 2011 at 8:42 AM, James Rome <jamesrome at gmail.com> wrote:
> I have a data set where the lines look like:
> 2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
> 2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
> Some lines are missing the field before and after the NON:
> 2011-05-13 00:00:05 EONBHS229 mia13001621NON
>
> I read them into R using
> df = read.fwf(file, widths=c(19,-4,7,3,8,2,1,3,1),
>
> col.names=c("DateTime","Flight","Dest","ArrTime","MsgType","Conf","Runway","Source"),
>
> colClasses=c("POSIXct",NA,"factor","factor","character","factor","factor","factor"))
>
> The documentation for read.fwf says that the data are read into a
> dataframe. Yet, I get a list, and the conversions I specified do not
> seem to have been obeyed:
>> df[1:20,]
> DateTime Flight Dest ArrTime MsgType Conf
> Runway Source
> 1 2011-05-13 00:00:00 AAL330 dfa 13002516 PS C NON A
> 2 2011-05-13 00:00:01 AAL223 laa 13044510 AS . NON M
> . . .
>> sapply(df, mode)
> DateTime Flight Dest ArrTime MsgType Conf
> "numeric" "numeric" "numeric" "numeric" "character" "numeric"
> Runway Source
> "numeric" "numeric"
>> dfn = df[!is.na(df$Source),]
>> mode(df)
> [1] "list"
>
> What am I doing wrong?
>
> Thanks,
> Jim Rome
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list