[R] Still can't find missing data
Farley, Robert
FarleyR at metro.net
Fri May 29 02:27:09 CEST 2009
That seems to work for the toy data. How do I implement this change with my real data, which are read from very large Stata and SPSS files and keep the factor definitions? Won't I be losing information (and creating a larger dataset) by not using the factor levels?
How do I recover the factor values? I read my datafile (read.spss using use.value.labels = FALSE,) and got this:
connector
Mode_orig_only 1 9
1 17.814338 0.000000
3 49.128982 0.000000
4 525.978899 0.000000
5 913.295370 0.000000
6 114.302764 0.000000
7 298.151438 0.000000
8 93.088049 0.000000
9 233.794168 0.000000
10 20.764539 0.000000
11 424.120506 0.000000
12 8.054528 0.000000
13 6.010790 0.000000
14 1832.748525 0.000000
15 10191.284139 0.000000
16 2099.771923 0.000000
17 1630.148576 0.000000
<NA> 0.000000 9491.013249
which does have the "NA" row, but not the factor labels. If I read the file with use.value.labels=TRUE I can see what I'm summarizing, but not the NAs. Can't I have both?
The top summary will also omit all 0 value factors (of course) in the variable summarized.
The same summary using factors:
connector
Mode_orig_only OD Passenger Connector
Walked/Biked 17.814338 0.000000
I flew in from another a place/connected 0.000000 0.000000
Amtrak 49.128982 0.000000
Bus - Chartered bus or van 525.978899 0.000000
Bus - Hotel Courtesy van 913.295370 0.000000
Bus - MTA (Metro) or other public transit bus 114.302764 0.000000
Bus - Scheduled airport bus or van (e.g. Airport bus or Disn 298.151438 0.000000
Bus - Union Station Flyaway 93.088049 0.000000
Bus - Van Nuys Flyaway 233.794168 0.000000
Green line/light rail 20.764539 0.000000
Limousine/town car 424.120506 0.000000
Metrolink 8.054528 0.000000
Motorcycle 6.010790 0.000000
On-call shuttle/van (e.g. Super Shuttle, Prime Time) 1832.748525 0.000000
Car/truck/van - Private 10191.284139 0.000000
Car/truck/van - Rental 2099.771923 0.000000
Taxi 1630.148576 0.000000
..Refused 0.000000 0.000000
Robert Farley
Metro
www.Metro.net
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Thursday, May 28, 2009 16:26
To: Farley, Robert
Subject: RE: [R] Still can't find missing data
Try reading it in with read.table's argument stringsAsFactors=FALSE.
I think the underlying problem is that exclude= is used only if
the classifying variables are not already factors. I haven't studied
the help file well enough to see if that is what is is documented
to do, but it seems misleading.
Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
> Sent: Thursday, May 28, 2009 4:10 PM
> To: R-help
> Subject: Re: [R] Still can't find missing data
>
> In this toy data, each of the tables should sum to 1111
> None of the tables shows NA columns or rows.
>
>
> > ################################
> > ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE,
> sep=",", na.strings="NA", dec=".", row.names="ID_Num")
> > ToyData
> Data1 Data2 Data3 Weight
> 101 Sam Red Banana 1
> 102 Sam Green Banana 2
> 103 Sam Blue Orange 2
> 104 Fred Red Orange 2
> 105 Fred Green Guava 2
> 106 Fred Blue Guava 2
> 107 <NA> Red Pear 50
> 108 <NA> Green Pear 50
> 109 <NA> Blue <NA> 1000
> > xtabs(Weight ~ Data1 + Data2, exclude=NULL,
> na.action=na.pass, ToyData)
> Data2
> Data1 Blue Green Red
> Fred 2 2 2
> Sam 2 2 1
> > xtabs(Weight ~ Data1 + Data2, exclude=NULL,
> na.action=na.pass,drop.unused.levels = FALSE, ToyData)
> Data2
> Data1 Blue Green Red
> Fred 2 2 2
> Sam 2 2 1
> > xtabs(Weight ~ Data1 + Data3, exclude=NULL,
> na.action=na.pass,drop.unused.levels = FALSE, ToyData)
> Data3
> Data1 Banana Guava Orange Pear
> Fred 0 4 2 0
> Sam 3 0 2 0
> >
>
>
>
>
>
> Robert Farley
> Metro
> www.Metro.net
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne
> Sent: Thursday, May 28, 2009 05:46
> To: r-help at r-project.org
> Subject: Re: [R] Still can't find missing data
>
>
>
>
> Farley, Robert wrote:
> >
> > I can't get the syntax that will allow me to show NA values
> (rows) in the
> > xtabs.
> >
> > lengthy non-reproducible example removed
> >
>
> If you want a reproducible answer, prepare a reproducible
> result. And check
> that the
> syntax is
>
> na.action=na.pass
>
> Dieter
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/Still-can%27t-find-missing-data-tp237306
> 27p23761006.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list