[R] Still can't find missing data

Farley, Robert FarleyR at metro.net
Fri May 29 02:27:09 CEST 2009


That seems to work for the toy data.  How do I implement this change with my real data, which are read from very large Stata and SPSS files and keep the factor definitions?  Won't I be losing information (and creating a larger dataset) by not using the factor levels?


How do I recover the factor values?  I read my datafile (read.spss using   use.value.labels = FALSE,) and got this:

              connector
Mode_orig_only            1            9
          1       17.814338     0.000000
          3       49.128982     0.000000
          4      525.978899     0.000000
          5      913.295370     0.000000
          6      114.302764     0.000000
          7      298.151438     0.000000
          8       93.088049     0.000000
          9      233.794168     0.000000
          10      20.764539     0.000000
          11     424.120506     0.000000
          12       8.054528     0.000000
          13       6.010790     0.000000
          14    1832.748525     0.000000
          15   10191.284139     0.000000
          16    2099.771923     0.000000
          17    1630.148576     0.000000
          <NA>     0.000000  9491.013249

which does have the "NA" row, but not the factor labels.  If I read the file with use.value.labels=TRUE I can see what I'm summarizing, but not the NAs.  Can't I have both?

The top summary will also omit all 0 value factors (of course) in the variable summarized.


The same summary using factors:
                                                             connector

Mode_orig_only                                                 OD Passenger    Connector

  Walked/Biked                                                    17.814338     0.000000

   I flew in from another a place/connected                        0.000000     0.000000

  Amtrak                                                          49.128982     0.000000

  Bus - Chartered bus or van                                     525.978899     0.000000

  Bus - Hotel Courtesy van                                       913.295370     0.000000

  Bus - MTA (Metro) or other public transit bus                  114.302764     0.000000

  Bus - Scheduled airport bus or van (e.g. Airport bus or Disn   298.151438     0.000000

  Bus - Union Station Flyaway                                     93.088049     0.000000

  Bus - Van Nuys Flyaway                                         233.794168     0.000000

  Green line/light rail                                           20.764539     0.000000

  Limousine/town car                                             424.120506     0.000000

  Metrolink                                                        8.054528     0.000000

  Motorcycle                                                       6.010790     0.000000

  On-call shuttle/van (e.g. Super Shuttle, Prime Time)          1832.748525     0.000000

  Car/truck/van - Private                                      10191.284139     0.000000

  Car/truck/van - Rental                                        2099.771923     0.000000

  Taxi                                                          1630.148576     0.000000

  ..Refused                                                        0.000000     0.000000







Robert Farley
Metro
www.Metro.net


-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Thursday, May 28, 2009 16:26
To: Farley, Robert
Subject: RE: [R] Still can't find missing data

Try reading it in with read.table's argument stringsAsFactors=FALSE.

I think the underlying problem is that exclude= is used only if
the classifying variables are not already factors.  I haven't studied
the help file well enough to see if that is what is is documented
to do, but it seems misleading.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
> Sent: Thursday, May 28, 2009 4:10 PM
> To: R-help
> Subject: Re: [R] Still can't find missing data
>
> In this toy data, each of the tables should sum to 1111
> None of the tables shows NA columns or rows.
>
>
> > ################################
> > ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE,
> sep=",", na.strings="NA", dec=".", row.names="ID_Num")
> > ToyData
>     Data1 Data2  Data3 Weight
> 101   Sam   Red Banana      1
> 102   Sam Green Banana      2
> 103   Sam  Blue Orange      2
> 104  Fred   Red Orange      2
> 105  Fred Green  Guava      2
> 106  Fred  Blue  Guava      2
> 107  <NA>   Red   Pear     50
> 108  <NA> Green   Pear     50
> 109  <NA>  Blue   <NA>   1000
> > xtabs(Weight ~  Data1 + Data2, exclude=NULL,
> na.action=na.pass, ToyData)
>       Data2
> Data1  Blue Green Red
>   Fred    2     2   2
>   Sam     2     2   1
> > xtabs(Weight ~  Data1 + Data2, exclude=NULL,
> na.action=na.pass,drop.unused.levels = FALSE, ToyData)
>       Data2
> Data1  Blue Green Red
>   Fred    2     2   2
>   Sam     2     2   1
> > xtabs(Weight ~  Data1 + Data3, exclude=NULL,
> na.action=na.pass,drop.unused.levels = FALSE, ToyData)
>       Data3
> Data1  Banana Guava Orange Pear
>   Fred      0     4      2    0
>   Sam       3     0      2    0
> >
>
>
>
>
>
> Robert Farley
> Metro
> www.Metro.net
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne
> Sent: Thursday, May 28, 2009 05:46
> To: r-help at r-project.org
> Subject: Re: [R] Still can't find missing data
>
>
>
>
> Farley, Robert wrote:
> >
> > I can't get the syntax that will allow me to show NA values
> (rows) in the
> > xtabs.
> >
> > lengthy non-reproducible example removed
> >
>
> If you want a reproducible answer, prepare a reproducible
> result. And check
> that the
> syntax is
>
> na.action=na.pass
>
> Dieter
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/Still-can%27t-find-missing-data-tp237306
> 27p23761006.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list