[R] Still can't find missing data - How do I get NA in xtabs with factors?

Farley, Robert FarleyR at metro.net
Wed Jun 3 02:03:24 CEST 2009


The problem here is Table doesn't seem to have a way to weigh the data.

> ToyData
    Data1 Data2  Data3 Weight
101   Sam   Red Banana    1.1
102   Sam Green Banana    2.1
103   Sam  Blue Orange    2.1
104  Fred   Red Orange    2.1
105  Fred Green  Guava    2.1
106  Fred  Blue  Guava    2.1
107  <NA>   Red   Pear   50.1
108  <NA> Green   Pear   50.1
109  <NA>  Blue   <NA> 1000.2
> with(ToyData,table(Data1, Data3, useNA =  "ifany"))
      Data3
Data1  Banana Guava Orange Pear <NA>
  Fred      0     2      1    0    0
  Sam       2     0      1    0    0
  <NA>      0     0      0    2    1
> xtabs(Weight ~  Data1 + Data3, exclude=NULL, na.action=na.pass, ToyData)
      Data3
Data1  Banana Guava Orange Pear
  Fred    0.0   4.2    2.1  0.0
  Sam     3.2   0.0    2.1  0.0




Robert Farley
Metro
www.Metro.net


-----Original Message-----
From: 3.14david at gmail.com [mailto:3.14david at gmail.com]
Sent: Sunday, May 31, 2009 14:27
To: Farley, Robert
Subject: Re: Still can't find missing data - How do I get NA in xtabs with factors?

you might want to try 'table' - with the exclude option -rather than 'xtabs': with(data,table(a, b, exclude="NULL"))

I *think* that the problem is that xtabs excludes NAs before it makes factors from the values

david freedman



Farley, Robert wrote:
>
> Let's see if I understand this.  Do I iterate through
>     x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
> for each of the few hundred variables (x) in my data frame?
>
>
> I tried to do this all at once and failed:
>> ToyData
>     Data1 Data2  Data3 Weight
> 101   Sam   Red Banana    1.1
> 102   Sam Green Banana    2.1
> 103   Sam  Blue Orange    2.1
> 104  Fred   Red Orange    2.1
> 105  Fred Green  Guava    2.1
> 106  Fred  Blue  Guava    2.1
> 107  <NA>   Red   Pear   50.1
> 108  <NA> Green   Pear   50.1
> 109  <NA>  Blue   <NA> 1000.2
>> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA), exclude=NULL,
>> na.action=na.pass))
> Error in levels(c(levels(ToyData), NA), exclude = NULL, na.action =
> na.pass) :
>   unused argument(s) (exclude = NULL, na.action = function (object, ...)
>> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA)))
>> ToyData
>  Data1  Data2  Data3 Weight
>   <NA>   <NA>   <NA>   <NA>
> Levels:
>>
> But it didn't work.  Don't I need to do this separately for each variable?
>
>
>
> Is there a way to get read.spss to insert "NA" levels for each variable
> when I create the data frame?  Is this because SPSS (and STATA) allow "NA"
> as an "undeclared level" and R does not?
>
>
> Will this be a problem with read.dta as well?
>
>
>
>
> Robert Farley
> Metro
> www.Metro.net
>
>
> -----Original Message-----
> From: William Dunlap [mailto:wdunlap at tibco.com]
> Sent: Thursday, May 28, 2009 20:39
> To: Farley, Robert
> Subject: RE: [R] Still can't find missing data
>
> In R factors don't save space over character vectors - only
> one copy of any given string is kept in memory in either case.
> Factors do let you order the levels in the way you want and
> that is often important in presentations.
>
> You can add NA to the list of levels of a factor by doing
>     x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
> where 'x' represents each factor in your dataset.  After
> doing that is.na(x) will be all FALSE and you may not
> want that for other situations.
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
>> Sent: Thursday, May 28, 2009 5:27 PM
>> To: R-help
>> Subject: Re: [R] Still can't find missing data
>>
>> That seems to work for the toy data.  How do I implement this
>> change with my real data, which are read from very large
>> Stata and SPSS files and keep the factor definitions?  Won't
>> I be losing information (and creating a larger dataset) by
>> not using the factor levels?
>>
>>
>> How do I recover the factor values?  I read my datafile
>> (read.spss using   use.value.labels = FALSE,) and got this:
>>
>>               connector
>> Mode_orig_only            1            9
>>           1       17.814338     0.000000
>>           3       49.128982     0.000000
>>           4      525.978899     0.000000
>>           5      913.295370     0.000000
>>           6      114.302764     0.000000
>>           7      298.151438     0.000000
>>           8       93.088049     0.000000
>>           9      233.794168     0.000000
>>           10      20.764539     0.000000
>>           11     424.120506     0.000000
>>           12       8.054528     0.000000
>>           13       6.010790     0.000000
>>           14    1832.748525     0.000000
>>           15   10191.284139     0.000000
>>           16    2099.771923     0.000000
>>           17    1630.148576     0.000000
>>           <NA>     0.000000  9491.013249
>>
>> which does have the "NA" row, but not the factor labels.  If
>> I read the file with use.value.labels=TRUE I can see what I'm
>> summarizing, but not the NAs.  Can't I have both?
>>
>> The top summary will also omit all 0 value factors (of
>> course) in the variable summarized.
>>
>>
>> The same summary using factors:
>>                                                              connector
>>
>> Mode_orig_only
>>  OD Passenger    Connector
>>
>>   Walked/Biked
>>     17.814338     0.000000
>>
>>    I flew in from another a place/connected
>>      0.000000     0.000000
>>
>>   Amtrak
>>     49.128982     0.000000
>>
>>   Bus - Chartered bus or van
>>    525.978899     0.000000
>>
>>   Bus - Hotel Courtesy van
>>    913.295370     0.000000
>>
>>   Bus - MTA (Metro) or other public transit bus
>>    114.302764     0.000000
>>
>>   Bus - Scheduled airport bus or van (e.g. Airport bus or
>> Disn   298.151438     0.000000
>>
>>   Bus - Union Station Flyaway
>>     93.088049     0.000000
>>
>>   Bus - Van Nuys Flyaway
>>    233.794168     0.000000
>>
>>   Green line/light rail
>>     20.764539     0.000000
>>
>>   Limousine/town car
>>    424.120506     0.000000
>>
>>   Metrolink
>>      8.054528     0.000000
>>
>>   Motorcycle
>>      6.010790     0.000000
>>
>>   On-call shuttle/van (e.g. Super Shuttle, Prime Time)
>>   1832.748525     0.000000
>>
>>   Car/truck/van - Private
>>  10191.284139     0.000000
>>
>>   Car/truck/van - Rental
>>   2099.771923     0.000000
>>
>>   Taxi
>>   1630.148576     0.000000
>>
>>   ..Refused
>>      0.000000     0.000000
>>
>>
>>
>>
>>
>>
>>
>> Robert Farley
>> Metro
>> www.Metro.net
>>
>>
>> -----Original Message-----
>> From: William Dunlap [mailto:wdunlap at tibco.com]
>> Sent: Thursday, May 28, 2009 16:26
>> To: Farley, Robert
>> Subject: RE: [R] Still can't find missing data
>>
>> Try reading it in with read.table's argument stringsAsFactors=FALSE.
>>
>> I think the underlying problem is that exclude= is used only if
>> the classifying variables are not already factors.  I haven't studied
>> the help file well enough to see if that is what is is documented
>> to do, but it seems misleading.
>>
>> Bill Dunlap
>> TIBCO Software Inc - Spotfire Division
>> wdunlap tibco.com
>>
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org
>> > [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
>> > Sent: Thursday, May 28, 2009 4:10 PM
>> > To: R-help
>> > Subject: Re: [R] Still can't find missing data
>> >
>> > In this toy data, each of the tables should sum to 1111
>> > None of the tables shows NA columns or rows.
>> >
>> >
>> > > ################################
>> > > ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE,
>> > sep=",", na.strings="NA", dec=".", row.names="ID_Num")
>> > > ToyData
>> >     Data1 Data2  Data3 Weight
>> > 101   Sam   Red Banana      1
>> > 102   Sam Green Banana      2
>> > 103   Sam  Blue Orange      2
>> > 104  Fred   Red Orange      2
>> > 105  Fred Green  Guava      2
>> > 106  Fred  Blue  Guava      2
>> > 107  <NA>   Red   Pear     50
>> > 108  <NA> Green   Pear     50
>> > 109  <NA>  Blue   <NA>   1000
>> > > xtabs(Weight ~  Data1 + Data2, exclude=NULL,
>> > na.action=na.pass, ToyData)
>> >       Data2
>> > Data1  Blue Green Red
>> >   Fred    2     2   2
>> >   Sam     2     2   1
>> > > xtabs(Weight ~  Data1 + Data2, exclude=NULL,
>> > na.action=na.pass,drop.unused.levels = FALSE, ToyData)
>> >       Data2
>> > Data1  Blue Green Red
>> >   Fred    2     2   2
>> >   Sam     2     2   1
>> > > xtabs(Weight ~  Data1 + Data3, exclude=NULL,
>> > na.action=na.pass,drop.unused.levels = FALSE, ToyData)
>> >       Data3
>> > Data1  Banana Guava Orange Pear
>> >   Fred      0     4      2    0
>> >   Sam       3     0      2    0
>> > >
>> >
>> >
>> >
>> >
>> >
>> > Robert Farley
>> > Metro
>> > www.Metro.net
>> >
>> >
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org
>> > [mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne
>> > Sent: Thursday, May 28, 2009 05:46
>> > To: r-help at r-project.org
>> > Subject: Re: [R] Still can't find missing data
>> >
>> >
>> >
>> >
>> > Farley, Robert wrote:
>> > >
>> > > I can't get the syntax that will allow me to show NA values
>> > (rows) in the
>> > > xtabs.
>> > >
>> > > lengthy non-reproducible example removed
>> > >
>> >
>> > If you want a reproducible answer, prepare a reproducible
>> > result. And check
>> > that the
>> > syntax is
>> >
>> > na.action=na.pass
>> >
>> > Dieter
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://www.nabble.com/Still-can%27t-find-missing-data-tp237306
>> > 27p23761006.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
Quoted from:
http://www.nabble.com/Still-can%27t-find-missing-data-tp23730627p23784989.html




More information about the R-help mailing list