[R] Still can't find missing data - How do I get NA in xtabs with factors?

Uwe Ligges ligges at statistik.tu-dortmund.de
Sat May 30 15:41:12 CEST 2009



Farley, Robert wrote:
> Let's see if I understand this.  Do I iterate through
>     x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
> for each of the few hundred variables (x) in my data frame?


Yes, for all being factors.

Best,
Uwe Ligges


> 
> I tried to do this all at once and failed:
>> ToyData
>     Data1 Data2  Data3 Weight
> 101   Sam   Red Banana    1.1
> 102   Sam Green Banana    2.1
> 103   Sam  Blue Orange    2.1
> 104  Fred   Red Orange    2.1
> 105  Fred Green  Guava    2.1
> 106  Fred  Blue  Guava    2.1
> 107  <NA>   Red   Pear   50.1
> 108  <NA> Green   Pear   50.1
> 109  <NA>  Blue   <NA> 1000.2
>> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA), exclude=NULL, na.action=na.pass))
> Error in levels(c(levels(ToyData), NA), exclude = NULL, na.action = na.pass) :
>   unused argument(s) (exclude = NULL, na.action = function (object, ...)
>> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA)))
>> ToyData
>  Data1  Data2  Data3 Weight
>   <NA>   <NA>   <NA>   <NA>
> Levels:
> But it didn't work.  Don't I need to do this separately for each variable?
> 
> 
> 
> Is there a way to get read.spss to insert "NA" levels for each variable when I create the data frame?  Is this because SPSS (and STATA) allow "NA" as an "undeclared level" and R does not?
> 
> 
> Will this be a problem with read.dta as well?
> 
> 
> 
> 
> Robert Farley
> Metro
> www.Metro.net
> 
> 
> -----Original Message-----
> From: William Dunlap [mailto:wdunlap at tibco.com]
> Sent: Thursday, May 28, 2009 20:39
> To: Farley, Robert
> Subject: RE: [R] Still can't find missing data
> 
> In R factors don't save space over character vectors - only
> one copy of any given string is kept in memory in either case.
> Factors do let you order the levels in the way you want and
> that is often important in presentations.
> 
> You can add NA to the list of levels of a factor by doing
>     x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
> where 'x' represents each factor in your dataset.  After
> doing that is.na(x) will be all FALSE and you may not
> want that for other situations.
> 
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
> 
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
>> Sent: Thursday, May 28, 2009 5:27 PM
>> To: R-help
>> Subject: Re: [R] Still can't find missing data
>>
>> That seems to work for the toy data.  How do I implement this
>> change with my real data, which are read from very large
>> Stata and SPSS files and keep the factor definitions?  Won't
>> I be losing information (and creating a larger dataset) by
>> not using the factor levels?
>>
>>
>> How do I recover the factor values?  I read my datafile
>> (read.spss using   use.value.labels = FALSE,) and got this:
>>
>>               connector
>> Mode_orig_only            1            9
>>           1       17.814338     0.000000
>>           3       49.128982     0.000000
>>           4      525.978899     0.000000
>>           5      913.295370     0.000000
>>           6      114.302764     0.000000
>>           7      298.151438     0.000000
>>           8       93.088049     0.000000
>>           9      233.794168     0.000000
>>           10      20.764539     0.000000
>>           11     424.120506     0.000000
>>           12       8.054528     0.000000
>>           13       6.010790     0.000000
>>           14    1832.748525     0.000000
>>           15   10191.284139     0.000000
>>           16    2099.771923     0.000000
>>           17    1630.148576     0.000000
>>           <NA>     0.000000  9491.013249
>>
>> which does have the "NA" row, but not the factor labels.  If
>> I read the file with use.value.labels=TRUE I can see what I'm
>> summarizing, but not the NAs.  Can't I have both?
>>
>> The top summary will also omit all 0 value factors (of
>> course) in the variable summarized.
>>
>>
>> The same summary using factors:
>>                                                              connector
>>
>> Mode_orig_only
>>  OD Passenger    Connector
>>
>>   Walked/Biked
>>     17.814338     0.000000
>>
>>    I flew in from another a place/connected
>>      0.000000     0.000000
>>
>>   Amtrak
>>     49.128982     0.000000
>>
>>   Bus - Chartered bus or van
>>    525.978899     0.000000
>>
>>   Bus - Hotel Courtesy van
>>    913.295370     0.000000
>>
>>   Bus - MTA (Metro) or other public transit bus
>>    114.302764     0.000000
>>
>>   Bus - Scheduled airport bus or van (e.g. Airport bus or
>> Disn   298.151438     0.000000
>>
>>   Bus - Union Station Flyaway
>>     93.088049     0.000000
>>
>>   Bus - Van Nuys Flyaway
>>    233.794168     0.000000
>>
>>   Green line/light rail
>>     20.764539     0.000000
>>
>>   Limousine/town car
>>    424.120506     0.000000
>>
>>   Metrolink
>>      8.054528     0.000000
>>
>>   Motorcycle
>>      6.010790     0.000000
>>
>>   On-call shuttle/van (e.g. Super Shuttle, Prime Time)
>>   1832.748525     0.000000
>>
>>   Car/truck/van - Private
>>  10191.284139     0.000000
>>
>>   Car/truck/van - Rental
>>   2099.771923     0.000000
>>
>>   Taxi
>>   1630.148576     0.000000
>>
>>   ..Refused
>>      0.000000     0.000000
>>
>>
>>
>>
>>
>>
>>
>> Robert Farley
>> Metro
>> www.Metro.net
>>
>>
>> -----Original Message-----
>> From: William Dunlap [mailto:wdunlap at tibco.com]
>> Sent: Thursday, May 28, 2009 16:26
>> To: Farley, Robert
>> Subject: RE: [R] Still can't find missing data
>>
>> Try reading it in with read.table's argument stringsAsFactors=FALSE.
>>
>> I think the underlying problem is that exclude= is used only if
>> the classifying variables are not already factors.  I haven't studied
>> the help file well enough to see if that is what is is documented
>> to do, but it seems misleading.
>>
>> Bill Dunlap
>> TIBCO Software Inc - Spotfire Division
>> wdunlap tibco.com
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org
>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
>>> Sent: Thursday, May 28, 2009 4:10 PM
>>> To: R-help
>>> Subject: Re: [R] Still can't find missing data
>>>
>>> In this toy data, each of the tables should sum to 1111
>>> None of the tables shows NA columns or rows.
>>>
>>>
>>>> ################################
>>>> ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE,
>>> sep=",", na.strings="NA", dec=".", row.names="ID_Num")
>>>> ToyData
>>>     Data1 Data2  Data3 Weight
>>> 101   Sam   Red Banana      1
>>> 102   Sam Green Banana      2
>>> 103   Sam  Blue Orange      2
>>> 104  Fred   Red Orange      2
>>> 105  Fred Green  Guava      2
>>> 106  Fred  Blue  Guava      2
>>> 107  <NA>   Red   Pear     50
>>> 108  <NA> Green   Pear     50
>>> 109  <NA>  Blue   <NA>   1000
>>>> xtabs(Weight ~  Data1 + Data2, exclude=NULL,
>>> na.action=na.pass, ToyData)
>>>       Data2
>>> Data1  Blue Green Red
>>>   Fred    2     2   2
>>>   Sam     2     2   1
>>>> xtabs(Weight ~  Data1 + Data2, exclude=NULL,
>>> na.action=na.pass,drop.unused.levels = FALSE, ToyData)
>>>       Data2
>>> Data1  Blue Green Red
>>>   Fred    2     2   2
>>>   Sam     2     2   1
>>>> xtabs(Weight ~  Data1 + Data3, exclude=NULL,
>>> na.action=na.pass,drop.unused.levels = FALSE, ToyData)
>>>       Data3
>>> Data1  Banana Guava Orange Pear
>>>   Fred      0     4      2    0
>>>   Sam       3     0      2    0
>>>
>>>
>>>
>>>
>>> Robert Farley
>>> Metro
>>> www.Metro.net
>>>
>>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org
>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne
>>> Sent: Thursday, May 28, 2009 05:46
>>> To: r-help at r-project.org
>>> Subject: Re: [R] Still can't find missing data
>>>
>>>
>>>
>>>
>>> Farley, Robert wrote:
>>>> I can't get the syntax that will allow me to show NA values
>>> (rows) in the
>>>> xtabs.
>>>>
>>>> lengthy non-reproducible example removed
>>>>
>>> If you want a reproducible answer, prepare a reproducible
>>> result. And check
>>> that the
>>> syntax is
>>>
>>> na.action=na.pass
>>>
>>> Dieter
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Still-can%27t-find-missing-data-tp237306
>>> 27p23761006.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list