[R] Still can't find missing data - How do I get NA in xtabs with factors?
Uwe Ligges
ligges at statistik.tu-dortmund.de
Sat May 30 15:41:12 CEST 2009
Farley, Robert wrote:
> Let's see if I understand this. Do I iterate through
> x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
> for each of the few hundred variables (x) in my data frame?
Yes, for all being factors.
Best,
Uwe Ligges
>
> I tried to do this all at once and failed:
>> ToyData
> Data1 Data2 Data3 Weight
> 101 Sam Red Banana 1.1
> 102 Sam Green Banana 2.1
> 103 Sam Blue Orange 2.1
> 104 Fred Red Orange 2.1
> 105 Fred Green Guava 2.1
> 106 Fred Blue Guava 2.1
> 107 <NA> Red Pear 50.1
> 108 <NA> Green Pear 50.1
> 109 <NA> Blue <NA> 1000.2
>> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA), exclude=NULL, na.action=na.pass))
> Error in levels(c(levels(ToyData), NA), exclude = NULL, na.action = na.pass) :
> unused argument(s) (exclude = NULL, na.action = function (object, ...)
>> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA)))
>> ToyData
> Data1 Data2 Data3 Weight
> <NA> <NA> <NA> <NA>
> Levels:
> But it didn't work. Don't I need to do this separately for each variable?
>
>
>
> Is there a way to get read.spss to insert "NA" levels for each variable when I create the data frame? Is this because SPSS (and STATA) allow "NA" as an "undeclared level" and R does not?
>
>
> Will this be a problem with read.dta as well?
>
>
>
>
> Robert Farley
> Metro
> www.Metro.net
>
>
> -----Original Message-----
> From: William Dunlap [mailto:wdunlap at tibco.com]
> Sent: Thursday, May 28, 2009 20:39
> To: Farley, Robert
> Subject: RE: [R] Still can't find missing data
>
> In R factors don't save space over character vectors - only
> one copy of any given string is kept in memory in either case.
> Factors do let you order the levels in the way you want and
> that is often important in presentations.
>
> You can add NA to the list of levels of a factor by doing
> x <- factor(x, levels(c(levels(x), NA), exclude=NULL)
> where 'x' represents each factor in your dataset. After
> doing that is.na(x) will be all FALSE and you may not
> want that for other situations.
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
>> Sent: Thursday, May 28, 2009 5:27 PM
>> To: R-help
>> Subject: Re: [R] Still can't find missing data
>>
>> That seems to work for the toy data. How do I implement this
>> change with my real data, which are read from very large
>> Stata and SPSS files and keep the factor definitions? Won't
>> I be losing information (and creating a larger dataset) by
>> not using the factor levels?
>>
>>
>> How do I recover the factor values? I read my datafile
>> (read.spss using use.value.labels = FALSE,) and got this:
>>
>> connector
>> Mode_orig_only 1 9
>> 1 17.814338 0.000000
>> 3 49.128982 0.000000
>> 4 525.978899 0.000000
>> 5 913.295370 0.000000
>> 6 114.302764 0.000000
>> 7 298.151438 0.000000
>> 8 93.088049 0.000000
>> 9 233.794168 0.000000
>> 10 20.764539 0.000000
>> 11 424.120506 0.000000
>> 12 8.054528 0.000000
>> 13 6.010790 0.000000
>> 14 1832.748525 0.000000
>> 15 10191.284139 0.000000
>> 16 2099.771923 0.000000
>> 17 1630.148576 0.000000
>> <NA> 0.000000 9491.013249
>>
>> which does have the "NA" row, but not the factor labels. If
>> I read the file with use.value.labels=TRUE I can see what I'm
>> summarizing, but not the NAs. Can't I have both?
>>
>> The top summary will also omit all 0 value factors (of
>> course) in the variable summarized.
>>
>>
>> The same summary using factors:
>> connector
>>
>> Mode_orig_only
>> OD Passenger Connector
>>
>> Walked/Biked
>> 17.814338 0.000000
>>
>> I flew in from another a place/connected
>> 0.000000 0.000000
>>
>> Amtrak
>> 49.128982 0.000000
>>
>> Bus - Chartered bus or van
>> 525.978899 0.000000
>>
>> Bus - Hotel Courtesy van
>> 913.295370 0.000000
>>
>> Bus - MTA (Metro) or other public transit bus
>> 114.302764 0.000000
>>
>> Bus - Scheduled airport bus or van (e.g. Airport bus or
>> Disn 298.151438 0.000000
>>
>> Bus - Union Station Flyaway
>> 93.088049 0.000000
>>
>> Bus - Van Nuys Flyaway
>> 233.794168 0.000000
>>
>> Green line/light rail
>> 20.764539 0.000000
>>
>> Limousine/town car
>> 424.120506 0.000000
>>
>> Metrolink
>> 8.054528 0.000000
>>
>> Motorcycle
>> 6.010790 0.000000
>>
>> On-call shuttle/van (e.g. Super Shuttle, Prime Time)
>> 1832.748525 0.000000
>>
>> Car/truck/van - Private
>> 10191.284139 0.000000
>>
>> Car/truck/van - Rental
>> 2099.771923 0.000000
>>
>> Taxi
>> 1630.148576 0.000000
>>
>> ..Refused
>> 0.000000 0.000000
>>
>>
>>
>>
>>
>>
>>
>> Robert Farley
>> Metro
>> www.Metro.net
>>
>>
>> -----Original Message-----
>> From: William Dunlap [mailto:wdunlap at tibco.com]
>> Sent: Thursday, May 28, 2009 16:26
>> To: Farley, Robert
>> Subject: RE: [R] Still can't find missing data
>>
>> Try reading it in with read.table's argument stringsAsFactors=FALSE.
>>
>> I think the underlying problem is that exclude= is used only if
>> the classifying variables are not already factors. I haven't studied
>> the help file well enough to see if that is what is is documented
>> to do, but it seems misleading.
>>
>> Bill Dunlap
>> TIBCO Software Inc - Spotfire Division
>> wdunlap tibco.com
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org
>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert
>>> Sent: Thursday, May 28, 2009 4:10 PM
>>> To: R-help
>>> Subject: Re: [R] Still can't find missing data
>>>
>>> In this toy data, each of the tables should sum to 1111
>>> None of the tables shows NA columns or rows.
>>>
>>>
>>>> ################################
>>>> ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE,
>>> sep=",", na.strings="NA", dec=".", row.names="ID_Num")
>>>> ToyData
>>> Data1 Data2 Data3 Weight
>>> 101 Sam Red Banana 1
>>> 102 Sam Green Banana 2
>>> 103 Sam Blue Orange 2
>>> 104 Fred Red Orange 2
>>> 105 Fred Green Guava 2
>>> 106 Fred Blue Guava 2
>>> 107 <NA> Red Pear 50
>>> 108 <NA> Green Pear 50
>>> 109 <NA> Blue <NA> 1000
>>>> xtabs(Weight ~ Data1 + Data2, exclude=NULL,
>>> na.action=na.pass, ToyData)
>>> Data2
>>> Data1 Blue Green Red
>>> Fred 2 2 2
>>> Sam 2 2 1
>>>> xtabs(Weight ~ Data1 + Data2, exclude=NULL,
>>> na.action=na.pass,drop.unused.levels = FALSE, ToyData)
>>> Data2
>>> Data1 Blue Green Red
>>> Fred 2 2 2
>>> Sam 2 2 1
>>>> xtabs(Weight ~ Data1 + Data3, exclude=NULL,
>>> na.action=na.pass,drop.unused.levels = FALSE, ToyData)
>>> Data3
>>> Data1 Banana Guava Orange Pear
>>> Fred 0 4 2 0
>>> Sam 3 0 2 0
>>>
>>>
>>>
>>>
>>> Robert Farley
>>> Metro
>>> www.Metro.net
>>>
>>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org
>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne
>>> Sent: Thursday, May 28, 2009 05:46
>>> To: r-help at r-project.org
>>> Subject: Re: [R] Still can't find missing data
>>>
>>>
>>>
>>>
>>> Farley, Robert wrote:
>>>> I can't get the syntax that will allow me to show NA values
>>> (rows) in the
>>>> xtabs.
>>>>
>>>> lengthy non-reproducible example removed
>>>>
>>> If you want a reproducible answer, prepare a reproducible
>>> result. And check
>>> that the
>>> syntax is
>>>
>>> na.action=na.pass
>>>
>>> Dieter
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Still-can%27t-find-missing-data-tp237306
>>> 27p23761006.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list