[R] svyglm error message

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Wed Feb 12 03:37:03 CET 2014


This is probably a case of incorrect import format conversion... You may be looking at the visual representation of the data within R but not the structure of the data within R. Use the str function to find out more about your data as R understands it... if you actually have empty strings mixed in with data then your numeric columns are probably factors now rather than the numbers you think they are. (Read about factors in the Introduction to R document that comes with the software.) It is possible to set parameters for most input functions to prevent conversion to factor and to treat certain strings as NA instead of as character data. But you don't give the import code in your question, nor a dput of the first few rows of data in memory, so hard to say how you should correct it.

Your inclusion of code snippets is a good start toward communicating what you are dealing with, but referring to objects that we don't have access to means that information is hidden from us... not "reproducible". [1] This leads to much guessing that is rather unsatisfying for everyone.

Your use of HTML formatted email will lead to corruption of your sample code on this mailing list... you should (re)read the Posting Guide mentioned in the footer of this email.

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On February 11, 2014 2:19:48 PM PST, Claire Wladis <cwladis at gmail.com> wrote:
>Yes, when I say that the cells are blank in the data frames I do mean
>that
>the contents of the cells are blank characters "".
>I have put in a lot of time trying to understand R, but I have no
>formal
>programming background, so I do not necessarily always know the correct
>terminology for something, and this can be hard to look up in reverse
>(i.e.
>if someone uses a term I don't know, I can look it up, but I find it
>hard
>to know how to figure out what something is called).  Thank you for
>helping
>me to understand how to describe this particular concept using the
>correct
>terminology.
>
>
>On Tue, Feb 11, 2014 at 5:11 PM, Bert Gunter <gunter.berton at gene.com>
>wrote:
>
>> Disclaimer:
>>
>> I have not followed this thread and claim no statistical expertise. I
>> just wanted to point out a couple of misconceptions that may be
>> relevant. Inline below.
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>> (650) 467-7374
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>> H. Gilbert Welch
>>
>>
>>
>>
>> On Tue, Feb 11, 2014 at 1:56 PM, Claire Wladis <cwladis at gmail.com>
>wrote:
>> > Thanks for your reply, Thomas.
>> >
>> > Yes, this is NCES data.
>> >
>> > There are no negative or missing weights.
>> >
>> > I am not a programmer and so I'm afraid I don't understand what you
>mean
>> by
>> > not being able to have blank cells in a data.frame object
>>
>> (In my opinion) This claim does not absolve you of the responsibility
>> of learning how to properly use R. If you do not wish to put in the
>> requisite effort, then you should not use R. Find something else.
>>
>>  -  What I mean
>> > specifically is that in the csv file which I imported into R to
>create
>> the
>> > dataset (using read.csv) there were blank cells for any missing
>data.
>>  This
>> > has never given me problems with R in the past using glm or related
>> > functions.
>> >
>> > Traceback gives the following:
>> > 3: glm.fit(XX, YY, weights = wi/sum(wi), start = beta0, offset =
>offs,
>> >        family = fam, control = contrl, intercept = incpt)
>> > 2: svyglm.svyrep.design(model, design = surveydatastructure, family
>=
>> > quasibinomial())
>> > 1: svyglm(model, design = surveydatastructure, family =
>quasibinomial())
>> >
>> > As for the model: I need to run this code on a number of different
>> models.
>> >  By playing around with this a lot, I have found that I get the
>error if
>> I
>> > include one particular variable (HSCRDANY) in the model.  I have
>checked
>> > all the values for that variable, and there are only three: "yes",
>"no"
>> and
>> > empty cells for missing data (or however one correctly phrases that
>for a
>> > dataframe in R).
>>
>> There is no such thing as "empty cells" -- R is **not** Excel (thank
>> heaven!). **Blank** values are **not** missing in character vectors:
>> they are blank characters, "" (if that is, in fact, what your data
>> input did -- I'm never sure with .csv files). "An Introduction to R"
>> or, if you prefer,  various good R web tutorials explain this. If you
>> do not care to put in the effort to learn about it, as I said above,
>> you probably shouldn't be using R.
>>
>>
>>
>>  Another variable, HSGPA, which has empty cells for all
>> > the same individuals, and which is also a categorical variable,
>does not
>> > have this problem.
>> >
>> > So, for example, this model works fine:
>> >
>>
>DISTEDUC~1+RACE+GENDER+RISKINDX+GPA+REMEVER+HSGPA+PELLAMT+FEDBEND+CAGI+PAREDUC+PRIMLANG+CITIZEN2
>> >
>> >
>> > But this model returns the error message listed above:
>> > DISTEDUC~1+RACE+GENDER+RISKINDX+GPA+REMEVER+HSGPA*+HSCRDANY*
>> > +PELLAMT+FEDBEND+CAGI+PAREDUC+PRIMLANG+CITIZEN2
>> >
>> > I don't understand what it is about the specific variable HSCRDANY
>which
>> > would prompt this error message?  I'm not sure what else to look
>for to
>> try
>> > to figure out what the issue with this particular variable may be?
>> >
>> > Thanks again for your time!
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Feb 11, 2014 at 1:05 PM, Thomas Lumley <tlumley at uw.edu>
>wrote:
>> >
>> >> This is some sort of NCES data, right?
>> >>
>> >> I can't see any way to get that particular error (which happens
>inside
>> >> glm.fit()) for a logistic model.
>> >>   Are there any negative or missing weights?
>> >>   What do you mean 'represented by blank cells' -- you can't have
>blank
>> >> cells in a data.frame object?
>> >>   What does traceback() give after the error?
>> >>   What is the model?
>> >>
>> >>    -thomas
>> >>
>> >>
>> >>
>> >> On Mon, Feb 10, 2014 at 4:10 PM, Claire Wladis <cwladis at gmail.com>
>> wrote:
>> >>
>> >>> Hello,
>> >>> I am using the survey package for the first time to analyze a
>dataset
>> that
>> >>> has both weights and 200 BRR replication weights.  When I try to
>run
>> >>> svyglm
>> >>> on the output from svrepdesign, I get an error message that I do
>not
>> know
>> >>> how to interpret, and an extended period of time searching for
>this
>> error
>> >>> on the web hasn't returned any results that seem relevant to my
>> situation.
>> >>>  I have no idea how to proceed with my analysis at this point, so
>I am
>> >>> hoping that someone with more experience with this package and
>with R
>> in
>> >>> general would be willing to help me figure out what the problem
>is.
>> >>>
>> >>> Here is my code:
>> >>> surveydatastructure <- svrepdesign(repweights=dataset[, 29:228] ,
>> >>> data=dataset, weights=dataset$WTA000)
>> >>>
>> >>> modeloutput <- svyglm(model, design=surveydatastructure,
>> >>> family=quasibinomial() )
>> >>>
>> >>> The model is defined in an earlier line of code, but for the sake
>of
>> >>> readability here, I have not included it.  The dataset has a
>binary
>> >>> dependent variable and a combination of categorical and
>continuous
>> >>> variables as dependent variables.  There is missing data in the
>> dataset,
>> >>> represented by blank "cells" in the data frame.  The data itself
>is
>> >>> restricted but I can describe any part of it as necessary.
>> >>>
>> >>>
>> >>> Here is the error message that R returns when I enter the svyglm
>> function
>> >>> line of code:
>> >>>
>> >>> Error in if (!(validmu(mu) && valideta(eta))) stop("cannot find
>valid
>> >>> starting values: please specify some",  :  missing value where
>> TRUE/FALSE
>> >>> needed
>> >>>
>> >>> Thanks for reading my post, and thanks in advance for any help!
>> >>> Sincerely,
>> >>> Claire
>> >>>
>> >>>         [[alternative HTML version deleted]]
>> >>>
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> and provide commented, minimal, self-contained, reproducible
>code.
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Thomas Lumley
>> >> Professor of Biostatistics
>> >> University of Auckland
>> >>
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list