[R] ecdf

gj gawesh at gmail.com
Mon Oct 17 11:48:33 CEST 2011


Hi Sarah,
Thanks for your very lucid explanations.
Thanks also to David and Dennis.

I got it completely. I now have some nice ggplot of a couple ecdf in
my paper :-)
Now on to do some matrix plots of correlation matrices and some lm().
I'm like a child in a candy shop. :-)

I'm learning something about R every day.

Regards
Gawesh

On Mon, Oct 17, 2011 at 2:11 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> Hi,
>
> On Sun, Oct 16, 2011 at 8:48 PM, gj <gawesh at gmail.com> wrote:
>> David is right. I am looking for the ecfd for fs$numstudents. The
>> other column is just an id.
>>
>> I guess I don't know how to read the R documentation when it comes to functions.
>>
>> looking at the documentation, i now notice that it says "Compute an
>> empirical cumulative distribution function and not a vector.
>>
>> But still I would had assumed that in ecdf(x) ... the x is the argument.
>
> ecdf() is the function you're calling.
> x is your vector, for which you want the ECDF.
>
> num.ecdf <- ecdf(fs$numstudents)
>
> There. That's the ECDF.
>
> But the ECDF is a *function* - that's what the F stands for, after all.
>
> If you're looking for the percentiles for your data, you might try:
>
> num.ecdf(fs$numstudents)
>
> You might also try working the examples given in ?ecdf yourself, so
> that you can see exactly what's going on before you try it with your
> own data.
>
>
>> So ecdf(fs$numstudents)(unique(fs$numstudents))
>>     ===============  ==================
>>          function                       arguments
>>
>> Yes? But I can't read that from the documentation? I suspect it has
>> something to those dots .... in the arguments which I don't
>> understand.
>
> Yes.
>
> That's the condensed version of what I just proposed, done in
> one step, instead of two. The two-step version is definitely in
> the help. It doesn't have anything to do with the ..., which simply allow
> for other arguments to be passed.
>
>> Why it says usage ecdf(x) when it's clearly not the case?
>>
>> I don't get it.
>
> Clearly that is the case. ecdf(x) returns the empirical cumulative
> distribution *function* of the vector of data x.
>
> I'm not entirely sure what you think you should be getting. Perhaps
> if you explained your expectations, the list would be able to help
> you achieve them.
>
> Sarah
>
>> Gawesh
>>
>>
>> On Sun, Oct 16, 2011 at 11:02 PM, David Winsemius
>> <dwinsemius at comcast.net> wrote:
>>>
>>> On Oct 16, 2011, at 3:53 PM, Dennis Murphy wrote:
>>>
>>>> Hi:
>>>>
>>>> I don't understand what you're attempting to do. Wouldn't courseid be
>>>> a categorical variable with a numeric label? If that is so, why are
>>>> you trying to compute an EDF? An EDF computes cumulative relative
>>>> frequency of a random variable, which by definition is numeric. If we
>>>> were talking about EDFs for a distribution of student course grades on
>>>> a numeric point system by course, that would make some sense, but I
>>>> don't see how the course IDs themselves qualify as being on an
>>>> interval scale of measurement. Could you clarify your intent?
>>>
>>> Huh? gawesh asked for ecdf on numstrudents (not courseid)  ... pretty
>>> clearly a numeric value for which an ECDF should make sense.
>>>
>>> --
>>> David.
>>>
>>> --
>>>>
>>>> Dennis
>>>>
>>>> On Sun, Oct 16, 2011 at 8:31 AM, gj <gawesh at gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>> Newbie here. I read the R for Beginners but i still don't get this.
>>>>>
>>>>> I have the following data (this is just an example) in a CSV file:
>>>>>
>>>>>   courseid numstudents
>>>>>       101         209
>>>>>       141          13
>>>>>       246         140
>>>>>       263           8
>>>>>       321          10
>>>>>       361          10
>>>>>       364          28
>>>>>       365          25
>>>>>       366          23
>>>>>       367          34
>>>>>
>>>>> I load my data using:
>>>>>
>>>>> fs<-read.csv(file="C:\\num_students_inallmodules.csv",header=T, sep=',')
>>>>>
>>>>> I want to get the ecdf. So, I looked at the ?ecdf which says
>>>>> usage:ecdf(x)
>>>>>
>>>>> So I expected ecdf(fs$numstudents) to work
>>>>>
>>>>> Instead it just returned:
>>>>> Call: ecdf(fs$numstudents)
>>>>>  x[1:210] =      1,      2,      3,  ...,   3717,   4538
>>>>>
>>>>> After Googling, got this to work:
>>>>> ecdf(fs$numstudents)(unique(fs$numstudents))
>>>>>
>>>>> But I don't understand why if the ?ecdf says usage is ecdf(x) ... I
>>>>> need to use ecdf(fs$numstudents)(unique(fs$numstudents)) to get this
>>>>> to work?
>>>>>
>>>>> Can somebody explain this to me?
>>>>>
>>>>> Regards
>>>>> Gawesh
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Sarah Goslee
> http://www.stringpage.com
> http://www.sarahgoslee.com
> http://www.functionaldiversity.org
>



More information about the R-help mailing list