[R] Problem with data distribution
John Fox
j|ox @end|ng |rom mcm@@ter@c@
Thu Feb 17 20:27:25 CET 2022
Dear Nega gupta,
In the last point, I meant to say, "Finally, it's better to post to the
list in plain-text email, rather than html (as the posting guide
suggests)." (I accidentally inserted a "not" in this sentence.)
Sorry,
John
On 2022-02-17 2:21 p.m., John Fox wrote:
> Dear Nega gupta,
>
> On 2022-02-17 1:54 p.m., Neha gupta wrote:
>> Hello everyone
>>
>> I have a dataset with output variable "bug" having the following
>> values (at
>> the bottom of this email). My advisor asked me to provide data
>> distribution
>> of bugs with 0 values and bugs with more than 0 values.
>>
>> data = readARFF("synapse.arff")
>> data2 = readARFF("synapse.arff")
>> data$bug
>> library(tidyverse)
>> data %>%
>> filter(bug == 0)
>> data2 %>%
>> filter(bug >= 1)
>> boxplot(data2$bug, data$bug, range=0)
>>
>> But both the graphs are exactly the same, how is it possible? Where I am
>> doing wrong?
>
> As it turns out, you're doing several things wrong.
>
> First, you're not using pipes and filter() correctly. That is, you don't
> do anything with the filtered versions of the data sets. You're
> apparently under the incorrect impression that filtering modifies the
> original data set.
>
> Second, you're greatly complicating a simple problem. You don't need to
> read the data twice and keep two versions of the data set. As well,
> processing the data with pipes and filter() is entirely unnecessary. The
> following code works:
>
> with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0))
>
> Third, and most fundamentally, the parallel boxplots you're apparently
> trying to construct don't really make sense. The first "boxplot" is just
> a horizontal line at 0 and so conveys no information. Why not just plot
> the nonzero values if that's what you're interested in?
>
> Fourth, you didn't share your data in a convenient form. I was able to
> reconstruct them via
>
> bug <- scan()
> 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
> 0 4 1 0
> 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
> 0 0 0 0
> 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
> 7 0 0 1
> 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
> 0 1 0 0
> 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
> 0 0 0 1
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
>
> data <- data.frame(bug)
>
> Finally, it's better not to post to the list in plain-text email, rather
> than html (as the posting guide suggests).
>
> I hope this helps,
> John
>
>>
>>
>> data$bug
>> [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0
>> 0 0 0
>> 0 4 1 0
>> [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1
>> 1 0 0
>> 0 0 0 0
>> [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0
>> 0 0 0
>> 7 0 0 1
>> [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0
>> 0 0 0
>> 0 1 0 0
>> [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4
>> 1 1 0
>> 0 0 0 1
>> [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/
More information about the R-help
mailing list