[R] Problem with data distribution

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Thu Feb 17 20:28:31 CET 2022


Hello,

In your original post you read the same file "synapse.arff" twice, 
apparently to filter each of them by its own criterion. You don't need 
to do that, read once and filter that one by different criteria.

As for the data as posted, I have read it in with the following code:


x <- "
0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0 0 
4 1 0
0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0 0 
0 0 0
1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0 7 
0 0 1
0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0 0 
1 0 0
0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0 0 
0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
"
bug <- scan(text = x)
data <- data.frame(bug)


This is not the right way to post data, the posting guide asks to post 
the output of


dput(data)
structure(list(bug = c(0, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 4, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 3, 2, 0, 0, 0, 0,
3, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
0, 0, 1, 1, 2, 1, 0, 1, 0, 0, 0, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 0, 1, 0, 0, 5, 0, 0, 0, 0, 0, 0, 7, 0, 0, 1, 0,
1, 1, 0, 2, 0, 3, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0,
0, 1, 0, 3, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 3, 0, 0, 1, 0, 1, 3, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 4, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 1, 0, 0, 0, 0, 0)),
class = "data.frame", row.names = c(NA, -222L))



This can be copied into an R session and the data set recreated with

data <- structure(etc)


Now the boxplots.

(Why would you want to plot a vector of all zeros, btw?)



library(dplyr)

boxplot(filter(data, bug == 0))    # nonsense
boxplot(filter(data, bug > 0), range = 0)

# Another way
data %>%
   filter(bug > 0) %>%
   boxplot(range = 0)


Hope this helps,

Rui Barradas


Às 19:03 de 17/02/2022, Neha gupta escreveu:
> That is all the code I have. How can I provide a  reproducible code ?
> 
> How can I save this result?
> 
> On Thu, Feb 17, 2022 at 8:00 PM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
> 
>> You pipe the filter but do not save the result. A reproducible example
>> might help.
>> Tim
>>
>> -----Original Message-----
>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Neha gupta
>> Sent: Thursday, February 17, 2022 1:55 PM
>> To: r-help mailing list <r-help using r-project.org>
>> Subject: [R] Problem with data distribution
>>
>> [External Email]
>>
>> Hello everyone
>>
>> I have a dataset with output variable "bug" having the following values
>> (at the bottom of this email). My advisor asked me to provide data
>> distribution of bugs with 0 values and bugs with more than 0 values.
>>
>> data = readARFF("synapse.arff")
>> data2 = readARFF("synapse.arff")
>> data$bug
>> library(tidyverse)
>> data %>%
>>    filter(bug == 0)
>> data2 %>%
>>    filter(bug >= 1)
>> boxplot(data2$bug, data$bug, range=0)
>>
>> But both the graphs are exactly the same, how is it possible? Where I am
>> doing wrong?
>>
>>
>> data$bug
>>    [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
>> 0 4 1 0
>>   [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
>> 0 0 0 0
>>   [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
>> 7 0 0 1
>> [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
>> 0 1 0 0
>> [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
>> 0 0 0 1
>> [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=NxfkBJHBnd8naYPQTd9Z8dZ2m-RCwh_lpGvHVQ8MwYQ&e=
>> PLEASE do read the posting guide
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=TZx8pDTF9x1Tu4QZW3x_99uu9RowVjAna39KcjCXSElI1AOk1C_6L2pR8YIVfiod&s=exznSElUW1tc6ajt0C8uw5cR8ZqwHRD6tUPAarFYdYo&e=
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list