[R] R how to find outliers and zero mean columns?
Norman Pat
normanmath1 at gmail.com
Thu Mar 31 04:30:12 CEST 2016
Hi Jim,
Thanks for your reply. I know these basic stuffs in R.
But I want to know let say you have a data frame X with 300 features.
>From that 300 features I need to pullout the names of each feature
that has zero values for all the observations in that sample.
Here I am looking for a package or a function to do that.
And how do I know whether there are abnormal values for each feature. Let
say
I have 300 features and 100000 observations. It is hard to look everything
in the excel file. Instead of that I am looking for a package that does the
work.
I hope you understood.
Thanks a lot
Cheers
On Thu, Mar 31, 2016 at 1:13 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi Norman,
> To check whether all values of an object (say "x") fulfill a certain
> condition (==0):
>
> all(x==0)
>
> If your object (X) is indeed a data frame, you can only do this by
> column, so if you want to get the results:
>
> X<-data.frame(A=c(0,1:10),B=c(0,2:10,99999),
> C=c(0,-1,3:11),D=rep(0,11))
> all_zeros<-function(x) return(all(x==0))
> which_cols<-unlist(lapply(X,all_zeros))
>
> If your data frame (or a subset) contains all numeric values, you can
> finesse the problem like this:
>
> which_rows<-apply(as.matrix(X),1,all_zeros)
>
> What you get is a list of logical (TRUE/FALSE) values from lapply, so
> it has to be unlisted to get a vector of logical values like you get
> with "apply".
>
> You can then use that vector to index (subset) the original data frame
> by logically inverting it with ! (NOT):
>
> X[,!which_cols]
> X[!which_rows,]
>
> Your "outliers" look suspiciously like missing values from certain
> statistical packages. If you know the values you are looking for, you
> can do something like:
>
> NA99999<-X==99999
>
> and then "remove" them by replacing those values with NA:
>
> X[NA99999]<-NA
>
> Be aware that all these hackles (diminutive of hacks) are pretty
> specific to this example. Also remember that if this is homework, your
> karma has just gone down the cosmic sinkhole.
>
> Jim
>
>
> On Thu, Mar 31, 2016 at 9:56 AM, Norman Pat <normanmath1 at gmail.com> wrote:
> > Hi team
> >
> > I am new to R so please help me to do this task.
> >
> > Please find the attached data sample. But in the original data frame I
> > have 350 features and 400000 observations.
> >
> > I need to carryout these tasks.
> >
> > 1. How to Identify features (names) that have all zeros?
> >
> > 2. How to remove features that have all zeros from the dataset?
> >
> > 3. How to identify features (names) that have outliers such as 99999,-1
> in
> > the data frame.
> >
> > 4. How to remove outliers?
> >
> >
> > Many thanks
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list