[R] R how to find outliers and zero mean columns?

Jim Lemon drjimlemon at gmail.com
Thu Mar 31 04:13:55 CEST 2016


Hi Norman,
To check whether all values of an object (say "x") fulfill a certain
condition (==0):

all(x==0)

If your object (X) is indeed a data frame, you can only do this by
column, so if you want to get the results:

X<-data.frame(A=c(0,1:10),B=c(0,2:10,99999),
 C=c(0,-1,3:11),D=rep(0,11))
all_zeros<-function(x) return(all(x==0))
which_cols<-unlist(lapply(X,all_zeros))

If your data frame (or a subset) contains all numeric values, you can
finesse the problem like this:

which_rows<-apply(as.matrix(X),1,all_zeros)

What you get is a list of logical (TRUE/FALSE) values from lapply, so
it has to be unlisted to get a vector of logical values like you get
with "apply".

You can then use that vector to index (subset) the original data frame
by logically inverting it with ! (NOT):

X[,!which_cols]
X[!which_rows,]

Your "outliers" look suspiciously like missing values from certain
statistical packages. If you know the values you are looking for, you
can do something like:

NA99999<-X==99999

and then "remove" them by replacing those values with NA:

X[NA99999]<-NA

Be aware that all these hackles (diminutive of hacks) are pretty
specific to this example. Also remember that if this is homework, your
karma has just gone down the cosmic sinkhole.

Jim


On Thu, Mar 31, 2016 at 9:56 AM, Norman Pat <normanmath1 at gmail.com> wrote:
> Hi team
>
> I am new to R so please help me to do this task.
>
> Please find the  attached data sample. But in the original data frame I
> have 350 features and 400000 observations.
>
> I need to carryout these tasks.
>
> 1. How to Identify features (names) that have all zeros?
>
> 2. How to remove features that have all zeros from the dataset?
>
> 3. How to identify features (names) that have outliers such as 99999,-1 in
> the data frame.
>
> 4. How to remove outliers?
>
>
> Many thanks
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list