[R] Outliers Help
arun
smartpink111 at yahoo.com
Fri Aug 30 14:43:23 CEST 2013
HI,
Also,
dd1<-matrix(cbind(D[,1],(D[-c(1:2)]/D[,2]>4)*1),dimnames=NULL,ncol=7)
identical(dd,dd1)
#[1] TRUE
A.K.
----- Original Message -----
From: Jose Iparraguirre <Jose.Iparraguirre at ageuk.org.uk>
To: Mª Teresa Martinez Soriano <teresamarso at hotmail.com>; "r-help at r-project.org" <r-help at r-project.org>
Cc:
Sent: Friday, August 30, 2013 5:39 AM
Subject: Re: [R] Outliers Help
Hi Ma Teresa,
Sorry, but I can't understand what you're trying to achieve.
On a statistical note, I'd tend to think more in terms of medians and would think hard before replacing any outliers, but that's another matter.
Here I created the dataframe dd with the means column of D in its first column, and then populated with a 1 whenever the value of D for that cell was greater than 4 times the mean for that row -your definition of 'outlier'.
> dd <- rep(0,15*7)
> dim(dd) <- c(15,7)
> dd[,1]<- D[,1]
> for (i in 1:15){
+ for (j in 2:7){
+ dd[i,j] <- D[i,(j+1)]/D[i,2]>4
+ }
+ }
> dd
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1108 0 0 0 0 0 0
[2,] 1479 NA NA 0 1 0 0
[3,] 1591 0 0 0 0 0 0
[4,] 3408 0 0 0 0 0 0
[5,] 3423 NA NA NA 1 0 0
[6,] 3872 0 0 0 0 0 1
[7,] 5823 0 0 0 0 0 0
[8,] 6051 NA NA NA NA 0 0
[9,] 8099 0 0 0 0 0 0
[10,] 8100 NA NA NA NA 0 0
[11,] 10640 1 1 1 0 0 0
[12,] 12600 0 0 0 0 0 0
[13,] 14680 0 0 1 0 0 1
[14,] 14698 0 0 0 0 0 0
[15,] 17143 0 0 0 0 0 0
So, you encounter four situations:
a) as in row 2, you have an outlier preceded and followed by values
b) as in row 5, you have an outlier preceded by an NA
c) as in row 6, there is an outlier in the last column
d) as in row 11, there are two or more consecutive outliers
The replacement rule you described would only apply to situations a) (ie replacing the outlier by the mean of the preceding and subsequent values), and b) (replacing it by the mean for the row).
But what of situations c) and d)?
And, because this is just a chunk of a bigger dataset, you can also get an outlier in the first column, followed by a number. Again, your rule has not accounted for this situation either.
Hope this helps,
José
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Mª Teresa Martinez Soriano
Sent: 30 August 2013 09:13
To: r-help at r-project.org
Subject: [R] Outliers Help
This is my a part of my data set
> D[1:15,c(1,5:10)]
X. media IE.2005 IE.2006 IE.2007 IE.2008 IE.2009 IE.2010
1 1108 22.00000 60.0 39 4.0 8.0 16.0 5.0
2 1479 110.00000 NA NA 53.0 1166.0 344.8 110.0
3 1591 86.60000 247.0 87 95.0 94.0 81.0 76.0
4 3408 807.00000 302.0 322 621.0 1071.0 1301.0 1225.0
5 3423 9.00000 NA NA NA 410.8 7.0 11.0
6 3872 103.25000 288.6 113 116.0 90.0 94.0 12036.6
7 5823 73.00000 117.0 70 80.0 74.0 69.0 72.0
8 6051 73.00000 NA NA NA NA 60.0 86.0
9 8099 125.16667 196.0 161 150.0 94.0 72.0 78.0
10 8100 70.00000 NA NA NA NA 48.0 92.0
11 10640 67.33333 1256.6 1152 664.2 74.0 77.0 51.0
12 12600 2417.00000 1960.0 2383 2453.0 2506.0 2758.0 2442.0
13 14680 38.00000 30.0 61 373.6 42.0 19.0 220.8
14 14698 698.16667 553.0 664 847.0 800.0 679.0 646.0
15 17143 392.16667 323.0 322 434.0 383.0 459.0 432.0
I have done multiple imputation and now I have some outliers which I would like to replace with the mean of this row or if it is possible with the mean of the previos and the next value of this row, I mean for instance:
value 1 - Outlier- Value 2
I would like to replace the outlier with the mean of value 1 and value2, the problem is that this values could be NA ( NA after the imputation because they don't exist), in this case I would like to replace outlier with the mean of the row.
An other problem I have is to detect correctly outlier values, for instance in this example of data set for X=3872 and IE.2010, we can see an outlier, I have thought to compare the values with the mean ( column media)
I have tried to do this code
D<-datos[, c(1,16:24)]
m<-as.matrix(D)
for( i in 1: nrow(D))
{
for( j in 5:(ncol(D)-1)) # I would change this in the new data set, because I will have more years than 2010
{
if(!is.na(m[i,j])&& !is.na (m[i,j+1])&&!is.na(m[i,j-1])&&!is.na(m[i,2])&&((m[i,j]/m[i,2])>4)){m[m[i,j]]<- (m[i,j-1]+m[i,j+1])/2 # Here I would like to find the values that are much more bigger than the mean of this row,
#if( !is.na(m[i,j])
# and replace them by the mean of the previous and the next values of the same row.
}
}
}
D<-as.data.frame(m)
But I get a data.frame that I had previously, it changes nothing
I accept any idea.
Thanks a lot, Teresa
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
The Wireless from Age UK | Radio for grown-ups.
www.ageuk.org.uk/thewireless
If you’re looking for a radio station that offers real variety, tune in to The Wireless from Age UK.
Whether you choose to listen through the website at www.ageuk.org.uk/thewireless, on digital radio (currently available in London and Yorkshire) or through our TuneIn Radio app, you can look forward to an inspiring mix of music, conversation and useful information 24 hours a day.
-------------------------------
Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798).
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.
For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health
cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and
regulated by the Financial Services Authority.
------------------------------
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are
addressed. If you receive a message in error, please advise the sender and delete immediately.
Except where this email is sent in the usual course of our business, any opinions expressed in this email are those of the author and do not
necessarily reflect the opinions of Age UK or its subsidiaries and associated companies. Age UK monitors all e-mail transmissions passing
through its network and may block or modify mails which are deemed to be unsuitable.
Age Concern England (charity number 261794) and Help the Aged (charity number 272786) and their trading and other associated companies merged
on 1st April 2009. Together they have formed the Age UK Group, dedicated to improving the lives of people in later life. The three national
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help the Aged in these nations to form three registered charities:
Age Scotland, Age NI, Age Cymru.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list