[R] Extracting data from a file containing data
jim holtman
jholtman at gmail.com
Thu Jul 2 02:49:20 CEST 2015
Here is a way to do case I. It uses the 'tidyr' package and produces
results like:
> case1[[1]]
YR JF-R_NINO1.2 MAM-R_NINO1.2 JJA-R_NINO1.2 OND-R_NINO1.2
1 1982 ML ML ME SE
2 1983 SE SE SE ME
3 1984 ML ML ML ML
4 1985 SL SL SL ML
5 1986 ME ML ML ME
6 1987 ME SE SE SE
7 1988 ML ML SL SL
8 1989 ML ML ML ML
9 1990 ML ML ML ML
10 1991 ML ML ME ME
11 1992 ME SE ME ML
12 1993 ME ME ME ME
13 1994 ML SL ML ME
14 1995 ME ML ML ML
15 1996 ML SL SL SL
16 1997 ML SE SE SE
17 1998 SE SE SE ML
18 1999 ML ML SL ML
19 2000 ML ML ML ML
20 2001 ML ME ML SL
21 2002 ML ME ML ME
22 2003 ML SL ML ME
23 2004 ML ML SL ME
24 2005 ML ML ML ML
25 2006 ME ML ME SE
26 2007 ME SL SL SL
27 2008 ML ME ME ML
28 2009 ML ME ME ME
29 2010 ME ME SL SL
30 2011 ML ME ME ML
31 2012 ML ME ME ML
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Fri, Jun 26, 2015 at 5:27 AM, Peter Tuju <peterenos at ymail.com> wrote:
> Dear Jim Holtman,
>
> Thank you very much for your help.
>
> The problem I'm trying to solve is “To determine weather the evolution of
> ENSO can influence rainfall over Tanzania”. In this study I have two types
> of data, ie Rainfall data (for 23 stations) and Nino indices data, both
> spanning a period of 31 years (1982-2012).
>
> *CASE I:*
> 1. In “*Nino.indices.txt*” data for all columns of the nino regions (both
> for anomalies and SST), to calculate the Season means "January & February
> (JF)", “March, April and may (MAM)", "June, July & August (JJA)" and
> "October, November and December (OND" for each year. and have the output in
> table form as;
>
> *Nino indices Mean*
> Years
> JF
> SST Mean
> NINO1+2
> JF
> ANOM Mean
> NINO1+2
> MAM
> SST Mean
> NINO3
> MAM
> ANOM Mean
> NINO3
> JJA
> SST Mean
> NINO4
> JJA
> ANOM Mean
> NINO4
> OND
> SST Mean
> NINO3.4
> OND
> SST Mean
> NINO3.4
> 1982
>
>
>
>
>
>
>
>
> 1983
>
>
>
>
>
>
>
>
> - - - -
>
>
>
>
>
>
>
>
> - - - -
>
>
>
>
>
>
>
>
> 2012
>
>
>
>
>
>
>
>
>
> 2. To use the Yearly anomalies for each column in nino regions to
> classify the events as;
> (i). If ANOM Mean> 1, then I assign it to “SE” (Being as Strong El-nino)
> (ii). If 0<ANOM Mean<=1 , then I assign it to “ME” (Being as Moderate
> El-nino)
> (iii). If ANOM== 0, then I assign it to “NT” (Being as Neutral Condition)
> (iv). If ANOM Mean< (-1), then I assign it to “SL” (Being as Strong
> La-nina)
> (v). If -1<=ANOM Mean< 0 , then I assign it to “ML” (Being as Moderate
> La-nina)
> The output have to be in table form as;
>
> *FOR NINO1+2*
> Years
> JF
> ANOM Mean
> NINO1+2
> MAM
> ANOM Mean
> NINO1+2
> JJA
> ANOM Mean
> NINO1+2
> OND
> SST Mean
> NINO1+2
> 1982
> SE
>
>
>
> 1983
>
>
> SL
>
> - - - -
>
>
>
> ML
> - - - -
>
> ME
>
>
> 2012
>
>
>
> *SL*
>
> *FOR NINO3*
> Years
> JF
> ANOM Mean
> NINO3
> MAM
> ANOM Mean
> NINO3
> JJA
> ANOM Mean
> NINO3
> OND
> SST Mean
> NINO3
> 1982
> *SE *
>
>
>
> 1983
>
>
>
>
> - - - -
>
>
>
>
> - - - -
>
> ME
>
>
> 2012
>
>
>
> *SL*
>
> *FOR NINO4*
> Years
> JF
> ANOM Mean
> NINO4
> MAM
> ANOM Mean
> NINO4
> JJA
> ANOM Mean
> NINO4
> OND
> SST Mean
> NINO4
> 1982
> *SE *
>
>
>
> 1983
>
>
>
>
> - - - -
> ML
>
>
> SL
> - - - -
>
> ME
>
>
> 2012
>
>
>
> *SL*
>
>
> *FOR NINO3.4*
> Years
> JF
> ANOM Mean
> NINO3.4
> MAM
> ANOM Mean
> NINO3.4
> JJA
> ANOM Mean
> NINO3.4
> OND
> SST Mean
> NINO3.4
> 1982
> *SE *
> SL
>
>
> 1983
>
>
>
>
> - - - -
>
>
> ML
>
> - - - -
>
> ME
>
>
> 2012
>
>
>
> *SL*
>
>
> 3. To plot the time series graph for each nino regions using the Yearly
> Anomalies.
>
>
> *CASE II:*
> Consider the Rainfall station data;
> 1. In some files containing the data there are missing data labeled by
> variable “m”. I want to substitute these missing data with long term mean.
> 2. Find the rowSum and anomalies of each file containing the data.
> 3. To find the cumsum of the rowSum of each file containing the data.
> 4. Plot the single mass curves ie. Plot(Year, cumsum) for each file and
> name its title as the name of the corresponding file name.
> 5. Plot the time series graphs for seasons JF, MAM, JJA and OND for each
> file and name give its name as “Time series graph for “name of the file””
> 6. To find the seasonal correlations for JF, MAM, JJA and OND using the
> anomalies of the rainfall station data and that of each nino region
> indices, and have the results in table form as;
>
> *CORRELATIONS OF RAINFALL AND NINO1+2 ANOMALIES*
> *Years*
> *JF*
> *MAM*
> *JJA*
> *OND*
> *1982 *
>
>
>
>
> *1983*
>
>
>
>
> *- - - -*
>
>
>
>
> *- - - -*
>
>
>
>
> *2012*
>
>
>
>
>
> *CORRELATIONS OF RAINFALL AND NINO3 ANOMALIES*
> *Years*
> *JF*
> *MAM*
> *JJA*
> *OND*
> *1982 *
>
>
>
>
> *1983*
>
>
>
>
> *- - - -*
>
>
>
>
> *- - - -*
>
>
>
>
> *2012*
>
>
>
>
>
> *CORRELATIONS OF RAINFALL AND NINO4 ANOMALIES*
> *Years*
> *JF*
> *MAM*
> *JJA*
> *OND*
> *1982 *
>
> ...
>
> [Message clipped]
-------------- next part --------------
input <- read.table("C:\\Users\\jh52822\\Downloads\\Nino_indices.txt"
, header = TRUE
, as.is = TRUE
)
# create factors
input$season <- factor(c(rep("JF", 2), rep("MAM", 3), rep("JJA", 3)
, NA, rep("OND", 3)
)[input$MON], levels = c("JF", "MAM", "JJA", "OND"))
# leave off the MON (-2) column from the data
res <- aggregate(. ~ season + YR, data = input[, -2], FUN = 'mean')
head(res,10)
# this function determines the labels to apply
f_labels <-
function(x)
{
result <- as.character(cut(x # convert to character since it is a factor
, breaks = c(-Inf, -1, 0, 1, Inf)
, labels = c("SL", "ML", "ME", "SE")
))
# check for zero
result[x == 0] <- "NT"
result
}
# get the "ANOM" columns for processing since these are the ones that
# we want to test for values.
anom <- which(grepl("^ANOM", names(res)))
# for each ANOM column iterate and compute the labels based on values
for (i in anom){
# use the variable in previous column for the name
res[[paste0("R_", names(res[i - 1L]))]] <- f_labels(res[[i]])
}
# the tidyr package helps to format the results
require(tidyr)
# columns to use as summary -- added above
sum_cols <- paste0("R_", names(res[anom - 1L]))
case1 <- lapply(sum_cols, function(.col){
# need to restrict what data we want
x <- spread_(res[, c("YR", "season", .col)], "season", .col)
# append the name of the data to the season
names(x)[-1] <- paste(names(x[-1]), .col, sep = '-')
x # return value
})
More information about the R-help
mailing list