[R-sig-Geo] Finding the highest and lowest rates of increase at specific x value across several time series in R
Alexander Ilich
@|||ch @end|ng |rom u@|@edu
Tue May 16 17:42:18 CEST 2023
The only spot you'll need to change the names for is when putting all of your dataframes in a list as that is based on the names you gave them in your script when reading in the data. In the function, you don't need to change the input to "dataframe1", and naming it that way could be confusing since you are applying the function to more than just dataframe1 (you're applying it to all 10 of your dataframes). I named the argument df to indicate that you should supply your dataframe as the input to the function, but you could name it anything you want. For example, you could call it "mydata" and define the function this way if you wanted to.
ExtractFirstMin<- function(mydata){
mydata$abs_diff<- abs(mydata$x-1)
min_rate<- mydata$y[which.min(mydata$abs_diff)]
return(min_rate)
}
#The function has its own environment of variables that is separate from the global environment of variables you've defined in your script.
#When we supply one of your dataframes to the function, we are assigning that information to a variable in the function's environment called "mydata". Functions allow you to generalize your code so that you're not required to name your variables a certain way. Note here, we do assume that "mydata" has a "$x" and "$y" slot though.
#Without generalizing the code using a function, we'd need to copy and paste the code over and over again and make sure to change the name of the dataframe each time. This is very time consuming and error prone. Here's an example for the first 3 dataframes.
min_rate<- rep(NA_real_, 10) #initialize empty vector
df1$abs_diff<- abs(df1$x-1)
min_rate[1]<- df1$y[which.min(df1$abs_diff)]
df2$abs_diff<- abs(df2$x-1)
min_rate[2]<- df2$y[which.min(df2$abs_diff)]
df3$abs_diff<- abs(df3$x-1)
min_rate[3]<- df3$y[which.min(df3$abs_diff)]
print(min_rate)
#> [1] 29.40269 32.21546 30.75330 NA NA NA NA NA
#> [9] NA NA
#With the function defined we can run that it for each individual dataframe, which is less error prone than copying and pasting but still fairly repetitive
ExtractFirstMin(mydata = df1) # You can explicitly say "mydata ="
#> [1] 29.40269
ExtractFirstMin(df2) # Or equivalently it will be based on the order arguments when you defined the function. Since there is just one argument, then what you supply is assigned to "mydata"
#> [1] 32.21546
ExtractFirstMin(df3)
#> [1] 30.7533
# Rather than manually typing out to tun the function on eeach dataframe and bringing it together, we can instead use sapply.
# Sapply takes a list of inputs and a function as arguments. It then applies the function to every element in the list and returns a vector (i.e. goes through each dataframe in your list, applies the function to each one individually, and then records the result for each one in a single variable).
sapply(df_list, ExtractFirstMin)
#> [1] 29.40269 32.21546 30.75330 30.12109 30.38361 28.64928 30.45568 29.66190
#> [9] 31.57229 31.33907
________________________________
From: rain1290 using aim.com <rain1290 using aim.com>
Sent: Monday, May 15, 2023 4:44 PM
To: Alexander Ilich <ailich using usf.edu>; r-sig-geo using r-project.org <r-sig-geo using r-project.org>
Subject: Re: [R-sig-Geo] Finding the highest and lowest rates of increase at specific x value across several time series in R
Hi Alexander and everyone,
I hope that all is well! Just to follow up with this, I recently was able to try the following code that you had kindly previously shared:
ExtractFirstMin<- function(df){
df$abs_diff<- abs(df$x-1)
min_rate<- df$y[which.min(df$abs_diff)]
return(min_rate)
} #Get first y value of closest to x=1
Just to be clear, do I simply replace the "df" in that code with the name of my individual dataframes? For example, here is the name of my 10 dataframes, which are successfully placed in a list (i.e. df_list), as you showed previously:
dataframe1
dataframe2
dataframe3
dataframe4
dataframe5
dataframe6
dataframe7
dataframe8
dataframe9
dataframe10
Thus, using your example above, using the first dataframe listed there, would this become:
ExtractFirstMin<- function(dataframe1){
dataframe1$abs_diff<- abs(dataframe1$x-1)
min_rate<- dataframe1$y[which.min(dataframe1$abs_diff)]
return(min_rate)
} #Get first y value of closest to x=1
df_list<- list(dataframe1, dataframe2, dataframe3, dataframe4, dataframe5, dataframe6, dataframe7, dataframe8, dataframe9, dataframe10)
# Apply function across list
sapply(df_list, ExtractFirstMin)
Am I doing this correctly?
Thanks, again!
-----Original Message-----
From: Alexander Ilich <ailich using usf.edu>
To: rain1290 using aim.com <rain1290 using aim.com>
Sent: Thu, May 11, 2023 1:48 am
Subject: Re: [R-sig-Geo] Finding the highest and lowest rates of increase at specific x value across several time series in R
Sure thing. Glad I could help!
________________________________
From: rain1290 using aim.com <rain1290 using aim.com>
Sent: Thursday, May 11, 2023 12:17:12 AM
To: Alexander Ilich <ailich using usf.edu>
Subject: Re: [R-sig-Geo] Finding the highest and lowest rates of increase at specific x value across several time series in R
Hi Alexander,
Many thanks for sharing this! It was really helpful!
-----Original Message-----
From: Alexander Ilich <ailich using usf.edu>
To: rain1290 using aim.com <rain1290 using aim.com>
Sent: Wed, May 10, 2023 2:05 pm
Subject: Re: [R-sig-Geo] Finding the highest and lowest rates of increase at specific x value across several time series in R
One way to do this would be to put all your dataframes in a list, make one of the code implementation I put earlier into a function, and then use sapply to apply it across all the data frames.
#Generate data
set.seed(5)
for (i in 1:10) {
assign(x = paste0("df", i),
value = data.frame(x = sort(rnorm(n = 10, mean = 1, sd = 0.1)),
y= rnorm(n = 10, mean = 30, sd = 1)))
} # Create 10 Data Frames
# Define Functions (two versions based on how you want to deal with ties)
ExtractFirstMin<- function(df){
df$abs_diff<- abs(df$x-1)
min_rate<- df$y[which.min(df$abs_diff)]
return(min_rate)
} #Get first y value of closest to x=1
ExtractAvgMin<- function(df){
df$abs_diff<- abs(df$x-1)
min_rate<- mean(df$y[df$abs_diff==min(df$abs_diff)])
return(min_rate)
} #Average all y values that are closest to x=1
# Put all dataframes into a list
df_list<- list(df1,df2,df3,df4,df5,df6,df7,df8,df9,df10)
# Apply function across list
sapply(df_list, ExtractFirstMin)
#> [1] 29.40269 32.21546 30.75330 30.12109 30.38361 28.64928 30.45568 29.66190
#> [9] 31.57229 31.33907
sapply(df_list, ExtractAvgMin)
#> [1] 29.40269 32.21546 30.75330 30.12109 30.38361 28.64928 30.45568 29.66190
#> [9] 31.57229 31.33907
________________________________
From: rain1290 using aim.com <rain1290 using aim.com>
Sent: Wednesday, May 10, 2023 1:40 PM
To: Alexander Ilich <ailich using usf.edu>; r-sig-geo using r-project.org <r-sig-geo using r-project.org>
Subject: Re: [R-sig-Geo] Finding the highest and lowest rates of increase at specific x value across several time series in R
Hi Alexander,
Thank you so much for taking the time to outline these suggestions!
What if I wanted to only isolate the y-value at x = 1.0 across all of my 10 dataframes? That way, I could quickly see what the highest and lowest y-value is at x = 1.0? That said, in reality, not all x values are precisely 1.0 (it can be something like 0.99 to 1.02), but the idea is to target the y-value at x = ~1.0. Is that at all possible?
Thanks, again!
-----Original Message-----
From: Alexander Ilich <ailich using usf.edu>
To: r-sig-geo using r-project.org <r-sig-geo using r-project.org>; rain1290 using aim.com <rain1290 using aim.com>
Sent: Wed, May 10, 2023 10:31 am
Subject: Re: [R-sig-Geo] Finding the highest and lowest rates of increase at specific x value across several time series in R
So using your data but removing x=1, 0.8 and 1.2 would be equally close. Two potential options are to choose the y value corresponding to the first minimum difference (in this case x=0.8, y=39), or average the y values for all that are equally close (in this case average the y values for x=0.8 and x=1.2). I think the easiest wayodo that would to first calculate a column of the absolute value of differences between x and 1 and then subset the dataframe to the minimum of that column to extract the y values. Here's a base R and tidyverse implementation to do that.
#Base R
df<- data.frame(x=c(0,0.2,0.4,0.6,0.8,1.2,1.4),
y= c(0,27,31,32,39,34,25))
df$abs_diff<- abs(df$x-1)
df$y[which.min(df$abs_diff)] #Get first y value of closest to x=1
#> [1] 39
mean(df$y[df$abs_diff==min(df$abs_diff)]) #Average all y values that are closest to x=1
#> [1] 36.5
#tidyverse
rm(list=ls())
library(dplyr)
df<- data.frame(x=c(0,0.2,0.4,0.6,0.8,1.2,1.4),
y= c(0,27,31,32,39,34,25))
df<- df %>% mutate(abs_diff = abs(x-1))
df %>% filter(abs_diff==min(abs_diff)) %>% pull(y) %>% head(1) #Get first y value of closest to x=1
#> [1] 39
df %>% filter(abs_diff==min(abs_diff)) %>% pull(y) %>% mean() #Average all y values that are closest to x=1
#> [1] 36.5
________________________________
From: rain1290 using aim.com <rain1290 using aim.com>
Sent: Wednesday, May 10, 2023 8:13 AM
To: Alexander Ilich <ailich using usf.edu>; r-sig-geo using r-project.org <r-sig-geo using r-project.org>
Subject: Re: [R-sig-Geo] Finding the highest and lowest rates of increase at specific x value across several time series in R
Hi Alex and everyone,
My apologies for the confusion and this double message (I just noticed that the example dataset appeared distorted)! Let me try to simplify here again.
My dataframes are structured in the following way: an x column and y column, like this:
[X]
Now, let's say that I want to determine the rate of increase at about x = 1.0, relative to the beginning of the period (i.e. 0 at the beginning). We can see clearly here that the answer would be y = 43. My question is would it be possible to quickly determine the value at around x = 1.0 across the 10 dataframes that I have like this without having to manually check them? The idea is to determine the range of values for y at around x = 1.0 across all dataframes. Note that it's not perfectly x = 1.0 in all dataframes - some could be 0.99 or 1.01.
I hope that this is clearer!
Thanks,
-----Original Message-----
From: Alexander Ilich <ailich using usf.edu>
To: r-sig-geo using r-project.org <r-sig-geo using r-project.org>; rain1290 using aim.com <rain1290 using aim.com>
Sent: Tue, May 9, 2023 2:23 pm
Subject: Re: [R-sig-Geo] Finding the highest and lowest rates of increase at specific x value across several time series in R
I'm currently having a bit of difficultly following. Rather than using your actual data, perhaps you could include code to generate a smaller dataset with the same structure with clear definitions of what is contained within each (r faq - How to make a great R reproducible example - Stack Overflow<https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example>). You can design that dataset to be small with a known answer and the describe how you got to that answer and then others could help determine some code to accomplish that task.
Best Regards,
Alex
________________________________
From: R-sig-Geo <r-sig-geo-bounces using r-project.org> on behalf of rain1290--- via R-sig-Geo <r-sig-geo using r-project.org>
Sent: Tuesday, May 9, 2023 1:01 PM
To: r-sig-geo using r-project.org <r-sig-geo using r-project.org>
Subject: [R-sig-Geo] Finding the highest and lowest rates of increase at specific x value across several time series in R
I would like to attempt to determine the difference between the highest and lowest rates of increase across a series of dataframes at a specified x value. As shown below, the dataframes have basic x and y columns, with emissions values in the x column, and precipitation values in the y column. Among the dataframes, the idea would be to determine the highest and lowest rates of precipitation increase at "approximately" 1 Terratons of emissions (TtC) relative to the first value of each time series. For example, I want to figure out which dataframe has the highest increase at 1 TtC, and which dataframe has the lowest increase at 1 TtC. at However, I am not sure if there is a way to quickly achieve this? Here are the dataframes that I created, followed by an example of how each dataframe is structured:
#Dataframe objects created:
CanESMRCP8.5PL<-data.frame(get3.teratons, pland20) IPSLLRRCP8.5PL<-data.frame(get6.teratons, pland21) IPSLMRRCP8.5PL<-data.frame(get9.teratons, pland22) IPSLLRBRCP8.5PL<-data.frame(get12.teratons, pland23) MIROCRCP8.5PL<-data.frame(get15.teratons, pland24) HadGEMRCP8.5PL<-data.frame(get18.teratons, pland25) MPILRRCP8.5PL<-data.frame(get21.teratons, pland26) GFDLGRCP8.5PL<-data.frame(get27.teratons, pland27) GFDLMRCP8.5PL<-data.frame(get30.teratons, pland28)
#Example of what each of these look like:
>CanESMRCP8.5PL
get3.teratons pland20 X1 0.4542249 13.252426 X2 0.4626662 3.766658 X3 0.4715780 2.220986 X4 0.4809204 8.495072 X5 0.4901427 10.206458 X6 0.4993126 10.942797 X7 0.5088599 6.592956 X8 0.5187588 2.435796 X9 0.5286758 2.275836 X10 0.5389284 5.051706 X11 0.5496212 8.313389 X12 0.5600628 9.007722 X13 0.5708608 11.905644 X14 0.5819234 6.126022 X15 0.5926283 9.883264 X16 0.6042306 7.699696 X17 0.6159752 5.614193 X18 0.6274483 6.681527 X19 0.6394011 10.112812 X20 0.6519496 8.721810 X21 0.6646344 10.315931 X22 0.6773436 11.372490 X23 0.6903203 8.662169 X24 0.7036479 10.106109 X25 0.7180955 10.990867 X26 0.7322746 13.491778 X27 0.7459771 17.256650 X28 0.7604589 12.040960 X29 0.7753096 10.638796 X30 0.7898374 7.889500 X31 0.8047258 11.757174 X32 0.8204160 15.060151 X33 0.8359387 9.822078 X34 0.8510721 11.388695 X35 0.8661237 10.271567 X36 0.8815913 13.224285 X37 0.8984146 15.584782 X38 0.9154501 9.320024 X39 0.9324529 9.187128 X40 0.9497379 12.919805 X41 0.9672824 15.190318 X42 0.9854439 12.098606 X43 1.0041460 16.758629 X44 1.0241779 17.435182 X45 1.0451656 15.323428 X46 1.0663605 18.292109 X47 1.0868977 12.625429 X48 1.1079376 17.318583 X49 1.1295719 14.056624 X50 1.1516720 18.239445 X51 1.1736696 16.312087 X52 1.1963065 18.683315 X53 1.2195753 20.364835 X54 1.2425277 14.337167 X55 1.2653873 16.072449 X56 1.2888002 14.870248 X57 1.3126799 18.431717 X58 1.3362459 19.873449 X59 1.3593610 17.278361 X60 1.3833589 18.532887 X61 1.4083234 16.178170 X62 1.4328881 17.689810 X63 1.4572568 21.395131 X64 1.4821021 20.154886 X65 1.5072721 15.655971 X66 1.5325393 21.692028 X67 1.5581797 23.258303 X68 1.5842384 23.802459 X69 1.6108635 15.824673 X70 1.6365393 19.016228 X71 1.6618322 20.957593 X72 1.6876948 19.105363 X73 1.7134712 19.759288 X74 1.7392598 27.315595 X75 1.7652725 24.882263 X76 1.7913807 25.813408 X77 1.8173818 23.658997 X78 1.8434211 24.223432 X79 1.8695911 23.560818 X80 1.8960611 28.057708 X81 1.9228969 26.996265 X82 1.9493552 26.659719 X83 1.9759324 22.723687 X84 2.0026666 30.977267 X85 2.0290137 29.384326 X86 2.0549359 24.840383 X87 2.0811679 26.952620 X88 2.1081763 29.894790 X89 2.1349227 25.224040 X90 2.1613017 27.722623
>IPSLLRRCP8.5PL
get6.teratons pland21 X1 0.5300411 8.128827 X2 0.5401701 6.683660 X3 0.5503503 12.344974 X4 0.5607762 11.322411 X5 0.5714146 14.250646 X6 0.5825357 10.013592 X7 0.5937966 9.437394 X8 0.6051673 8.138396 X9 0.6168960 9.767765 X10 0.6290367 8.166579 X11 0.6413864 12.307348 X12 0.6539184 12.623931 X13 0.6667360 11.182448 X14 0.6800060 12.585040 X15 0.6935350 13.408614 X16 0.7071757 9.352335 X17 0.7211951 12.743725 X18 0.7356089 11.625612 X19 0.7502665 10.240418 X20 0.7650959 12.394282 X21 0.7800845 16.963066 X22 0.7953119 16.380090 X23 0.8107459 10.510501 X24 0.8260236 12.645911 X25 0.8414439 14.134851 X26 0.8572960 18.924963 X27 0.8732313 17.849050 X28 0.8892344 10.941533 X29 0.9057380 12.034925 X30 0.9223530 15.897904 X31 0.9391578 19.707692 X32 0.9563358 16.690375 X33 0.9738711 18.098571 X34 0.9916517 16.588447 X35 1.0096934 16.125172 X36 1.0279473 19.108647 X37 1.0463864 16.972994 X38 1.0653421 22.869403 X39 1.0842487 21.228874 X40 1.1035309 25.509754 X41 1.1230403 15.579367 X42 1.1426743 21.259726 X43 1.1626806 26.061262 X44 1.1833831 21.918530 X45 1.2045888 22.369094 X46 1.2262981 21.480456 X47 1.2481395 20.503543 X48 1.2703019 27.717028 X49 1.2929382 26.295449 X50 1.3157745 28.271455 X51 1.3390449 31.595651 X52 1.3626052 26.188018 X53 1.3863833 26.326999 X54 1.4102701 26.902272 X55 1.4343871 25.308764 X56 1.4584666 23.789699 X57 1.4831504 26.916504 X58 1.5080384 32.921638 X59 1.5331210 29.753267 X60 1.5582794 29.567720 X61 1.5832585 31.454097 X62 1.6085002 26.602191 X63 1.6339502 35.873728 X64 1.6594560 34.222654 X65 1.6851070 36.290959 X66 1.7109757 31.623912 X67 1.7368503 31.965520 X68 1.7626750 41.490310 X69 1.7883216 35.645934 X70 1.8141292 35.639422 X71 1.8405670 37.085608 X72 1.8672313 44.812777 X73 1.8939987 40.044602 X74 1.9208222 37.834526 X75 1.9478806 44.497335 X76 1.9750195 39.839740 X77 2.0024118 38.300529 X78 2.0302205 52.116649 X79 2.0581589 59.189047 X80 2.0861536 51.559857 X81 2.1141780 43.305779 X82 2.1421791 47.950074 X83 2.1703249 46.252149 X84 2.1985953 47.536605 X85 2.2266540 49.422466 X86 2.2547762 44.577399 X87 2.2827062 49.720523 X88 2.3102098 47.138244 X89 2.3379090 51.882832 X90 2.3656370 51.413472
Etc...
Any help with this would be greatly appreciated!
Thanks,
[[alternative HTML version deleted]]
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo using r-project.org
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-geo&data=05%7C01%7Cailich%40usf.edu%7Cb59e2f81076143e9b0a408db50af0b78%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638192484981183656%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iwGeEZIJDfmhJ5TkfQxq5htErTGihLIrl7T5nJ6fIC0%3D&reserved=0<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list