[R] New reshape2 question
Jeff Newmiller
jdnewmil at dcn.davis.ca.us
Thu Aug 14 08:19:47 CEST 2014
See below.
On Wed, 13 Aug 2014, Neotropical bat risk assessments wrote:
> Hi all,
>
> Thanks go out to those who provided helpful suggestions last year with a
> similar issue.
>
> I am working with a new data set and trying what I assumed was a simple
> aggregation in reshape2 but is not working. I have a large number of similar
> data sets to run so getting the code correct is important.
>
> I have tried this code line in bold (both plyr and reshape2 are loaded):
>
>> ChenaPond <- read.table("C:/Bat papers in prep/Chile/Data &
> analyses/ChenaPond.txt",header=T,sep="\t",quote="")
I find it most efficient to use the "stringsAsFactors=FALSE" option and
only convert to factor those columns that I know I want to be factors.
In particular, dates and times can be challenging to read in directly as
date/times... I find it most clear to read them in as strings and
convert them using specific conversion statements.
> dat1<-ChenaPond
>> *res2<-ddply(dat1,.(Location,Species),summarize, Time=sum(Time))*
>
> *Error in Summary.factor(c(3L, 4L, 5L, 15L, 39L, 45L, 18L, 24L, 25L, 26L, :
> sum not meaningful for factors*
>
> Attached is the data. Not sure why it is all factors and when I tried
> changing to double precision the times were corrupted. I recall that R does
> not do well with time values. Do I need a line using chron as well
> beforehand?
R is actually much more specific about time values than, say, Excel. This
may make it appear to be a hassle, but it is actually capable of
considerably more than Excel in regards to dates and times with a minimum
of additional work. The hardest part is understanding how our calendar
and timezones actually work.
chron is certainly an option, but I typically use POSIXt so that is what I
am more familiar with. You can read [1] and decide what you would prefer.
Word to the wise: you will probably get into trouble if you convert POSIXt
types to numeric... chron may be more forgiving.
> I even tried for several hours looking at the ReshapeGUI package to see what
> I may have been doing incorrectly to no avail.
I am completely baffled why you chose to focus on the reshaping method
rather than following the lead of your error above which pointed to
factors as the problem.
>
> 1. What I need to do to analyze all the data in another program is to
> reformat it so that I have a Species by Time matrix summarized in 5
> minute time blocks. The result needs to be Species as rows, and
> time intervals are arranged chronologically in columns.
Below is one way to proceed. I suggest you step through it one piece at a
time interspersed with appropriate use of the str() function to clarify
what the data looks like at each step. You probably ought to read
?DateTimeClasses and follow links from there as well.
The follwing statement is the output of the "dput" function, which is
recommended in [2].
dta <- structure(list(
Species = c("Myochi", "Lascin", "Lascin", "Lascin",
"Tadbra", "Lasvar", "Lasvar", "Lasvar", "Lascin", "Tadbra", "Lascin",
"Lascin", "Lasvar", "Lasvar", "Lasvar", "Lasvar", "Lasvar", "Myochi",
"Myochi", "Lascin", "Lasvar", "Lasvar", "Lasvar", "Myochi", "Lasvar",
"Lascin", "Lascin", "Lascin", "Myochi", "Myochi", "Myochi", "Myochi",
"Lascin", "Lasvar", "Lasvar", "Myochi", "Lasvar", "Lasvar", "Lascin",
"Lasvar", "Lasvar", "Lascin", "Lascin", "Tadbra", "Lascin", "Lascin",
"Lascin", "Lascin", "Lascin", "Lasvar", "Lasvar", "Lascin", "Lasvar",
"Tadbra", "Myochi", "Myochi", "Lasvar", "Myochi", "Myochi", "Myochi",
"Lasvar", "Lasvar", "Tadbra", "Lasvar", "Lasvar", "Lasvar", "Tadbra"
), Location = c("Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond",
"Chena pond", "Chena pond", "Chena pond", "Chena pond"),
Date = c("5/26/09", "5/26/09", "5/26/09", "5/26/09", "5/26/09",
"5/26/09", "5/26/09", "5/26/09", "5/26/09", "5/26/09", "5/26/09",
"5/26/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/15/09", "10/15/09", "10/15/09", "10/15/09", "10/15/09",
"10/16/09"),
Time = c("18:38", "18:51", "19:38", "19:39",
"19:47", "20:12", "20:16", "20:56", "21:19", "21:20", "22:47",
"22:56", "20:51", "20:55", "20:56", "20:57", "20:59", "21:00",
"21:26", "21:29", "21:33", "21:34", "21:35", "21:55", "21:56",
"21:59", "22:00", "22:01", "22:03", "22:08", "22:08", "22:09",
"22:17", "22:23", "22:24", "22:26", "22:30", "22:31", "22:42",
"22:42", "22:44", "22:46", "22:49", "22:49", "22:50", "22:51",
"22:53", "22:54", "22:57", "23:01", "23:06", "23:08", "23:09",
"23:14", "23:30", "23:31", "23:33", "23:35", "23:35", "23:38",
"23:39", "23:44", "23:45", "23:47", "23:52", "23:59", "0:00"
)),
.Names = c("Species", "Location", "Date", "Time")
, class = "data.frame", row.names = c(NA, -67L))
library(lubridate)
library(reshape2)
# set time zone to something that doesn't use daylight savings
# this may not be how your data are actually recorded... look
# up ?timezones ... the short answer is you may need to look
# at the names of some files on your system or in your R install
# directory to find out what labels correspond to your data's timezone.
Sys.setenv( TZ="Etc/GMT+5" )
dta$Dtm <- mdy_hm( paste( dta$Date, dta$Time ) )
floor5 <- function( dtm ) {
# break up the POSIXct (number of seconds since 1/1/1970 GMT)
dtmlt <- as.POSIXlt( dtm )
# floor the minutes and seconds to the next lower 5 minutes
dtmlt$sec <- 0
dtmlt$min <- 5 * ( dtmlt$min %/% 5 )
as.POSIXct( dtmlt )
}
dta$Dtm5 <- floor5( dta$Dtm )
# can be done with table
#table( dta$Dtm5, dta$Species )
# I prefer data frames, so reshape2 helps out
dtat <- dcast( dta, Dtm5~Species, fun.aggregate = length, value.var="Dtm5" )
> 2. Then I need the matrix converted such that each unique Species will
> have proportional abundances of time (0 to 100) so totals for each
> species should be the same (or 100%).
I don't like to do all the work for other people. Is dividing some vectors
by their sums something you need help with?
> What do folks suggest?
> Plyr, Reshape2 or try tables?
Any of these... depending on your preference. You just need to get a grip
on how R handles time. [1]
> Thanks,
>
> Bruce
>
> --
> Bruce W. Miller, PhD.
> Neotropical bat risk assessments
>
> If we lose the bats, we may lose much of the tropical vegetation and the
> lungs of the planet
>
> Using acoustic sampling to map species distributions for >15 years.
>
> Providing Interactive identification keys to the vocal signatures of New
> World Bats
>
> For various project details see:
>
> https://sites.google.com/site/batsoundservices/
>
>
[1] http://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf starting on page 29
[2] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
More information about the R-help
mailing list