[R] expanding a presence only dataset into presence/absence

arun smartpink111 at yahoo.com
Mon Apr 29 22:38:41 CEST 2013



HI,
Check if this is what you wanted.  I am not sure the Hopeful outcome includes all the possible combinations.

dat1<- read.csv("Matthewdat.csv",sep=",",header=TRUE,stringsAsFactors=FALSE)
dat1
 #                Species CallingIndex Site      Date
#1     Pseudacris crucifer            2 3608 3/31/2001
#2        Anaxyrus fowleri            2 3638 4/13/2001
#3     Pseudacris crucifer            3 3641 3/23/2001
#4        Pseudacris kalmi            1 3641 3/23/2001
#5 Lithobates catesbeianus            1 3641 4/27/2001
#6     Pseudacris crucifer            2 3641 4/27/2001
#7     Pseudacris crucifer            3 3663  4/5/2001
#8     Pseudacris crucifer            2 3663  5/2/2001
#9    Lithobates clamitans            1 3663  6/6/2001



dat1New<-do.call(rbind,lapply(split(dat1,dat1$Site), function(x) {x$Present<-1; x}))
row.names(dat1New)<-1:nrow(dat1New)
 dat2<-do.call(rbind,lapply(split(dat1,dat1$Site),function(x) expand.grid(unique(x$Species),unique(x$Site),unique(x$Date))))
 row.names(dat2)<- 1:nrow(dat2)
 colnames(dat2)<- colnames(dat1)[c(1,3,4)]
 res<-merge(dat1New,dat2,by=c("Species","Site","Date"),all=TRUE)
 res[is.na(res)]<-0
 res
#                   Species Site      Date CallingIndex Present
#1         Anaxyrus fowleri 3638 4/13/2001            2       1
#2  Lithobates catesbeianus 3641 3/23/2001            0       0
#3  Lithobates catesbeianus 3641 4/27/2001            1       1
#4     Lithobates clamitans 3663  4/5/2001            0       0
#5     Lithobates clamitans 3663  5/2/2001            0       0
#6     Lithobates clamitans 3663  6/6/2001            1       1
#7      Pseudacris crucifer 3608 3/31/2001            2       1
#8      Pseudacris crucifer 3641 3/23/2001            3       1
#9      Pseudacris crucifer 3641 4/27/2001            2       1
#10     Pseudacris crucifer 3663  4/5/2001            3       1
#11     Pseudacris crucifer 3663  5/2/2001            2       1
#12     Pseudacris crucifer 3663  6/6/2001            0       0
#13        Pseudacris kalmi 3641 3/23/2001            1       1
#14        Pseudacris kalmi 3641 4/27/2001            0       0


A.K.



________________________________
From: Matthew Venesky <mvenesky at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Monday, April 29, 2013 3:54 PM
Subject: Re: [R] expanding a presence only dataset into presence/absence



Arun,

Thanks again for your time on this. We are getting very close but not quite there. The problem is that I only gave you a very simple example because I didn't want to bog any of the readers of the blog down. If you have any interest or time, I was wondering if you could consider the full example and some actual data (attached CSV).

As you'll see, there is an additional column titled "CallingIndex", which is an estimate of the species abundance (range of 1-3). If they were present, they were given a value that ranged from 1-3; if they were absent, they were not given any value. Editing your code to reflect this wasn't a problem.

However, what I didn't explain in enough detail to you is the specific contexts when we want to add zeros to the data. Essentially, we want to nest species within site and date and add zeros accordingly. If a species is never found at a site, we do not want to make any adjustments to the data. For example, Anaxyrus fowleri was not found at site 3608, so we do not want the code to add a row with Anaxyrus fowleri to site 3608 (in your code, it would add this). What we do want, however, is to add a zero for a species that was found on one date at a site but never found again on other dates. For example, Lithobates clamitans was found at site 3663 on 6/6/2001 but not observed on the other 2 sampling dates, so we want to assign a calling index of 0 for Lithobates clamitans on sampling date 4/5/2001 and 5/2/2001 for site 3663 (and also make the appropriate addition for Pseudacris crucifer on the appropriate sampling dates).

You should be able to visualize what I am looking to do in the CSV file attached to this email. 

Does this make sense? Do you know of any code to do this task? 








--
Matthew D. Venesky, Ph.D.


Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620

Website: http://mvenesky.myweb.usf.edu/


On Mon, Apr 29, 2013 at 2:11 PM, arun <smartpink111 at yahoo.com> wrote:


>
>Hi Matthew,
>No problem.
>Regards,
>
>Arun
>________________________________
>From: Matthew Venesky <mvenesky at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Monday, April 29, 2013 2:09 PM
>
>Subject: Re: [R] expanding a presence only dataset into presence/absence
>
>
>
>This, my friend, is a stroke of genius.
>
>I'll give it a try on the real data and I will keep you posted.
>
>Many, many, thanks.
>
>
>
>
>
>
>--
>Matthew D. Venesky, Ph.D.
>
>
>Postdoctoral Research Associate,
>Department of Integrative Biology,
>The University of South Florida,
>Tampa, FL 33620
>
>Website: http://mvenesky.myweb.usf.edu/
>
>
>On Mon, Apr 29, 2013 at 2:05 PM, arun <smartpink111 at yahoo.com> wrote:
>
>
>>
>>
>>I am sorry.  I forgot to update the code:dat1<- read.table(text="
>>
>>Species Site Date
>>a 1 1
>>b 1 1
>>b 1 2
>>c 1 3
>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>dat1$Present<- 1
>>dat2<-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))
>> colnames(dat2)<- colnames(dat1)[-4] #changed here
>>
>>res<-merge(dat1,dat2,by=c("Species","Site","Date"),all=TRUE)
>>res[is.na(res)]<- 0
>> res<-res[order(res$Date),]
>>
>>row.names(res)<- 1:nrow(res)
>>
>>res
>>#  Species Site Date Present
>>#1       a    1    1       1
>>#2       b    1    1       1
>>#3       c    1    1       0
>>#4       a    1    2       0
>>
>>#5       b    1    2       1
>>#6       c    1    2       0
>>#7       a    1    3       0
>>#8       b    1    3       0
>>
>>#9       c    1    3       1
>>A.K.
>>
>>
>>
>>________________________________
>>From: Matthew Venesky <mvenesky at gmail.com>
>>To: arun <smartpink111 at yahoo.com>
>>Sent: Monday, April 29, 2013 1:58 PM
>>
>>Subject: Re: [R] expanding a presence only dataset into presence/absence
>>
>>
>>
>>The output that you prepared (for Site 1) looks good... however, I can't get that code to work. I get the following error:
>>
>>> dat2<-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))colnames(dat2)<- colnames(dat1)
>>Error: unexpected symbol in "dat2<-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))colnames"
>>
>>
>>
>>
>>
>>
>>--
>>Matthew D. Venesky, Ph.D.
>>
>>
>>Postdoctoral Research Associate,
>>Department of Integrative Biology,
>>The University of South Florida,
>>Tampa, FL 33620
>>
>>Website: http://mvenesky.myweb.usf.edu/
>>
>>
>>On Mon, Apr 29, 2013 at 1:44 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>Hi Matthew,
>>>
>>>So, do you think the output I gave is different from what you expected?
>>>Thanks,
>>>Arun
>>>
>>>
>>>
>>>
>>>
>>>
>>>________________________________
>>>From: Matthew Venesky <mvenesky at gmail.com>
>>>To: arun <smartpink111 at yahoo.com>
>>>Sent: Monday, April 29, 2013 1:15 PM
>>>Subject: Re: [R] expanding a presence only dataset into presence/absence
>>>
>>>
>>>
>>>
>>>I see what you are confused about. 
>>>
>>>I'm sorry. I gave extra sites as examples in my table called "Desired Data" such that there are 3 sites in the "Desired Data" and only 1 site in the "My current data". Ignore sites 2 and 3; you should see what I am trying to do using only site 1.
>>>
>>>
>>>
>>>
>>>--
>>>Matthew D. Venesky, Ph.D.
>>>
>>>
>>>Postdoctoral Research Associate,
>>>Department of Integrative Biology,
>>>The University of South Florida,
>>>Tampa, FL 33620
>>>
>>>Website: http://mvenesky.myweb.usf.edu/
>>>
>>>
>>>On Mon, Apr 29, 2013 at 1:11 PM, Matthew Venesky <mvenesky at gmail.com> wrote:
>>>
>>>That is part of the difficulty. If Species C was present only on Date 3, we need to have the code manually add Species C as absent (i.e., assign it a value of 0) at that site on the previous sampling dates. 
>>>>
>>>>
>>>>Or, is there something else that is confusing you that I am not explaining?
>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>
>>>>
>>>Matthew D. Venesky, Ph.D.
>>>>
>>>>
>>>>Postdoctoral Research Associate,
>>>>Department of Integrative Biology,
>>>>The University of South Florida,
>>>>Tampa, FL 33620
>>>> 
>>>>Website: http://mvenesky.myweb.usf.edu/
>>>>
>>>>
>>>>On Mon, Apr 29, 2013 at 12:47 PM, arun <smartpink111 at yahoo.com> wrote:
>>>>
>>>>Hi,
>>>>>
>>>>>Your output dataset is bit confusing as it contains Sites that were not in the input.
>>>>>Using your input dataset, I am getting this:
>>>>>
>>>>>
>>>>>dat1<- read.table(text="
>>>>>
>>>>>Species Site Date
>>>>>a 1 1
>>>>>b 1 1
>>>>>b 1 2
>>>>>c 1 3
>>>>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>>>dat1$Present<- 1
>>>>>dat2<-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))
>>>>> colnames(dat2)<- colnames(dat1)
>>>>>res<-merge(dat1,dat2,by=c("Species","Site","Date"),all=TRUE)
>>>>>res[is.na(res)]<- 0
>>>>> res<-res[order(res$Date),]
>>>>> res
>>>>>#  Species Site Date Present
>>>>>#1       a    1    1       1
>>>>>#4       b    1    1       1
>>>>>#7       c    1    1       0
>>>>>#2       a    1    2       0
>>>>>#5       b    1    2       1
>>>>>#8       c    1    2       0
>>>>>#3       a    1    3       0
>>>>>#6       b    1    3       0
>>>>>#9       c    1    3       1
>>>>>A.K.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>----- Original Message -----
>>>>>From: Matthew Venesky <mvenesky at gmail.com>
>>>>>To: r-help at r-project.org
>>>>>Cc:
>>>>>Sent: Monday, April 29, 2013 11:12 AM
>>>>>Subject: [R] expanding a presence only dataset into presence/absence
>>>>>
>>>>>Hello,
>>>>>
>>>>>I'm working with a very large dataset (250,000+ lines in its' current form)
>>>>>that includes presence only data on various species (which is nested within
>>>>>different sites and sampling dates). I need to convert this into a dataset
>>>>>with presence/absence for each species. For example, I would like to expand
>>>>>"My current data" to "Desired data":
>>>>>
>>>>>My current data
>>>>>
>>>>>Species Site Date
>>>>>a 1 1
>>>>>b 1 1
>>>>>b 1 2
>>>>>c 1 3
>>>>>
>>>>>Desired data
>>>>>
>>>>>Species Present Site Date
>>>>>a 1 1 1
>>>>>b 1 1 1
>>>>>c 0 1 1
>>>>>a 0 2 2
>>>>>b 1 2 2
>>>>>C 0 2 2
>>>>>a 0 3 3
>>>>>b 0 3 3
>>>>>c 1 3 3
>>>>>
>>>>>I've scoured the web, including Rseek and haven't found a resolution (and
>>>>>note that a similar question was asked sometime in 2011 without an answer).
>>>>>Does anyone have any thoughts? Thank you in advance.
>>>>>
>>>>>--
>>>>>
>>>>>Matthew D. Venesky, Ph.D.
>>>>>
>>>>>Postdoctoral Research Associate,
>>>>>Department of Integrative Biology,
>>>>>The University of South Florida,
>>>>>Tampa, FL 33620
>>>>>
>>>>>Website: http://mvenesky.myweb.usf.edu/
>>>>>
>>>>>    [[alternative HTML version deleted]]
>>>>>
>>>>>______________________________________________
>>>>>R-help at r-project.org mailing list
>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>> 
>>> 
>> 
> 



More information about the R-help mailing list