[R] reshaping data
Mia Bengtsson
miamynta at gmail.com
Fri May 21 19:48:21 CEST 2010
Yes, that works beautifully on both the test dataset and my real dataset. This was exactly what I was looking for. Thank you!
/ Mia
On May 21, 2010, at 6:10 PM, William Dunlap wrote:
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Mia Bengtsson
>> Sent: Friday, May 21, 2010 3:39 AM
>> To: Dennis Murphy; Henrique Dallazuanna
>> Cc: r-help at r-project.org
>> Subject: Re: [R] reshaping data
>>
>> Thank you Dennis and Henrique for your help!
>>
>> Both solutions work! I just need to find a way of removing
>> the empty "cells" from the final "long" dataframe since they
>> are not NAs.
>>
>> Maybe there is an easier way of doing this of the data is not
>> treated as a dataframe? The original data file that is
>> derived from another program (mothur) is a textfile with the
>> following format:
>>
>> red \t A,B,C
>> green \t D
>> blue \t E,F
>>
>> The first column "species" is separated from the
>> "sequences"(A, B, C...) with tab, and then the "sequences"
>> are separated from each other with commas.
>>
>> I imported into R as what I thought was a dataframe using:
>>
>> test1<-readLines("path/test")
>> test2<-gsub(pattern= "\t", otu, replacement=",")
>> test3<-textConnection(test2)
>> test.df<-read.csv(test3, header=F)
>>
>> Should I rather have imported it as something else if I want
>> to reshape it into a list as described previously?
>
> Does the following do what you want, where my "txt" should
> resemble the output of your test1, the output of
> readLines("path/test")?
>
>> txt <- c("red \t A,B,C", "green \t D", "blue \t E,F")
>> f <- function (textLines) {
> tmp <- strsplit(textLines, " *\t *")
> letters <- strsplit(vapply(tmp, FUN = `[`, 2, FUN.VALUE = ""),
> ",")
> numLetters <- vapply(letters, FUN = length, FUN.VALUE = 0L)
> data.frame(Species = rep(vapply(tmp, FUN = `[`, 1, FUN.VALUE = ""),
> numLetters), Letter = unlist(letters))
> }
>> f(txt)
> Species Letter
> 1 red A
> 2 red B
> 3 red C
> 4 green D
> 5 blue E
> 6 blue F
>
> vapply() is new in R 2.11.? and is like sapply but lets
> you specify what the return value of FUN is expected to
> be. Thus it gives you some error checking, saves some
> time over sapply, and works nicely when the length of the
> input is 0. If you don't have 2.11 replace with by sapply
> and remove the FUN.VALUE argument.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>>
>> Thanks a million!
>>
>> / Mia Bengtsson
>>
>>
>> On May 21, 2010, at 2:15 AM, Dennis Murphy wrote:
>>
>>> Hi:
>>>
>>>
>>> On Thu, May 20, 2010 at 10:13 AM, Mia Bengtsson
>> <mia.bengtsson at bio.uib.no> wrote:
>>> Hello,
>>>
>>> I am a relatively new R-user who has a lot to learn. I have
>> a large dataset that is in the following dataframe format:
>>>
>>> red A B C
>>> green D
>>> blue E F
>>>
>>> This isn't a data frame in R - if it were, it would have NA
>> (or at least ""/" "padding at the end of each row.
>>> Data frames are not ragged arrays. To have this type of
>> structure in R, the data would have to be in a list.
>>>
>>> This matters because Henrique's solution with reshape()
>> assumes a data frame as input. A similar solution
>>> would be to use melt() in the reshape package, something like
>>>
>>> library(reshape)
>>> longdf <- melt(yourdf, id.var = 'species')
>>> longdf
>>>
>>> If you have NA padding, the way to get rid of them in the
>> reshaped data frame is (with the above approach)
>>>
>>> longdf[!is.na(longdf$value), -longdf$variable]
>>>
>>> If the padding is with blanks, then Henrique's solution
>> works here, too.
>>>
>>> HTH,
>>> Dennis
>>>
>>>
>>> Where red, green and blue are "species" names and A, B and
>> C are observations (corresponding to DNA sequences). Each
>> observation can only belong to one species. I would like to
>> list the observations in one column, with the species they
>> belong to in the next. Like this:
>>>
>>> A red
>>> B red
>>> C red
>>> D green
>>> E blue
>>> F blue
>>>
>>> I have tried using reshape() and stack() but I cannot get
>> my head around it. Any help is highly appreciated!
>>>
>>> Thanks in advance,
>>> __________________________________
>>>
>>> Mia Bengtsson, PhD-student
>>> Department of Biology
>>> University of Bergen
>>> +47 55584715
>>> +47 97413634
>>> mia.bengtsson at bio.uib.no
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
More information about the R-help
mailing list