[R] flexible approach to subsetting data
David Winsemius
dwinsemius at comcast.net
Tue Jul 23 20:12:21 CEST 2013
On Jul 23, 2013, at 10:49 AM, David Winsemius wrote:
>
> On Jul 23, 2013, at 10:01 AM, Adams, Jean wrote:
>
>> Check out the reshape() function of the reshape package. Here's one of the
>> examples from ?reshape.
>>
>> Jean
>>
>>
>> library(reshape) # No, at least not for the reshape-function
>
> The reshape function is from the 'base' package. The 'reshape' and 'reshape2' packages were written (at least in part) because the 'reshape'-function was so difficult to understand.
>
> If you do choose to use the reshape2 package, which is well-respected and often extremely helpful, the function you will want to start with is 'melt'.
>
>
>> long <- reshape(wide, direction="long")
>
> I don't think this example will be particularly helpful since the initial direction is "long" (from "wide") and more input would be needed.
Here's a dataset to experiment with
df5 <- data.frame(dose.0 = c(40,50,60,50),resp.0=c(40,50,60,50),
dose.1 = c(1,2,1,2), resp.1=c(1,2,1,2)+3,
dose.2 = c(2,1,2,1), resp.2=c(1,2,1,2)+3,
dose.3 = c(3,3,3,3), resp.3=c(1,2,1,2)+3 )
Notice that you would need add the ".0" to the column names
reshape(df5, direction="long",
v.names=c("dose", "resp"),
varying=list(dose=c(1,3,5,7), resp=c(2,4,6,8) )
) # succeeds
So perhaps could use similar call (after append the ".0"'s) with:
varying=list(sim=seq(1,810,by=4),
X1= seq(2,810,by=4),
X2= seq(3,810,by=4),
X3= seq(4,810,by=4)
)
>
>
>> wide
>> long
>>
>>
>>
>> On Tue, Jul 23, 2013 at 9:35 AM, Andrea Lamont <alamont082 at gmail.com> wrote:
>>
>>> Hello:
>>>
>>> I am running a simulation study and am stuck with a subsetting problem.
>>>
>>> Here is the basic issue:
>>> I generated data and am running a simulation that uses multiple imputation.
>>> For each generated dataset, I used multiple imputation. The resultant
>>> dataset is in wide for where each imputation is recorded as a separate
>>> column (though the different simulations are stacked). Here is an example
>>> of what it looks like:
>>>
>>> sim X1 X2 X3 sim.1 X1.1 X1.1 X3.1
>
>>> 1 # # # # # # #
>>> 1 # # # # # # #
>>> 1 # # # # # # #
>>> 2 # # # # # # #
>>> 2 # # # # # # #
>>> 2 # # # # # # #
>>>
>>> sim refers to the simulated/generated dataset. X1-X3 are the values for the
>>> first imputed dataset, X1.1-X3.1 are the values for the second imputed
>>> dataset.
>>>
>>> The problem is that I want the data to be in long format, like this:
>>>
>>> sim m X1 X2 X3
>>> 1 1 # # #
>>> 1 2 # # #
>>> 2 1 # # #
>>> 2 2 # # #
>>>
>>> where m is the imputation number.
>>> This will allow me to do cleaner calculations (e.g. X3-X1).
>>>
>>> I know I can subset the data manually - e.g. [,1:10] and save this to
>>> separate datasets then rbind; however, I'm looking for a more flexible
>>> approach to do this. This manual approach would be quite tedious as number
>>> of imputations (and therefore number of columns) increased (with only 10
>>> imputations, there are roughly 810 columns). Also,I would like to
>>> avoid having to recode each time I change the number of imputations.
>>>
>>> THe same is true for the reshape function, which would require naming
>>> a huge number of columns and edits each time 'm' changes.
>
> If the columns are named regularly, then 'reshape' will attempt to split properly without an explicit naming. Details and a better description of the problem might allow more specific answers to emerge. The fact that the first instances have no numeric indicators may be a problem for the algorithm.
>
> Why not post dput(head( dfrm[ ,1:12]))
>
> --
> David.
>
>>>
>>>
>>> Is there a flexible way to approach this? I'm inclined to use a for loop,
>>> but know that 1) this is generally inefficient and 2) am having trouble
>>> with
>>> the coding regardless.
>>>
>>> Any suggestions are appreciated.
>>>
>>> Thanks,
>>> Andrea
>>>
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list