[R] help sub setting data frame
Ista Zahn
istazahn at gmail.com
Fri Oct 23 00:24:25 CEST 2009
Is this what you want?
df = data.frame('id'=c(1:100),'res'=c(1001:1100))
dfb=df[1:10,]
dfc = df[df$id %in% dfb$id,]
Still not sure, but that's my best guess. Going back to your original
data you can try
dfb = chkPd[chkPd$PN %in% df$PN,]
Hope it helps,
Ista
On Thu, Oct 22, 2009 at 6:10 PM, Sean MacEachern <sean.maceach at gmail.com> wrote:
> Hi Ista,
>
> I think I'm suffering long dayitis myself. You are probably right. I
> don't use subset that often. I typically use brackets to subset
> dataframes. Essentially what I am trying to do is take my original
> dataframe (chkPd) and subset it using a smaller dataframe with some
> matching PN IDs. They are only a few hundred rows different in size so
> subset wouldn't be appropriate here. I'm just struggling to figure out
> what's going wrong in my first example.
> for instance if I try:
>> df = data.frame('id'=c(1,2,3,4),'res'=c(10,10,20,20))
>> dfb=df[1:2]
>> dfc = df[dfb$id,]
>
> I get something along the lines of what I'd expect where my new
> dataframe is a subset of the original based on the matching ids I
> specified in dfb$id. Is that wrong in my first example?
>
> Cheers,
>
> Sean
>
> On Thu, Oct 22, 2009 at 4:55 PM, Ista Zahn <istazahn at gmail.com> wrote:
>> Hi Sean,
>> Comment in line below.
>>
>> On Thu, Oct 22, 2009 at 5:39 PM, Sean MacEachern <sean.maceach at gmail.com> wrote:
>>> Hi,
>>>
>>> I'm running into a problem subsetting a data frame that I have never
>>> encountered before:
>>>
>>>> dim(chkPd)
>>> [1] 3213 6
>>>
>>>> df = head(chkPd)
>>>> df
>>> PN WB Sire Dam MG SEX
>>> 601 1001 715349 61710 61702 67 F
>>> 969 1001_1 511092 616253 615037 168 F
>>> 986 1002_1 511082 616253 623905 168 F
>>> 667 1003 715617 61817 61441 67 F
>>> 1361 1003_1 510711 635246 627321 168 F
>>> 754 1004 715272 62356 61380 67 F
>>>
>>>
>>>> dfb = chkPd[df$PN,]
>>>> dfb
>>> PN WB Sire Dam MG SEX
>>> 1001 2114_1 510944 616294 614865 168 M
>>> NA <NA> NA <NA> <NA> NA <NA>
>>> NA.1 <NA> NA <NA> <NA> NA <NA>
>>> 1003 1130_1 510950 616294 619694 168 F
>>> NA.2 <NA> NA <NA> <NA> NA <NA>
>>> 1004 2221-SHR2 510952 616294 619694 168 M
>>>
>>>
>>> I'm not sure why I'm getting this behaviour? By sub-setting the
>>> original data frame by PN I seem to be pulling out row numbers?
>>> Therefore I am only getting results where PN is less than the
>>> dimensions of the original data frame and of course nothing where PN
>>> has _ in the id. I have also tried using subset but haven't had any
>>> luck with that either.
>>
>> That is the documented behavior as far as I can tell. See
>>
>> ?"[.data.frame"
>>
>> Maybe my brain is going soft at the end of a long day, but I can't
>> tell what you're trying to do. Can you clarify?
>>
>> -Ista
>>
>>>
>>>
>>>>dfb = subset(chkPd, PN==df$PN)
>>> Warning message:
>>> In PN == df$PN :
>>> longer object length is not a multiple of shorter object length
>>>
>>> I wasn't aware that both the larger data frame had to be a multiple of
>>> the object you were sub-setting . In any case I would appreciate any
>>> insight into what I may be doing wrong.
>>>
>>> Cheers,
>>>
>>> Sean
>>>
>>>
>>>> sessionInfo()
>>> R version 2.9.1 (2009-06-26)
>>> i386-apple-darwin8.11.1
>>>
>>> locale:
>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] splines stats graphics grDevices utils datasets methods base
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Ista Zahn
>> Graduate student
>> University of Rochester
>> Department of Clinical and Social Psychology
>> http://yourpsyche.org
>>
>
--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org
More information about the R-help
mailing list