[R] Correct subsetting in R
peter dalgaard
pdalgd at gmail.com
Thu Nov 2 12:08:31 CET 2017
> On 1 Nov 2017, at 18:03 , Elahe chalabi via R-help <r-help at r-project.org> wrote:
>
> But they row.names() cannot give me the IDs
>
Is "training" extracted from "data" using standard data frame indexing? If so, data[row.names(training), "ID"] should give you the relevant values.
If not, then you are in trouble because you cannot tell the difference between two IDs that have identical responses in columns 2:608. You might proceed with something like
signature1 <- do.call("paste", data)
any(duplicated(signature1)) # if TRUE you're not quite happy because two or more IDs are indistinguishable.
signature2 <- do.call("paste", data)
m <- match(signature2, signature1)
any(duplicated(m)) # ouch if TRUE... will require more thought
any(is.na(m)) # even more ouch, if TRUE...
data$ID[m]
-pd
>
>
>
>
>
> On Wednesday, November 1, 2017 9:45 AM, David Wolfskill <r at catwhisker.org> wrote:
>
>
>
> On Wed, Nov 01, 2017 at 04:13:42PM +0000, Elahe chalabi via R-help wrote:
>
>> Hi all,
>> I have two data frames that one of them does not have the column ID:
>>
>>> str(data)
>> 'data.frame': 499 obs. of 608 variables:
>> $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
>> $ alright : int 1 0 0 0 0 0 0 1 2 1 ...
>> $ bad : int 1 0 0 0 0 0 0 0 0 0 ...
>> $ boy : int 1 2 1 1 0 2 2 4 2 1 ...
>> $ cooki : int 1 2 2 1 0 1 1 4 2 3 ...
>> $ curtain : int 1 0 0 0 0 2 0 2 0 0 ...
>> $ dish : int 2 1 0 1 0 0 1 2 2 2 ...
>> $ doesnt : int 1 0 0 0 0 0 0 0 1 0 ...
>> $ dont : int 2 1 4 2 0 0 2 1 2 0 ...
>> $ fall : int 3 1 0 0 1 0 1 2 3 2 ...
>> $ fell : int 1 0 0 0 0 0 0 0 0 0 ...
>>
>> and the other one is:
>>
>>> str(training)
>> 'data.frame': 375 obs. of 607 variables:
>> $ alright : num 1 0 0 0 1 2 1 0 0 0 ...
>> $ bad : num 1 0 0 0 0 0 0 0 0 0 ...
>> $ boy : num 1 1 2 2 4 2 1 0 1 0 ...
>> $ cooki : num 1 1 1 1 4 2 3 1 2 2 ...
>> $ curtain : num 1 0 2 0 2 0 0 0 0 0 ...
>> $ dish : num 2 1 0 1 2 2 2 1 4 1 ...
>> $ doesnt : num 1 0 0 0 0 1 0 0 0 0 ...
>> $ dont : num 2 2 0 2 1 2 0 0 1 0 ...
>> $ fall : num 3 0 0 1 2 3 2 0 2 0 ...
>> $ fell : num 1 0 0 0 0 0 0 0 0 0 ...
>> Does anyone know how should I get the IDs of training from data?
>> thanks for any help!
>> Elahe
>> ....
>
> row.names() appears to be what is wanted.
>
> Peace,
> david
> --
> David H. Wolfskill r at catwhisker.org
> Unsubstantiated claims of "Fake News" are evidence that the claimant lies again.
>
> See http://www.catwhisker.org/~david/publickey.gpg for my public key.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list