[R] how to find number of unique rows for combination of r columns
Gerrit Eichner
gerr|t@e|chner @end|ng |rom m@th@un|-g|e@@en@de
Fri Nov 8 15:50:59 CET 2019
Hi, Ana,
doesn't
udt <- unique(dt[c("chr", "pos", "gene_id")])
nrow(udt)
get close to what you want?
Hth -- Gerrit
---------------------------------------------------------------------
Dr. Gerrit Eichner Mathematical Institute, Room 212
gerrit.eichner using math.uni-giessen.de Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany
http://www.uni-giessen.de/eichner
---------------------------------------------------------------------
Am 08.11.2019 um 15:38 schrieb Ana Marija:
> Hello,
>
> I have a data frame like this:
>
>> head(dt,20)
> chr pos gene_id pval_nominal pval_ret wl wr
> 1: chr1 54490 ENSG00000227232 0.6084950 0.7837780 31.62278 21.2838
> 2: chr1 58814 ENSG00000227232 0.2952110 0.8975820 31.62278 21.2838
> 3: chr1 60351 ENSG00000227232 0.4397880 0.8679590 31.62278 21.2838
> 4: chr1 61920 ENSG00000227232 0.3195280 0.6018090 31.62278 21.2838
> 5: chr1 63671 ENSG00000227232 0.2377390 0.9880390 31.62278 21.2838
> 6: chr1 64931 ENSG00000227232 0.2766790 0.9070370 31.62278 21.2838
> 7: chr1 81587 ENSG00000227232 0.6057930 0.6167630 31.62278 21.2838
> 8: chr1 115746 ENSG00000227232 0.4078770 0.7799110 31.62278 21.2838
> 9: chr1 135203 ENSG00000227232 0.4078770 0.9299130 31.62278 21.2838
> 10: chr1 138593 ENSG00000227232 0.8464560 0.5696060 31.62278 21.2838
>
> it is very big,
>> dim(dt)
> [1] 73719122 8
>
> To count number of unique rows for all 3 columns: chr, pos and gene_id
> I could just join those 3 columns and than count. But how would I find
> unique number of rows for these 4 columns without joining them?
>
> Thanks
> Ana
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list