[BioC] Problem with removing duplicated probes of datasets without annotation
Wolfgang Huber
whuber at embl.de
Sun Jun 22 10:49:31 CEST 2014
Kaj
this is an R question, code like the following would do the job
x = … a data.frame with columns ‘probeid’ and ‘pvalue’ ...
s = split( seq_len(nrow(x)), x$probeid)
uniqueids = sapply( s, function(i) i[which.min(x$pvalue[i])] )
And you can replace what’s inside the ‘which.min(…)’ expression with whatever pleases you.
There are plenty of places in vignettes etc. where this type of operation is done. One I happen to be aware of right now is inside the function ‘myHeatmap’ of the ‘Hiiragi2013’ package.
Wolfgang Huber
On 22 Jun 2014, at 10:02, Kaj Chokeshaiusaha [guest] <guest at bioconductor.org> wrote:
> Dear R helpers,
>
> I'm working with the goat dataset with no available annotation db. For this reason, I use the 'genefilter' instead of 'nsFilter' function with ANOVA (p<0.05) (available in 'genefilter' package). The problem is that I have the filtered data with 500 ducplicated probes of which I want to remove.
>
> Due to my limited ability, I cannot figure out how to do them. It would be great if I can either select a probe of each duplicates with lowest p-value or most variance.
>
> Would you please help me with some examples?
>
> Best Regards,
> Kaj
>
> -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] Biobase_2.24.0 BiocGenerics_0.10.0 genefilter_1.46.1
>
> loaded via a namespace (and not attached):
> [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7
> [4] GenomeInfoDb_1.0.2 IRanges_1.22.9 RSQLite_0.11.4
> [7] splines_3.1.0 stats4_3.1.0 survival_2.37-7
> [10] tcltk_3.1.0 tools_3.1.0 XML_3.98-1.1
> [13] xtable_1.7-3
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list