[R] Merging and extracting data from list
Matthew Dowle
mdowle at mdowle.plus.com
Fri Jan 22 10:09:11 CET 2010
?merge
plyr
data.table
sqldf
crantastic
"Dr. Viviana Menzel" <vivianamenzel at gmx.de> wrote in message
news:4B58A0E9.3050806 at gmx.de...
Hello R-help group,
I have a question about merging lists. I have two lists:
Genes list (hSgenes)
name chr strand start end transStart transEnd
symbol description feature
ENSG00000223972 1 1 11874 14412 11874 14412
DEAD/H box polypeptide 11 like 1DEAD/H box polypeptide 11 like 3DEAD/H
box polypeptide 11 like 9 ;; [Source:UniProtKB/TrEMBL;Acc:B7ZGX0] gene
ENSG00000227232 1 -1 14363 29570 17551 29343
WASH5P WAS protein family homolog 5 pseudogene (WASH5P), non-coding
RNA [Source:RefSeq DNA;Acc:NR_024540] gene
.....
Chers list (chersList)
name chr start end cellType antibody features
maxLevel score
chr1.cher1 1 859132 859732 human AB ENSG00000223764
ENSG00000231958 ENSG00000187634 1.25736038968316 0.664381383074449
chr1.cher2 1 889564 890464 human AB ENSG00000188976
1.47884233632064 2.88839131446868
chr1.cher3 1 1106364 1106864 human AB
ENSG00000162571 1.83795654418115 3.58404359147275
....
In the second list, I want to add a column with the gene description
(obtained from the first list). I used the following method:
chersMergeGenes <-
data.frame(chersList,description=hSgenes$description[match(chersList$features,
hSgenes$name)],symbol=hSgenes$symbol[match(chersList$features,
hSgenes$name)])
write.table(chersMergeGenes, row.names=F, quote=F, sep="\t",
file="chersMergeGenes.txt")
and it works only partially. When chersList$features contains more than
a feature (e.g. ENSG00000223764 ENSG00000231958 ENSG00000187634), it
doesn't work (NA as result).
But I don't know how to split the features to obtain all descriptions.
Can someone give me a hint to do this?
Another problem:
I have following data:
$ENSG00000000003
[1] "GO:0043123" "GO:0004871"
$ENSG00000000419
[1] "GO:0018406" "GO:0035269" "GO:0006506" "GO:0019348" "GO:0005789"
[6] "GO:0005624" "GO:0005783" "GO:0033185" "GO:0004582" "GO:0004169"
[11] "GO:0005515"
$ENSG00000000457
[1] "GO:0005737" "GO:0030027" "GO:0005794" "GO:0005515"
I want to extract a list of names ($ENSG00000?????) where go =
GO:0005515. How can I do it?
Thanks on advance
Viviana
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Viviana Menzel
Rottweg 34
35428 Langgöns
Tel.: +49 6403 7748550
Mobil: +49 177 5126092
E-Mail: vivianamenzel at gmx.de
Web: www.dres-menzel.de
More information about the R-help
mailing list