[R] Merging and extracting data from list

Fri Jan 22 10:09:11 CET 2010

?merge
plyr
data.table
sqldf
crantastic

"Dr. Viviana Menzel" <vivianamenzel at gmx.de> wrote in message 
news:4B58A0E9.3050806 at gmx.de...
Hello R-help group,

I have a question about merging lists. I have two lists:

Genes list (hSgenes)
name    chr    strand    start    end    transStart    transEnd
symbol    description    feature
ENSG00000223972    1    1    11874    14412    11874    14412
DEAD/H box polypeptide 11 like 1DEAD/H box polypeptide 11 like 3DEAD/H
box polypeptide 11 like 9 ;; [Source:UniProtKB/TrEMBL;Acc:B7ZGX0]    gene
ENSG00000227232    1    -1    14363    29570    17551    29343
WASH5P    WAS protein family homolog 5 pseudogene (WASH5P), non-coding
RNA [Source:RefSeq DNA;Acc:NR_024540]    gene
.....

Chers list (chersList)
name    chr    start    end    cellType    antibody    features
maxLevel    score
chr1.cher1    1    859132    859732    human    AB    ENSG00000223764
ENSG00000231958 ENSG00000187634    1.25736038968316    0.664381383074449
chr1.cher2    1    889564    890464    human    AB    ENSG00000188976
1.47884233632064    2.88839131446868
chr1.cher3    1    1106364    1106864    human    AB
ENSG00000162571    1.83795654418115    3.58404359147275
....

In the second list, I want to add a column with the gene description
(obtained from the first list). I used the following method:

chersMergeGenes <-
data.frame(chersList,description=hSgenes$description[match(chersList$features,
hSgenes$name)],symbol=hSgenes$symbol[match(chersList$features,
hSgenes$name)])
write.table(chersMergeGenes, row.names=F, quote=F, sep="\t",
file="chersMergeGenes.txt")

and it works only partially. When chersList$features contains more than
a feature (e.g. ENSG00000223764 ENSG00000231958 ENSG00000187634), it
doesn't work (NA as result).
But I don't know how to split the features to obtain all descriptions.

Can someone give me a hint to do this?

Another problem:

I have following data:

$ENSG00000000003
[1] "GO:0043123" "GO:0004871"

$ENSG00000000419
 [1] "GO:0018406" "GO:0035269" "GO:0006506" "GO:0019348" "GO:0005789"
 [6] "GO:0005624" "GO:0005783" "GO:0033185" "GO:0004582" "GO:0004169"
[11] "GO:0005515"

$ENSG00000000457
[1] "GO:0005737" "GO:0030027" "GO:0005794" "GO:0005515"

I want to extract a list of names ($ENSG00000?????) where go =
GO:0005515. How can I do it?

Thanks on advance

Viviana

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Viviana Menzel
Rottweg 34
35428 Langgöns
Tel.: +49 6403 7748550
Mobil: +49 177 5126092
E-Mail: vivianamenzel at gmx.de
Web: www.dres-menzel.de