[BioC] Convert KEGGpathway to nuID in lumi
Martin Morgan
mtmorgan at fhcrc.org
Tue Mar 3 06:33:49 CET 2009
Hi Allen --
ss <affysnp at gmail.com> writes:
> Dear all,
> I'd appreciate your input if you could.
> I first normalized data, and it is called john2. We are interested in Notch
> signaling
> pathway and want to get the expression values for all genes on this pathway.
> So
> I tried the below (it does not look neat...sorry that I am new to R coding):
Here's a simple alternative...
library(lumi)
data(example.lumi)
annotation(example.lumi) <- "lumiHumanAll"
the last line above is a hack, the annotation package 'lumiHumanAllV1"
is not available to me, and the rest of the steps require a valid
annotation package. It's worth getting the LumiBatch right up front,
so you can rely on the software to do things like subsetting and
keeping track of the annotation information for you.
Now get the relevant probe IDs, select just those that are in the
LumiBatch, and subset
pid <- toTable(lumiHumanAllPATH2PROBE["04330"])$probe_id
ok_pid <- pid[pid %in% featureNames(example.lumi)]
notchSet <- example.lumi[ok_pid,]
You now have a mini-LumiBatch that you can manipulate any way you
like, e.g.,
> exprs(notchSet)
A01 A02 B01 B02
NqbdUs8V6OMoPh1puQ 1854.2 2004.0 1787.4 1728.9
ElXciyeulKjgX5Ulus 506.6 522.9 428.4 502.7
KUXd9XWh3ASgfgx6iI 185.0 288.8 164.6 258.6
Npzx7j5NWp6CpNB5TM 115.2 123.8 108.2 104.3
B.VJf357yqcU6hSd6U 1175.6 1560.4 1449.6 1362.0
NrAl0kElLoh6BEFMl4 253.1 275.1 212.7 201.9
0TVfhL5Jd_F2jBDCIk 3892.0 5297.3 4867.7 4043.8
Teee5.43efFvQim7ko 1222.2 1908.2 1713.1 1485.6
fooKDKRK3dQLg4HfFQ 1753.2 1911.4 1751.7 1732.7
0niXHt19Ue1.dSldI4 2280.6 3742.3 2651.0 3072.9
ZkeX1uG2W_RzoS_d8I 204.8 241.2 189.8 211.4
3CuC6Kue64npR1Inks 1082.0 1379.7 1068.1 1212.2
Hylf6kIO1txWn1FFd4 155.7 175.0 153.6 164.0
iNFN9SNxaVZ6rlfqhw 1132.2 1380.5 1227.5 1184.4
KRUFAvUqf9CgeFEkuE 185.2 215.0 179.5 188.3
xaiVLOqoKt8XSS8VR8 294.9 320.5 325.3 245.0
cXu3S3W0t7f3Wd93hU 1873.4 2201.1 2042.0 1692.0
> syms <- getSYMBOL(featureNames(notchSet), annotation(notchSet))
> data.frame(exprs(notchSet), GeneSymbol=syms)
A01 A02 B01 B02 GeneSymbol
NqbdUs8V6OMoPh1puQ 1854.2 2004.0 1787.4 1728.9 CREBBP
ElXciyeulKjgX5Ulus 506.6 522.9 428.4 502.7 KAT2A
KUXd9XWh3ASgfgx6iI 185.0 288.8 164.6 258.6 HES1
Npzx7j5NWp6CpNB5TM 115.2 123.8 108.2 104.3 JAG2
B.VJf357yqcU6hSd6U 1175.6 1560.4 1449.6 1362.0 MFNG
NrAl0kElLoh6BEFMl4 253.1 275.1 212.7 201.9 PSEN1
0TVfhL5Jd_F2jBDCIk 3892.0 5297.3 4867.7 4043.8 NUMB
Teee5.43efFvQim7ko 1222.2 1908.2 1713.1 1485.6 NCOR2
fooKDKRK3dQLg4HfFQ 1753.2 1911.4 1751.7 1732.7 SNW1
0niXHt19Ue1.dSldI4 2280.6 3742.3 2651.0 3072.9 NCSTN
ZkeX1uG2W_RzoS_d8I 204.8 241.2 189.8 211.4 DLL1
3CuC6Kue64npR1Inks 1082.0 1379.7 1068.1 1212.2 APH1A
Hylf6kIO1txWn1FFd4 155.7 175.0 153.6 164.0 DLL4
iNFN9SNxaVZ6rlfqhw 1132.2 1380.5 1227.5 1184.4 PSENEN
KRUFAvUqf9CgeFEkuE 185.2 215.0 179.5 188.3 MAML2
xaiVLOqoKt8XSS8VR8 294.9 320.5 325.3 245.0 PTCRA
cXu3S3W0t7f3Wd93hU 1873.4 2201.1 2042.0 1692.0 DTX3
And another way, a little twisted... Load some packages and a sample
data set
library(GSEABase)
Then create a GeneSetCollection, in general one for each KEGG id, but
for you a collection of one
gsc <- GeneSetCollection(example.lumi, setType=KEGGCollection("04330"))
And then use the first (and only!) gene set in the collection to
subset example.lumi
notchSet <- example.lumi[gsc[[1]],]
Martin
> ######KEGG:Notch#####################
> ######http://www.genome.ad.jp/kegg/pathway/hsa/hsa04330.html######
> Notch<-mget("04330",lumiHumanAllPATH2PROBE,ifnotfound = NA)
> write.table(Notch, file="Notch.txt",quote=FALSE,row.names=FALSE,col.names =
> FALSE)
> Notch1<-read.table('Notch.txt',header=FALSE)
> Notch_geneSymbol <- getSYMBOL(as.vector(Notch1[,1]), 'lumiHumanAll.db')
>
> ########Retrieve the normalized data#########
> dataMatrix <- exprs(john2)
> probeList <- rownames(dataMatrix)
> geneSymbol <- getSYMBOL(probeList, 'lumiHumanAll.db')
> anno<-data.frame(dataMatrix,geneSymbol=geneSymbol)
>
> ########Subset expression data for Notch genes#######
> b<-anno[as.vector(Notch1[,1]),]
> dim(b)
>
>
> You will see many empty rows in b filled with "NA". I wonder what
> the reason could be.
>
> Thank you so much!
>
>
> Allen
>
>
>> b[1:10,1:3]
> X451Lu X451Lu885 geneSymbol
> NA NA NA <NA>
> QeUV54t73rDarMMp3k 9.220161 9.577290 JAG1
> NqbdUs8V6OMoPh1puQ 10.386946 10.960459 CREBBP
> Te17_XV.gA5S4QmAUE 8.250515 8.325535 CREBBP
> KghpNGCSdQCgCfqa5U 10.038218 9.930820 CTBP1
> TnYhDwobqOKdYhKGa8 9.509115 9.382045 CTBP1
> NA.1 NA NA <NA>
> iAW0JWKaMiFGUuFL7I 8.724987 8.597907 CTBP1
> NA.2 NA NA <NA>
> NA.3 NA NA <NA>
>
>> Notch_geneSymbol[1:10]
> QTei90CF6p57gnoHmo QeUV54t73rDarMMp3k NqbdUs8V6OMoPh1puQ Te17_XV.gA5S4QmAUE
> "JAG1" "JAG1" "CREBBP" "CREBBP"
> KghpNGCSdQCgCfqa5U TnYhDwobqOKdYhKGa8 f2IQ8KG6jinWIShmvc iAW0JWKaMiFGUuFL7I
> "CTBP1" "CTBP1" "CTBP1" "CTBP1"
> NpkSqRU027UU5_RSBQ K3FQQYgKWgrnmnnTRo
> "CTBP2" "DTX1"
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
More information about the Bioconductor
mailing list