[BioC] Convert KEGGpathway to nuID in lumi

Tue Mar 3 06:33:49 CET 2009

Hi Allen --

ss <affysnp at gmail.com> writes:

> Dear all,
> I'd appreciate your input if you could.
> I first normalized data, and it is called john2. We are interested in Notch
> signaling
> pathway and want to get the expression values for all genes on this pathway.
> So
> I tried the below (it does not look neat...sorry that I am new to R coding):

Here's a simple alternative...

  library(lumi)
  data(example.lumi)
  annotation(example.lumi) <- "lumiHumanAll"

the last line above is a hack, the annotation package 'lumiHumanAllV1"
is not available to me, and the rest of the steps require a valid
annotation package. It's worth getting the LumiBatch right up front,
so you can rely on the software to do things like subsetting and
keeping track of the annotation information for you.

Now get the relevant probe IDs, select just those that are in the
LumiBatch, and subset

  pid <- toTable(lumiHumanAllPATH2PROBE["04330"])$probe_id
  ok_pid <- pid[pid %in% featureNames(example.lumi)]
  notchSet <- example.lumi[ok_pid,]

You now have a mini-LumiBatch that you can manipulate any way you
like, e.g.,

> exprs(notchSet)
                      A01    A02    B01    B02
NqbdUs8V6OMoPh1puQ 1854.2 2004.0 1787.4 1728.9
ElXciyeulKjgX5Ulus  506.6  522.9  428.4  502.7
KUXd9XWh3ASgfgx6iI  185.0  288.8  164.6  258.6
Npzx7j5NWp6CpNB5TM  115.2  123.8  108.2  104.3
B.VJf357yqcU6hSd6U 1175.6 1560.4 1449.6 1362.0
NrAl0kElLoh6BEFMl4  253.1  275.1  212.7  201.9
0TVfhL5Jd_F2jBDCIk 3892.0 5297.3 4867.7 4043.8
Teee5.43efFvQim7ko 1222.2 1908.2 1713.1 1485.6
fooKDKRK3dQLg4HfFQ 1753.2 1911.4 1751.7 1732.7
0niXHt19Ue1.dSldI4 2280.6 3742.3 2651.0 3072.9
ZkeX1uG2W_RzoS_d8I  204.8  241.2  189.8  211.4
3CuC6Kue64npR1Inks 1082.0 1379.7 1068.1 1212.2
Hylf6kIO1txWn1FFd4  155.7  175.0  153.6  164.0
iNFN9SNxaVZ6rlfqhw 1132.2 1380.5 1227.5 1184.4
KRUFAvUqf9CgeFEkuE  185.2  215.0  179.5  188.3
xaiVLOqoKt8XSS8VR8  294.9  320.5  325.3  245.0
cXu3S3W0t7f3Wd93hU 1873.4 2201.1 2042.0 1692.0

> syms <- getSYMBOL(featureNames(notchSet), annotation(notchSet))
> data.frame(exprs(notchSet), GeneSymbol=syms)
                      A01    A02    B01    B02 GeneSymbol
NqbdUs8V6OMoPh1puQ 1854.2 2004.0 1787.4 1728.9     CREBBP
ElXciyeulKjgX5Ulus  506.6  522.9  428.4  502.7      KAT2A
KUXd9XWh3ASgfgx6iI  185.0  288.8  164.6  258.6       HES1
Npzx7j5NWp6CpNB5TM  115.2  123.8  108.2  104.3       JAG2
B.VJf357yqcU6hSd6U 1175.6 1560.4 1449.6 1362.0       MFNG
NrAl0kElLoh6BEFMl4  253.1  275.1  212.7  201.9      PSEN1
0TVfhL5Jd_F2jBDCIk 3892.0 5297.3 4867.7 4043.8       NUMB
Teee5.43efFvQim7ko 1222.2 1908.2 1713.1 1485.6      NCOR2
fooKDKRK3dQLg4HfFQ 1753.2 1911.4 1751.7 1732.7       SNW1
0niXHt19Ue1.dSldI4 2280.6 3742.3 2651.0 3072.9      NCSTN
ZkeX1uG2W_RzoS_d8I  204.8  241.2  189.8  211.4       DLL1
3CuC6Kue64npR1Inks 1082.0 1379.7 1068.1 1212.2      APH1A
Hylf6kIO1txWn1FFd4  155.7  175.0  153.6  164.0       DLL4
iNFN9SNxaVZ6rlfqhw 1132.2 1380.5 1227.5 1184.4     PSENEN
KRUFAvUqf9CgeFEkuE  185.2  215.0  179.5  188.3      MAML2
xaiVLOqoKt8XSS8VR8  294.9  320.5  325.3  245.0      PTCRA
cXu3S3W0t7f3Wd93hU 1873.4 2201.1 2042.0 1692.0       DTX3

And another way, a little twisted... Load some packages and a sample
data set

  library(GSEABase)

Then create a GeneSetCollection, in general one for each KEGG id, but
for you a collection of one

  gsc <- GeneSetCollection(example.lumi, setType=KEGGCollection("04330"))

And then use the first (and only!) gene set in the collection to
subset example.lumi

  notchSet <- example.lumi[gsc[[1]],]

Martin

> ######KEGG:Notch#####################
> ######http://www.genome.ad.jp/kegg/pathway/hsa/hsa04330.html######
> Notch<-mget("04330",lumiHumanAllPATH2PROBE,ifnotfound = NA)
> write.table(Notch, file="Notch.txt",quote=FALSE,row.names=FALSE,col.names =
> FALSE)
> Notch1<-read.table('Notch.txt',header=FALSE)
> Notch_geneSymbol <- getSYMBOL(as.vector(Notch1[,1]), 'lumiHumanAll.db')
>
> ########Retrieve the normalized data#########
> dataMatrix <- exprs(john2)
> probeList <- rownames(dataMatrix)
> geneSymbol <- getSYMBOL(probeList, 'lumiHumanAll.db')
> anno<-data.frame(dataMatrix,geneSymbol=geneSymbol)
>
> ########Subset expression data for Notch genes#######
> b<-anno[as.vector(Notch1[,1]),]
> dim(b)
>
>
> You will see many empty rows in b filled with "NA". I wonder what
> the reason could be.
>
> Thank you so much!
>
>
> Allen
>
>
>> b[1:10,1:3]
>                       X451Lu X451Lu885 geneSymbol
> NA                        NA        NA       <NA>
> QeUV54t73rDarMMp3k  9.220161  9.577290       JAG1
> NqbdUs8V6OMoPh1puQ 10.386946 10.960459     CREBBP
> Te17_XV.gA5S4QmAUE  8.250515  8.325535     CREBBP
> KghpNGCSdQCgCfqa5U 10.038218  9.930820      CTBP1
> TnYhDwobqOKdYhKGa8  9.509115  9.382045      CTBP1
> NA.1                      NA        NA       <NA>
> iAW0JWKaMiFGUuFL7I  8.724987  8.597907      CTBP1
> NA.2                      NA        NA       <NA>
> NA.3                      NA        NA       <NA>
>
>> Notch_geneSymbol[1:10]
> QTei90CF6p57gnoHmo QeUV54t73rDarMMp3k NqbdUs8V6OMoPh1puQ Te17_XV.gA5S4QmAUE
>             "JAG1"             "JAG1"           "CREBBP"           "CREBBP"
> KghpNGCSdQCgCfqa5U TnYhDwobqOKdYhKGa8 f2IQ8KG6jinWIShmvc iAW0JWKaMiFGUuFL7I
>            "CTBP1"            "CTBP1"            "CTBP1"            "CTBP1"
> NpkSqRU027UU5_RSBQ K3FQQYgKWgrnmnnTRo
>            "CTBP2"             "DTX1"
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793