[R] R help for creating expression data of Differentially expressed genes
Vivek Das
vd4mmind at gmail.com
Wed May 8 00:07:26 CEST 2013
HI Arun,
My data sets are as in the provided files. I am providing the sample files.
I guess this will give a better idea to the type of working I want to do
with the two files and the kind or script am trying to write. Hope you can
give me some suggestions regarding this. I am new to R so having trouble to
use different functions to use this for my working.
Anyone who can help me out with this can be of great help.
----------------------------------------------------------
Vivek Das
PhD Student in Computational Biology
Giuseppe Testa's Lab
European School of Molecular Medicine
IFOM-IEO Campus
Via Adamello, 16
Milan, Italy
emails: vivek.das at ieo.eu
vchris_05 at yahoo.co.in
vd4mmind at gmail.com
On Tue, May 7, 2013 at 10:36 PM, arun <smartpink111 at yahoo.com> wrote:
> Hi Vivek,
>
> May be this helps:
> set.seed(35)
> dat1<- cbind(ID=1:8,
> as.data.frame(matrix(sample(1:50,8*7,replace=TRUE),ncol=7)))
>
> set.seed(38)
> dat2<- cbind(ID= sample(1:20,8,replace=FALSE),
> as.data.frame(matrix(sample(1:50,8*33,replace=TRUE),ncol=33)))
> colnames(dat2)[-1]<-gsub("V","X",colnames(dat2)[-1])
> merge(dat1[,1:2],dat2[,1:31],by="ID")
> # ID V1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18
> X19 X20
> #1 1 43 44 4 33 47 29 43 31 15 2 34 42 5 18 22 36 34 44 3
> 45 9
> #2 3 28 4 18 45 24 5 20 30 16 49 34 33 5 24 49 31 10 45 21
> 26 20
> #3 6 5 16 1 5 2 26 6 40 16 15 50 26 37 22 25 39 16 24 29
> 50 42
> #4 7 25 26 39 16 29 5 40 15 27 46 16 38 36 42 8 3 29 7 13
> 18 38
> #5 8 30 3 41 25 38 24 41 44 23 2 45 33 10 18 20 49 19 23 42
> 25 5
> # X21 X22 X23 X24 X25 X26 X27 X28 X29 X30
> #1 14 27 3 21 6 44 33 42 10 29
> #2 48 13 8 47 18 9 23 9 44 3
> #3 25 14 31 19 14 6 26 13 6 49
> #4 43 28 15 6 9 19 43 21 41 21
> #5 1 27 18 3 42 5 16 39 46 47
> A.K.
>
>
>
> ----- Original Message -----
> From: Vivek Das <vd4mmind at gmail.com>
> To: arun <smartpink111 at yahoo.com>
> Cc:
> Sent: Tuesday, May 7, 2013 3:45 PM
> Subject: R help for creating expression data of Differentially expressed
> genes
>
> Hi Arun,
>
> I need some help regarding R scripting. I have two data file one
> containing seven columns and the other containing 33. Both files have
> unique identifier as ID. I want to create another file which should have
> the first two columns of the first file and and the 31 columns of the
> second file matched on the basis of ID. The first file is having gene I'd
> and gene names of around 500 and I want the output file which is having all
> of those and other attributes as well. I want to get the output file having
> all attributes matching with the I'd of the first file. So that I get
> output of 500 rows with all the attributes of second file. I am new to R
> but having trouble with merge function in R. If you can help it will be
> great.
>
> Regards,
> Vivek
>
> Sent from my iPad
>
> On 07/mag/2013, at 21:13, arun <smartpink111 at yahoo.com> wrote:
>
> > HI Ye,
> >
> > For the NA in ID column,
> >
> >
> >
> > Hi
> > dat1<- read.table(text="
> > ObsNumber ID Weight
> > 1 0001 12
> > 2 0001 13
> > 3 0001 14
> > 4 0002 16
> > 5 0002 17
> > 6 N/A 18
> >
> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"),na.strings="N/A")
> > unlist(lapply(split(dat1,dat1$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> > #[1] "0001_1" "0001_2" "0001_3" "0002_1" "0002_2"
> > A.K.
> > ________________________________
> > From: Ye Lin <yelin at lbl.gov>
> > To: arun <smartpink111 at yahoo.com>
> > Cc: R help <r-help at r-project.org>
> > Sent: Tuesday, May 7, 2013 2:54 PM
> > Subject: Re: [R] create unique ID for each group
> >
> >
> >
> > Thanks A.K. But I have "NA" in ID column, so when I apply the code, it
> gives me error saying the replacement as less rows than the data has.
> Anyway for ID=N/A, return sth like "N/A_1" in order as well?
> >
> >
> >
> >
> >
> >
> > On Tue, May 7, 2013 at 11:17 AM, arun <smartpink111 at yahoo.com> wrote:
> >
> > H,
> >> Sorry, a mistake:
> >> dat1$UniqueID<-unlist(lapply(split(dat1,dat1$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> >> dat1
> >> # ObsNumber ID Weight UniqueID
> >> #1 1 0001 12 0001_1
> >> #2 2 0001 13 0001_2
> >> #3 3 0001 14 0001_3
> >> #4 4 0002 16 0002_1
> >> #5 5 0002 17 0002_2
> >>
> >> dat2$UniqueID<-unlist(lapply(split(dat2,dat2$ID),function(x)
> with(x,as.character(interaction(ID,seq_len(nrow(x)),sep="_")))),use.names=FALSE)
> >>
> >> A.K.
> >>
> >>
> >>
> >>
> >>
> >> ----- Original Message -----
> >>
> >> From: arun <smartpink111 at yahoo.com>
> >> To: Ye Lin <yelin at lbl.gov>
> >> Cc: R help <r-help at r-project.org>
> >> Sent: Tuesday, May 7, 2013 2:10 PM
> >> Subject: Re: [R] create unique ID for each group
> >>
> >>
> >>
> >> Hi,
> >>
> >> Try this:
> >> dat1<- read.table(text="
> >> ObsNumber ID Weight
> >> 1 0001 12
> >> 2 0001 13
> >> 3 0001 14
> >> 4 0002 16
> >> 5 0002 17
> >> ",sep="",header=TRUE,colClass=c("numeric","character","numeric"))
> >> dat2<- read.table(text="
> >> ID Height
> >> 0001 3.2
> >> 0001 2.6
> >> 0001 3.2
> >> 0002 2.2
> >> 0002 2.6
> >> ",sep="",header=TRUE,colClass=c("character","numeric"))
> >>
> dat1$UniqueID<-with(dat1,as.character(interaction(ID,ObsNumber,sep="_")))
> >>
> dat2$UniqueID<-with(dat2,as.character(interaction(ID,rownames(dat2),sep="_")))
> >> dat2
> >> # ID Height UniqueID
> >> #1 0001 3.2 0001_1
> >> #2 0001 2.6 0001_2
> >> #3 0001 3.2 0001_3
> >> #4 0002 2.2 0002_4
> >> #5 0002 2.6 0002_5
> >> A.K.
> >>
> >>
> >>
> >> ----- Original Message -----
> >> From: Ye Lin <yelin at lbl.gov>
> >> To: R help <r-help at r-project.org>
> >> Cc:
> >> Sent: Tuesday, May 7, 2013 1:54 PM
> >> Subject: [R] create unique ID for each group
> >>
> >> Hey All,
> >>
> >> I have a dataset(dat1) like this:
> >>
> >> ObsNumber ID Weight
> >> 1 0001 12
> >> 2 0001 13
> >> 3 0001 14
> >> 4 0002 16
> >> 5 0002 17
> >>
> >> And another dataset(dat2) like this:
> >>
> >> ID Height
> >> 0001 3.2
> >> 0001 2.6
> >> 0001 3.2
> >> 0002 2.2
> >> 0002 2.6
> >>
> >> I want to merge dat1 and dat2 based on "ID" in order, I know "match"
> only
> >> returns the first match it finds. So I am thinking create unique ID col
> in
> >> dat2 and dat2, then merge. But I dont know how to do that so it can be
> like
> >> this:
> >>
> >> dat1:
> >>
> >> ObsNumber ID Weight UniqueID
> >> 1 0001 12 0001_1
> >> 2 0001 13 0001_2
> >> 3 0001 14 0001_3
> >> 4 0002 16 0002_1
> >> 5 0002 17 0002_1
> >>
> >> dat2:
> >>
> >> ID Height UniqueID
> >> 0001 3.2 0001_1
> >> 0001 2.6 0001_2
> >> 0001 3.2 0001_3
> >> 0002 2.2 0002_1
> >> 0002 2.6 0002_2
> >>
> >> Or if it is possible to merge dat1 and dat2 by matching "ID" but return
> the
> >> match in order that would be great!
> >>
> >> Thanks for your help!
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
-------------- next part --------------
ID test_ID gene locus Sample_118p_0 Sample_118rp3_0 Sample_118rz_0 Sample_118z_0 Sample_132p1_0 Sample_132p2_0 Sample_132p3_0 Sample_132rp1_0 Sample_132rp3_0 Sample_132rp4_0 Sample_132rz1_0 Sample_132rz2_0 Sample_132z_0 Sample_141p1_0 Sample_141p2_0 Sample_141p3_0 Sample_141p4_0 Sample_141z_0 Sample_183p1_0 Sample_183p2_0 Sample_183p3_0 Sample_183z_0 Sample_91p_0 Sample_91rp1_0 Sample_91rp3_0 Sample_91rp4_0 Sample_91rz_0
XLOC_000009 XLOC_025681 NEFL chr8:24808468-24814131 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
XLOC_000010 XLOC_025681 NEFL chr8:24808468-24814131 0 0 0.29217 0.270976 0.126338 0 0 0.464747 0.596984 0.199851 0.892021 0.863341 2.91729 0 0.226087 0 0 2.1632 0.356073 0.655415 0 1.1598 0.385098 0.718336 0.187613 0.34955 0.498937
XLOC_000011 XLOC_022130 "HLA-DRB1,HLA-DRB5" chr6:32441213-32557613 3.59279 9.09855 2.57678 1.59323 16.9363 4.47379 6.8702 6.92243 21.7622 7.46156 4.42057 3.34178 15.4373 5.21231 3.85498 2.53136 6.18972 4.83315 6.90879 12.5242 5.96035 3.40959 8.60407 15.9087 8.16287 9.35126 6.01379
XLOC_000012 XLOC_003321 CCDC3 chr10:12938624-13043704 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.581209 0.455395 0 0
XLOC_000013 XLOC_005027 CD248 chr11:66081957-66084515 0.248183 0.234721 0.145036 0.0538057 0.288489 0.120182 0.138705 0.138422 0.474156 0.297623 0.177122 0.149999 0.537889 0.0951497 0.112231 0.0610627 0.134862 0.257719 0.212109 0.325353 0.0387095 0.191911 0.229399 0.332815 0.0745058 0.225575 0.198141
XLOC_000014 XLOC_021040 STC2 chr5:172741725-172756506 0 0 0 0 0 0 0 0.0364255 0.0701849 0 0 0 0.0979922 0 0 0 0 0.101727 0 0 0 0 0 0 0 0.0410951 0.0586578
-------------- next part --------------
ID test_ID gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
XLOC_000009 XLOC_025681 NEFL chr8:24808468-24814131 Sample_118p Sample_118rp3 OK 0.14678 84.3686 9.1669 -4.83529 1.33E-06 0.0261296 yes
XLOC_000010 XLOC_025681 NEFL chr8:24808468-24814131 Sample_118p Sample_118z OK 0.14678 64.1788 8.77229 -4.63808 3.52E-06 0.0401193 yes
XLOC_000011 XLOC_022130 "HLA-DRB1,HLA-DRB5" chr6:32441213-32557613 Sample_118rz Sample_118z OK 3.18746 9.29E+06 21.4749 -5.75217 8.81E-09 0.00280103 yes
XLOC_000012 XLOC_003321 CCDC3 chr10:12938624-13043704 Sample_118p Sample_132p1 OK 0.0184144 83.7839 12.1516 -4.77738 1.78E-06 0.0288706 yes
XLOC_000013 XLOC_005027 CD248 chr11:66081957-66084515 Sample_118p Sample_132p1 OK 0.280334 216.614 9.59377 -5.10742 3.27E-07 0.0159446 yes
XLOC_000014 XLOC_021040 STC2 chr5:172741725-172756506 Sample_118p Sample_132p1 OK 0.187273 69.3633 8.53289 -4.73246 2.22E-06 0.0320926 yes
-------------- next part --------------
ID Sample_118p_0 Sample_118rp3_0 Sample_118rz_0 Sample_118z_0 Sample_132p1_0 Sample_132p2_0 Sample_132p3_0 Sample_132rp1_0 Sample_132rp3_0 Sample_132rp4_0 Sample_132rz1_0 Sample_132rz2_0 Sample_132z_0 Sample_141p1_0 Sample_141p2_0 Sample_141p3_0 Sample_141p4_0 Sample_141z_0 Sample_183p1_0 Sample_183p2_0 Sample_183p3_0 Sample_183z_0 Sample_91p_0 Sample_91rp1_0 Sample_91rp3_0 Sample_91rp4_0 Sample_91rz_0
XLOC_000001 112.474 166.179 81.5227 44.7787 301.154 118.827 144.47 170.407 406.899 189.131 97.1834 72.739 386.81 86.966 85.7031 53.01 158.314 145.843 219.667 240.231 127.42 78.5814 179.324 297.395 203.55 251.538 110.898
XLOC_000002 13.7609 17.7673 11.911 6.2906 39.1648 14.8832 30.0239 42.7172 88.8146 23.3105 15.4408 7.47508 40.3511 12.6166 12.7373 10.9697 28.2655 22.6594 27.2177 27.8328 18.213 7.8803 22.6769 28.9456 18.7493 22.7607 15.679
XLOC_000003 62.1301 102.162 748.313 273.52 242.685 94.2888 161.228 225.243 497.011 160.376 896.121 465.496 2330.57 72.3527 73.9626 71.3686 203.201 1048.81 172.241 183.26 98.1168 473.464 117.368 174.073 119.605 122.661 754.735
XLOC_000004 4.16261 5.71899 4.55739 2.48634 9.11917 3.49082 3.49611 4.97502 12.5986 6.38753 4.94983 4.81898 18.2275 3.22435 2.07446 1.97518 4.05074 8.86568 5.11854 6.4147 4.65076 4.37495 6.36026 9.22755 6.65625 8.8201 7.17221
XLOC_000005 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
XLOC_000006 0 0.103125 0 0 0 0.0829754 0 0 0 0 0 0 0 0 0 0 0 0 0 0.15724 0 0 0 0.11489 0.0900197 0 0
XLOC_000007 0.0282754 0.0218796 0 0 0.0385837 0 0.0129295 0.0315409 0.0303866 0 0 0 0 0 0 0 0 0 0 0.0333607 0.0396915 0 0.0392031 0 0 0 0
XLOC_000008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
XLOC_000009 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
XLOC_000010 0 0 0.29217 0.270976 0.126338 0 0 0.464747 0.596984 0.199851 0.892021 0.863341 2.91729 0 0.226087 0 0 2.1632 0.356073 0.655415 0 1.1598 0.385098 0.718336 0.187613 0.34955 0.498937
XLOC_000011 3.59279 9.09855 2.57678 1.59323 16.9363 4.47379 6.8702 6.92243 21.7622 7.46156 4.42057 3.34178 15.4373 5.21231 3.85498 2.53136 6.18972 4.83315 6.90879 12.5242 5.96035 3.40959 8.60407 15.9087 8.16287 9.35126 6.01379
XLOC_000012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.581209 0.455395 0 0
XLOC_000013 0.248183 0.234721 0.145036 0.0538057 0.288489 0.120182 0.138705 0.138422 0.474156 0.297623 0.177122 0.149999 0.537889 0.0951497 0.112231 0.0610627 0.134862 0.257719 0.212109 0.325353 0.0387095 0.191911 0.229399 0.332815 0.0745058 0.225575 0.198141
XLOC_000014 0 0 0 0 0 0 0 0.0364255 0.0701849 0 0 0 0.0979922 0 0 0 0 0.101727 0 0 0 0 0 0 0 0.0410951 0.0586578
More information about the R-help
mailing list