[R] Help with DNA Methylation Analysis
Eric Berger
er|cjberger @end|ng |rom gm@||@com
Mon Aug 27 08:32:40 CEST 2018
Your problem is that the command you entered
> the_data<-read.csv(file=“c:/file_name.csv,header=TRUE,sep=“,”)
is missing a double quote after the .csv. The statement should be
> the_data<-read.csv(file=“c:/file_name.csv",header=TRUE,sep=“,”)
The '+' sign is a prompt from R that indicates it has not yet seen the end
of a statement, and it is expecting you to continue from the previous line.
The explanation: you are supplying the read.csv() function three arguments,
one each for the parameters 'file', 'header' and 'sep'.
The parameters 'file' and 'sep' are expecting strings as arguments, such as
"c:/file_name.csv" or "c:/myspecialdata.csv".
The parameter 'sep' (for separator) indicates that the separator is a comma.
Note that you could also have written
> the_data<-read.csv(file=“c:/file_name.csv")
as the default values for the parameter 'header' is TRUE, and for the
parameter 'sep' is comma.
You can confirm this by looking at the help via
> ?read.csv
HTH,
Eric
On Mon, Aug 27, 2018 at 6:49 AM, Spencer Brackett <
spbrackett20 using saintjosephhs.com> wrote:
> Hello all,
>
> To begin my analysis, I downloaded two TCGA datasets (GBM and LGG), both
> csv files, onto on r script after loading the cBioLite package. Following
> this, I inputted the following argument...
>
> > the_data<-read.csv(file=“c:/file_name.csv,header=TRUE,sep=“,”)
>
> Upon running the line I received this...
>
> +
>
> If continue to press enter, the + sign continues to appear on every
> subsequent/new line.
>
> Does anyone know what this is indicative of and how I may continue on with
> my analysis
>
> My next step after this would have been the following (the numbers before
> each command being line markers; not part of line)..
>
> 1 library(TCGAbiolinks)
> 2
> 3 # Download the DNA methylation data: HumanMethylation450 LGG and GBM.
> 4 path <– "."
>
> Best wishes,
>
> Spencer Brackett
>
> On Sun, Aug 26, 2018 at 9:13 PM Caitlin <bioprogrammer using gmail.com> wrote:
>
> > You're welcome Spencer :)
> >
> > I hope I was able to help you. If this problem persists, or a new one
> > appears, feel free to post or email. You might also like:
> >
> > https://www.biostars.org/
> >
> > It is quite similar to StackOverflow but with a biological sciences
> focus.
> >
> > Hope this helps!
> >
> > ~Caitlin
> >
> >
> >
> > On Sun, Aug 26, 2018 at 6:02 PM Spencer Brackett <
> > spbrackett20 using saintjosephhs.com> wrote:
> >
> >> Caitlin,
> >>
> >> Thanks again! I already have the two files stored in those two CSV
> files
> >> via my desktop, but if tuning those with this function do not work,
> then I
> >> will try it with a flash drive.
> >>
> >> Best,
> >>
> >> Spencer Brackett
> >>
> >> On Sun, Aug 26, 2018 at 8:56 PM Caitlin <bioprogrammer using gmail.com>
> wrote:
> >>
> >>> Hmm...could you store each in its own file (a flash drive would be
> fine)
> >>> then use:
> >>>
> >>> the_data <- read.csv(file="c:/file_name.csv", header=TRUE, sep=",")
> >>>
> >>> to read each into your script? The data would then exist as a
> dataframe object that you could then work with.
> >>>
> >>>
> >>> On Sun, Aug 26, 2018 at 5:50 PM Spencer Brackett <
> >>> spbrackett20 using saintjosephhs.com> wrote:
> >>>
> >>>> Caitlin,
> >>>>
> >>>> Perhaps that is the problem. To be more specific, the data was
> >>>> transferred from the TCGA database to a CSV file... there are
> technically
> >>>> two separate files (CSV) for this analysis.... one for GBM and one
> for LGG.
> >>>> Both CVS files were then individually downloaded onto my open R
> console.
> >>>> Upon arranging them with the summary () function, the data expanded
> and
> >>>> took up the whole console page... even seemingly abrogating the
> arguments
> >>>> which allowed for the data to be downloaded onto R in the first
> place. Are
> >>>> you suggesting that I would need to utilize a flash drive to
> successfully
> >>>> utilize the function you suggested? Or could I perhaps do so with the
> CSV
> >>>> field I mentioned? If so, how?
> >>>>
> >>>> -Spencer B
> >>>>
> >>>> On Sun, Aug 26, 2018 at 8:42 PM Caitlin <bioprogrammer using gmail.com>
> >>>> wrote:
> >>>>
> >>>>> No worries Spencer. There is no downloaded data? Nothing is
> physically
> >>>>> stored on your hard drive? The dot in the path would be interpreted
> (no pun
> >>>>> intended!) as something like the following:
> >>>>>
> >>>>> If the TCGA data was stored in a file named "tcga_data.dat" and it
> was
> >>>>> in a directory named "C:\spencer", the 4th line of that script would
> set
> >>>>> the path to "C:\spencer\tcga_data.dat" if you ran the script from
> that same
> >>>>> folder. If your tcga data is not stored in the same file from which
> the
> >>>>> script is being ran, it won't find any data to work with. Does this
> help?
> >>>>>
> >>>>>
> >>>>> On Sun, Aug 26, 2018 at 5:34 PM Spencer Brackett <
> >>>>> spbrackett20 using saintjosephhs.com> wrote:
> >>>>>
> >>>>>> Caitlin,
> >>>>>>
> >>>>>> Forgive me, but I’m not quite sure exactly what your question is
> >>>>>> asking. The data is originally from the TCGA and I have it
> downloaded onto
> >>>>>> another R script. I opened a new script to perform the functions I
> posted
> >>>>>> to this forum because I was unable to input any other commands into
> the
> >>>>>> console.... due to the fact that the translated data filled the
> entirety of
> >>>>>> said consule. Perhaps overloaded it? Regardless, I was unable to
> input any
> >>>>>> further commands.
> >>>>>>
> >>>>>> -Spencer Brackett
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Aug 26, 2018 at 8:27 PM Caitlin <bioprogrammer using gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> You're welcome Spencer :)
> >>>>>>>
> >>>>>>> The 4th line:
> >>>>>>>
> >>>>>>> path <– "."
> >>>>>>>
> >>>>>>> refers to the current directory (the dot in other words). Is the
> >>>>>>> data stored in the same directory where the code is being run?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Sun, Aug 26, 2018 at 5:22 PM Spencer Brackett <
> >>>>>>> spbrackett20 using saintjosephhs.com> wrote:
> >>>>>>>
> >>>>>>>> Thank you! I will make note of that. Unfortunately, lines 1 and 4
> >>>>>>>> of the first portion of this analysis appear to be where the error
> >>>>>>>> begins... to which several subsequent lines also come up as
> ‘errored’.
> >>>>>>>> Perhaps this is an issue of the capitalization and/or spacing
> (something
> >>>>>>>> within the text)? The proposed method for methylation data
> extraction is
> >>>>>>>> based on the first third of the following TCGA workflow:
> >>>>>>>> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5302158/#!po=
> 0.0715308
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Spencer Brackett
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sun, Aug 26, 2018 at 8:07 PM Caitlin <bioprogrammer using gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Spencer.
> >>>>>>>>>
> >>>>>>>>> Should you capitalize the following library import?
> >>>>>>>>>
> >>>>>>>>> library(summarizedExperiment)
> >>>>>>>>>
> >>>>>>>>> In other words, I think that line should be:
> >>>>>>>>>
> >>>>>>>>> library(SummarizedExperiment)
> >>>>>>>>>
> >>>>>>>>> Hope this helps.
> >>>>>>>>>
> >>>>>>>>> ~Caitlin
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Sun, Aug 26, 2018 at 2:09 PM Spencer Brackett <
> >>>>>>>>> spbrackett20 using saintjosephhs.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Good evening,
> >>>>>>>>>>
> >>>>>>>>>> I am attempting to run the following analysis on TCGA data,
> >>>>>>>>>> however
> >>>>>>>>>> something is being reported as an error in my arguments... any
> >>>>>>>>>> ideas as to
> >>>>>>>>>> what is incorrect in the following? Thanks!
> >>>>>>>>>>
> >>>>>>>>>> 1 library(TCGAbiolinks)
> >>>>>>>>>> 2
> >>>>>>>>>> 3 # Download the DNA methylation data: HumanMethylation450 LGG
> >>>>>>>>>> and GBM.
> >>>>>>>>>> 4 path <– "."
> >>>>>>>>>> 5
> >>>>>>>>>> 6 query.met <– TCGAquery(tumor =
> >>>>>>>>>> c("LGG","GBM"),"HumanMethylation450",
> >>>>>>>>>> level = 3)
> >>>>>>>>>> 7 TCGAdownload(query.met, path = path )
> >>>>>>>>>> 8 met <– TCGAprepare(query = query.met,dir = path,
> >>>>>>>>>> 9 add.subtype = TRUE, add.clinical = TRUE,
> >>>>>>>>>> 10 summarizedExperiment = TRUE,
> >>>>>>>>>> 11 save = TRUE, filename =
> "lgg_gbm_met.rda")
> >>>>>>>>>> 12
> >>>>>>>>>> 13 # Download the expression data: IlluminaHiSeq_RNASeqV2 LGG
> and
> >>>>>>>>>> GBM.
> >>>>>>>>>> 14 query.exp <– TCGAquery(tumor = c("lgg","gbm"), platform =
> >>>>>>>>>> "IlluminaHiSeq_
> >>>>>>>>>> RNASeqV2",level = 3)
> >>>>>>>>>> 15
> >>>>>>>>>> 16 TCGAdownload(query.exp,path = path, type =
> >>>>>>>>>> "rsem.genes.normalized_
> >>>>>>>>>> results")
> >>>>>>>>>> 17
> >>>>>>>>>> 18 exp <– TCGAprepare(query = query.exp, dir = path,
> >>>>>>>>>> 19 summarizedExperiment = TRUE,
> >>>>>>>>>> 20 add.subtype = TRUE, add.clinical = TRUE,
> >>>>>>>>>> 21 type = "rsem.genes.normalized_results",
> >>>>>>>>>> 22 save = T,filename = "lgg_gbm_exp.rda")
> >>>>>>>>>>
> >>>>>>>>>> To download data on DNA methylation and gene expression…
> >>>>>>>>>>
> >>>>>>>>>> 1 library(summarizedExperiment)
> >>>>>>>>>> 2 # get expression matrix
> >>>>>>>>>> 3 data <– assay(exp)
> >>>>>>>>>> 4
> >>>>>>>>>> 5 # get sample information
> >>>>>>>>>> 6 sample.info <– colData(exp)
> >>>>>>>>>> 7
> >>>>>>>>>> 8 # get genes information
> >>>>>>>>>> 9 genes.info <– rowRanges(exp)
> >>>>>>>>>>
> >>>>>>>>>> Following stepwise procedure for obtaining GBM and LGG clinical
> >>>>>>>>>> data…
> >>>>>>>>>>
> >>>>>>>>>> 1 # get clinical patient data for GBM samples
> >>>>>>>>>> 2 gbm_clin <– TCGAquery_clinic("gbm","clinical_patient")
> >>>>>>>>>> 3
> >>>>>>>>>> 4 # get clinical patient data for LGG samples
> >>>>>>>>>> 5 lgg_clin <– TCGAquery_clinic("lgg","clinical_patient")
> >>>>>>>>>> 6
> >>>>>>>>>> 7 # Bind the results, as the columns might not be the same,
> >>>>>>>>>> 8 # we will plyr rbind.fill , to have all columns from both
> files
> >>>>>>>>>> 9 clinical <– plyr::rbind.fill(gbm_clin ,lgg_clin)
> >>>>>>>>>> 10
> >>>>>>>>>> 11 # Other clinical files can be downloaded,
> >>>>>>>>>> 12 # Use ?TCGAquery_clinic for more information
> >>>>>>>>>> 13 clin_radiation <– TCGAquery_clinic("lgg","
> clinical_radiation")
> >>>>>>>>>> 14
> >>>>>>>>>> 15 # Also, you can get clinical information from different tumor
> >>>>>>>>>> types.
> >>>>>>>>>> 16 # For example sample 1 is GBM, sample 2 and 3 are TGCT
> >>>>>>>>>> 17 data <– TCGAquery_clinic(clinical_data_type =
> >>>>>>>>>> "clinical_patient",
> >>>>>>>>>> 18 samples = c("TCGA-06-5416-01A-01D-1481-05",
> >>>>>>>>>> 19 "TCGA-2G-AAEW-01A-11D-A42Z-05",
> >>>>>>>>>> 20 "TCGA-2G-AAEX-01A-11D-A42Z-05"))
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> # Searching idat file for DNA methylation
> >>>>>>>>>> query <- GDCquery(project = "TCGA-GBM",
> >>>>>>>>>> data.category = "Raw microarray data",
> >>>>>>>>>> data.type = "Raw intensities",
> >>>>>>>>>> experimental.strategy = "Methylation array",
> >>>>>>>>>> legacy = TRUE,
> >>>>>>>>>> file.type = ".idat",
> >>>>>>>>>> platform = "Illumina Human Methylation 450")
> >>>>>>>>>>
> >>>>>>>>>> **Repeat for LGG**
> >>>>>>>>>>
> >>>>>>>>>> To access mutational information concerning TMZ methylation…
> >>>>>>>>>>
> >>>>>>>>>> > mutation <– TCGAquery_maf(tumor = "lgg")
> >>>>>>>>>> 2 Getting maf tables
> >>>>>>>>>> 3 Source: https://wiki.nci.nih.gov/
> display/TCGA/TCGA+MAF+Files
> >>>>>>>>>> 4 We found these maf files below:
> >>>>>>>>>> 5 MAF.File.Name
> >>>>>>>>>> 6 2 hgsc.bcm.edu_LGG.IlluminaGA_
> DNASeq.1.somatic.maf
> >>>>>>>>>> 7
> >>>>>>>>>> 8 3
> >>>>>>>>>> LGG_FINAL_ANALYSIS.aggregated.capture.tcga.uuid.curated.
> somatic.maf
> >>>>>>>>>> 9
> >>>>>>>>>> 10 Archive.Name Deploy.Date
> >>>>>>>>>> 11 2 hgsc.bcm.edu_LGG.IlluminaGA_
> DNASeq_automated.Level_2.1.0.0
> >>>>>>>>>> 10-DEC-13
> >>>>>>>>>> 12 3 broad.mit.edu_LGG.IlluminaGA_
> DNASeq_curated.Level_2.1.3.0
> >>>>>>>>>> 24-DEC-14
> >>>>>>>>>> 13
> >>>>>>>>>> 14 Please, select the line that you want to download: 3
> >>>>>>>>>>
> >>>>>>>>>> **Repeat this for GBM***
> >>>>>>>>>>
> >>>>>>>>>> Selecting specified lines to download…
> >>>>>>>>>>
> >>>>>>>>>> 1 gbm.subtypes <− TCGAquery_subtype(tumor = "gbm")
> >>>>>>>>>> 2 lgg.subtypes <− TCGAquery_subtype(tumor = "lgg”)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Downloading data via the Bioconductor package RTCGAtoolbox…
> >>>>>>>>>>
> >>>>>>>>>> library(RTCGAToolbox)
> >>>>>>>>>> 2
> >>>>>>>>>> 3 # Get the last run dates
> >>>>>>>>>> 4 lastRunDate <− getFirehoseRunningDates()[1]
> >>>>>>>>>> 5 lastAnalyseDate <− getFirehoseAnalyzeDates(1)
> >>>>>>>>>> 6
> >>>>>>>>>> 7 # get DNA methylation data, RNAseq2 and clinical data for LGG
> >>>>>>>>>> 8 lgg.data <− getFirehoseData(dataset = "LGG",
> >>>>>>>>>> 9 gistic2_Date = getFirehoseAnalyzeDates(1), runDate =
> >>>>>>>>>> lastRunDate,
> >>>>>>>>>> 10 Methylation = TRUE, RNAseq2_Gene_Norm = TRUE, Clinic =
> >>>>>>>>>> TRUE,
> >>>>>>>>>> 11 Mutation = T,
> >>>>>>>>>> 12 fileSizeLimit = 10000)
> >>>>>>>>>> 13
> >>>>>>>>>> 14 # get DNA methylation data, RNAseq2 and clinical data for GBM
> >>>>>>>>>> 15 gbm.data <− getFirehoseData(dataset = "GBM",
> >>>>>>>>>> 16 runDate = lastDate, gistic2_Date =
> >>>>>>>>>> getFirehoseAnalyzeDates(1),
> >>>>>>>>>> 17 Methylation = TRUE, Clinic = TRUE, RNAseq2_Gene_Norm =
> >>>>>>>>>> TRUE,
> >>>>>>>>>> 18 fileSizeLimit = 10000)
> >>>>>>>>>> 19
> >>>>>>>>>> 20 # To access the data you should use the getData function
> >>>>>>>>>> 21 # or simply access with @ (for example gbm.data using Clinical)
> >>>>>>>>>> 22 gbm.mut <− getData(gbm.data,"Mutations")
> >>>>>>>>>> 23 gbm.clin <− getData(gbm.data,"Clinical")
> >>>>>>>>>> 24 gbm.gistic <− getData(gbm.data,"GISTIC")
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Genomic Analysis/Final data extraction:
> >>>>>>>>>>
> >>>>>>>>>> Enable “getData” to access the data
> >>>>>>>>>>
> >>>>>>>>>> Obtaining GISTIC results…
> >>>>>>>>>>
> >>>>>>>>>> 1 # Download GISTIC results
> >>>>>>>>>> 2 gistic <− getFirehoseData("GBM",gistic2_Date ="20141017" )
> >>>>>>>>>> 3
> >>>>>>>>>> 4 # get GISTIC results
> >>>>>>>>>> 5 gistic.allbygene <− gistic using GISTIC@AllByGene
> >>>>>>>>>> 6 gistic.thresholedbygene <− gistic using GISTIC@ThresholedByGene
> >>>>>>>>>>
> >>>>>>>>>> Repeat this procedure to obtain LGG GISTIC results.
> >>>>>>>>>>
> >>>>>>>>>> ***Please ignore the 'non-coded' text as they are procedural
> >>>>>>>>>> steps/classifications***
> >>>>>>>>>>
> >>>>>>>>>> [[alternative HTML version deleted]]
> >>>>>>>>>>
> >>>>>>>>>> ______________________________________________
> >>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>>> PLEASE do read the posting guide
> >>>>>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>>> and provide commented, minimal, self-contained, reproducible
> code.
> >>>>>>>>>>
> >>>>>>>>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list