[R] Fwd: UPDATE

Caitlin bioprogr@mmer @ending from gm@il@com
Thu Dec 27 06:51:46 CET 2018


Is the file being saved as .xls, .xlsx, .csv, .tsv, or .txt?


On Wed, Dec 26, 2018 at 10:14 PM Spencer Brackett <
spbrackett20 using saintjosephhs.com> wrote:

> Follow up,
>
> Would read.txt also work, as I am certain that I have both datasets in
> .txt files? As to a previous users question concern the .csv nature of the
> supposed excel file, I am uncertain as to how this was translated as such.
> The file is most certainly in excel.
>
>
> On Thu, Dec 27, 2018 at 12:10 AM Spencer Brackett <
> spbrackett20 using saintjosephhs.com> wrote:
>
>> Caitlin,
>>
>>   I tried your command in both RGui and RStudio but both came up as
>> errors. I believe I made a mistake somewhere I labeling/downloading the
>> files, which is the source of the confusion in R. I will re-examine the
>> files saved on my desktop to determine the error. Regardless, would it be
>> better to use a read.table or read.csv function when attempting to download
>> my datasets? I tried using read.xl on RStudio as this process seemed much
>> easier, however, it would seem that my proclivity to error prevents such.
>>
>> Best,
>>
>> Spencer
>>
>> On Wed, Dec 26, 2018 at 11:55 PM Caitlin Gibbons <bioprogrammer using gmail.com>
>> wrote:
>>
>>> Does this help Spencer? The read.delim() function assumes a tab
>>> character by default, but I specifically included it using the read.csv
>>> function. The downloaded file is NOT an Excel file so this should help.
>>>
>>> GBM_protein_expression <- read.csv("C:/Users/Spencer/Desktop/GBM
>>> protein_expression.tsv", sep=“\t”)
>>>
>>> Sent from my iPhone
>>>
>>> > On Dec 26, 2018, at 9:23 PM, Richard M. Heiberger <rmh using temple.edu>
>>> wrote:
>>> >
>>> > this is wrong because the file is a csv file.  read_excel is designed
>>> > for xls files.
>>> > GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>>> > protein_expression.csv")
>>> >
>>> > How did you get a csv? it downloads as tsv.
>>> >
>>> > the statement you should use is in base, no library() statement is
>>> needed.
>>> >
>>> > GBM_protein_expression <- read.delim("C:/Users/Spencer/Desktop/GBM
>>> > protein_expression.csv")
>>> >
>>> > read.delim is the same as read.csv except that it sets the sep
>>> > argument to "\t".
>>> >
>>> >
>>> >
>>> > On Wed, Dec 26, 2018 at 11:11 PM Spencer Brackett
>>> > <spbrackett20 using saintjosephhs.com> wrote:
>>> >>
>>> >> Sorry, my mistake.
>>> >>
>>> >> So I could still use read.table and should I try using a .txt version
>>> of
>>> >> the file to avoid the silent changes you described?
>>> >>
>>> >> Also, when I tried to simply this process by downloading the dataset
>>> onto
>>> >> RStudio opposed to R (Gui) I received the following...
>>> >> library(readxl)
>>> >>> GBM_protein_expression <- read_excel("C:/Users/Spencer/Desktop/GBM
>>> >> protein_expression.csv")
>>> >> Error: Can't establish that the input is either xls or xlsx.
>>> >>> View(GBM_protein_expression)
>>> >> Error in View : object 'GBM_protein_expression' not found
>>> >> Error in gzfile(file, mode) : cannot open the connection
>>> >> In addition: Warning message:
>>> >> In gzfile(file, mode) :
>>> >>  cannot open compressed file
>>> >>
>>> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
>>> >> probable reason 'No such file or directory'
>>> >>> library(readxl)
>>> >>> GBM_protein_expression <-
>>> >> read_excel("C:/Users/Spencer/Desktop/GBM_protein_ expression.xlsx")
>>> >> readxl works best with a newer version of the tibble package.
>>> >> You currently have tibble v1.4.2.
>>> >> Falling back to column name repair from tibble <= v1.4.2.
>>> >> Message displays once per session.
>>> >>> View(GBM_protein_expression)
>>> >>
>>> >>
>>> >> Is this perhaps the result of lack of preview (which I did not
>>> complete at
>>> >> the time I hit import as the preview failed to load), or the fact
>>> that the
>>> >> excel file itself contains no numerical data, but only TRUE or FALSE
>>> >> entries?
>>> >>
>>> >> On Wed, Dec 26, 2018 at 10:59 PM Jeff Newmiller <
>>> jdnewmil using dcn.davis.ca.us>
>>> >> wrote:
>>> >>
>>> >>> Please always reply-all to keep the list involved.
>>> >>>
>>> >>> If you used Save As to change the data format to Excel AND the file
>>> >>> extension to xlsx, then yes, you should be able to read with readxl.
>>> I
>>> >>> don't recommend it, though... Excel often changes data silently and
>>> in
>>> >>> irregularly located places in your file.
>>> >>>
>>> >>> On December 26, 2018 7:38:16 PM PST, Spencer Brackett <
>>> >>> spbrackett20 using saintjosephhs.com> wrote:
>>> >>>> So even if I imported the file form ICGC to my desktop as an excel
>>> >>>> file,
>>> >>>> and can view and saved the data as such, it is still a TSV?
>>> >>>>
>>> >>>> On Wed, Dec 26, 2018 at 10:35 PM Jeff Newmiller
>>> >>>> <jdnewmil using dcn.davis.ca.us>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> CSV and TSV are not Excel files. Yes, I know Excel will open them,
>>> >>>> but
>>> >>>>> that does not make them Excel files.
>>> >>>>>
>>> >>>>> Read a TSV file with read.table or read.csv, setting the sep
>>> argument
>>> >>>> to
>>> >>>>> "\t".
>>> >>>>>
>>> >>>>> On December 26, 2018 7:26:35 PM PST, Spencer Brackett <
>>> >>>>> spbrackett20 using saintjosephhs.com> wrote:
>>> >>>>>> I tried importing the file without preview and recieved the
>>> >>>>>> following....
>>> >>>>>>
>>> >>>>>> library(readxl)
>>> >>>>>>> GBM_protein_expression <-
>>> read_excel("C:/Users/Spencer/Desktop/GBM
>>> >>>>>> protein_expression.csv")
>>> >>>>>> Error: Can't establish that the input is either xls or xlsx.
>>> >>>>>>> View(GBM_protein_expression)
>>> >>>>>> Error in View : object 'GBM_protein_expression' not found
>>> >>>>>> Error in gzfile(file, mode) : cannot open the connection
>>> >>>>>> In addition: Warning message:
>>> >>>>>> In gzfile(file, mode) :
>>> >>>>>> cannot open compressed file
>>> >>>>>
>>> >>>>>
>>> 'C:/Users/Spencer/AppData/Local/Temp/RtmpQNQrMh/input147c61fc5b52.rds',
>>> >>>>>> probable reason 'No such file or directory'
>>> >>>>>>> library(readxl)
>>> >>>>>>> GBM_protein_expression <-
>>> >>>>>> read_excel("C:/Users/Spencer/Desktop/GBM_protein_
>>> expression.xlsx")
>>> >>>>>> readxl works best with a newer version of the tibble package.
>>> >>>>>> You currently have tibble v1.4.2.
>>> >>>>>> Falling back to column name repair from tibble <= v1.4.2.
>>> >>>>>> Message displays once per session.
>>> >>>>>>> View(GBM_protein_expression)
>>> >>>>>>
>>> >>>>>> Also, the area above my console says that no data is available in
>>> >>>> the
>>> >>>>>> table. Is this perhaps the result of lack of preview or the fact
>>> >>>> that
>>> >>>>>> the
>>> >>>>>> excel file itself contains no numerical data, but only TRUE or
>>> FALSE
>>> >>>>>> entries?
>>> >>>>>>
>>> >>>>>> On Wed, Dec 26, 2018 at 9:57 PM Spencer Brackett <
>>> >>>>>> spbrackett20 using saintjosephhs.com> wrote:
>>> >>>>>>
>>> >>>>>>> Hello again,
>>> >>>>>>>
>>> >>>>>>> I worked on directly downloading the file into R as was
>>> suggested,
>>> >>>>>> but
>>> >>>>>>> have thus far been unsuccessful. This is what  I generated on my
>>> >>>>>> second
>>> >>>>>>> attempt...
>>> >>>>>>>
>>> >>>>>>> GBM protein_expression<-(file.choose(), header=TRUE, sep="\t")
>>> >>>>>>> Error: unexpected symbol in "GBM protein_expression"
>>> >>>>>>>> GBM
>>> >>>>>>>
>>> >>>>>
>>> >>>
>>> >>>>>
>>> protein_expression<-(file.choose(GBM_protein_expression.xlsx),header=TRUE,
>>> >>>>>>> sep="\t")
>>> >>>>>>> Error: unexpected symbol in "GBM protein_expression"
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>> What part of the argument is in error?
>>> >>>>>>>
>>> >>>>>>> Also I tried importing the dataset as an excel file on RStudio to
>>> >>>> see
>>> >>>>>> if I
>>> >>>>>>> could solve my problem that way. However, my imported excel file
>>> >>>> has
>>> >>>>>> been
>>> >>>>>>> stuck in the 'retrieving preview data' and no data is appearing.
>>> >>>> Is
>>> >>>>>> the
>>> >>>>>>> data file prehaps too large or in the wrong format?
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Wed, Dec 26, 2018 at 6:42 PM Spencer Brackett <
>>> >>>>>>> spbrackett20 using saintjosephhs.com> wrote:
>>> >>>>>>>
>>> >>>>>>>> Mr. Heiberger,
>>> >>>>>>>>
>>> >>>>>>>> Thank you for the insight! I will try out suggestion.
>>> >>>>>>>>
>>> >>>>>>>> Best,
>>> >>>>>>>>
>>> >>>>>>>> Spencer Brackett
>>> >>>>>>>>
>>> >>>>>>>> On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger
>>> >>>>>> <rmh using temple.edu>
>>> >>>>>>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> I looked at the first file.  It gives an option to download as
>>> >>>> TSV
>>> >>>>>>>>> (tab separated values).
>>> >>>>>>>>> That is the same as CSV except with tabs instead of commas.
>>> >>>>>>>>> You do not need any external software to read it.  Read the
>>> >>>>>> downloaded
>>> >>>>>>>>> file directly into R.
>>> >>>>>>>>>
>>> >>>>>>>>> read.delim looks as if it would work directly on the downloaded
>>> >>>>>> file.
>>> >>>>>>>>> ?read.delim
>>> >>>>>>>>> The notation "\t" means the tab character.
>>> >>>>>>>>>
>>> >>>>>>>>> As an aside, stay away from notepad. it is too naive for almost
>>> >>>>>>>>> anything interesting.
>>> >>>>>>>>> The specific case I often see is people reading linux-style
>>> text
>>> >>>>>> files
>>> >>>>>>>>> with notepad, which doesn't
>>> >>>>>>>>> understand NL terminated lines.  nicely formatted text files
>>> >>>> become
>>> >>>>>>>>> illegible.
>>> >>>>>>>>>
>>> >>>>>>>>> On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett
>>> >>>>>>>>> <spbrackett20 using saintjosephhs.com> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>> Good evening,
>>> >>>>>>>>>>
>>> >>>>>>>>>> I am attempting to anaylze the protein expression data
>>> >>>> contained
>>> >>>>>> within
>>> >>>>>>>>>> these two ICGC, TCGA datasets (one for GBM and the other for
>>> >>>> LGG)
>>> >>>>>>>>>>
>>> >>>>>>>>>> *File for GBM  protein expression*:
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> >>>>>>>>>>
>>> >>>>>>>>>> *File for LGG protein expression:*
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> *
>>> >>>>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> >>>>>>>>>> <
>>> >>>>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D
>>> >>>>>>>>>> *
>>> >>>>>>>>>>
>>> >>>>>>>>>>  When I tried to transfer the files from .txt (via Notepad)
>>> >>>> to
>>> >>>>>> .csv
>>> >>>>>>>>> (via
>>> >>>>>>>>>> Excel), the data appeared in the columns as unorganized and
>>> >>>>>> random
>>> >>>>>>>>>> script... not like how a typical csv should be arranged at
>>> >>>> all. I
>>> >>>>>> need
>>> >>>>>>>>> the
>>> >>>>>>>>>> dataset to be converted into .csv in order to analyze it in R,
>>> >>>>>> which
>>> >>>>>>>>> is why
>>> >>>>>>>>>> I am hoping someone here might help me in doing that. If not,
>>> >>>> is
>>> >>>>>> there
>>> >>>>>>>>>> perhaps some other way that I could analyze the datatsets on
>>> >>>> R,
>>> >>>>>> which
>>> >>>>>>>>> again
>>> >>>>>>>>>> is downloaded from the dataportal ICGC?
>>> >>>>>>>>>>
>>> >>>>>>>>>> Best,
>>> >>>>>>>>>>
>>> >>>>>>>>>> Spencer Brackett
>>> >>>>>>>>>>
>>> >>>>>>>>>>        [[alternative HTML version deleted]]
>>> >>>>>>>>>>
>>> >>>>>>>>>> ______________________________________________
>>> >>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>> >>>> see
>>> >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>>>>>>>>> PLEASE do read the posting guide
>>> >>>>>>>>> http://www.R-project.org/posting-guide.html
>>> >>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>> >>>>>> code.
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>
>>> >>>>>>      [[alternative HTML version deleted]]
>>> >>>>>>
>>> >>>>>> ______________________________________________
>>> >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>>>>> PLEASE do read the posting guide
>>> >>>>>> http://www.R-project.org/posting-guide.html
>>> >>>>>> and provide commented, minimal, self-contained, reproducible code.
>>> >>>>>
>>> >>>>> --
>>> >>>>> Sent from my phone. Please excuse my brevity.
>>> >>>>>
>>> >>>
>>> >>> --
>>> >>> Sent from my phone. Please excuse my brevity.
>>> >>>
>>> >>
>>> >>        [[alternative HTML version deleted]]
>>> >>
>>> >> ______________________________________________
>>> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> > ______________________________________________
>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list