[R] Reading in a transcript-like file
David Winsemius
dwinsemius at comcast.net
Wed Jun 30 15:54:04 CEST 2010
On Jun 30, 2010, at 12:21 AM, ARRRRRR wrote:
>
> http://r.789695.n4.nabble.com/file/n2272669/FT20100626_%2420_%2B_%242_Sit_%26_Go_-_%28169112900%29_-_Summary.txt
> FT20100626_%2420_%2B_%242_Sit_%26_Go_-_%28169112900%29_-_Summary.txt
>
> I have a lot of experience with Stata, but I'm new to R. I'm trying
> to read
> the attached file into R on my mac. My goal is to have it as a
> list, with
> each element a string - from then I can parse out the data I need
> and add it
> as an observation in a dataframe.
>
> I've tried scan, readlines, etc. but I'm stumped. I've been adding
> encoding="UTF-16", but that doesn't seem to help much.
> The closest I've come is:
>
> test<-scan(file="FT20100626 $20 + $2 Sit & Go - (169112900) -
> Summary.txt",
> what=list(""), flush=FALSE, skip=0, encoding="UTF-16", quote="\n")
>
> which gives me a list wherein each element is first letter of the row.
>
>> test
> [[1]]
> [1] "\xff\xfeF" "T" "P" "T" "S" "$"
> "+" "$" "S"
I believe you are being bitten by an encoding issue and that it is
referred to by this section of the help page from ?connections:
"The encoding "UCS-2LE" is treated specially, as it is the appropriate
value for Windows ‘Unicode’ text files. If the first two bytes are the
Byte Order Mark 0xFFFE then these are removed as most implementations
of iconv do not accept BOMs. Note that some implementations will
handle BOMs using encoding "UCS-2" but many will not."
Notice the your first two entries are \xff\xfe which I believe is a
representation of 0xFFFE. When you look at that page with FireFox and
request encoding information you are given UTF-16. I am not
sufficiently educated on encoding issues even though we share
platforms. I tried a few different encoding specifications including
"UTF-16", "UCS-2" and "UCS-2LE" with scan and readLines but failed to
work through to the solution. Another possiblity might be to subscribe
to the R SIG-Mac mailing list and post the question there.
--
David.
> "[10] "&" "G" "(" "H" "N" "L"
> "B" "u" "$"
> [19] "+" "$" "B" "u" "C" "1"
> "6" "E" "T"
> [28] "o" "P" "P" "$" "T" "o"
> "s" "2" "0"
> [37] "E" "T" "o" "f" "2" "1"
> "E" "\n" "1"
> [46] "B" "$" "2" ":" "J" "$"
> "3" ":" "b"
> [55] "4" ":" "s" "c" "2" "5"
> ":" "R" "6"
> [64] ":" "S" "B" "o" "f" "i"
> "1" "p"
>
> Any help would be greatly appreciated.
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Reading-in-a-transcript-like-file-tp2272669p2272669.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list