[R] convert list to Dataframe
David Winsemius
dwinsemius at comcast.net
Sun Nov 1 14:05:13 CET 2009
Three suggestions:
-- drop the idea of using a dataframe. It's only appropriate when the
data is rectangular.
-- look at strsplit for separating at "@" characters.
-- post the output of dput() on your sample, since email is probably
not capable of rendering this data without creating distortions.
--
David
On Nov 1, 2009, at 7:43 AM, onyourmark wrote:
>
> Hi. I have a huge list called twitter:
>
>> dim(twitter)
> NULL
>> str(twitter)
This looks to have been converted into an R object through soe process
on some unspecified input. You should describe that process, and hte
only unambiguous method of doing so is by including the code.
> List of 1
> $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic
> [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed
> Lessons For
> Governance From Campaigner-in-chief: President obama jumps campaign
> 09
> tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
> 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading
> Washington
> meets Gen. Jim Jones, Sen. John McCain others. Will Obama team raise
> worries EU ties?;London, England;United Kingdom;Greater
> London;Westminster;;51.5001524;-0.1262362
> 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses
> wearing
> thin Obama, media pals... http://tinyurl.com/yfw6cd9;So.
> California;USA;CA;;;36.778261;-119.4179324
> 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama
> Afghanistan
> troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
> #video;USA;USA;;;;37.09024;-95.712891 ...
> .. ..- attr(*, "Author")= chr(0)
> .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
> 04:46:56"
> .. ..- attr(*, "Description")= chr(0)
> .. ..- attr(*, "Heading")= chr(0)
> .. ..- attr(*, "ID")= chr "1"
> .. ..- attr(*, "Language")= chr "en"
> .. ..- attr(*, "LocalMetaData")= list()
> .. ..- attr(*, "Origin")= chr(0)
> - attr(*, "CMetaData")=List of 3
> ..$ NodeID : num 0
> ..$ MetaData:List of 2
> .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
> .. ..$ creator : Named chr ""
> .. .. ..- attr(*, "names")= chr "LOGNAME"
> ..$ Children: NULL
> ..- attr(*, "class")= chr "MetaDataNode"
> - attr(*, "DMetaData")='data.frame': 1 obs. of 1 variable:
> ..$ MetaID: num 0
> - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"
>
> It contains tweets but in many languages. The "columns" are
> separated by
> semi-colons. I am using the tm package and it is a "corpus".
>
> It looks like this:
It is difficult to see any connection with what you have above.
>
> 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1 day
> :p;Huddersfield/Lincoln;United
> Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
> 547283;06:37:17;21;10;2009;fabiomafra;alguém traz mais lenha pro
> computador
> da facool? BOM DIA.;Belo Horizonte - MG -
> BR;Brazil;MG;;;-19.8157306;-43.9542226
> 547284;06:37:17;21;10;2009;romanotr;Вау, "Репортеры
> без границ" опубликовали
> список стран со свободой слова, из 173
> Грузия на 81 месте опережая Украину.
> Успехи,успехи...;Portugal Aveiro;Portugal;Aveiro;;;
> 40.6411848;-8.6536169
> 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton <\;Someone's
> Daughter>\;;Kanazawa, Japan;Japan;Ishikawa
> Prefecture;;;36.5613254;136.6562051
> Error: invalid input
> '547286;06:37:18;21;10;2009;Atogey;支æŒä½
> ,国家需è¦ä»–
> 们,但是国家的未æ
> ¥ä¸èƒ½é 他们…RT
> @zuola ￿我觉得 @wenyunc
>
> I want to convert it to "fields" or columns and so I thought I should
> convert it to a dataframe. I tried
>
>> twitterDF<-as.data.frame(twitter)
> Error in sort.list(y) :
> invalid input
> '547286;06:37:18;21;10;2009;Atogey;支æŒä½
> ,国家需è¦ä»–
> 们,但是国家的未æ
> ¥ä¸èƒ½é 他们…RT
> @zuola ￿我觉得 @wenyunchao
> 一点都ä¸ä¹è§‚。真æ
> £çš„ä¹è§‚åº”è¯¥æ˜¯ï¼šä½ å…
> ³æˆ‘åˆæ€Žä¹ˆæ ·ï¼Œåæ £æ”¿æ²»æ–
> —争ä¸ä¼šä¸¢æŽ‰æ€§å‘½ï¼Œè€å
> 出æ¥åŽæ›´æ˜¯ä¸€æ¡å¥½æ±‰ã€
> ‚北风还是èˆä¸å¾—*霸地ä½ã
> €è‚‰ã€ä¹¦ã€å
> ¥³äººå’Œç½‘络的,ä¸è¿‡ç‰
> ¢é‡Œä¸ä¼šæ供这些。å¦â
> €¦;山西,浙江;China;Zhejiang;;;
> 28.695035;119.751054'
> in 'utf8towcs'
>>
>
> Can anyone suggest what I can do?
>
> P.S. Actually, I would love to remove all the non-English tweets but
> I have
> no clue about how to do that.
>
> --
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list