[R] convert list to Dataframe
onyourmark
william108 at gmail.com
Sun Nov 1 14:24:54 CET 2009
Hello. The "fields" are separated by a ';'. I think that the data is
"rectangular" in the sense that there are about 15 fields for each row. Some
of the fields are empty. In the dput() display below, it seems that the rows
are delimited by ' " ' .
Any idea from this?
Here is the end of the output for dput(twitter)
"4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings 15K Manage
Holiday Rush [Black Friday]
http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136",
"4927863;05:04:14;28;10;2009;padden;Rachel master chef cook
anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114",
"4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty success bored
attentions people formerly snubbed you. -Mary Wilson Little
#quote;UK;United Kingdom;;;;55.378051;-3.435973",
"4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight conchords,
pleeeeeaaase :) thanks rosie
xx;Australia;Australia;;;;-25.274398;133.775136",
"4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach las ich
jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um 10 n.
kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526",
"4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health care
reform rural America: By Christopher Smart The health-care crisis ..
http://bit.ly/49Iqcu;London;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362",
"4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters Studio O+A: San
Francisco based interior design firm Studio O+A designed ..
http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114"
), Author = character(0), DateTimeStamp = structure(list(sec =
56.4049999713898,
min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L,
wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", "min",
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXt",
"POSIXlt"), tzone = "GMT"), Description = character(0), Heading =
character(0), ID = "1", Language = "en", LocalMetaData = list(), Origin =
character(0), class = c("PlainTextDocument",
"TextDocument", "character"))), CMetaData = structure(list(NodeID = 0,
MetaData = structure(list(create_date = structure(list(sec =
56.4059998989105,
min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L,
wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXt", "POSIXlt"), tzone = "GMT"), creator =
structure("", .Names = "LOGNAME")), .Names = c("create_date",
"creator")), Children = NULL), .Names = c("NodeID", "MetaData",
"Children"), class = "MetaDataNode"), DMetaData = structure(list(
MetaID = 0), .Names = "MetaID", row.names = c(NA, -1L), class =
"data.frame"), class = c("VCorpus",
"Corpus", "list"))
onyourmark wrote:
>
> Hi. I have a huge list called twitter:
>
>> dim(twitter)
> NULL
>> str(twitter)
> List of 1
> $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic
> [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons
> For Governance From Campaigner-in-chief: President obama jumps campaign
> 09 tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
> 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading Washington
> meets Gen. Jim Jones, Sen. John McCain others. Will Obama team raise
> worries EU ties?;London, England;United Kingdom;Greater
> London;Westminster;;51.5001524;-0.1262362
> 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses
> wearing thin Obama, media pals... http://tinyurl.com/yfw6cd9;So.
> California;USA;CA;;;36.778261;-119.4179324
> 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama Afghanistan
> troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
> #video;USA;USA;;;;37.09024;-95.712891 ...
> .. ..- attr(*, "Author")= chr(0)
> .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
> 04:46:56"
> .. ..- attr(*, "Description")= chr(0)
> .. ..- attr(*, "Heading")= chr(0)
> .. ..- attr(*, "ID")= chr "1"
> .. ..- attr(*, "Language")= chr "en"
> .. ..- attr(*, "LocalMetaData")= list()
> .. ..- attr(*, "Origin")= chr(0)
> - attr(*, "CMetaData")=List of 3
> ..$ NodeID : num 0
> ..$ MetaData:List of 2
> .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
> .. ..$ creator : Named chr ""
> .. .. ..- attr(*, "names")= chr "LOGNAME"
> ..$ Children: NULL
> ..- attr(*, "class")= chr "MetaDataNode"
> - attr(*, "DMetaData")='data.frame': 1 obs. of 1 variable:
> ..$ MetaID: num 0
> - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"
>
> It contains tweets but in many languages. The "columns" are separated by
> semi-colons. I am using the tm package and it is a "corpus".
>
> It looks like this:
>
> 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1 day
> :p;Huddersfield/Lincoln;United
> Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
> 547283;06:37:17;21;10;2009;fabiomafra;alguém traz mais lenha pro
> computador da facool? BOM DIA.;Belo Horizonte - MG -
> BR;Brazil;MG;;;-19.8157306;-43.9542226
> 547284;06:37:17;21;10;2009;romanotr;Вау, "Репортеры без границ"
> опубликовали список стран со свободой слова, из 173 Грузия на 81 месте
> опережая Украину. Успехи,успехи...;Portugal
> Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
> 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton <\;Someone's
> Daughter>\;;Kanazawa, Japan;Japan;Ishikawa
> Prefecture;;;36.5613254;136.6562051
> Error: invalid input
> '547286;06:37:18;21;10;2009;Atogey;支æŒä½ ,国家需è¦ä»–们,但是国家的未æ¥ä¸èƒ½é 他们…RT
> @zuola ￿我觉得 @wenyunc
>
> I want to convert it to "fields" or columns and so I thought I should
> convert it to a dataframe. I tried
>
>> twitterDF<-as.data.frame(twitter)
> Error in sort.list(y) :
> invalid input
> '547286;06:37:18;21;10;2009;Atogey;支æŒä½ ,国家需è¦ä»–们,但是国家的未æ¥ä¸èƒ½é 他们…RT
> @zuola ￿我觉得 @wenyunchao
> 一点都ä¸ä¹è§‚。真æ£çš„ä¹è§‚åº”è¯¥æ˜¯ï¼šä½ å…³æˆ‘åˆæ€Žä¹ˆæ ·ï¼Œåæ£æ”¿æ²»æ–—争ä¸ä¼šä¸¢æŽ‰æ€§å‘½ï¼Œè€å出æ¥åŽæ›´æ˜¯ä¸€æ¡å¥½æ±‰ã€‚北风还是èˆä¸å¾—*霸地ä½ã€è‚‰ã€ä¹¦ã€å¥³äººå’Œç½‘络的,ä¸è¿‡ç‰¢é‡Œä¸ä¼šæ供这些。å¦â€¦;山西,浙江;China;Zhejiang;;;28.695035;119.751054'
> in 'utf8towcs'
>>
>
> Can anyone suggest what I can do?
>
> P.S. Actually, I would love to remove all the non-English tweets but I
> have no clue about how to do that.
>
>
--
View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148893.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list