[R] clean Email format data

climmi kewang.2009 at hotmail.com
Tue Jun 26 12:23:39 CEST 2012


Dear all 

I am now going to do some text analysis using R. 
However, the data is very noisy that I need to clean it first.
I don't have much experience in the text cleaning process.   Is anyone would
provide help on this?
If you are able to provide some similar code which was done before would be
greatly appreciated.

May content is mainly the Feedback data through 
*Phone call record*:  (usally the structure looks like the below one)
*Email:*   the common email corresponding , usually got a lot of history ,
and also some footnote such as "if you are not the intended reciepient... "
etal..

I know it's quite a complex problem and can not be solved by a single
answer,so,  some tips is also very good, I will ...... 


One example of the data: 



#########################################
Fyna.       

<g-ccdfa at adfae.com>    
 24/06/2012 09:15 AM              
To        <g-ccdfa at adfae.com>          
cc         <g-ccdfa at adfae.com>          
Subject         ase Mewrr asdffID:dde_20120624_15988015_11653024 *  (keep
this part)*


CUSTOMER DETAILS Name      : Mr dffa  
Company     :  da
Address     :  ff
Home No.     :  
Office No.     : 
Payphone Ext     :  
Mobile No.     :  
Fax No.     :  
Email     :  
CASE DETAILS Division     : * dsaf (RIM) (keep this part)*
Category 1     : * dsaf (RIM) (keep this part)*
Category 2     : * dsaf (RIM) (keep this part)*
Category 3     :   
Veh Reg Num     :   
COMMENTS  24/06/2012 09:15:23 AM (Name) -  Location @Ddaferdsdaf Rd   



Caller feedback Content.. ("*This part I need to keep*")


NFORMANT STATES 
Date & Time : 24/06/2012 09:15:31 AM  
CSO ID : dasf  


https://MSCCasdfEB/LsdfA/Madsf.htm?pardsnDc?0pAsdoE9.=cS0eiIcp9m
############################################################

--
View this message in context: http://r.789695.n4.nabble.com/clean-Email-format-data-tp4634491.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list