[R] Urgent - R help - Multivariate - Naive Bayes code for R
Athmakuru Prasad
@thm@kuru @ending from gm@il@com
Fri May 25 06:08:16 CEST 2018
Friends,
I am doing a URL classification, based on certain key words whether it
contains an executive information or not. I have already went through 50K
URL's and identified the key words and made it as 0, 1 ( 0 - do not have
the key word and 1 - have the key word) and 0- do not contain executive
information 1 - contains executive information.
A sample set of data is shown below.
DomainID Domain LinkID Raw_Link Cleansed_Link Biz_name Address1 City State
PostalCode Address_Page_flag Executive_page_flag collections other_keywords
conditions policy story history brand login job career who company people
staff Board management team terms privacy shop gallery News location site
Sitemap page Content Event blog categories Services Index Product Reviews
Testimonials about contact Home_page Link_Len LinkCount ExecWordCount
ExecWordRatio Category
250842730 www.aaronwomenscenterhouston.com 250842730-1
http://www.aaronwomenscenterhouston.com aaronwomenscenterhouston.com AARON
WOMEN’S CENTER 2505 North Shepherd Dr Houston TX 77008 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 28 9 1 0.48%
Clinic
250842730 www.aaronwomenscenterhouston.com 250842730-2
http://www.aaronwomenscenterhouston.com/surgical-termination
aaronwomenscenterhouston.com/surgical-termination AARON WOMEN’S CENTER 2505
North Shepherd Dr Houston TX 77008 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 49 9 1 0.65% Clinic
250842730 www.aaronwomenscenterhouston.com 250842730-3
http://www.aaronwomenscenterhouston.com/non-surgical-termination
aaronwomenscenterhouston.com/non-surgical-termination AARON WOMEN’S CENTER 2505
North Shepherd Dr Houston TX 77008 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 53 9 1 0.79% Clinic
250842730 www.aaronwomenscenterhouston.com 250842730-4
http://www.aaronwomenscenterhouston.com/birth-control
aaronwomenscenterhouston.com/birth-control AARON WOMEN’S CENTER 2505 North
Shepherd Dr Houston TX 77008 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42 9 1 0.59% Clinic
250842730 www.aaronwomenscenterhouston.com 250842730-5
http://www.aaronwomenscenterhouston.com/late-term-termination
aaronwomenscenterhouston.com/late-term-termination AARON WOMEN’S CENTER 2505
North Shepherd Dr Houston TX 77008 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 9 2 0.71% Clinic
250842730 www.aaronwomenscenterhouston.com 250842730-6
http://www.aaronwomenscenterhouston.com/patient-forms
aaronwomenscenterhouston.com/patient-forms AARON WOMEN’S CENTER 2505 North
Shepherd Dr Houston TX 77008 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42 9 1 0.78% Clinic
I understand that i need to use Multivariate Bernouli classification to
segregate the URL's......I am struggling to get an appropriate R code for
the same....
Any help in providing and R code for this would be greatly appreciated.
Cheers
ALN
[[alternative HTML version deleted]]
More information about the R-help
mailing list