EXPLANATORY VARIABLES --------------------- | 48 continuous real [0,100] attributes of type word_freq_WORD = percentage | of words in the e-mail that match WORD, i.e. 100 * (number of times the | WORD appears in the e-mail) / total number of words in e-mail. A "word" | in this case is any string of alphanumeric characters bounded by | non-alphanumeric characters or end-of-string. 01 word_freq_make: continuous. 02 word_freq_address: continuous. 03 word_freq_all: continuous. 04 word_freq_3d: continuous. 05 word_freq_our: continuous. 06 word_freq_over: continuous. 07 word_freq_remove: continuous. 08 word_freq_internet: continuous. 09 word_freq_order: continuous. 10 word_freq_mail: continuous. 11 word_freq_receive: continuous. 12 word_freq_will: continuous. 13 word_freq_people: continuous. 14 word_freq_report: continuous. 15 word_freq_addresses: continuous. 16 word_freq_free: continuous. 17 word_freq_business: continuous. 18 word_freq_email: continuous. 19 word_freq_you: continuous. 20 word_freq_credit: continuous. 21 word_freq_your: continuous. 22 word_freq_font: continuous. 23 word_freq_000: continuous. 24 word_freq_money: continuous. 25 word_freq_hp: continuous. 26 word_freq_hpl: continuous. 27 word_freq_george: continuous. 28 word_freq_650: continuous. 29 word_freq_lab: continuous. 30 word_freq_labs: continuous. 31 word_freq_telnet: continuous. 32 word_freq_857: continuous. 33 word_freq_data: continuous. 34 word_freq_415: continuous. 35 word_freq_85: continuous. 36 word_freq_technology: continuous. 37 word_freq_1999: continuous. 38 word_freq_parts: continuous. 39 word_freq_pm: continuous. 40 word_freq_direct: continuous. 41 word_freq_cs: continuous. 42 word_freq_meeting: continuous. 43 word_freq_original: continuous. 44 word_freq_project: continuous. 45 word_freq_re: continuous. 46 word_freq_edu: continuous. 47 word_freq_table: continuous. 48 word_freq_conference: continuous. | 6 continuous real [0,100] attributes of type char_freq_CHAR = percentage | of characters in the e-mail that match CHAR, i.e. 100 * (number of CHAR | occurences) / total characters in e-mail 49 char_freq_;: continuous. 50 char_freq_(: continuous. 51 char_freq_[: continuous. 52 char_freq_!: continuous. 53 char_freq_$: continuous. 54 char_freq_#: continuous. | 1 continuous real [1,...] attribute of type capital_run_length_average | = average length of uninterrupted sequences of capital letters 55 capital_run_length_average: continuous. | 1 continuous integer [1,...] attribute of type capital_run_length_longest | = length of longest uninterrupted sequence of capital letters 56 capital_run_length_longest: continuous. | 1 continuous integer [1,...] attribute of type capital_run_length_total = | sum of length of uninterrupted sequences of capital letters = total | number of capital letters in the e-mail 57 capital_run_length_total: continuous. RESPONSE VARIABLE ----------------- | 1 nominal {0,1} class attribute of type spam = denotes whether the e-mail | was considered spam (1) or not (0), i.e. unsolicited commercial e-mail.