E-mail Spam Filtering by A New Hybrid Feature Selection Method Using Chi2 as Filter and Random Tree as Wrapper
Keywords:Feature Extraction, Feature Selection, Classification, Spam Filtering, Machine Learning.
The purpose of this research is presenting a machine learning approach for enhancing the accuracy of automatic spam detecting and filtering and separating them from legitimate messages. In this regard, for reducing the error rate and increasing the efficiency, the hybrid architecture on feature selection has been used. Features used in these systems, are the body of text messages. Proposed system of this research has used the combination of two filtering models, Filter and Wrapper, with Chi Squared (Chi2) filter and Random Tree wrapper as feature selectors. In addition, Multinomial Naïve Bayes (MNB) classifier, Discriminative Multinomial Naïve Bayes (DMNB) classifier, Support Vector Machine (SVM) classifier and Random Forest classifier are used for classification. Finally, the output results of this classifiers and feature selection methods are examined and the best design is selected and it is compared with another similar works by considering different parameters. The optimal accuracy of the proposed system is evaluated equal to 99%.
Authors who publish with Engineering Journal agree to transfer all copyright rights in and to the above work to the Engineering Journal (EJ)'s Editorial Board so that EJ's Editorial Board shall have the right to publish the work for nonprofit use in any media or form. In return, authors retain: (1) all proprietary rights other than copyright; (2) re-use of all or part of the above paper in their other work; (3) right to reproduce or authorize others to reproduce the above paper for authors' personal use or for company use if the source and EJ's copyright notice is indicated, and if the reproduction is not made for the purpose of sale.