Supervised Learning for Binary Classification on US Adult Income
Abstract
In this project, various binary classification methods have been used to make predictions about US adult income level in relation to social factors including age, gender, education, and marital status. We first explore descriptive statistics for the dataset and deal with missing values. After that, we examine some widely used classification methods, including logistic regression, discriminant analysis, support vector machine, random forest, and boosting. Meanwhile, we also provide suitable R functions to demonstrate applications. Various metrics such as ROC curves, accuracy, recall and F-measure are calculated to compare the performance of these models. We find the boosting is the best method in our data analysis due to its highest AUC value and the highest prediction accuracy. In addition, among all predictor variables, we also find three variables that have the largest impact on the US adult income level.
Copyright (c) 2021 Li-Pang Chen
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright of their work, with first publication rights granted to Tech Reviews Ltd.