Credit Card Fraud

Description

The Credit Card Fraud detection Dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

The dataset has been collected and analyzed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection.

Implementation Steps in R

Exploring the dataset

Perform EDA using Univariate, Bivariate and Multivariate analysis

Visualizing and understanding the feature plots and correlation plots

Create pairwise plots for each attribute

Create density plots for each attribute

Learn to handle imbalanced data using oversampling, undersampling and mixed sampling

Learn to remove redundant features

Rank features using LVQ model (Learning Vector Quantization)

Select features using RFE method (Recursive Feature Elimination)

Learn to preprocess using LDA (Linear Discriminant Analysis)

Apply Linear Algorithms like Logistic Regression model

Apply Non-Linear Algorithms like SVM (Support Vector Machine), KNN (K Nearest Neighbour) and Naive Bayes

Apply Non-Linear Algorithms like CART (Classification and Regression Trees)

Apply Ensemble Algorithms like RandomForest, Bagging CART, Gradient Boosting model

Perform GLMNet Regression analysis

Apply the Neural Network model

Compare the results of different models

Select the best model

Visualize results using box and whisker plots

Code for this project: https://github.com/michaelkaya/

Graphic Reference:

https://www.eastwestbank.com/ReachFurther/en/News/Article/Credit-Card-Fraud-The-Three-Words-You-Never-Want-to-Hear