Fraud Detection

 

Fraud Detection Using Machine Learning deploys a machine learning (ML) model and an example dataset of credit card transactions to train the model to recognize fraud patterns.

  1. Gradient Boosting Machine for credit card fraud detection
    1. Input: A Credit card dataset with 30,000 instances is taken from UCI ML Repository.
    2. The parameters:  No.  of iterations, loss function, Weak learner and sampling ratio are chosen.
    3. The LightGBM model is optimized with the following steps:
      1. The number of estimators or boosted trees will influence the performance of the LGBM. Models with varying numbers of trees are constructed and evaluated to decide the optimal number of nopt.
      2. In low and medium datasets, the occurrence of overfitting is the most common problem. Therefore, the maximum depth Dmax of trees should be limited.
      3. Set the number of tree leaves, Nleaves=2Dmax to get the same number of leaves for depth-wise trees. Appropriate value of this parameter is used to moderate the complexity level of the LGBM tree. If depth is unconstrained, it can induce overfitting, therefore the Nleaves should be smaller than 2Dmax.
      4. Build multiple LGBM models with varying Dmax and Nleaves parameters using 10-fold cross validation.
      5. Validate the model using dynamic Credit Transaction input and predict whether it is fraudulent or legitimate transaction.
      6. The performance of this model is evaluated using precision, recall, and accuracy parameters.
      1. Long Short-Term Memory (LSTM)
        1. It is a special type of artificial Recurrent Neural Network (RNN) architecture used to model time series information in the field of deep learning.
        2. LSTM unit consists of a memory cell that stores information which is updated by three special gates: the input gate, the forget gate and the output gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.
        3. A pattern recognition LSTM networks with 9 input neurons since each input feature present in our dataset will be represented by its input neuron. Feature ‘Fraud status’ is used as output neuron. One hidden layer with 15 neurons was used to analyze the structure of the networks.
        4. The implementation steps of the model are detailed below:
          1. Reshape dataset into three-dimensional tensor (samples, number of timesteps, number of features).
          2. Define learning parameters (memory size, learning rate, batch size, and epochs).
          3. Define LSTM cell.
          4. Set tensor variables for weight and bias vectors.
          5. Divide dataset into training, validation, and testing.
          6. Compute the output based on SoftMax activation function.
          7. Define cross entropy loss function.
          8. Add Adam optimization function to minimize the cross-entropy loss function.
          9. Repeat:
            -> Compute training error.
            -> Compute validation error.
            -> Update weights and biases using back propagation.
          10. Predict for testing dataset using trained LSTM.
          1. Autoencoder
            1. An autoencoder is a feed-forward multilayer neural network that reproduces the input data on the output layer. 
            2. Use oversampling to transform the imbalanced dataset to a balanced dataset. Then use a denoised autoencoder to get the denoised dataset. Finally, use a deep fully connected neural network model for final classification.
            3. Oversampling is a technique used to deal with imbalanced dataset, its subject to create specific class sample so the class distribution of the original dataset can be balanced.
            4. Denoised autoencoder is a variation of traditional autoencoder which could make autoencoder neural network learn how to remove the noise and reconstruct undisturbed input as much as possible
            5. Entropy is a measure for information contents and could be defined as the unpredictability of an event. Cross-entropy can be used in multi-classification problems with the combination of SoftMax giving better training performance on neural networks.
            1. Contrastive learning
              1. Contrastive learning is a self-supervised learning approach that focuses on learning useful representations by distinguishing between similar and dissimilar data points.
              2. By maximizing the similarity between positive pairs (augmented views of the same data point) and minimizing the similarity between negative pairs (augmented views of different data points), contrastive learning helps models understand the inherent structure of the data.
              3. To implement contrastive learning, start by collecting and preprocessing your dataset, including performing data augmentation to create positive and negative pairs. Define a suitable neural network architecture as an encoder to map input data to a lower-dimensional latent space. Use a contrastive loss function, such as InfoNCE, to train the model. Initialize the encoder network parameters and, for each epoch, apply augmentations to the images, compute their latent representations, and calculate the contrastive loss. Backpropagate the loss and update the network parameters using an optimizer. After training, evaluate the encoder’s performance on downstream tasks, and fine-tune if necessary.