Using machine learning to solve data imbalance in AML L1 alerts

Our recommendations

Global Research And Analytics
March 21, 2018
Brexit gauge: An impact assessment template for financial institutions
CRISIL Global Research & Analytics
June 28, 2017
Machine learning for customer risk ratings
CRISIL Global Research and Risk Solutions
July 20, 2023
The SS1/23 ask
CRISIL Global Research and Risk Solutions
June 14, 2023
MRM in times of AI
CRISIL Global Research and Risk Solutions
May 30, 2023
Enhancing asset management insights with alternative data ingestion

Report
Financial Crime
Banks
Global Research And Analytics
Financial Crime and Compliance
Data Imbalance Problem

June 15, 2018

Using machine learning to solve data imbalance in AML L1 alerts

Is your data well-balanced to train the machine learning model?

Download the report now

TData in the banking and financial services sectors has grown exponentially with the rise in money laundering and other financial crimes across the globe. Anti-money laundering (AML) data, in particular, has evolved dramatically and grown in volume due to the complexity of existing alerts as well as generation of new types of alerts.

Understanding customer level and transactions data is important in model development activities, which are vital to AML programs.

Based on various studies on financial crime compliance or FCC, researchers have found growing data imbalance problem between the minority class and the majority class (the minority class being true matches or true alerts, and the majority class being false matches or false positives).

Classical or traditional models favour the majority class and usually show inferior performance on the minority class. Presenting imbalanced data to a classifier will produce undesirable results, such as a much lower performance on testing data than training data.

However, a good AML model should perform equally well on both minority and majority classes.

The cost-sensitive learning methods consider higher costs for misclassification of observations in the minority class to address the anomaly. However, using a cost-sensitive learning method requires knowledge of the cost of misclassification, which is often unknown and therefore has to be assumed.

Machine learning algorithms and data mining solutions have provided an opportunity to understand the nature of imbalanced data. Machine learning techniques attempt to resolve class imbalance problems using sampling techniques, optimisation of model structure and learning algorithms. For imbalanced datasets, applying traditional methodologies such as K Nearest Neighbors, and Naive Bayes, results in inferior performance of the algorithms.

In this paper, we focus on the current challenges faced in using traditional methods for classification with imbalanced datasets, which rely on conventional sampling techniques to balance datasets. Additionally, we discuss alternative data balancing techniques to rebalance the data and a few of the machine learning classification algorithms that adapt themselves to deal with minority class data detection.

Sucess Dialog

Error Dialog

Our recommendations

Global Research And Analytics

CRISIL Global Research & Analytics

CRISIL Global Research and Risk Solutions

CRISIL Global Research and Risk Solutions

CRISIL Global Research and Risk Solutions

Using machine learning to solve data imbalance in AML L1 alerts

Is your data well-balanced to train the machine learning model?

Understanding customer level and transactions data is important in model development activities, which are vital to AML programs.

Based on various studies on financial crime compliance or FCC, researchers have found growing data imbalance problem between the minority class and the majority class (the minority class being true matches or true alerts, and the majority class being false matches or false positives).

Classical or traditional models favour the majority class and usually show inferior performance on the minority class. Presenting imbalanced data to a classifier will produce undesirable results, such as a much lower performance on testing data than training data.

However, a good AML model should perform equally well on both minority and majority classes.

The cost-sensitive learning methods consider higher costs for misclassification of observations in the minority class to address the anomaly. However, using a cost-sensitive learning method requires knowledge of the cost of misclassification, which is often unknown and therefore has to be assumed.

Related links

Sucessful

Error

Sucess Dialog

Error Dialog

Our recommendations

Global Research And Analytics

CRISIL Global Research & Analytics

CRISIL Global Research and Risk Solutions

CRISIL Global Research and Risk Solutions

CRISIL Global Research and Risk Solutions

Using machine learning to solve data imbalance in AML L1 alerts

Is your data well-balanced to train the machine learning model?

Sucess Dialog

Warning Dialog

Understanding customer level and transactions data is important in model development activities, which are vital to AML programs.

Based on various studies on financial crime compliance or FCC, researchers have found growing data imbalance problem between the minority class and the majority class (the minority class being true matches or true alerts, and the majority class being false matches or false positives).

Classical or traditional models favour the majority class and usually show inferior performance on the minority class. Presenting imbalanced data to a classifier will produce undesirable results, such as a much lower performance on testing data than training data.

However, a good AML model should perform equally well on both minority and majority classes.

The cost-sensitive learning methods consider higher costs for misclassification of observations in the minority class to address the anomaly. However, using a cost-sensitive learning method requires knowledge of the cost of misclassification, which is often unknown and therefore has to be assumed.

Thank You

Warning

Warning

Related links

What's popular