Automated Data Lineage Generation for Risk Models for a Multinational Banking and Financial Services Corporation

Client : A multinational banking and financial services corporation

 

Objective

 

  • Analyze model execution scripts to understand and map data flow and lineage across various systems 
  • Identify inconsistencies such as circular referencing of feeds, sourcing from non-authorized data, white paper non-compliance, etc.

 

CRISIL's solution

 

Model Scripts

  • SQL
  • SAS

 

Automated Parser for SQL Scripts

 

Manual Parsing for SAS Scripts

 

Template to Record Data Flow

  • Version
  • Script Name
  • Source Database, Table and Column
  • Target Database, Table and Column
  • Transformation applied (For e.g., multiplication by defined rates, categorical transformations, etc.)

 

Data Lineage Tool

  • Ability to query on model and version
  • Map data control checks such as circular referencing of feeds, sourcing and white paper inconsistencies, etc.
  • Central repository for data dictionary

 

Data Flow Diagram

  • Visual flow diagram representation of how data is sourced, consumed and processed in a model
  • Directed graph representation enables quick audit and validation of the model process
  • Technology Stack:
  • Parser for SQL build using Python
  • Data flow diagram build using NetworkX library in Python

 

Client impact

 

  • Accurately map data lineage and version control across model scripts
  • Ability to map and tag data control checks
  • Enable quick data audits by visual representation
  • Automated data dictionary generation and audits

Request for services

Error Msg
Error Msg
Error Msg

Questions



Looking for high-end research and risk services? Reach out to us at:

United States
1-855-595-2100/
+1 646 292 3520

 

United Kingdom
+44 (0) 870 333 6336

India
+91 22 33 42 3000 /
+91 22 61 72 3000