Credit risk analysis models: construction and application

The Covid-19 health crisis has led to two situations:

  • An increase in the number of loan applications related to the creation of state guarantees.
  • An increase in uncertainty about the probability of default.

To mitigate risk, banks and lenders must accelerate their transformation and have a reliable and fast risk scoring model. In October, we developed our own risk scoring models. These are now available to third parties with the launch of October Connect, our neolending technology for corporate finance.

To talk about our risk analysis models, we met Tejas Sherkar, our Head of Data at October. Read his interview now.


What is a credit risk analysis model? What is the purpose of these models?

A credit risk model is basically a set of rules to quantify the risk involved in extending credit to a borrower. These rules and the data fed to them determines the nature, complexity and the performance of the model.

Rating models mostly focus on predicting credit-worthiness of the borrower. Whereas, scoring models can predict credit-worthiness and potential default.

At October, we focus on building scoring models by means of predictive analytics.

How do you build a scoring model? Focus on Machine Learning, assessment, implementation, validation

Once we have a clear vision of the question to be solved, we can start building a model. In the case of October, the question was: how can we process loan applications in a fast, scalable and secured manner in order to help as many borrowers as possible while keeping our default risk low?

Here, we are dealing with a binary (default vs non-default) classification problem.

We start by gathering the data from our data lake (a data store built in-house with enforced ACID properties) which includes existing companies in our portfolio and their repayment behaviour, all historical loan requests and their associated financials, bank transaction data and default flags.

This is typically followed by a data cleaning step, where we look at the distribution of all data points related to historical loan requests, to treat outliers and missing values. The purpose of this exercise is to understand our population, and build a representative dataset on which we can train our model.

At October, we use both linear or non-linear models trained on this representative dataset. Non-linear models are often considered to behave like a black-box, but we make use of SHAP to make non-linear models fully explainable.

What is the lifecycle of a risk scoring model?

After the model is trained and deployed in production, we monitor the data points (which the model uses for scoring) of the new loan requests over a period of time (usually 3-6 months).

If the statistical properties of these new data points have changed significantly as compared to the last model training, it is likely we will re-train the model and deploy an improved iteration of it in production. But this is not something to be done lightly: we need to understand what changed in the population and the biases that were introduced.

We are also on the lookout for new data points, either newly engineered from existing data or from suppliers, that could improve the performance of our model.

October introduced Magpie in 2020. What is Magpie?

Magpie is an instant credit-risk scoring model which looks at the financial (balance sheet + income statement) and behavioural information of the SME and gives a score from 1 to 5 in the order of increasing probability of default.

Kea was launched earlier this month. What is the difference between Kea and Magpie scoring?

Under the hood, Magpie and Kea are built using the same class of machine learning models. However, they differ in the type of information analysed and category of companies targeted.

Magpie looks at the borrowers financial and behavioural data to asses the probability of default (PD).
Whereas, Kea analyses bank transactions and behavioural data to calculate the borrower’s probability of default.

What does Kea analyse in the bank transactions of the company?

Bank transactions provide a unique insight into the day-to-day operations of a company and Kea engineers many attributes to analyse the borrower’s ability and willingness to repay the prospective loan.

The attributes range from whether costs are paid regularly, to existing loan repayment schedules, to late payments and bank balance trends over time.

What was the impact of the new DSP2 (open banking) regulation on the creation of this new risk model scoring?

The DSP2 regulation enables the client to securely share his/her company’s bank data with a lender (like October) via an API within seconds.

A model like Kea (based entirely on bank transactions) can therefore analyse this data instantly, allowing for a safer and faster .

What impact this risk scoring model have on the credit process?

Risk scoring models like Magpie and Kea reduce the time to arrive at a credit decision and help in developing a scalable business.

They also bring a certain predictability to the whole product offering, where we can let our partners and borrowers know early in the process about necessary steps to follow or documents to have handy.

Will there be any tasks performed manually?

While credit scoring is done automatically, we rely on the expertise of our Operations team to preform customer identification, some anti-fraud checks and due-diligence before funding the borrower.

What types of companies Kea scoring address?

At this moment, Kea scores micro-companies in 🇫🇷 and in 🇮🇹. The loan amount can be up to €30k with or without state guarantee.