Big Data for Credit Scoring: Opportunities and Challenges


Throughout the past few decades banks have gathered plenty of information describing the default behavior of their customers. Examples are historical information about a customerís date of birth, gender, income, employment status, etc. All this data has been nicely stored into huge (e.g. relational) databases or data warehouses.

Credit: Pixabay
Credit: Pixabay

On top of this, banks have accumulated lots of business experience about their credit products. As an example, many credit experts do a pretty good job at discriminating between low risk and high risk mortgages using their business expertise only. It is now the aim of credit scoring to analyze both sources of data into more detail and come up with a statistically based decision model which allows to score future credit applications and ultimately decide which ones to accept or reject.

The emergence of big data has created both opportunities and challenges for credit scoring. Big data is often characterized in terms of its four Vís: Volume, Variety, Velocity and Veracity. To illustrate this, letís briefly zoom into some key sources or processes generating big data.

Traditional sources are large-scale transactional enterprise systems such as OLTP (On-Line Transaction Processing), ERP (Enterprise Resource Planning) and CRM (Customer Relationship Management) applications. Classical credit scorecards are typically constructed using data extracted from these traditional transactional systems.

The online social graph is a more recent example. Think about the major social networks such as Facebook, Twitter, LinkedIn, Weibo, and WeChat. All together these networks capture information about close to two billion people relating to their friends, preferences, and other behavior, thereby leaving a massive digital trail of data. Also think about the Internet of Things (IOT) or the emerging sensor enabled ecosystem that is going to connect various objects (e.g. homes, cars, etc.) with each other, and with humans. Finally, we see more and more open or public data such as data about weather, traffic, maps, and macro-economy. It is clear that all these new data sources offer a tremendous potential for building better credit scoring models.


Some challenges
All the above data-generating processes can be characterized in terms of the sheer volume of data that is being generated. Clearly, this poses serious challenges in terms of setting up scalable storage architectures, combined with a distributed approach to data manipulation and querying.

Big data usually comes in a great variety or in various formats. Traditional data types, or structured data, such as customer name, customer birth date, etc., are more and more complemented with unstructured data such as images, fingerprints, tweets, emails, Facebook pages, sensor data, and GPS data. Although the former can be easily stored in traditional (e.g. relational) databases, the latter needs to be accommodated using the appropriate database technology facilitating the storage, querying, and manipulation of each of these types of unstructured data. Also, this requires a substantial effort since it is claimed that at least 80% of all data is unstructured.

Velocity refers to the speed at which the data is generated and needs to be stored and analyzed. Think about streaming applications such as on-line trading platforms, YouTube, SMS messages, credit card swipes, and phone calls, which are all examples where high velocity is a key concern.

Veracity indicates the quality or trustworthiness of the data. Unfortunately, more data does not automatically imply better data, so the quality of the data generating process must be closely monitored and guaranteed.

As the volume, variety, velocity, and veracity of data continue to grow, so do the new opportunities for building better credit scoring models. Think about Facebook or Twitter as an example. It is quite obvious that knowing a credit applicantís hobbies, followers, friends, likes, education, and workplace could be very beneficial to better quantify his/her creditworthiness. In other words, a customerís social standing, on-line reputation and professional connections are likely to be related to his/her credit quality. Another useful data source concerns call detail records or CDR data which capture the mobile phone usage of an applicant. Also surfing behavior could be a nice add-on.


Opportunities
Clearly, the availability of these big data sources creates both opportunities as well as challenges for credit scoring. For example, the availability of social network and CDR data may be beneficial in various settings. First, it may be useful to score customers who lack borrowing experience (e.g. because itís their first loan or they recently moved to a new country) and would be automatically perceived as risky according to traditional credit scoring models, which rely on historical information. By using these alternative data sources, a better assessment of the credit risk can be made, which can then be translated into a more favorable interest rate. This obviously gives an incentive to the customer to disclose his/her social network, CDR or other relevant data to the bank. Other examples are developing countries. In these countries, banks often lack historical credit information and no local credit bureaus may be available. Hence, other data sources should be used to optimize access to credit. Given the widespread use of social networks and/or mobile phones (even in developing countries!), the data gathered might be an interesting alternative to do credit scoring.

Obviously, using the above mentioned data sources also comes with various challenges. The first one concerns privacy. It is important that customers are properly informed about what data is used to calculate their credit score. An opt-out option should always be provided. Furthermore, using social network data for credit scoring can trigger new fraud behavior whereby customers strategically construct their social network to artificially and maliciously brush up their credit quality. One example is that customers can easily buy Twitter followers to boost their credit scores. Finally, regulatory compliance might become an important issue. Many countries prohibit the use of gender, age, marital status, national origin, ethnicity and beliefs for credit scoring. Much of this information can be easily scraped from social networks. Hence, it may be harder to oversee regulatory compliance when using social network or other data for credit scoring.

For more information, refer to Bartís On-Line Self-Paced E-learning course on Credit Risk Modeling, see Credit Risk Modeling. Bart Baesens is a professor at KU Leuven (Belgium), and a lecturer at the University of Southampton (United Kingdom). He has done extensive research on big data and analytics, customer relationship management, web analytics, fraud detection, and credit risk management. His findings have been published in well-known international journals (e.g. Machine Learning, Management Science, IEEE Transactions on Neural Networks, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Evolutionary Computation, Journal of Machine Learning Research,) and presented at international top conferences. He is author of the books Analytics in a Big Data World, and Fraud Analytics using Descriptive, Predictive and Social Network Techniques and teaches E-learning courses on Advanced Analytics in a Big Data World and Credit Risk Modeling. His research is summarized at www.dataminingapps.com. He also regularly tutors, advises, and provides consulting support to international firms with respect to their big data, analytics and credit risk management strategy.

Bart Baesens, KU Leuven

Professor Bart Baesens is a professor at KU Leuven (Belgium), and a lecturer at the University of Southampton (United Kingdom). He has done extensive research on big data and analytics, customer relationship management, web analytics, fraud detection, and credit risk management. His findings have been published in well-known international journals (e.g. Machine Learning, Management Science, IEEE Transactions on Neural Networks, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Evolutionary Computation, Journal of Machine Learning Research) and presented at international top conferences. He is author of the books Credit Risk Management: Basic Concepts, Analytics in a Big Data World, and Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection and teaches E-learning courses on Advanced Analytics in a Big Data World and and Credit Risk Modeling. His research is summarized at www.datamingapps.com. He also regularly tutors, advises and provides consulting support to international firms with respect to their big data, analytics, and credit risk management strategy.

Critical Reflections on Insourcing and Outsourcing Big Data & Analytics

Organizations that are considering outsourcing options for their analytics initiatives need to take a critical look and plan carefully. Many companies find themselves opting for only limited outsourcing.


Re: Pros and Cons
  • 10/31/2016 7:40:13 PM
NO RATINGS

That'd be a good use of socil media data, rbaz. I'd hope it wouldn't influence any more than 10% of the score.

Re: Pros and Cons
  • 10/18/2016 8:41:08 AM
NO RATINGS

@Lyndon. I think those people who post a constant stream of kitten photos might actually be poor credit risks. After all, if they spend that much time on Facebook they probably don't have real jobs.

Joking aside, social media activity for credit evaluations and other personal business, like insurance, would only be one of many factors in an evaluation. I can't imagine it having a great impact on the final score unless social posts reflected something like a disproportionate interest in gambling or driving 120 mph

Re: Pros and Cons
  • 10/17/2016 10:14:37 PM
NO RATINGS

..

Kq4ym writes


It may become a bit sticky trying to figure out what many social media venues tell about a propect of customer.


 

And in his blog article Bart Baesens writes that "customers can easily buy Twitter followers to boost their credit scores."

I had no idea that bulking up my Twitter following would help to boost my credit score. Makes me wonder how many Twitterati out there have been posting cute kitten photos and other fluff just to try to attract more followers so they could qualify for a mortgage on a bigger house ...

 

Re: Pros and Cons
  • 10/17/2016 2:34:23 PM
NO RATINGS

It may become a bit sticky trying to figure out what many social media venues tell about a propect of customer. And as pointed out  "Many countries prohibit the use of gender, age, marital status, national origin" as identifiers which may pose benefits and hindrances to the customers and busiiness interests.

Re: Pros and Cons
  • 10/17/2016 11:06:31 AM
NO RATINGS

@tinym, I see social media data as an enhancer. It gives a source of validation and affirmation of a,profile not a definite characterization of the subject. A bid iffy and discomforting as far as I am concerned.

Re: Pros and Cons
  • 10/16/2016 5:22:38 PM
NO RATINGS

I see an issue using social media data in this way - How do they account for accounts that use a persona rather than represent an actual person online? Many people present a narrow version of the true self online so as not to reveal much about them in real life.

Pros and Cons
  • 10/12/2016 4:14:06 PM
NO RATINGS

@Bart

Nice job laying out the benefits and challenges, really for the lender and the consumer. For the consumer the benefit is that they aren't being judged on just one or two data points. The irony is that the additional data points could have a negative impact on them if, for example, their social media activity reflects recklessness.

INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +