PI Anders Løland co-PI Ingrid Hobæk Haff
Fraud is expensive, affects common resources and prices and is therefore important to detect and prevent. Soft fraud, the exaggeration of legitimate claims, is quite diffuse and difficult to spot. A sustainable welfare system and efficient insurance operations require implementation of effective measures to limit fraud. Tax avoidance and tax evasion are other important types of fraud. We are also interested in money laundering detection. We develop adaptive tools that use “all data”, including payment logs, relational networks and other available digital records, but under strict privacy protection regulations.
A further objective is to combine the multitude of fraud detection models in an optimal way, taking advantage of the strength of each predictor while blurring away weaknesses, and still obtaining coherent quantifications of the uncertainty in the fraud prediction. A similar objective is the development of new individualised anti-money laundering solutions. So far, the detection of suspicious transactions is based on labour-intensive semi-manual approaches and restricted to customers who significantly differ from the norm. Since the volume of banking transactions is steadily increasing, automated, intelligent tools are needed. The aim is to significantly increase the number of correctly identified money laundering transactions.
Ensemble methods for fraud detection
Fraud detection can be seen as a regression/forecasting problem, where fraud (true/false) is the response, possibly with a potential economic loss, and there are very many covariates. Including interactions, the number of covariates is huge. Generally, there are few fraud cases that are investigated, and a great number of undetected cases exist. The objective is to produce a trustworthy probability of fraud for each case. Many statistical and machine learning methods already exist. Combining results produces better results. We construct a toolbox for combining fraud forecasting models, exploiting both the time series aspect of the data and the covariates, in addition to the probabilities stemming from each individual model.
Text-mining for fraud detection
In addition to ordinary variables (age, demography, background, behaviour, etc.), a potential fraud case can be accompanied with a variable amount of text, for example the policy holder’s description of a claim or an officer’s summary of a case. These texts are informative for a human eye searching for fraud. The objective is to exploit recent advances in text-mining to produce text related features that can be used in the statistical models, and to investigate the added value of these.
Network analysis for fraud detection
Fraud can be viral, spreading directly or indirectly from one fraudster to others. Exploiting knowledge about social relations can be useful. Understanding how such networks of users look and evolve over time is expected to significantly improve fraud detection models. We build these networks and extract useful characteristics to produce better fraud forecasts and provide additional insight into how fraud spreads.
- Text mining seminar held, but text mining analysis postponed due to lack of data
- Prototype toolbox for fraud detection
- Initial misreporting of VAT and prepared risk for future misreporting of VAT project
- Prototype detection of money laundering and detection of insurance fraud
- Continuous work on research applications for access to various data
- Seminar on network analysis for fraud detection
- Methods and tools for variable selection
- Exploit network relations between individuals and businesses in statistical models
- New, local discriminant methods
- Fraud detection toolbox improved, with both ensemble and network models
- Misreporting of VAT and risk for future misreporting of VAT
- Detection of money laundering, insurance fraud and social security fraud