Based on some of our recent research and learnings, here are some of the most common use cases of Artificial Intelligence and Machine Learning techniques used in finance. This link is a good introduction of machine learning. This article is good read for both technical and non technical people.
1. FRAUD DETECTION AND PREVENTION
Fraud detection is topic that is applicable to many industries like Banking, Finance, Insurance, Government policies etc. Recent rise in digitization fraud is becoming the major problem that finance and banking institutions faces. But these frauds are really small compared to the non-fraud cases, that is why detecting fraud from a massive amount of online transaction data is not something that humans can do easily.
In traditional approach, logical statements and summation of weights were used to assign a fraud index to any particular payment methodology. For example,
If a vendor is 6 months old, add 5 points. If 12 months old, add 2 points.
If transaction amount is twice the last year, add 4 points. If thrice the last year, add 6 points.
Similar rules for size of transactions, mode of payments etc.
These scores are finally summed up and magnitude of score determines the risk score. As we can see, although this method works but generates lot of false positives for new upcoming legit vendors and false negatives if any big company do fraud once in awhile.
To overcome issues with traditional approach, various techniques of data analysis are used these days. They are divided into two categories,
Statistical Techniques (for Data Preprocessing)
Machine Learning (for Data Evaluation)
Statistical Techniques include computing user profiles, calculation of various averages (e.g., time of call, delay in transaction etc.) These make the labels for our machine learning algorithms to be used for Data evaluation.
Supervised machine learning approach is commonly used for fraud detection. To use this approach, we must have quality data. Data must contain the features on which the final output depends. In fraud detection it can be name of vendors, details of transaction like date, time, location, bank name or source name so on and so forth. Output or labels is true or false value to indicate fraud or non fraud transaction. Sample data is used for training machine learning model. After training model can be used for future prediction.
Detecting a fraud is a classification task. So we can use classification model for it. There are plenty of machine learning models for classification task. Some of them which shows promising results are as below
Logistic regression
Support Vector Machines
Neural Networks
Naive Bayes
There are different variants of above models as well which can perform really good.
This link made for fraud detection challenge on kaggle. It employs three methodology k-means, logistic regression and deep neural network for the same task. Overall result was neural network was the highly accurate but with highest false negatives. K-means was least accurate with 54% accuracy. Logistic regression was the most accurate with 99.88% accuracy on the kaggle dataset of Credit Card fraud.
2. STOCK MARKET PREDICTION
With rise of technology everyone wants to trade smart, especially in stock market. Stock market is regarded one of the best investment strategy in 21st century.
There the three methodologies for stock market prediction:
Fundamental analysis: This focuses on the company itself, its past performances, total revenue, profits per year etc. As this is the classical method of stock prediction therefore machine learning techniques are not so much found in this methodology. Though if we are to look at 20-25 fundamental features to predict prices, it is impossible for human to remember and interpret all these methodologies, here models like neural network come to the rescue.
Technical analysis: This doesn’t focus on the company or the market. They predict future prices of a particular stock solely based on the past trends. Various statistical methods are employed for this e.g., time series analysis, exponential moving average. This is a case of supervised learning and thus KNN (K nearest neighbour) and Decision Tree are frequently used for this type of analysis. This link describes neural network for time series forecasting.
Technological methods: Most prominent algorithms used are Artificial neural networks (Backpropagation) together with RNN(recurrent neural network) and Genetic algorithms. Unlike other techniques, here text mining is used excessively together with other numerical features for the market prediction. Most prominent ones are the news from google finance or Twitter. We can use techniques like sentiment analysis from NLP for scoring different words found by text mining and train how these data are connected with rise and fall of stock prices used to predict future prices based on today’s news. This link contains a code repository showing the prediction based on sentiment analysis. Efficient-market-hypothesis states that stock prices are too random and are immediately reflected based on any decision by the company. Therefore based on time series data, it is impossible to predict the market. Therefore techniques like text mining and sentiment analysis are highly efficient and useful here.
In general, in order to predict stock prices, the data must have all features on which the prices depends like companies current status or revenue, past trends in prizes, current economic status of company etc. Also it must contain the prices or labels, so that algorithm can learn from it. Some of the common model used are :
Neural Networks
K-Nearest Neighbours Regressor
Decision Tree Regressor
AdaBoost Regressor
Bagging Regressor
Gradient Descent Regression
Different variant of above models can also be used.
Here is the one github repository in which, the author has used the above mentioned algorithms for predicting stock market pricing. In the project they have successfully tested algorithms for predicting stock prices. Out of them the Bagging Regressor performs well as Bagging(Bootstrap Sampling) relies on the fact that combination of many base learner reduces the error significantly. Therefore we want to produce as many base learner as we can. Each base learner is generated by sampling the original data set with replacement.
This is one Kaggle kernel link for stock market prediction. The author has used LSTM networks to predict the future stock prices.
3. ALGORITHMIC TRADING
This is the technique which is been in finance from 1970’s. Algorithmic trading is technique that is used by many financial company to automate their finance decisions and trades. As we know that computers can do thousands of calculation in less span of time with really high efficiency, it is very efficient and robust way to go for financial trading. Algorithmic systems often making thousands or millions of trades in a day, hence the term “high-frequency trading” (HFT), which is considered to be a subset of algorithmic trading. There are lot of inputs to the system at a time so that it can learn from it and can predict future tradings.
All the Algorithmic Trading Strategies that are being used today can be classified broadly into the following categories:
Momentum/Trend Following
Arbitrage
Statistical Arbitrage
Market Making
In short there are few efficient methods that can be used. But with the recent rise of AI, nowadays many companies use AI for algorithmic trading. Though none of them openly talked about it.
JPMorgan, Bank of America, and Morgan Stanley are developing automated investment advisors, powered by machine learning technology. Other fintech companies will likewise follow suit.
Common machine learning and deep learning models used today:
Recurrent Neural Networks(RNN).
Long Short Term Memory Network(LSTM)
Ensemble algorithms
Support vector Machines
Here is code example from GitHub in which author has used some of above algorithms topredict trades in finance. In the results it shows that SVM (Support Vector Machines) works the best, LR second and better than Random Forests. Out of the algorithmic strategy that we use in market the hybrid of momentum and RSI works the best in the mentioned dataset in repository. Though it depends upon the dataset that which model will work better.
Here is another example on algorithmic trading. The dataset is published in Kaggle. It has data related to Japan trade trend from 1988 to 2015. The solution to the problem is here. The author has used Python to find the patterns in the data by plotting it.
4. CUSTOMER SERVICE AND RECOMMENDATION
Customer service and recommendation are the key contribution in growth of some financial company and banks. With the rapidly changing market it is very necessary to recommend best investment strategy to customer. There are banks which use recommendation system to recommend what kind of investment should customer make?, what kind of saving strategy should they use?,what kind of credit card plans suites them best? etc .
Virtual assistance and chatbots are very common these days like Siri, Cortana, Google Assistance etc. They are not just for better customer support by these tech giants but can also be used in finance domain. There few tech companies using these chatbots to provide good customer service. Customer can ask questions to them related to investments, trends, savings, loans, insurance plans etc, so that they can keep the good track of their financial situation.
The data for virtual assistance and chatbots is conversation that happens between customer and customer care service executives. It can be a formal or informal talk between banking advisor and customer as well. The conversation between two people can be feed to neural networks and then eventually it will learn to give good answers to the proper question. Recurrent neural nets are very common for this purpose.
Here is the link to GitHub repository of LightFm, a python library which can be used for developing hybrid as well as an end to end type of recommendation systems.
In this repository we can find resources like research paper,blogs,tutorials etc on how deep learning is being used in recommendation system. There are tons of techniques and different company uses different techniques as per their convenience and requirement.
5. RISK MANAGEMENT IN BANKS AND FINANCIAL INSTITUTIONS
Risk management is another field where AI can do great things. By knowing the information from not just inside of company but also from outside customer, an AI can help in better risk management and prevention of future losses. Brilliant research on the future of risk management in banks from McKinsey states that “risk management is crucial for success and financial health of the banks and other institutions; ML is the key element, allowing to evaluate all the factors to make a well-grounded decision, and every new piece of information processed by the algorithm makes its predictions more accurate.”
By applying predictive analysis to huge amounts of data in real time, AI technology can detect rogue investors working in unison across multiple accounts — something that would be nearly impossible for a human investment manager. Efficiency is another major plus point of AI as the computers are way more faster than humans in so many ways. The past data of company and customers can provide really good information to AI to learn risk management. Some of the machine learning models that can be used :
Recurrent Neural Network(RNN)
Long Short Term Memory Network(LSTM)
Logistic Regression
Ensemble Techniques
Here is the repository from Github which shows how credit risk management can be done using machine learning. In the repository we can see use of Logistic regression, Gradient Boosting and ensemble algorithms showing good results. Tech giants like IBM are developing their own risk management apps to help the finance company and banks in better risk analysis. Here is the github repository of project in which they have demonstrated how companies can build their own apps using their services like IBM BlueMix, IBM watson etc.
6. NETWORK SECURITY
Network security is really a big issue not just for tech companies but also for financial institutions. The moderns cyber attacks are so powerful that many reputed institutions like U.S. Department of Justice, the U.S. Democratic National Committee (DNC), the Internal Revenue Service, Snapchat, LinkedIn, and Oracle are defenseless against them. This is the place where AI can shine against conventional techniques and methods.
Finding a malware and a security threat can be much more effective using modern day AI techniques. Anomalous patterns in data can be easily detected and can be interpreted as threat by AI much more efficiently then any human.
Tech giants like Microsoft, IBM, Amazon etc are heavily investing in AI which can prevent any kind of cyber attack. The data needed for this methodology is the past attacks and patterns related to it. Any other information related to malware and cyber attack also can be helpful. Some of the algorithms that can be used are as below:
XGBoost
Neural Networks
Microsoft has launched a malware detection challenge using machine learning on Kaggle. Here is one of the good solution to that challenge on Github. Author has used XGBoost classifier for it. He has done good work on feature selection and feature engineering as well.
Watch This : https://www.youtube.com/watch?v=Z5vxRC8dMvs