Chief Data Scientist at Reorg, a global provider of credit intelligence, data and analytics, and Adjunct at UVA’s School of Data Science.
Finance is an evergreen field with an abundance of data. There are countless ways to create business opportunities by deriving meaning from financial text documents using novel data science methodologies and approaches.
Data science is a fast-growing field with ever-advancing methodologies and tools. The application of data science in finance can be highly rewarding by not only identifying lucrative opportunities but also identifying financial or credit risks and communicating insights in a timely manner with users to maximize information utility.
In this article, I will highlight five applications of data science in finance as we have discovered at Reorg.
1. Stop sweating the small stuff.
Complex, large models are not necessarily required for data science to have an impact in the financial sector. Identifying bottlenecks in workflow processes and using simple models that help internal stakeholders do their jobs more swiftly and effectively helps to prevent fatigue and increases the potential value that can be generated per hour. For example, financial analysts look at data every day. Part of that involves repetitive tasks such as locating fundamentals and converting them into appropriate currencies and units. Such tasks can be automated by building information retrieval (IR) models using natural language processing (NLP) techniques.
At Reorg, we process large text documents such as bond and loan documentation to identify data of interest and convert that text from unstructured into structured data. This helps in streamlining the workflow processes of our analysts by reducing the amount of manual keyword searching required when sifting through the vast number of documents that come in every minute.
2. Bring order to chaos.
Legal, financial and editorial teams at my company who generate credit intelligence are vigilant looking for the latest scoop. The challenge is the volume and frequency of financial reporting data, which comes in multiple forms and from multiple sources. The teams work to synthesize, organize and process the data, drawing inferences and publishing pertinent intelligence and analysis for our subscribers. It is valuable to work with stakeholders to build decision support systems by training data science models that can learn how to perform recurring actions from these processes.
Imagine there are tens of thousands of documents coming in daily, but only about 10% of them are useful. Typically, the team must diligently open every document to look for important ones. Borderline cases that might contain valuable information can require further analysis, and this extra decision-making can act as a bottleneck in the process. A machine learning model can be executed that can read the incoming documents in real time and classify them into different buckets to establish an order of flow – “ignore,” “review” and “important.” This system will save time for the team, so they don’t have to worry about the “ignore” bucket. They can focus attention on the “important” documents first and “review” the ones that need more attention later.
3. Cast a wider net.
Data science models can increase the scalability of existing business processes. During earnings season, there is an influx of data that can overburden teams beyond their capacity. This can lead to narrowing of the financial coverage area at a time when information is particularly valuable to subscribers. Machine learning models work tirelessly and can be especially helpful during busy times.
Following the above example, the teams can focus on processing the most important parts of the queue in the “important” and “review” buckets while the model continues to examine all documents. Without this machine learning model support, the teams might have to limit the documents they examine to get the best value from their limited time.
4. Discover untapped opportunities.
When clerical tasks are automated and data inputs are cleanly organized in real time, this creates an opportunity for deeper analyses to be performed. These deeper analyses have the potential to identify previously unrecognized patterns in financial data, predict risk and detect high-yield credit prospects in new ways.
At Reorg, as part of identifying which SEC filings are “important,” it became crucial to identify credit risk factors noted in those text documents. Apart from adding value to our intel and highlighting credit risks, the model also collects this data historically and can be used to create a timeline of changes in credit risk. This can provide additional insights into a company’s performance over time and allow further examination of overall credit risk, painting a bigger picture.
5. Predict the unpredictable.
There are some problems that could be lucrative to solve, but it is nearly impossible to do so. It is not necessary to completely solve the problem to unlock valuable opportunities. A middle ground that takes a step toward a possible solution is significant. Attempting to build a model that predicts something that is uncertain can lead to other possibilities.
One approach when trying to solve a complex problem is splitting the problem into smaller components and building sub-models. If I am trying to predict bankruptcy, there could be a series of sub-models that work on the sentiment of earnings, call transcripts, previously identified risk factors and language related to staff changes, for example. These outputs altogether can be reported as the chance of a company filing for bankruptcy. Here, the intermediate outputs can give more insight than the overall output.
Though the final output could have false positives, those are there for a reason, assuming the model is trained correctly and tuned sufficiently. Those false positives could also reveal information that can catch us by surprise. For example, when predicting bankruptcy, a false positive could mean that the company did not actually file for bankruptcy, despite the presence of strong signals that they could be in the process of doing so.
In sum, data science can have a rich variety of useful financial applications. These applications can range from a complete product with high accuracy, an intermediate decision-making tool or simple automation of clerical tasks.