Download free DLP for AI whitepaper


  • Data volume and quality are key factors in training any machine learnings models.
  • Although banks have a lot of data, they face the daunting task of making data useful for risk management and trading.

We have built and/or integrated various machine learning (ML) models over the years within Financial Services. The success of these projects generally lie in tight requirements and good data quality. Following are some takeaways based on NLP work we did for an investment bank’s trading desk based in NYC (used with permission of course). Reason to delve deeper into this case study is to illustrate real-life challenges and explain why it will take time for Machine Learning and AI to be highly impactful on Wall Street.

Effect of sentiment from news sources on prices of trading instruments

Sentiment Analysis is defined as a: “process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.”

We built a Sentiment Analysis engine to highlight companies who are having a meaningful impact on sentiment based on newsfeed and Twitter feed data. This was then used to create a trading algorithm to capture alpha against human traders.

Delta (Change) in Liquid markets (such as Stocks, on-the-run Corp Bonds, Options, Commodities Futures etc.) is an amazing measure to detect Sentiment Analysis. In theory, Twitter, Bloomberg Breaking News, Reuters News feed should be the canary in the minefield for market-effecting input. A sentiment engine should be able to process inputs from the more ‘fast-moving’ sources of news and give an early indicator of upcoming change in price of a stock or a company’s bonds. Ideally this engine should predict the volatility change before the market catches on to it.

Volatility surface is the Wall Street version of sentiment analysis

Beauty of trading equity volatility is that one can dig below the stock price movement to decompose what is actually going on within a given company’s risk profile. Without going into the gory details of the ‘greeks’, its important to understand that vega is equivalent of SA in trading speak. Using options, one can bet not only on the absolute amount of movement of the instrument (bond, stock etc.) but also time when the movement is expected to happen.

‘Bot’ to trade various ETF Gammas using Twitter sentiment analysis

We trained a proprietary Sentiment Analysis engine that took Twitter News feeds, Volatility Surface (1 year out) for major ETFs and developed an algorithm to buy and sell gamma. In historic testing, the results were worth taking a punt on.

Some thoughts on our Trading Bot’s performance

Over a course of few months, we measured the PnL (profit/loss) of the ‘Bot’ against the human market-makers. I cannot disclose actuals here but can generically say the following:

1)     False Positives and Incorrect Signals became too annoying for the human traders. Although, we were able to eliminate > 70% of these relatively quickly, the process of cleaning the noise was too onerous to fully resolve. Successful Quant Hedge Funds have solved these issues but it requires a large team and frankly not many trading desks have the wherewithal and expertise.

2)     Non-tradeable signals: roughly one-third of the correct signals were too small relative to the bid/offer to be worth acting on.

3)     The Risk models worked more inline though for small magnitude of movements, due to the fact that PnL marking was done on mid-to-mid prices. In large movements though (where news was more publicly released such as earnings), the SA created more head-fakes than needed.

4)     Early morning/pre-open signals such as news from the Emerging markets (that were open when US markets were closed) were not useful. This was because in most cases the Emerging markets were taking a lead from the S&P the day prior and figuring out what moved first was like deciding does the tail wag the dog or the other way around.


Generically speaking, Data volume and quality are key factors in training any ML models. Although banks have a lot of data, they face the daunting task of making data useful for Risk Management and Trading. The difficulty of implementing a simple model with large datasets such as News and Market prices is only a microcosm of the wider challenges of finding good data to train ML models in Finance.

Polymer is a human-centric data loss prevention (DLP) platform that holistically reduces the risk of data exposure in your SaaS apps and AI tools. In addition to automatically detecting and remediating violations, Polymer coaches your employees to become better data stewards. Try Polymer for free.


Get Polymer blog posts delivered to your inbox.