While Machine Learning (ML) is a hot buzzword, it is not a replacement for traditional analytics. ML and traditional analytics are really complementary technologies. For example, an organization might use traditional analytics to look at sales by region or customer demographics such as age or gender. As they move up the analytics maturity curve, they might use machine learning techniques such as clustering to find categories of similar customers, such as women with children under 12 that live in the suburbs. This can let the organization better market to the identified clusters.
A great example of machine learning is a recently published study from China where blood test results, symptoms, and other electronic health record data was used to train a machine learning model to accurately diagnose illnesses in pediatric patients.
However, business is complex and is affected by countless factors including its own actions, competitors, demographic trends, the economy and more. This makes it impractical for everything to be quantified to allow a machine learning model to spit out an answer. For example, if an executive sees market share is lower in one region than others, sometimes the old-fashioned approach of talking to people in the field will still be the best way to determine why.
The volume of data will inevitably keep growing, but managing and analyzing it will become easier. For example, analyzing and cleansing large data sets traditionally required complex scripts. New and enhanced tools will reduce that effort. For example, Microsoft is enhancing PowerBI to better support big data scenarios. This will allow non-programmers to cleanse and analyze huge data volumes with just a few clicks.
In order for machine learning to be successful, there are a number of challenges to consider:
- Common mistakes from the past that continue to be made:
- Over-fitting / fitting to the sample data. Result: machine learning models fail to perform in real world operation despite showing great promise during construction and simulation
- Using the wrong machine learning technique for the problem space. Result: modeling exercise fails and machine learning in general is deemed unsuitable for the problem
- Failing to de-normalize data based on known offsets and scaling relationships. Result: modeling exercise fails and machine learning in general is deemed unsuitable for the problem
- Confusing correlation with causation. Result: model fails to perform in real-world context.
- Failure to use bi-temporal data models when appropriate. Result: Machine learning models show great promise, but ends up relying on data that is not actually available in real-time ahead of the event that is being predicted.
- Ignoring the possibility of “black swan” events. Result: models appear to work in the real-world and may even perform well – until some out-of-sample dramatic shift occurs in the system that could not have been predicted solely by the machine learning algorithm training on the data. This can result in disaster, because the algorithm may have become “trusted” and already been implemented at large economic scale.
- More subtle problems that are emerging:
- Basing machine-learning strategies based on open-loop assumptions where the implementation of the methodology itself has potential to create a feedback effect. This is particularly significant with systems involving human decision-making subject to behavioral change. Result: the implementation of the machine learning system causes people to alter behavior – and the altered behavior “breaks the model”.
- Failure to recognize (for some micro-economic domains) that machine learning is an arms-race within the context of a zero-sum game. Result: Competitors implement machine learning techniques within the same time frame, each one predicting beneficial outcomes based on the previous status-quo – i.e. they expect sales or profits or market share to increase by some amount. Instead they end up in place, possibly deeming that machine learning has been a failure, when in fact it has enabled them to retain revenue/profit/market share that they would have otherwise lost to competitors.