Seduced by inflated promises, organisations have started to mine their data with state of art algorithms expecting that it is turned into gold instantly. This expectation that technology will act as a philosopher’s stone, makes data science comparable to alchemy. It looks like science, but it isn’t. Most of the algorithms fail to deliver value as they can’t provide an explanation as to why things are happening nor provide actionable insights or guidance for influencing the phenomena being investigated. To illustrate, take the London riots in 2011. Since the 2009 G20 summit, the UK police has been gathering and analysing a lot of social media data, but still they were not able to prevent the 2011 riots from happening nor track and arrest the rioters. Did the police have too little data or lack of computing or algorithmic power? No, millions have been spent. Despite all the available technology the police was unable to make sense of it all. I see other organisations struggle with the same problem trying to make sense of their data. Although I’m a strong proponent of using data and mathematics (and as such data science) for answering business questions, I do believe that technology can never be sufficient to provide an answer. Likewise, the amount, diversity and speed of the data.
Inference vs PredictionLet’s investigate the disconnect between the business goals and the data science efforts as mentioned in the HBR article. Many of today’s data science initiatives result in predictive models. In a B2C context these models are used to predict whether you’re going to click on an ad, buy a suggested product, if you’re going to churn, or if you’re likely to commit fraud or default on a loan. Although a lot of effort goes into creating highly accurate predictions, questions is if these predictions really create business value. Most organisations require a way to influence the phenomenon being predicted instead of the prediction itself. This will allow them to decide on the appropriate actions to take. Therefore, understanding what makes you click, buy, churn, default or commit fraud is the real objective. To be able to understand what influences human behaviour requires another approach than creating predictions, it requires inference. Inference is a statistical, hypothesis driven approach to modelling and focusses on understanding the causality of a relationship. Computer science, the core of most data science methods, focusses on finding the best model to fit the data and doesn’t focus on understanding why. Inferential models provide the decision maker with guidance on how to influence customer behaviour and thus value can be created. This might better explain the disconnect between business goals and the analytics efforts as reported in the HBR article. For example, knowing that a call positively influences customer experience and prevents churn for a specific type of customer gives the decision maker the opportunity to plan such a call. Prediction models can’t provide these insights, but will provide the expected number of churners or who is most likely to churn. How to react on these predictions is left to the decision maker.
Keep it simple!Second reason for failure mentioned in the HBR article is that data scientists put a lot of effort in improving the predictive accuracy of their models instead of taking on new business questions. Reason mentioned for this behaviour is the huge effort for getting the data ready for analysis and modelling. Consequence of this tendency is that it increases model complexity. Is this complexity really required? From a user’s perspective, complex models are more difficult to understand and therefore also more difficult to adopt, trust and use. For easy acceptance and deployment, it is better to have understandable models. Sometimes this is even a legal requirement, for example in credit scoring. A best practice I apply in my work as a consultant is to balance the model accuracy well against the accuracy required for the decision to be made, the analytics maturity of the decision maker and the accuracy of the data. This also applies to data science projects. For example, targeting the receivers of your next marketing campaign requires less accuracy than have a self-driven car find its way to its destination. Also, you can’t make more accurate predictions than the accuracy of your data. Most data are uncertain, biased, incomplete and contain errors, when you have a lot of data this becomes even worse. This will negatively influence the quality and applicability of the model based on this data. In addition, research shows that the added value of more complex methods is marginal compared to what can be achieved with simple methods. Simple models already catch most of the signal in the data, enough in most practical situations to base a decision on. So, instead of creating a very complex and highly accurate model, better to test various simple ones. They will capture the essence of what is in the data and speed up the analysis. From a business perspective, this is exactly what you should ask you data scientists to do, come up with simple models fast and if required for the decision use the insights from these simple models to direct the construction of more advanced ones.
The question “How to get value from your data science initiative?” has no simple answer. There are many reasons why data science projects succeed or fail, the HBR article only mentions a few. I’m confident that the above considerations and recommendations will increase the chances of your next data science initiative to be successful. Can’t promise you gold however, I’m no alchemist.