You’ve developed a platform that’s gaining significant customer traction and enabling you to collect vast amounts of transaction and user data. Word gets out about your software, you acquire more users and feature requests start rolling in. As you develop and deliver those new features, you engage more users and collect even more data!
There’s tremendous value in that data, but limited thinking may be limiting your ability to mine it for the insights you need to further improve your product or even develop new ones that better meet the needs of your user base. Perhaps you’ve only gotten as far as creating simple plots and histograms around events, fault detection and other simple rules-based alerting and reporting. You know there’s so much more you can do with all that data — but how?
The good news is that advanced machine learning techniques are helping organizations unlock value from user data that, until recently, yielded only limited insights. Here are five actionable steps to take to nurture a data science culture and derive more value from your data:
Step 1: Embrace intellectual humility
You can’t get out of basic mode if you don’t embrace the “unknown unknowns” — the often-cited Donald Rumsfeld quote on not knowing what you don’t know. In other words, it’s OK to not know everything. Acknowledge that you don’t know everything, and trust that the data can help you figure out what might be of value.
In the age of big data, artificial intelligence (AI) and easy access to powerful computing resources, there’s no need to place artificial limits because of one’s bias. This bias can lead to non-representative datasets, poorly engineered features and/or algorithmic outputs that can produce highly misleading, sometimes catastrophic results.
Step 2: Learn from experts to develop a sound strategy
A typical engineering organization is not designed nor adept at data science; rather, it’s built for creating and executing on product roadmaps. By contrast, data science involves research and experimentation. Key aspects of a sound strategy include:
- Data collection, acquisition and roadmap creation
- Machine learning training pipeline
- Testing and verification methodologies
- A model roadmap
- A resourcing plan
Leading these activities requires a specialized skill set beyond product management, and you may need to leverage third-party expertise. They understand best practices and have real-world experience in solving data science challenges — and they can minimize risk and frustration.
Once you have an understanding of the dos and don’ts, you can decide whether or not to continue with third-party expertise or start bringing the skill sets in-house. You may want to leverage your trusted partner for interviewing the lead data scientist and the first few hires.
Step 3: Embrace experimentation
Our ability to collect all kinds of data, coupled with advances in machine learning and the availability of powerful computing resources, creates a perfect storm for advanced data modeling. Supervised and unsupervised machine learning techniques are being used with increasing frequency to help provide interesting insights and foresight.
Imagine a scenario in which you have no idea what your dataset is good for. A machine learning engineer can experiment with using various hierarchical clustering techniques to see if any interesting patterns emerge. Once they uncover patterns, they can use the clusters to develop supervised learning models and make predictions based on labeled data within the clusters. This can be done with no previous knowledge or hypothesis about the dataset’s potential value.
If the results are not satisfactory, you can either continue to refine the models and/or figure out what gaps you might have in your datasets that are yielding these results — a “data gap” analysis, if you will. This can help you augment your data acquisition strategy and create better training datasets through an iterative process that leads to better model accuracy over time.
Step 4: Adopt ‘productization’ as a forcing function
As your confidence grows around your “experiment,” you can begin to move the proofs-of-concept into engineering mode. Consider “productizing” your project for either an internal or external customer to encourage getting out of research mode and ultimately help make meaningful decisions that drive quality, velocity or quantifiable business metrics such as revenue, ROI or IRR hurdle rates.
Some examples of productized models can include:
- Integration into the current (internal) workflow
- Integration into an internal-facing tool that is used for decision making
- Integration of the model into an external-facing tool or product sold to customers
Productization involves features development, which requires quality assurance (QA), A/B testing, support, bug fixes and so on. Consider developing a holistic productization strategy and roadmap to align teams across the organization and encourage support for the project.
It’s worth noting that model testing can be difficult and requires a special skill set beyond the stereotypical product QA engineers. The simplest method is a black-box approach in which QA engineers use datasets that the models were not trained on to check the input-to-output space mapping; that is, given these particular never-seen input datasets, how accurate is the output?
Step 5: Think long-term
Just as it is with developing new products, developing a new machine learning model that delivers increasingly meaningful insights and foresight from your data requires ongoing refinement; it is an iterative process where a combination of improving datasets along with algorithmic and model refinements can lead to very satisfactory performance.
Data science can be a powerful and transformative arsenal to your business. A culture that supports the experimenting nature of data scientists is not easy for many organizations to embrace and support. Having a longer-term vision, developing an understanding and being purposeful on the strategy can have a significant impact on decision-making velocity and quality and provide significant competitive advantage.