Product Manager & Machine Learning

mukund kannan
8 min readJun 5, 2024

--

All successful products are alike; every unsuccessful product is unsuccessful in its own way.

With pulls and pushes from various stakeholders and juggling day-to-day priorities without losing sight of the larger product strategy, a product manager’s job is tough. Tougher it gets when it involves latest technologies that seem to be stuck eternally in the narrow band between ‘mass media hype begins’ & ‘supplier proliferation’ in the Gartner Hype Cycle.

Well, now that I have managed to win some brownie points from you, Mr. Product Manager, let me provide some points for a new PM starting off on a ML based product journey. Primarily the difference between traditional software based approach to ML based approach is this: In ML, specific requirements and constraints are all contained within the data that needs to be prised out whereas in traditional software these are all upfront provided by a business subject matter expert clearly. Working in this opaque world is the single biggest challenge for an uninitiated PM.

Before anything else, probably the most important advice I can give to a PM is to get over the mental barrier of learning AI/ML. All the fancy terms, hyperbolic claims making it all feel like magic, heavy math, probability & statistics — it can surely be daunting. My experience has been to internalize the concepts through intuition based understanding. There is so much material available on the net where these are all explained in extremely simple terms. An aspiring learner should back her/himself up and get started.

Let us break the PMs responsibilities into three broad segments and discuss important aspects a PM needs to pay attention to.

Identifying & defining viable products & features

Here, the PM’s focus is to identify the products and/or features that will pretty much define the outcome of the entire journey. This involves maintaining product portfolios & feature backlogs and prioritizing them to maximize the value.

Please pardon the funny face here — blame it on GenAI (OpenArt.ai)

Typically, prioritization would involve a relative comparison on business value, risk of failure, cost of development and skill-set availability. Depending on the company’s business context & current focus, some of these get dialed up or down. In this list, for assessing the risk of failure and cost of development the PM has to have some bit of ML background.

Failure risk analysis: Data dependency, constant flux of underlying technologies, different development tools & approaches, risk analysis make ML projects different. Most of the ML product builds start off with some hypotheses that would require a POC as a safeguard against certain types of risks. This poses the challenge to the PM on how to plan for this — time/value boxing, how fast is fast in the ‘fail fast’ decisions, identifying interim milestones on slightly larger POCs, validation & evaluation frameworks to be put in place etc. A level of understanding of ML would go a long way in making these decisions in an informed way.

Cost of development is an important element in the assessment. The initial POC might give some idea of the effort required for some of the moving parts in the project. It will also bring out elements in the ML journey that need additional experiments.

ML is extremely experiment driven — not just in the model building phase; it is experimental through the lifecycle. Even data identification & gathering might be iterative processes. This is because, in use cases with higher number of features (or in unstructured data projects), it is very difficult to be able to confidently predict data sufficiency beforehand. Only during the hyper iterative model building cycle, these things tend to surface. So, with an experienced data scientist’s inputs a PM should make allowance for these in the planning stages.

A PM should focus on the following during the initial discussions with the data science and engineering teams.

Technology choices

  • Larger and more complicated models tend to be more accurate, but larger models tend to require more compute. Some of the larger NLP models (Google BERT for example) might work out very well for classification use cases, but each kubernetes pod might require high time (2–3 minutes) to spin up, which possibly is not acceptable in some time sensitive cases. Most of the LLM based models as of now require GPUs during inference making it completely a non-option for highly cost sensitive business scenarios.
  • Accuracy is typically reported in statistical terms. A PM has to be comfortable with these accuracy metrics. Most of them are very simple to understand, if we approach them through intuition learning rather than in pure mathematical terms.
  • While the data scientists report the accuracy only from the models’ point of view, the PM has to think in terms of end to end process flow. For example, a model with 95% accuracy might sound nice. But this metric is reported based on historical data. At runtime, unless there is a way to clearly identify if the prediction for the runtime data is a failure or success, the output cannot be trusted — bringing the effective accuracy down to 0! So, the PM needs to ensure that the design provides for high confidence predictions separated from those that would have to be routed through a fall out management workflow. Then the metric reporting has to be done for the success scenarios (as models never can predict with 100% accuracy) and the likely effort required to handle the fallout cases — like a human-in-the-loop workflow.

Maintainability

  • Traditional installed executable tend to perform consistently over time. But model performance tends to vary over time due to concept drift (where the model-build-time concept level assumptions change) and data drift (sub-population changes through time). What is the strategy in place to measure and handle the drifts?
  • In cases where there are multiple prediction pipelines (or hierarchical models) used for different scenarios, it can get very confusing during average accuracy calculations. PM needs to be sure that the evaluation is in line with the complexity of the model serving layer.

Overall cost of ownership

  • One of the biggest follies while using cloud is not factoring in the cost of development. One needs to plan for what is expected for development & testing (where typically the measurement is pretty relaxed).
  • Similarly at serving time, sufficient monitoring & comparison of the live volumes and load profiles against the original assumptions. Any large deviations in these would increase the burn rate.
  • Unless designed for the right load profiles and monitoring in place, serverless approach could cause unforeseen overruns.
  • Again, better observability systems will play a huge part in estimating and auditing the cost.

Observability

  • As mentioned above, in scenarios where inference pipeline orchestration is in place, a diagnostic audit might prove to be a challenge. In order to handle the above issues, a proper observability system should be in place.
  • In the MLOps implementations, observability integration would increase the feedback loop’s reliability significantly.

Defining success criteria

The PM has to define the requirements and lay down the success criteria for the product. A clear definition of acceptance criteria through the innovation value chain is a must for the overall successful outcome of the product. The last statement should not be read as a documentation diktat; it is a call for the PM to be part of the team through the lifecycle and help iterate quickly and clearly towards the successful completion. Please note that acceptance criteria at every level will also help with the fail fast strategies which are a must in the fast moving product development cycles.

Data scientists tend to be too focused on data and statistical interpretation of the model performance. A PM should be able to act as the bridge to translate these into business outcome terms.

A large laminate manufacturer in the US came to us with a problem that they were hoping would make a difference in their customer retention strategy. Using a mobile app, their customer would take a picture of a pattern that they like, the app would send it to a model, which has to identify and provide similar patterns available in the manufacturer’s inventory. The team came up with a Siamese network based approach. It was well received in the initial demos with the sales teams, but subsequently their manufacturer’s styling team did not approve as it did not meet their definition of ‘similarity’. Ideally the team should have consulted the styling team and come up with the process of evaluation & success criteria.

Remember, the ML systems work on averages and reporting metrics would be gross metrics based on the data scientists understanding of the relative importance of the features. A PM needs to be very careful in this as the cost of failure need not be not well understood by the data science team.

We had a project where the requirement was to take in a hospital discharge summary as input, extract relevant medical entities from the document and provide standardized medical coding as output. The team developed a multi-modal ML model based system that gave very impressive starting metrics. But when the averages were weighted on the total business value of the different classes, it was realized that the tail where the model had more failures had higher business value and that sent the team back to the drawing board.

Product build & execution

The PM needs to be very closely engaged in the development and execution of the product. Her/his contribution in decision making will guide the teams to navigate through the uncertainties that crop up during the execution stages.

  • ML teams would have to make choices on thresholds and additional validations to be performed to arrive at a proper prediction confidence strategy. Thresholding decisions will be based on factors like cost of failure, cost of manual fall out management, importance of the sub-population from business context etc. These are typically business decisions, not purely data oriented and the PM has a larger role in guiding the data science teams.
  • Given that training data tends to cover a small fraction of possible real-world scenarios, introduction of the model into production is typically done in a hyper vigilant manner. A PM needs to be part of the deployment decisions — AB, canary, blue-green deployments, control/treatment group proportion, scaling strategies etc. While these tend to be similar to any of the deployments of web modern applications, decisions are to be taken with a clear understanding of the data the model is trained on. Without this, the models might under perform at scale.
  • Monitoring a live ML model is very different from traditional software executable based deliveries. Model behavior might end up changing (drift — as explained earlier) over time. Data scientists will have to create a ‘baseline behavior’ and measure the drift at regular intervals. A PM needs to add in the business context to this very important step.
  • To ensure that the models stay within acceptable behavior boundaries, they need to be regularly refreshed by running retraining cycles. While this is a highly data science oriented effort, PM should be part of the discussions on the process to be followed.

Social and psychological acceptance barriers for an Artificial Intelligence based system are typically much higher, thereby making the change management planning for the ML systems that much higher. One of the ways to gain confidence of the business stakeholder is to let them interact with the model early in the cycle and let them gain some insights on the decision boundaries. This makes the models feel less of a threat. A PM needs to include this into the overall planning.

In the current world, a product that is of some value is going to have have a ML model embedded in it. The PM driving the product needs to be the one in charge of the critical decisions that maximizes the value for the product’s consumer. May the ML knowledge set you free.

--

--