Solving process failure mysteries with ML & Data Visualization.

Independent of industry such as banking, healthcare, manufacturing, logistics and others there is always one core performance metric which impact customer retention & and strong repeatable business. Sometimes it is observed that there is repeated incidents where a process performance suffers and its difficult to track down the exact reasons. The repeated process failures causes customer complaints and impacts repeat business. In such cases it gets super important to track the root causes causing this metric to suffer.

Examples of core performance metrics

  • Healthcare                       –              On time surgery / procedure ( Y / N )
  • Aviation                            –              On time flight departure ( Y/ N)
  • Manufacturing               –              Daily Quality target met ( Y / N ).
  • E-commerce / Logistics  –              On time delivery ( Y / N )

Independent of the type of business as a leader or manager a simple 2 phased project can be devised to solve such process mysteries.

Phase 1 : Data science. The objective of this phase is to model the failure rate and parameters responsible.

Phase 2: Data visualization : Show exact co-relations and enable leadership to clearly see the relationship and associated impact.

Phases in the Data Science project.

  • Identification:  Put together possible independent parameters which are involved in deciding the above metric. Eliminate parameters which are very closely connected.
  • Mapping : Put together a Input / Output map which connects the input parameters to the output one as shown below.  Some very important factors to consider are the ones I have laid out below.
    • Alternatives – A parameters like a specific crew / driver for flight / delivery. Changeable parameter which can help positively change the metric.
    • Baseline – Parameters which you can’t change but can help account for the problem to happen, like  hospital department, origin airport, bank location.
    • Category – Product/service category where the problem occurs.
    • Date – When the problem happens. Time patterns.
    • The actual core metric – Which has a Y /N (success / failure) impact.
  • Collection : Collect as much historical data if possible over years and thousands of transactions where the above inputs and output can be put into a table.
  • Modelling – Use two factor regression tools like  Logistics regression, Boosted decision trees, two class neural network models or others to check if you can arrive at a good prediction model which can predict the output which a solid success rate.

Its not too difficult to execute the plan above. Your technical team can follow the below links to learn more from Microsoft & my other blog can also help provide more detail.   

GuideLink
Microsoft cheat sheet to pick the best Two class prediction model.  https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet
AIML Steps to run the prediction project.  https://bigideas688018338.wordpress.com/2020/05/23/making-smart-decisions-jump-start-using-azure-machine-learning/

              TARGET of DATA SCIENCE PHASE: Once the predictability model with a good accuracy is established with the right parameters, the Business leaders should identify and filter the parameters which have the maximum impact on the result. These are the possible root causes. An example shown below.

  • PHASE II -Data Visualization : This is the moment of Truth phase which would show how the result varies based on actual above parameters. Follow the steps below
    • Collect : Collect  data with the specific maximum impact parameters above and the Output metric.
    • Visualize : Connect the data with the Data visualization tool like Tableau and with Time (day of the week parameter) see how the variations occur, i.e  how many transactions actually fail with the above parameters.
  • Analyse : The sample visualization above shows 2 graphs depicting number of transactions (Y axis) which fail for specific product/service categories on specific days(Tuesdays in this case) when the value of one of the Alternatives >= ‘a specific value’.

TARGET of DATA VISUALIZATION PHASE: Once the specific impact parameters and associated values ( example why only on Tuesdays is the value of the impact parameter, exceeding certain limits causing performance metric to fail ) which cause the issues are visible the root cause and process fixes become easily evident to fix the mystery behind the process failures.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s