The Fourth Way to Practice Data Science – Purpose Built Analytic Modules

Uncategorized

The Fourth Way to Practice Data Science – Purpose Built Analytic Modules

MMS • RSS

Article originally posted on Data Science Central. Visit Data Science Central

Summary: Purpose Built Analytic Modules (PBAMs) such as those for Fraud Detection represent a fourth way to practice data science, a new model for the good use of Citizen Data Scientists, and a new market for AI-first companies.

It appears that data science has exited its age of exploration and entered into its age of consolidation and refinement. That doesn’t mean that we aren’t making improvements but increasingly these tend to be incremental and not the big exciting break throughs we made through about 2016. More like a mission on the International Space Station and less like those original walks on the moon.

Nothing wrong with that. In fact in our maturity we’re doing two things at once, becoming more specialized and at the same time setting a larger place at the table for new members, representing the team sport that data science has become. No more unicorns, and at the same time achieving what we wanted all along, much wider and deeper adoption of advanced analytics by companies of all sizes and types.

There seem to be four distinct schools emerging for how to practice data science. No they’re not exclusive. There’s plenty of crossover. But see if you don’t recognize these as four fairly unique tribes of practitioners.

Write the code: Do it from the ground up in Python, R, Scala or whatever you like. Make it just the way you like it.
Drag-and-Drop: Plenty of platforms offer the efficiency and simplicity of drag-and-drop environments (e.g. SAS, SPSS, Alteryx, etc.). Many use cases can get the same level of accuracy much more quickly, with fewer resources, and greater standardization.
Automated Machine Learning (AML): Skip the drag-and-drop altogether and let ML tune and select the champion models. Some now include even data cleaning and feature engineering. (e.g. DataRobot, Tazi, etc.).

And now a fourth category that’s existed in plain sight for some time.

Purpose Built Analytic Modules (PBAMs): Highly tuned special purpose modules such as those for fraud detection. These are practically plug-and-play in the industries and applications for which they’re targeted. And they allow Citizen Data Scientists (aka business analysts and some LOB managers) to operate advanced ML without the need to extensively configure the underlying DS techniques.

Fraud Detection as the Paradigm

PBAMs for fraud detection have been in the market for several years but it was the most recent Forrester report on Enterprise Fraud Management applications that caused me to stop and reflect on this.

As we all recognize, you can build your own fraud detection program from any number of platforms and techniques, including coding it up from scratch. Graph databases, various anomaly detection routines, good old fashion supervised models, and now even deep learning techniques can all be harnessed for this.

If you identify as a data scientist you may scoff a little at this. Pushing the buttons on a highly simplified UI and reading preformated reports and dashboards may not seem much like the data science we’re used to. I get it. But frankly we’re going to see more and more of this.

If you’re old enough to remember business in the 90s, this is the ERPification of the reengineering movement that was originally built entirely on custom code. Sorry for the made up word, but the continued simplification of DS into simplified analytic packages is a trend we will not be able to resist.

Market Drivers of Enterprise Fraud Management

This is a product that appeals to banks and financial services that provide credit, especially consumer. If you issue a credit card, or for that matter if you accept credit cards, the whole profitability of your business can turn on these losses.

In financial services the benchmark is 3 to 6 basis points for card fraud and 10 basis points for non-card fraud. Now those are extremely small percentages and you want to make sure you’ve got absolutely the best detection system there is.

Those basis points of actual loss are only the tip of the iceberg. There are very significant costs associated with maintaining a staff of fraud investigators who review and approve pretty much every item flagged by the system. High false positive rates balloon these costs and also chase away your good customers when their legitimate transactions are falsely flagged as fraud.

In other words, you want world class data scientists to build those for you and keep them updated. The question is, are your internal data scientists up to that level and is do-it-yourself really the way to go?

PBAMs Require Highly Standardized Processes Across Similar Businesses

The good news is that the business processes of banks and non-bank card lenders and acceptors are extremely similar, allowing for significant standardization of the module.

Yes the data inputs are going to be somewhat different and in fact, users of these PBAM fraud modules report that implementation adds 20% to 35% to the total project cost. But hey, if you’ve ever implemented an ERP that ratio if more like 100% so these numbers look pretty attractive.

The standardization is critical since these have to be extremely easy to use. And that’s because the users, Security & Risk professionals (S&R) are not data scientists. But they are the very definition of the Citizen Data Scientist.

Forrester and Enterprise Fraud Management (EFM)

Here’s a little of what Forrester had to say.

Old rules based detections are dead and gone. ML is where it’s at.

Although the S&R professionals that are the primary users are not data scientists they are quite tuned into the techniques used to identify potential fraud. They need the ability to further tune the sensitivity of the system which means understanding, if not building, the underlying data science.

This means the EFM can’t be just a black box. They have to provide for some user adjustment. They have to have sufficient transperancy to explain themselves and also offer trends. Many of these institutions are subject to regulatory audits in which they have to not only ‘show their work’ but also show that they’re getting better over time.

Here’s the Forrester Wave chart for Enterprise Fraud Management Q3 2018.

What This Tells Us About the PBAM Market Segment

There are several things interesting about this result. First of all, there are only seven vendors that met the minimum requirements for inclusion. They range from specialty analytics companies like NICE Actimize, a smaller company focused on fighting financial crime, to the bigs like SAS and IBM which promote themselves as one-stop-shops for everything data science.

Second, it’s interesting that the competitors are both small specialty companies and just a few of the large providers of comprehensive analytic platforms. I would have expected to see more of the large competitors, and especially the drag-and-drops like Alteryx or the AMLs like DataRobot enter here. And where are Amazon, Microsoft, and Google that should be able to bring significant AI resources to bear on this?

I think what this is telling us is that the movement to PBAMs is fairly new. I would have expected a much deeper bench of competitors, both large and small. I see this as an emerging trend and an opportunity for AI-first startups to establish a competitive position.

Isn’t This Just AI?

Using the broader consumer definition, AI is any predictive analytic function that has been fully automated to provide a valuable action that replaces or augments a human action. Some of these do indeed involve our new deep learning capabilities of image and speech classification, but many can be created from classic predictive analytics, now allowed to take an automatic action.

The easiest way to understand this is to simply see AI as a logical extension that started with predictive analytics (what is likely to happen), to prescriptive analytics (what should happen), and now to AI (the automation of the optimized prescriptive analytic).

The key to the difference between AI and PBAMs is the automation. AI as we understand it is fully automated. Our deep learning AI looks at a part on an assembly line, detects that it is defective, and removes it. Blue River, the company that has automated ‘see and spray’ optimization technology in agriculture uses image classification to inspect a lettuce plant and instantly take action to fertilize it or kill it.

PBAMs however allow the user some control over the underlying analytics, and provide them with output that is typically evaluated with human judgement, not fully automated. In fraud, the control allows for adjusting the sensitivity of certain cases. Although specific transactions may be automatically held up, almost all are passed by a human S&R reviewer before action to reject is taken.

Tools for the Citizen Data Scientist (CDS)

Ever since the phrase ‘Citizen Data Scientist’ entered our lexicon we’ve been gnashing our teeth over what it means to give non-data scientists access to data science tools. Originally it was almost unthinkable because of the experience and expertise required to prevent amateurs from making really serious mistakes with big financial consequences.

The analytic platform vendors have continued to embrace the CDS market. Their motive however is that there are far more CDSs than data scientists and therefore many more seats to sell. The drag-and-drop and AML markets have at least in part been driven by this, in addition to legitimate goals of efficiency and standardization.

It’s old news that there are not enough data scientist to go around. That’s one part of it. The other part however is that the cadres of business analysts have been looking for a legitimate role in the advanced analytics world, beyond their traditional BI domain.

Businesses themselves have been embracing this. Just this week I hosted a DataScienceCentral webinar in which the centerpiece was a case study of Shell Oil that has gone out of its way to leverage the CDS capabilities of analysts and LOB managers based on a sophisticated structure of support from their data science COE. They are not alone.

Data scientists need to do what we do best. I’m thinking that PBAMs, purpose built analytic modules are the foundation for expanding the role of citizen data scientists surrounding the rare and more expensive data science core.

Fraud is perhaps the most obvious specialty area. It’s likely though that other PBAM process targets will arise providing companies with new and safe ways to leverage their citizen data scientists and provide both large and small AI-first vendors new markets to dominate.

Other articles on AI Strategy

From Strategy to Implementation – Planning an AI-First Company

Comparing the Four Major AI Strategies

Comparing AI Strategies – Systems of Intelligence

Comparing AI Strategies – Vertical versus Horizontal.

What Makes a Successful AI Company – Data Dominance

AI Strategies – Incremental and Fundamental Improvements

Other articles by Bill Vorhies.

About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001. He can be reached at:

Bill@Data-Magnum.com or Bill@DataScienceCentral.com