Predictive Analytics: Models, Models Everywhere

The race is on to make the access to data-driven function points as easy as traditional function points.

Machine learning and other statistical techniques for predictive analytics have grown up in a silo. That silo is both cultural (hence requiring a cognitive and semantic phase-shift) as well as technical (requiring different languages, platforms and runtime-environments).

ML Coming to a Stack Near You

Tearing Down the Walls

These barriers are beginning to break down.

Tools like Azure Machine Learning Studio and Amazon Machine Learning are bringing this capability into the mainstream development community. That process will continue and will require an ongoing training effort.

But to date, the effort to bring the results of those efforts into the software stack are high. To be sure, you can call an Azure Web Service or Amazon API, but for many there are business and technical reasons this is a “no-go”.

Microsoft’s recent move to bring R models into SQL Server was a huge step. Stored procedures have a long and glorious tradition within the software stack – particularly in enterprise solutions.

But the move to SQL Server can only be seen as a first step. There are two big problems here:

  1. This supports R Models only.Models created in Azure Machine Learning Studio cannot be ported into SQL Server. Certainly, you can build, train and evaluate your model in Azure ML Studio till you get a great model and then re-build it in R – but that is just the kind of extra-step that hinders adoption.
  2. SQL Server is often not the database being used.

It’s Not Enough

So why can’t I have my model where I want it?

From a technical viewpoint, there is nothing here that restricts the runtime. (I may be wrong here, and would be happy to be enlightened.) The data is required to train the model but not for its execution.

Stored procedures are great – but that is actually not what models are. They do not require proximity to large data sets or a specialized execution engine.

It’s just a function!

What is the problem here? We know how to create portable software.

Fun Times Ahead

IMHO, this is a short-term engineering issue and we will soon see model availability change rapidly.

Microsoft’s announcement last week at Build 2017 of Azure IoT Edge is, again, a stop-gap solution. An early look shows some crazy-cool functionality but it is a niche solutions that is quite heavy and has significant platform dependencies.

Models need to be portable in the same way that other functions are portable.

In the end, we will just have two ways of building functions:

  1. We can write high quality functions by deriving algorithms from the heads of experts (or becoming an expert ourselves)
  2. We can derive models from data (likely with the help of experts) and build functions to interrogate the models

Models Everywhere

Once we can do that on any platform, within any stack, and in any layer we will be in the place to realize the vision of “intelligent apps” that bring predictive analytics directly to the point of decision.

Models, models everywhere… I think we’re on the brink.

The Analytics Layer

Application Architects love layers; and I think it is time for a new layer – the analytics layer.

“You know what else everybody likes? Parfaits! Everybody likes a parfait.” – Donkey

Data analytics have become too compelling to ignore any longer – even at the application level; and not just within specialized applications but within the standard application software stack. Architects have always been about taming complexity and the techniques coming out of the analytics world bring powerful tools to the table. We’ve always had a data layer. It’s time for the analytics layer.

Early trends in data warehousing and BI had the unfortunate side effect of creating silos within organizations and within software applications. Analytics was a separate entity. Star schemas, denormalized data and batch processing took significant processing power and therefore was expected to be “off-loaded” from the standard operational system. Dashboards and pivot tables were bolted on as a way to see into the state. And though over time the delay was minimized to the point of “near-real-time” the silos remain. Today, analytics is not a layer; it’s a module.

But the true power of analytics lies in their ability to provide effective simplicity. Google became the most profitable company in the world not by creating a simple search, other companies had the idea of a single text entry, but by making that simple search nearly always find what the user was looking for through developing the most effective ranking algorithm. Google didn’t just build a data layer, they built a high-powered analytics layer and stacked their software on top of it.

An analytics layer holds the promise of delivering intelligence directly to the point of decision within the application. Up to now we have offered users choices but expect them to bring the intelligence to the table. We need to go beyond explaining the options available to bringing related real-time information and statistics to bear on the decisions being made.

Consider two simple examples: Amazon and Stack Overflow. Both of these websites changed the way decisions were made based on bringing highly reliable information to the point of decision. The approach is different – simple ratings vs. stack ranking – but each generated incredible gravity for their site because of the power of aggregated, targeted, reliable information delivered right at the point of decision. Both applications rely on a powerful analytics layer.

Interestingly, the analytics delivered in these examples did not require advanced algorithms, neural networks or machine learning – but rather the answer to one question: “What information would give the user the best chance to make the right decision?”

Sometimes the answer to that question is extremely simple – a basic comparison, information about what other users decided, etc. In other cases, more sophisticated algorithms and statistical methods are required. But in all cases, the question needs to be asked. Having an Analytics Layer forces the question to be asked. When designing the Data Layer, the Data Architect asks what options need to be provided to the user. When designing the Analytics Layer, the Analytics Architect asks what intelligence gives the best chance for a successful choice. And when designing the UI Layer the UX architect asks how best to bring those two things together.

Vendors and platforms are touting data analytics products and tools, but until application architects begin to think in terms of an analytics layer the true potential of these techniques will remain largely untapped.