Oh man! Is my machine learning or am I learning from my machine?

5 min read

When I decided to major in Statistics, my dad was concerned: he worried I wouldn’t be able to find a job outside of building mortality table for the town demographics services.

When I got out of my masters, 10 years ago, when searching for a job and said I liked dealing with data and numbers to find patterns, I would get the response “that business is not about finding patterns”: that was the job of a few data analysts in the company (and when they said that it sounded like they segregated these people in a room and threw the key).

Today, when I tell to our customers or at events that I have a degree in Statistics and I have in fact played with Neural Networks and open source tools, their eyes get a star-shaped and they get all excited.

You know one thing? Throughout all this process, SAS was in fact doing machine learning. And in fact, SAS has been doing Machine Learning for about 41 years now.

Today, however, Machine Learning is shifting from an algorithm-based concept (decision trees, neural nets, random forests,…) to a process- based concept (get actual decisions from implementing machine learning) – guiding the notion of artificial intelligence. Summarizing, we can identify three key notions to implement a successful “analytics 4.0” strategy in a company:

Figure 1 – key pillars

Let’s think of a practical example: the approval of a loan, normally based on a combo of analytics and business strategies. Small disclaimer here: this blog won’t focus on any regulatory & privacy – which remain key aspects to address when implementing ANY of the above!

The offline modeling works like this:

Figure 2 – Machine Learning logical flow

The above process happens offline: the learning algorithms step might take advantage of supervised algorithms like regressions, decision trees, random forests, gradient boosting, neural nets, and others that SAS provides. When a model is selected, SAS allows to generate an API of the overall process, including steps like variable transformation, missing imputation, … and deploy it an object that can be called in real time by a front-end application, scoring new incoming data. When the model performances decay, or on a periodically defined basis, the data scientists restart the process and select a new model. Is the machine learning? Oh yes, it is, because of the way the learning algorithms work: automatically and iteratively.

But is this artificial intelligence? Not yet. Let’s try to kick some in:

Figure 3 – Automatic Retraining

In the above scenario, the process of model selection is automated, meaning it re-goes through the initial training process considering the new data that was made available in a certain window of time. Ideally, a new algorithm might be selected as an outcome of this process with different variables as opposed to the initial time. This scenario is suitable for cases where the outcome of the label we’re investigating “makes itself known” within a certain period (i.e. I never know right away that my customer will default!). This cannot be a “continuous learning”, because we must wait until the label value actually becomes available before we update the models, but we can make it as frequent and automatic as possible so that the data scientist only has to make sure the accuracy holds.

The notion of continuous learning requires that the information we’re looking the machine to learn is available immediately and we also have a learning window of data we use for the learning. The notion of a window calls for a different type of real time execution: not a triggered, one-in-one-out observation, but a streaming, window-listening scenario where my engine waits and listens for events as they go: obviously, the two logic do not exclude but complement each other.

An example of continuous learning can be constant unsupervised learning, where the clusters that separate customer groups (or transactions) keep learning and keep adjusting as transactions come in. A way fancier example of continuous learning is particularly relevant when we try to constantly learn from unstructured data of any kind, trying to simulate the human behavior in the learning, for example when we talk to someone and adjust the conversation to our audience. In the situation of a loan approval, we can imagine an artificial agent discussing with the customer on the phone about his loan application and the agent adjusting the decisions on the approval, the price, the loan amount, etc. based on models, rules, and the continuous learning of the ongoing dialogue. Graphically:

Figure 4 – Continuous Learning

Does this mean we don’t need data scientists anymore? Not at all. We need them more than ever, as they’re the ones that can set up this scenario and can ensure the efficiency and properness of the process. In addition, not all data and all analysis are suitable for continuous learning, as we mentioned earlier.

Does this look fun? Yes, cos it is! SAS can help your organization embarking in the journey of powering your analytics economy, so if you’d like to hear more, you can contact me directly via email or social network or visit us at www.sas.com as well as these dedicated pages communities.sas.com , blog.sas.com , http://developer.sas.com  and support.sas.com, among others . And play it nice if you want your loan to be approved

About Cristina Conti

Cristina has been passionate about Analytics ever since her first grade teacher talked about odds probability when tossing a coin! Ever since, she has focused on building a strong analytics knowledge and experience directed at solving everyday business problems. A lot of her work revolves around the modernization and automation in the modelling lifecycle in all of its depth - including the usage of Big Data and introduction of new modeling techniques. Her experience ranges across different industries as she started out as a biostatistician to move then more into the financial sector, particularly banking risk and marketing departments.