Turning data into gold; the ultimate promise of Analytics and Data Science. It should be obvious you’ll need a Data Alchemist then. Now that would be a fancy title, at least for the job ads, but the important thing is to understand the skills behind it. Moreover, understanding the characteristics of the “gold”.
Øyvind W. Remme
Partner, NextBridge Advisory
It’s not only about using analytics and data science as decision support or to improve business processes; the real gold lies in developing (more) intelligent products, services and customer interactions. One of the thought-leaders in the field, Tom Davenport, defines this new era of embedding data smartness into customer offerings as “Analytics 3.0”.
Data Product – The Gold
In general, we are talking about developing data-driven products, or just “Data Products”. If we hang on to our analogy about turning data into gold, then data products would be like automated alchemical processes. A data product takes different types of data as input, applies analytical algorithms on them (the formula) and produces an intelligent result. Now, there are debates about the proper definition of the term. Some definitions are very wide, that includes basically any type of data processing, but I prefer a definition offered by DJ Patil, named the first U.S. Chief Data Scientist by the White House, who says “a data product is a product that facilitates an end goal through the use of data.” Moreover, in my opinion, great data products move beyond “helping management make decisions” to actually automate decisions and take actions. Like it or not, everything that can be automated will be automated. That includes decision-making. Join the ride or die.
You are probably using several data products already without thinking of them as such, e.g. data-driven apps on your smart phone. Google Translate is a data product using statistical machine translation. Actually, just a few months ago (November 2016) Google Translate was switched to use Google Neural Machine Translation (GNMT), significantly improving the translation quality. Some exciting data products that will affect most of us in the future are self-driving cars, buses and other types of transportations. These are highly data-driven products that uses Artificial Intelligence (AI) to navigate and drive among regular traffic.
Analytics tools are not data products. They are only tools. You’ll need humans to utilize them and apply the logic, at least for now and in the near future. Of course, if you look far enough into the future, AI will probably also be able to replace (many) human coders, but we are not there yet.
The Age of AI
Tom Davenport and Julia Kirby take a rather positive view on our adaptation to the future of automation, e.g. in their book “Only Humans Need Apply: Winners and Losers in the Age of Smart Machines”, where they argue that humans and machines will be able to achieve more favorable outcomes than either could have done alone. They also provide different types of career strategies to avoid becoming obsolete. The most interesting strategy, in my opinion (as a data nerd), is “stepping forward”, which means to be involved in developing the next generation AI. Who knows, maybe data scientists will be the last bastion of human labor? Someone along the management career path would maybe go for the “stepping up” strategy, meaning to head for “more big-picture thinking and a higher level of abstraction than computers are”. (You can also read about it in their Harvard Business Review article, Beyond Automation).
If you are the “stepping forward” type, a data scientist is what you should become. If you are the “stepping up” type, on the other hand, you should definitely invest in data scientists, because they are the ones to realize your visions.
The Data Alchemist
Unfortunately, good data scientists are hard to come by, and you will need some very skilled people to develop the data products your company needs to be properly intelligent. Moreover, there are many different types of qualities needed, which reduces the list of candidates drastically. The most logical way to handle this is to put together a team that completes each other, but in my experience, in order to achieve real excellence, you also need this one particular person. The one I call the “Data Alchemist”.
If I should describe the data alchemist in one sentence, it would be this: “The data alchemist is an innovator of valuable data products.” In the next section, I will describe the key skills of the data alchemist. These capabilities are of course the same whether you are looking for “the one” or planning to build a data science team.
First, let’s be done with the obvious, she has to be clever in math and statistics because that’s the core of the “the magic” (the secret ingredients). It’s not as difficult as “none-quants” seem to believe, you just have to learn it, like any other “hard skill”.
The data alchemist is a nerd (and I mean that as a positive and admirable quality), but still great with people. No matter how much we love Sheldon Cooper on the screen, he would be disastrous in our team. One of the most important tasks in all development projects is to interact with people, to elicit requirements and needs, and most of all; be able to interpret and “read between the lines”. Another side to this is often referred to as “storytelling skills”, which means to be able to communicate ideas and concepts in a way that engage people. These are “soft skills” and much more difficult to learn, especially for hard-core quants, but they will also be harder to replace by AI than “hard skills” (something for nerds to reflect upon).
A funny anecdote in this context is that IBM has developed a data product, based on Watson Cognitive Computing Platform, called “Teacher Advisor” to help math teachers develop personalized lesson plans. Teacher Advisor analyzes student data and tailor instructional material for students based on their individual skill levels.
Another issue with many analysts and data scientists is actually connected to their love for “solving puzzles”. They typically dive too quickly into the problem solving, and spend too little time on the bigger picture. This may result in solutions that require too much effort to build and maintain, takes too long processing time or requires too expensive hardware. As DJ Patil puts it in his report, Data Jujitsu: The Art of Turning Data into Product: “Smart data scientists don’t just solve big, hard problems; they also have an instinct for making big problems small.”
The data alchemist must have in-depth knowledge of the business area and industry. How can she build and innovate for your business otherwise? It’s OK to have a strong academic background, but she must not be stuck in the theoretical world. To quote a previous colleague of mine, while I was part of a team developing software for a large grocery chain: “How can you design a good warehouse inventory system when you’ve never set foot in a warehouse?” You’ll have to drag your butt out of the lab!
One interesting capability often mentioned as something that separates data scientists from (regular) analysts is usually referred to as “hacker skills”. Obviously, it doesn’t mean to be capable of breaking into other networks or computers. It’s just another term for being highly skilled in IT and programming. A data scientist must master the art of coding and be able to adapt to new programming languages when needed. In my opinion, that’s where the “science” part really applies. It’s also hard to imagine developing data products without those skills.
A data product can be divided into the following four modules or steps: 1) Data collection, 2) data preparation, 3) the analytical algorithms and 4) the output (the action/visible part). Each of those parts require coding, but often by the use of different programming tools. A data science team must master all of those parts. I have seen several places where analysts or data scientists are cut off from several of those steps (willingly or unwillingly), especially step 1 and 2, because it’s someone else’s job (e.g. IT or Data Warehouse department). This is not optimal. It’s a major bottleneck, and in my opinion a main reason for lack of innovational progress. In these cases, it’s important to separate between prototyping and development for production. It’s OK that IT takes responsibility for developing the data product to be put in production, but the data science team must take responsibility for the experimentation and prototyping. Hence, they must master all parts of data product development.
Finally, a real scientist is always passionate about her job, but an alchemist equally loves the intimacy of the lab and the exposure to the rest of the organization. You’ll be lucky to find someone with all these qualities, and to keep her, so start grooming and create a workplace where data scientists can thrive and evolve and stay with you.