5 years ago

Rethinking Data Science Teams: Part 2

When it comes to data science teams, you really need generalists who are better at innovating and learning.

It would seem that every business should be dabbling in data science. In part 1, we have touched on the benefits that data science can bring to your business. Uptake and adoption have increased over the years, but still, only half of the companies are using data science.

We also pointed out that data science teams are structured like an assembly line when they should not be. While it is tempting to have specialists doing their own things in your organization and have a project manager do all the coordination, data science is not a good area for assembly line thinking.

For one, it is not meant for execution, but for learning. With data science, you need a team that is always improvising, learning, and adapting. You do not have the luxury of knowing what results you can expect, and even what the final product is.

How specialization can hinder data science

In an assembly line, every worker is an expert and a specialist. That’s not bad. In fact, it leads to shorter production times, higher product quality, and increased efficiency. But when your product is still evolving, specialization can be detrimental. Here’s why:

1. More coordination costs.

Businesses spend time and resources in communicating, justifying, and discussing the work to be done. You also have to order the work according to priority, so the more important ones get done first. Additionally, the more people are involved, the higher the coordination costs.

When you have specialists working on your data science projects, they tend to be organized by function. Each project will need several specialists. Every step, every change, and every handoff involves a different set of specialists. For instance, a modeling specialist will need to coordinate with data engineers if they want to experiment with something. The specialist will then have to coordinate with another specialist before he or she can deploy the new solution.

In short, the amount and cost of coordination needed to explore new things are too high. You will more likely give up on exploring and experimenting with your data. And, you will have effectively killed learning in the process.

2. You will be faced with days, weeks, or months of wait-time.

Coordination, in itself, is a pain. But in a data science team that has specialists, you will have to wait a long time before you can start the actual work. As you need to hold meetings, have discussions, and even evaluate your designs, you will be waiting for days to months before you get to start working.

You will be working with several specialists, and everyone is just too busy. It might be hell to find a common free time on such short notice. But getting that meeting is just the first step.

Since you are working with several specialists who are busy with other projects, starting to work on your new project might hit a snag. Even if you do get the design down pat and you’re ready to work on your pet project, it might not get finished because the next specialist who is supposed to do his or her work on it is working on other projects.

They can only work on it when they have the time, and that could take weeks. As such, the learning stalls.

3. It limits what you can do. And Learn.

When there is a division of labor, you can get some incremental increases in productivity and efficiency. But because of that setup, you can only learn what you are hired to do. A research scientist will only do research rather than experimenting with the algorithms, changing conditions, and others. You do not experiment because you’re not paid to do it and you are incentivized to just do what’s within the scope of your job.

A data analyst will only look at the data instead of thinking of ways to better gather data. In fact, it is all too common to have simple blockages. A project manager will see status updates such as “waiting on data engineer for resources” or “waiting on data to become available.” You cannot do anything until the other person finishes his or her job. And you cannot learn anything outside of your own job scope.

Say no to specialists

It seems that the easy way to change things for your data science team will be to say no to specialists. You will need to hire generalists or those who can do diverse roles when it comes to data science. This means that they should be able to do everything from conception and modeling, right down to implementation and measurement.

This way, each person on your team will be able to do what the others are doing. If they need to experiment with something, they are no longer required to wait. Each member of your team is able to work and learn on his or her own.

Your coordination costs also decrease because fewer people are working on a specific capability. There are no bottlenecks or waiting as a generalist can move across different functions, add data to the pipeline, try out new functionalities, deploy new versions, and even go back and forth as new ideas are introduced.

In this scenario, it is true that your generalist will not have in-depth expertise of a particular function in your data science projects. However, this is okay. You are optimizing for learning, not for incremental efficiency gains. In short, you really do not need experts who are good at only one capability or function. Instead, you need a team that is not afraid or hindered from experimenting, doing trial & error, and learn.

* * *

When it comes to data science teams, you really need generalists who are better at innovating and learning. These people do not have to wait to do their jobs and learn. What’s more, because they are not expected to do repetitive tasks, your team is much more motivated to work. There is no dulling of talent, so to speak, which is a genuine threat when it comes to assembly line workers.

Photo courtesy of Boegh.

Scroll to Top