The Blog


When it comes to data science nobody is asking the question of when AI will replace the data scientist. A common complaint is that there is a huge demand for data scientists and AI engineers, but have you ever thought that a data scientist’s work is highly repetitive and that machine learning algorithms eventually will be able to learn from data scientists what to do with the data?

So, is a data scientist’s work repetitive? It is, and these are the steps (which are often done iteratively):

  1. Data integration
  2. Data cleansing
  3. Feature engineering
  4. Model definition and training
  5. Model evaluation and hyperparameter tuning
  6. Model deployment

Especially, but not limited to deep learning, a lot of work involves a trial-and-error style tweaking of the abundance of different knobs you can tune.

All of this can be done by a machine and sometimes more effectively. So, let’s start with black box models. I can’t introduce all of the IBM Watson service offerings here, so I’ll only give an example. Let’s take IBM Watson Visual Recognition. This is a REST service that you can train with your images to make it recognize whatever you can imagine. No math or data science skills are required. And, there is a Watson service for nearly every data science task.

Another example is the IBM Model Asset Exchange (MAX) where the math and AI complexities are encapsulated in a Docker image that can be deployed on Kubernetes or on IBM Watson Machine Learning with a single click.

If you need to get your hands dirty with AI, IBM Deep Learning as a service takes care of parallelized model training on GPUs and hyperparameter tuning for you. So you can concentrate on your specific business task.

Last but not least, the IBM AI Fairness 360 toolkit addresses the most urgent need for enterprise AI, model bias. The AI Fairness 360 open source toolkit not only automatically detects bias but also quantifies it. The toolkit provides a set of algorithms to mitigate that bias to remove it and will be a part of every ML and AI DevOps pipeline soon (like unit and integration testing in traditional software development).

So, in the future we will see the work of data scientists moved to a level of abstraction where the dirty work is automated and the data scientist can concentrate on key tasks like communicating results to business stakeholders, identifying opportunities for AI in business processes, and taking care of ethical and bias issues of the models.