Exploring and modeling COVID data
Join Damiaan Zwietering as he explores the world of COVID-19 data in data science and where the opportunities and pitfalls lie.
“If your prediction proves to be very good, then it’s probably too good to be true,” says IBM developer advocate and data scientist Damiaan Zwietering.
Damiaan loves his profession, which he has been practicing for almost 25 years, and by now he has come across most of the pitfalls. He likes to share his knowledge and experience with others, from developers to people in the business, and therefore, has a prominent role during the June 12, 2020, Code @ Think digital event. To register for this event, click here.
In two sessions, he’ll introduce anyone who wants to know more about data science into the world of COVID-19 data and where the opportunities and pitfalls lie.
Exploring COVID-19 data
In the first session, Damiaan will delve deeper into his experiences with the investigation of raw data, in this case, raw data on COVID-19 from the EU CDC, or European Centre for Disease Prevention and Control. During the session, he will not only broadcast, but also will allow participants to actively participate in the live data sets he shows. More well-known issues will also be discussed, such as graphs you see in the press. How difficult is it to reproduce them? What’s involved in making good use of the raw data and that it is eventually usable as well?
Damiaan: “The pitfalls are also an important part. What do you come across with such a data set? You see anomalies, and how do you treat them as an analyst and data scientist, and also as a developer? How do you unravel the underlying processes to learn how the data came about? In the past, there was the idea that you “can release an AI on something” and that a solution would come out of it. He hopes that by now everyone is convinced that it’s not that simple. Think of your own assumptions, which can be reflected in the way you visualize data. “You always have to keep a very close eye on that.”
Then, you inevitably end up with the question on how the data is collected, such as how did the measurement come about? Questions that you should always keep in mind. “Suppose you see a clear deviation in certain sensor data. Is the sensor broken or is it really true?” Because of this, conclusions are often drawn about the way of measuring and not so much about the actual process. This is all very much applicable to COVID-19 data, precisely because there are actually no previous examples, and ways of reporting can change from one day to the next, such as the huge increase in cases in China after a change in the way of reporting.
Hands-on workshop: Modeling COVID data
After the data exploration, Damiaan will continue with a hands-on workshop. “This will be very practical,” he says. “We just dive into the code we’ve been working on for the first half hour. We don’t just look at the numbers, but see if we can make a model using them.”
Damiaan then expects to run into the limits of machine learning and AI. “There’s not an AI or machine learning off-the-shelf where you use your data and then know the number of cases next week. That doesn’t exist at all. You still have to build that yourself. It takes a lot of expertise and human intelligence to make that work.”
Ethics in data science
The moment you use data, you touch on ethics. Damiaan likes to be involved in the discussion around this ever-sensitive subject. “The discussion is often about whether it is ethical to use algorithms to look at data and draw conclusions from it. We’re way past that step because we have been doing that for a long time. Now the question is, how do you look at data ethically with algorithms?”
He gives as an example automated analyses of fraud where you apply one rule, for example, that people under a certain age commit more fraud in a certain way. Technically, this is already a very simple algorithm. “That’s one if-then-rule, but that’s just one. Now, imagine a very complex set. Was it done in an ethical way, and is it applied in the same way?” he says.
According to Damiaan, things often go wrong with organizations that think they are doing well because they say they are careful not to discriminate, and so on, while the actual consequences for the people you make decisions about using the algorithms remain underexposed.
“In the case of COVID, we look at infections, such as taking a social measure in response to infection data in a country, based on infected citizens. It is often said far too easily that everything is neatly ticked off the list of points, while not looking at the consequences. But, of course, this is true everywhere. Think of the withdrawal of a benefit or when authorization for home improvement is automated.”
Damiaan will briefly touch on the ethics of data science in his sessions at Code@Think and will host an interactive session on the topic later this year.