For best experience please turn on javascript and use a modern browser!
You are using a browser that is no longer supported by Microsoft. Please upgrade your browser. The site may not present itself correctly if you continue browsing.
The Spotlight introduces a different Data Science Centre Affiliate Member every month. This month: Shubha Guha, Data Engineer and Data Steward at the Informatics Institute, Faculty of Science.

How do you apply data science to your projects?

'My role spans across research in data management for data science, research software engineering, and data stewardship for the Informatics Institute. Every year, I also help with teaching Big Data, a course that is mandatory for the data science master’s track.'

Is there a project from this past year that you are most proud of? 

'I gave a talk at a conference on my paper called “Automated Data Cleaning Can Hurt Fairness in Machine Learning-based Decision Making”, which shows that standard methods for data science practitioners to clean and prepare data before doing machine learning (which is a large part of what data scientists must spend their time on) can have unintended consequences for the fairness of the trained ML model if not carefully considered.'

What do you like most about being a DSC member?

At the Informatics Institute, there is so much innovation (and also hype) around AI and machine learning that it can be easy to lose sight of all the other meaningful and pioneering applications of data science. I value the occasional change of perspective from talking to DSC members in other disciplines and it’s always satisfying to zoom out and get the more global perspective of inter- and transdisciplinary work.

What is your favourite data science method?

'We see a lot of big, beefy deep learning models making the news and winning AI competitions, so I really love to see an elegant old-school solution such as nearest neighbors methods for recommending retail or supermarket products. It feels like I’m rooting for the underdogs and it proves that sometimes smaller and simpler is better than expensive hardware and weeks of model training.'

Are you camp Python/R/or something else?

'I often use (and enjoy) Python, but I am also keeping a close eye on the growing popularity of Rust. It’s not the most approachable language for beginners, but I think there is a lot of potential for making more performant and precise systems.'

Contact Shubha Guha about her work

S. (Shubha) Guha MSc

Faculty of Science

Informatics Institute