Transcript
What are the questions to ask if you get a new project?
[00:00:00] Paolo: Welcome to the Effective Data Scientist Podcast. The podcast designed to help improve your skills, stay focused, manage successful projects, and have fun at work. Be an Effective Data Scientist now.
[00:00:34] Alexander: Welcome to another episode, this time together with Paolo. Hey Paolo, how you doing?
[00:00:40] Paolo: Very well, Alexander. How are you doing?
[00:00:43] Alexander: Very good. Summer has finally arrived here and so I’ve opened the windows. If you hear a little bit, kind of birds and the big backgrounds and things like that. And that’s due to the nice weather here.
But we are not here to talk about the weather. We are here to talk about different changes, especially new projects. So taking on a new project is always in really, really exciting and interesting time. But it’s also time where you can, make a lot of mistakes. And these mistakes then bite you later.
So that’s why we want to talk about how you can make sure that you succeed directly at the beginning. So Paolo. What kind of things are, if you get a new project and you know, someone maybe you haven’t even worked before comes to you and says, oh, I would like to do X, Y, Z. What are your thoughts?
[00:01:51] Paolo: Yeah, I think that’s there are a lot of challenges that either should be addressed upfront because you could be tempted to do plan, lot of analysis I don’t know try new things maybe or just reuse codes or ideas from previous projects or stuff like that. You can have more ideas and new, new things to try out.
But in the end, I think that it’s quite important to align with the stakeholders, customers and set up a kickoff meeting before starting this new project and trying to understand what are the real needs behind this new project request.
[00:02:50] Alexander: Yeah. Yeah. That is, that is very good point. I think we love statistics, we love data. So we are prone to directly jump into these kind of topics. And I think we also want to very often serve very, very fast and be quite helpful. And maybe we are also a little bit uncomfortable asking questions. And so the easy way might be to directly to jump to action. But yeah, setting up a kickoff meeting where you talk about lots of different things is quite, quite helpful. And the first question that I usually ask is something like, what do you want to achieve here? Yeah. Why is that important for you? And Because very often in the past I would get kind of requests like can you give me a table that looks like this X, Y, Z? Or Can you run a logistic regression here? Yeah. And well, you wouldn’t drive your car into the garage and ask the technician to kind of use this tool and apply it here and whatsoever. Likewise, someone shouldn’t come to you and say, do a logistic regression.
[00:04:19] Paolo: Yeah. Yeah. It’s also important to align with the methodology. For example, because I worked with some stakeholders, customers that worked actually quite keen on, for example, trying out the agile methodology. For example, you can ask, is it okay for you if we try something? And then we think about it, we have maybe one meeting weekly, and then we discuss.
So we can go back and forth and then after a few cycles, we can maybe deploy the final outputs, but it’s not for everyone. You can find maybe some customers with re, I don’t know, with a hundred percent everything clear in their minds, so they just need to explain fully the project and you know, having things done in the most efficient way. Because everything is already in their minds, so they don’t need to refine the ideas or stuff like that. It’s quite important to understand who is your customer? How do you want to work together?
[00:05:42] Alexander: Yeah. Yeah. To kind of agree on the modes of how you work together. When things get delivered, is it, you know, more kind of informally delivery, or is there some kind of, you know, highly Q seed version that is, that is delivered?
Yeah. And that always also of course, depends on the project scope. Yeah. So maybe it’s, you know, just, you know, rerunning a certain stuff because you just wanted to get the, get an updated data set in it, something like this. And really the, the purpose of the project, Hasn’t changed. It’s just, yeah, we need to provide this update.
Yeah. Maybe that’s straightforward, but sometimes it’s kind of, yeah, we want to have this update, but we also, and then come exploit that or Yeah, we want this update, but there was some changes. In terms of how we collected the data, or there’s some, you know, new stakeholders that want to look into the data and they might have a different perception, different understanding, different needs. Yeah. So even if it’s just a rerun of something, there could be changes that you don’t foresee.
[00:07:02] Paolo: And maybe there are some further changes or new ideas from customers that can be maybe foresee upfront, I don’t know. You maybe if they ask for one more subgroups, then you can have the idea that could be maybe more important to look Another subgroup.
And then it’s important to ask are you really sure about you just need Yep, this more subgroup because, you know we are on time, so if we want to put few more subgroups, it’s fine. And it’s not more you know, workload for programmers.
[00:07:44] Alexander: Yep. That’s the other point. Yeah. So it might be also an opportunity to improve things. Yeah. So maybe in the past it was done quick and dirty, and now maybe there’s the opportunity to improve it. And because you see now maybe, you know, you do it more regularly. Yeah. All the. Someone else worked on it before and says, you know, you have new ideas how to, how to improve it, how to make it better.
So that’s why it’s really important to understand. And what are the real goals here? Yeah. How will that have an impact on decisions? Who will be the stakeholders that will look into this data? Yeah. How do people want to maybe even interact with the data? Yeah. Will they, you know, kind of. Then take it and put it into PowerPoint slides or will they further need to further process it with kind of maybe something in Excel or something like this?
Yeah, it would be, you know, I had situations where we were all always kind of providing P D F tables and later for I found out and then people were taking these numbers and putting it to into Excel so that they could do their own visualizations in. And Well, if I would have known that I could have, you know, directly given you an Excel spreadsheet or even better directly the visualization.
[00:09:21] Paolo: Yeah. And also saved a lot of programming resources.
[00:09:25] Alexander: Yeah. Yeah. And, you know, improves the overall quality. So I think it’s always important to understand what happens with things, downstream. And then of course, kind of are there any critical timelines? Yeah. What are the element? Are these just kind of be say before because you say, well, two weeks would be nice, or is there some kind of.
Hard deadline because, you know, there’s some deliverables that depends on it or there’s some kind of, I don’t know, external deadline that you need to adhere to.
[00:10:06] Paolo: Yeah. And then it’s important to understand the data if it’s the first time we approach this kind of data set for this project. So what, what are the meanings of the variables? How the data is being collected? What’s the process behind the data collection? The study if we are working in a clinical study, for example, or in It’s the same if we work on a data science you know, project where we want to investigate traffic in our website or the patterns from our customers. In an online shopping, it’s the same. What kind of features the customers have, how the data is collected in the end.
[00:11:07] Alexander: Yeah, it’s really important because that will tell you. A lot about the, the quality of the data. The quantity of the data. The structure of the data.
Yeah. What are the areas that depend on each other? Yeah. Are there any kind of logical connections in the data? Kind of like only if this box was clicked, then this other question pops up. Yeah, or only if for, for that specific country we have also collected that data, or this type of data was collected in this way up to that time point and differently thereafter.
Yeah. So there can be all kind of different things that will have an impact on analysis. It’s like, oh, how are missing? Is coded and coded consistently or is there, you know do you have, consistently the same language in the data or something like this. Yeah. What, where is free text and what is kind of maybe categorized text?
Are there any kind of labels and things like that? So lots of, lots of thoughts from a data management point as you so to say, to have a look into and becomes even more important. Of course, if you have connected databases. Yeah. So that you need to say, okay, what I, here actually is the variables that help me to match one on the other.
Yeah. And is there anything that I should, should pay attention to? Or is there someone I should talk to understand more about it?
[00:12:52] Paolo: And it’s important to ask question. You know as far as you have doubts and stuff like that, it’s important to ask questions and not wait or waste time just trying to reinvent the wheel or figuring out visa solutions. So it’s better to go to the client, customers and ask help in the understanding of the data you have.
[00:13:23] Alexander: Yep. Is there any related project? Is there any related data set, maybe internally or externally that we should have a look into? Where’s this idea coming from? Did you take it from a report that you have seen that you want to show me? Do your kind of the, show that I, you know, the where that kind of ideas coming from so that I can match that? Yeah.
[00:13:47] Paolo: Can I contact the relevant people involved in previous processes?
[00:13:53] Alexander: Yep. Yeah, because then you can also have a discussion about what good looks like. Yeah. How, how precise does our answer need to be? Yeah. How much time do you want to invest in it? Yeah. Because, oh, if we first need to kind of. Merge all this data together and kind of, you know, have lots of data cleaning and lots of other stuff before we can even start to kind of have a look into the data. That may take quite a lot of time. Is the other person willing to invest that or not?
[00:14:36] Paolo: Yeah. And Client, but sometimes also ask to yourself, how much time do you want to spend on this project? Because there are many ways to accomplish you know, the same goal. Yep. And you know, sometime it’s just useless to invest a lot of time in something that it’s clearly not going. So far, according to your perception.
So maybe it’s better just do what was requested and then you know, if everything is less or more fine, jump to another big task or project.
[00:15:21] Alexander: Yeah, I think it is. Also good discussion to have once you are more familiar with the data, you can say, well, to have a look into how feasible the things are. Yeah, maybe you see that well, there’s actually, you know, the data is not collected and that quality that you could provide a reasonable, sufficient answer. Yeah. Or maybe there’s certain systematic problems in the data that you say, I can give you an answer, but it will have these, you know, biases in it.
Yeah. Can we, you know, understand how big the bias might be? Yeah. Sometimes, you know, having an approximate on stuff fast. It’s better than to have a precise answer much later with much more additional cost. Yeah. So having a little bit of a discussion of what do you think we should invest in here? Yeah.
Is it one hour, one week or is it two weeks or is it a three months’ project? Yeah. Or should we make some kind of staggered decision kind of first do something kind of. Feasibility analysis where we just invest, you know, an hour or two or maybe a day and then see, Hmm, do we move forward? Yes, or no? Yeah. Cause if you get a data set first time, maybe. You don’t really know what you get. And then I think it’s also good to set expectations that you say, well I can, I can, first I need to look into the data and then I can tell you what’s feasible, what’s not.
[00:17:10] Paolo: And sometimes you, you can have clear ideas on Could be wrong with proceeding in with some directions. For example, I was asked to, to do some AI stuff in clinical project in which you have 3000 patients, for example, and 15 clinical variables. And it’s quite easy to kill this idea because it doesn’t make any sense to use AI methodology. When you have such a huge number of subjects and few number of items or variables, then you can have a well educated guess about the right direction for the.
But it’s always quite important to convey the message to the customer in the right way. So showing that this is just based on maybe some scientific paper. Or some previous experience, stuff like that. So always providing the, the right argument for closing this idea or opening maybe the project to new ideas.
[00:18:29] Alexander: Yeah. And if he says, well, please apply artificial intelligence here, but he doesn’t really understand why, and you know why that is important and what’s behind that. Hmm. Yeah. So, maybe he doesn’t really care so much about the output, but just needs to have an AI project in his portfolio.
In this portfolio.
[00:18:57] Paolo: Yeah, maybe.
[00:19:01] Alexander: So these are kind of a couple of different thoughts to have before you jump on a new project, because that’s an area where you can easily avoid lots of pain later on. So, Get a really good sense of why is that as important? How does good look like? What are the investments that the stakeholder wants to do?
Yeah. Very often they might, you know, vastly underestimate the resources needed to get to this answer because they think it’s just a press the button exercise and then, you know, having the names of the people that you talk to and digging into the data, really getting a deep understanding of all the variables, but also how all the data happened in the first place. Where is this coming from? So any final thoughts on, on that topic from you, Paolo?
[00:20:10] Paolo: Yeah, I think that that’s it. We will link some interesting resources, but other blogs podcast covering these questions to ask before starting a data science project.
[00:20:23] Alexander: And if you have any other kind of comments, please let us know. We have both on LinkedIn and on other social media, so contact us there. Until next week.
[00:20:36] Paolo: Bye.