Skip to content

Data Visualization Part 2 (Episode 18)


Data Visualization Part 2

[00:00:00] Alexander: Welcome to another episode, and today we are speaking again about data visualization. Paolo is on the show as well. Hi, Paolo. How are you doing?

[00:00:11] Paolo: I’m doing very well, Alexander. How are you doing?

[00:00:15] Alexander: Very good. I love this topic so much. It’s one of my favorite topics. Also because if you work on data visualization, you can directly see the impact. And also, if you want to show others your results, if you want to have an impact, if you want to convince others about the values that you bring, data visualizations are, I think, the number one tool to go for, because you can speak about, oh, we do this, It’s AI and machine learning and whatsoever kind of sophisticated stuff most people don’t understand.

But if you show them a really good data visualization, they directly get what, what you’re working on. And so that’s why I think data visualization is really important. It certainly opened many, many doors for me. So Let’s dive into this next episode about data visualization. And if you haven’t listened to the previous one, we’ll just scroll back a little bit in your podcast player and have a listen to that one as well.

Let’s start with one key insight. Not everything in your figure has the same importance.

[00:01:33] Paolo: Very important Topic, Alexander, because we make data visualizations. Because we want to focus our attention in what’s important, right? So we don’t want to confuse our readers, our audience about the key messages to be exploratory or explanatory.

This is really important. So I think it’s important to Have in mind what is the main information and usually, for example, if you, if you have a scatterplot, for example X and Y axis are way more important than other. Information and that you, you want, you want your reader focus in this X, Y space instead of confusing them with latent call or grid lines and everything else. That could be seen as something less important and confusing.

[00:02:47] Alexander: And I think, you know, the, one of the most important aspects of a good data visualization is very often the most overlooked one. And that is the title. I think the title is so important to spend time creating, crafting a really, really good title.

Yeah. Of course you can apply a certain kind of. Logging templates, yeah, that are really kind of making people think and stop, yeah, but at least make it either a statement or a question, yeah, and then the statement is then backed up by the figure that you show or you have the question and the answer of the question is in the figure.

So think about that. Your footnote is surely less important, you know, and that’s, you know, where it starts. The most important thing is On the top. Actually, if we think in the Western world, it’s on the top left, because that’s where we start, start reading. And so the least important things are on the bottom right.

So just by positioning things, you can give them different importance. Headline, top left, footnote, bottom right. Because we read in this kind of. That like shape across the figure.

[00:04:19] Paolo: Yeah, of course, for example, when we do visualizations in our pharma work, for example, the good notes. Are always really important for our audience. As you know people wants to see the abbreviation. I mean, abbreviation for everything could be also mg should be abbreviated for milligrams, stuff like that. But we, have always a few lines of abbreviations and notes in our. Legends and footnotes so usually I put them in a light, light gray. So for example, you want the information that you can read it, but you need also to get that this is not the main part. And you don’t need it for get the key messages in the data visualization you have.

[00:05:17] Alexander: I love that you mentioned gray because that is my favorite color in data visualization. Usually we have white backgrounds, and then, you know, something dark on it, black. And so. The lighter gray you use for your visual elements, the more it, you know, gets into the background. If you have a very, very light gray, you can hardly distinguish it from white. And so readers will automatically more focus on the areas with a big contrast.

So it’s a heavy, dark. Red, black things, of course, well, even more with, you know, some different hues. But yeah, that’s, it’s a really important thing. The other thing that you mentioned is size. Yeah. So font size of a title should be much bigger than the font size of the footnotes. And you can use different sizes of your font, of your visual elements to make them stand out. Thickness is another one. If you, if you really want to have grid lines, well, they shouldn’t be as thick as the axes, for example. Or the visual elements within the chart. Let’s say you have a line chart, then, you know, the lines that actually encode the information They should probably be the sickest part, you know, or maybe you have a certain line that is most important, you know, and then, you know, that is the sickest line, you know, and so it’s the darkest line and everything else is lighter and thinner.

[00:07:03] Paolo: For example, if you plot a simple cumulative distribution function. So maybe the horizontal 50 percent line is important because you can see in different groups what level we have the half of the sample. But again, you can use light gray and if you use other grid lines. It should be lighter for sure because and see, important things and outstanding in terms of the importance as compared to other less impactful elements. So deadlines could be important. If you want to have an idea. And this is not the key message, because you want to see the difference between for example, within groups. And so you have a, you need to have a clear hierarchy. Of the different levels of information and use this hierarchy efficiently.

[00:08:12] Alexander: And you can, you know use even more elements. Yeah, to put some hierarchy in there. So if you, for example, you have an electronic way to display your information, like on a homepage or something like this. Then you can have also maybe a hover over function. Yeah, where you put additional information.

Yeah, you could, for example, put all your footnotes or whatsoever in a read more here or Background information here, kind of area where, you know, if you hover over, you get all the other things, you know, or if you have a scatterplot and you hover over certain points, you get maybe the exact positioning or maybe you get additional background information about this specific data point. So that’s another layer that you can think about.

[00:09:09] Paolo: Yeah, for example, speaking of scatter plots, I recently used a simple scatter plot for visualizing before and after treatment data points. So yeah, for example, in the x axis, we have the baseline. In the y axis, you have the score at follow up. And then you, you can see with the X Y positioning of the observations moved from baseline to the follow up.

And then you can include so many levels of information, for example, with colors. You can have the groups in which you can categorize each individual data point. Then you can have a 45 degrees line showing no change between baseline and follow up. Then you can select a lower triangle with a certain area of improvement or worsening with a shaded area, maybe in gray also, you can have another line that pre specified level of improvement and worsening and plain with The colors, the likeness of these elements.

You can have different levels of information. That’s at the end you have a clear search in the figure without, I mean, confusing the reader and Because maybe some reader just maybe is able just to get the message in terms of the XY positioning, but maybe more experienced readers can get more messages and that you are enabling that different readers. In getting different messages from the same plot again, it’s important to have a clear hierarchy of the information.

[00:11:19] Alexander: Yeah, hierarchy is one point. The other thing is also kind of how you want them present them sequentially. Yeah, so if you, for example, come from a, as you say, kind of bigger perspective. Yeah, you can also organize your presentation or your figure in a way that, you know, you add more and more information over time.

So imagine, you know, you’re giving your shows a figure in a presentation. Maybe first show certain elements and then you add further elements and further elements and further elements, you know, or maybe if you have it online, you use this kind of scroll retelling technique, you know, where you have a figure and while you scroll, there’s more and more elements coming to the figure.

So maybe you start with, oh, this is how I’ll kind of sustain it. therapy works. Now, this is how, you know, placebo works. This is how the new therapy works. And then you can say, and here’s kind of a reference line for where the target is according to the guidelines. And then you can kind of, you know, in the next slide, you can maybe split it into subgroups and speak about this.

So, so you can add more and more information In a kind of sequential way, but of course you usually start with the most important one.

So that leads us to a little bit of another side of it. Yeah, of course you have important information, you have less important information, but you have also information that you actually don’t need to show. Yeah, I would say you can probably include my. Making things simpler and the key word that I learned from Cole Knuspermanafleck here is clutter.

You need to declutter your data visualization. And so a great data visualization only has those elements in it that needs to convey the message, not more. Yeah, so here’s a phrase less is more really.

[00:13:40] Paolo: This is a nice perspective, though I’m, I’m using often Gigi Block, for example in our for patient worker and you know, there is a sequential workflow in Gigi block.

So you start with initial initializing the plot, and then you add the first element, for example, for a scatter. Could be. The standard G on point line, for example, and then you can add the sequentially other elements. So, for me, sometimes it’s I, I go in the other direction. So I start really, really basic.

And then I had the Information that I see is missing, basically, because maybe, of course, I can put this line. Of course, I can change the title to reflect the colors, the different groups. But, of course, sometimes you, you have the space for decluttering because when you start your basic. Change your block term.

For example, you have a great background, you have the major and minor lines. In the blog. So it’s maybe it’s a combined approach at the at the same time I had the information which is missing. They also remove something which is redundant and, and not informative, which is really, really important.

Of course, I think the cluttering is so important for people working with Excel, which is also a great tool for data visualization that the cluttering is way more important because you know, I think that under the Excel plot, you have a lot of information, which is not.

[00:15:48] Alexander: And in Excel has actually improved over the years, is my perception. So in the early versions of Excel, there was much more clutter in the standard templates than it’s now. The newer templates are much more clean, for sure. But if you work with any form of template. Usually there is more in it than what you actually need, which is in a form kind of okay for a template because you want to show all the different options to the person.

And if you don’t display them in your graphs and people may not recognize that these kind of options are there, but then you need to have a conscious decision to remove these visual elements. That are not that helpful. So that is a, that’s a really, really important.

[00:16:42] Paolo: Yeah, because, you know, you were of the cluttering, I mean, the cluttering is important if, you know, if you have clutter in the figure. So I think it’s important to convey the message that you don’t. You don’t need to add clap. Yeah. So, of course sometimes so people we have more minimalistic people for example, or people who want to add more details.

It’s also a matter of preferences. Because maybe also introducing some extra information or plateauing sometimes. Could make the visualization experience more joyful. For example. It’s a matter of preference, but in general you, you need to be minimalistic. Yeah.

[00:17:36] Alexander: I think coming from a statistics background, usually kind of people in our kind of data space are interested in lots of details. I think that’s kind of a requirement of our job. Yeah. We need to have a focus on the details to actually do our job well. So people we talk to are not necessarily the same. Yeah. And so recognizing that says might be very different preferences in terms of wanting to see the details.

Is a really, really important thing to who’s the audience is that really kind of other data scientists, other statisticians, or is the audience upper management, for example, or people’s that, you know, lay people said, look at it. So that is really, really important to take into account. You, there’s a couple of ways you can reduce clutter.

One area that I always try to remove is legends and in the previous episodes, we talked about how you can get rid of a legend by using, for example, coloring. Another way to get rid of a legend is by putting the labels directly into the chart. Yeah. So if you, for example, have a line chart with three lines in it, of course you can put the la you know, a legend with the red line is, I don’t know.

Category A, orange is B, and green is C. Yeah. Or you can put these labels directly into the plot, next to the visual elements. Then it’s much easier for the reader to see. Okay, this line corresponds to to that one. What we are using here as a Gestalt principle is proximity. Just by putting things closer to each other.

We know they belong to each other. And so if you put a label next to a line. So that means this label belongs to this line, you know, of course, you know, it needs to be kind of positioned in such a way.

[00:19:54] Paolo: That it’s not confusing. So you really know that this line belongs to that category, but..

[00:20:00] Alexander: If it’s a little bit difficult, you can use another gestalt principle, and that is connectivity, yeah? You can basically put a little bit of a connector, a little line or something like this, or an arrow or whatsoever, that points the, and connects the label with, with the respective line or bar or whatsoever, yeah? So there’s, there’s a couple of different ways you can use to make sure that people understand it. But that is a really, really nice way to get rid of a legend.

Okay. Very good. So we talked a lot about designing your data visualization in this episode and about decluttering it, making it as minimalistic as possible. Yeah. Of course, taking the audience into account because everything that is on your data visualization. increases the cognitive burden of your audience.

And the goal is to have a good data to ink ratio, a high data to ink ratio, as Paolo mentioned. I think this coin was termed by Tufti?

[00:21:14] Paolo: Tufti. Yeah.

[00:21:16] Alexander: Yeah. So have that in mind, data to ink ratio, data to ink ratio. And make minimalistic graphs that only show the key things to your audience and walk your audience through it. We talked about, you know, not everything has the same importance. You can guide your audience by seeing the important things first and then getting into the details later if they have time. Any final thoughts on that Paolo?

[00:21:48] Paolo: Yeah, I think that in general it’s always important to have the why in mind. So, declutter your figure and make sure that your figure answers the question.

[00:21:59] Alexander: Okay, very good. Awesome. If you loved this episode, tell it to your friends. And make sure to subscribe and see you then again for the next episode or hopefully you listen to it.