Skip to content

Data Visualization Part 1 (Episode 17)

Transcript

Data Visualization Part 1

[00:00:00] Alexander: Welcome. We are talking today about data visualization and tips around data visualization. And together with me is Paolo. Hi Paolo, how are you doing?

[00:00:12] Paolo: I’m doing very well, Alexander. How are you doing?

[00:00:14] Alexander: Very good. So, we will go through a series of tips about data visualization in this episode and also in the next episodes. And I think each tip is quite actionable. But before we get started data visualization, what is your general perspective about data visualization? Paolo?

[00:00:38] Paolo: First of all, is that is one of my favorite topics because I really enjoy reading data visualizations. And I think that in terms of data scientists. For statistician for me it’s a way of doing the last mile in communicating the, the message and it’s really important.

Yeah, and of course you can have many different objectives in mind, like doing exploratory work or presenting results. Or just letting the user enjoy the data and the I mean, the data and the results. So, it’s a way of presenting your results, in an effective way. In a way of enabling the user capability, navigating for your results, for your data.

[00:01:38] Alexander: Yeah, completely agree. It’s, you speak about these two different areas of data visualizations that are really important. One is explanatory data visualizations. So how do you Get a message across already have, or exploratory data visualization where we want to dig into data and understand data.

And of course, there’s sometimes kind of a little bit of a mix of both, you know, so, so if you have a little bit of interactivity with the data and then you can explore the data a little bit, but it says that’s just a means to get the specific message across. So let’s start with a couple of topics.

Well, one of my pet peeves is long labels. Yeah. And there’s always this problem when you have long labels in your data visualizations and what I very often see. And maybe that’s because that is kind of a standard setting in some software like Excel. People put the have these kind of, let’s say, a vertical bar chart, and then put the long labels below these bars.

And once that, then what people see, oh, I don’t have enough space there, so I need to maybe have the, you know, labels a little bit twisted, you know, so that I can have generate more space and have some kind of angle on there, but I don’t like that very much.

[00:03:10] Paolo: Like the 90 degrees’ angle was the default, right?

[00:03:15] Alexander: Yeah, yeah. At 90 degrees’ angle, 45 degrees’ angle. Yeah. And then. Yeah. It’s really, really hard to, get it? Yeah? It’s kind of, you look like this. Well, you can’t actually now listen to it on the podcast, but maybe if you see the YouTube video, you can see that. Yeah? You need to twist your head to read them.

Yeah? Or you need to shorten them so much that they become hard to understand. Now, there’s a very, very easy way to get around it. And that is, Paolo, what would you do?

[00:03:51] Paolo: Then horizontal. So I will change the orientation of the graph like departure. For example, horizontal. So you can have a nice long labels in horizontally in your left space, for example, so you can and this has also the benefit that you can read as it, as it’s natural for humans from left to right. You have the label and then you see the information in terms of the eight or width now of the bar. So you can easily get all the information at once.

[00:04:40] Alexander: Yeah. And then you can also kind of arrange that in a nice way. If you want to have, let’s say, spaces between them, groups them. So, you know, maybe there’s categories and subcategories. You can have kind of overarching labels next to it. So if you have a couple of different columns of label. You can even add further information, like maybe how many subjects are in this line, or you know, if you present a bar charts, you can give kind of additional numbers in there, whatever kind of, how many subjects are in this category, if you chose and see, you know, see proportion of, I don’t know, response or whatsoever in the bar charts.

Or you, and then you can also, you know really nicely aligns us. So if you think about a horizontal bar chart and you have all the all the text on left side of the bars, then you can right align. The labels and that creates a nice visual break and you may even get rid of the y axis overall since it declutters a little bit more or at least puts the y axis very much into the background by making it light gray for example.

[00:06:06] Paolo: Yes, you can have multiple layers for the labels. You can easily, for use visions if in the parts, if maybe not all labels are so important, maybe just one bar refer to one category and if it’s important you can have one annotation. And I think it’s important to declutter your figure and.

[00:06:42] Alexander: You can, yeah, that’s a good example. You can even put the labels into the bars. Yeah. And if the bars are long enough, yeah, you can have the labels in the bars process next to the bars. That’s another kind of interesting way to do it, or can put further information within the bars, yeah. Kind of numbers, means, confidence intervals, all kinds of different other things.

Okay. So that’s tip number one. Let’s go to tip number one, and that is about color. Color is probably one of my favorite tools in data visualization because it is so powerful. When I think about color, I always think about this, this red beetle that stands in front of a big mall where you have lots of lots of cars.

Most of them are gray, blue, white, or dark, yeah? And then you have this one red beetle that stands in there. I think it’s pretty easy to find it. Because color is so powerful in drawing your attention.

[00:08:02] Paolo: It’s a way of encoding information, of course, because you can position data into the X, Y space. Then, of course the that’s another option of encoding information, and it’s color. So, of course with colors you can, it’s like different category or individual observations among all the observations you have in your data visualization. And could be also a powerful way of encoding quantitative information for example.

Because maybe you can have a blue color for improving or red colors for worsening, for example. It’s a nice way. Or using color in a visualization for example, highlighting one specific year in a line chart. When you analyze a series of data as compared to all the representations which can be in the back background or colored gray.

[00:09:11] Alexander: Yeah, I think there’s a couple of different things here that you talked about. First, I think it’s important to understand that when we talk about color. There’s actually different dimensions of color. And the first one that actually jumps to our mind is the hue. So whether it’s red, or blue, or green.

But that’s just one dimension of color. The other dimension is the saturation. So how red is it? How green is it? Yeah. And then the third part is, see, is the lightning that you can also put in. And so you can play with all these three dimensions. To come up with different things and you can play with these dimensions then in order to also showcase different variables or different types of variables.

So, for example, if you have a categorical variable that is not ordered in some way. Like, I don’t know, race, for example, or brands or whatsoever, yeah, then you can have, you know, different hues for these like green and blue and gray or whatsoever. If you, however, have a continuous variable, let’s say a variable that goes from 0 to 100, then you can encode it with the same hue, let’s say blue. But just a different situation in it, so that it’s very light for zero and very blue, heavy blue for 100.

[00:11:02] Paolo: Or you can play for example. Speaking of the continuous variable or ordered categorical variable, you can play with both hue and separation, because maybe you can choose two different hues for good and bad, for example, to be again blue for good and red for bad. But then you have different degrees of good and bad, and then you can play with the separation. And then you have maybe the white in the middle.

[00:11:37] Alexander: Diverging color scheme. Yeah. Completely agree. So, basically, if you would think of, let’s say you go from minus 100 to 100, minus 100 would be dark red. Zero would be completely light, and then it goes into a more and more blue, for example. By the way we, we talk about blue and red here, and that probably has a reason isn’t, hasn’t it?

[00:12:04] Paolo: I think that’s a cultural reason and blindness, high color blindness also. But I mean, I was thinking about cultural biases when you do your color palette, for example, because maybe in other cultures, maybe red is not a natural category, maybe it’s more of a blue one, or I don’t know, it depends on the, also, cultural environment. Of course, red. And blue work really, really well in terms of how you visualize these colors independently of your color blindness of the orange and blue for example work pretty well.

[00:13:01] Alexander: So you can basically use that to encode information and make it tested with your audience. Yeah, does this q encoding should to say, does it make sense? Yeah. Sometimes there’s kind of some natural things. Yeah. So for example, if you think about, do you prefer this party or that party?

Yes. And probably you choose the colors of the, of these different parties. Yeah. Or if you look for preference with, with CIS Sports Club versus the CIS sports club. Well, there’s usually some color coding. It naturally comes to mind. Yes. I’m some kind of brand colors. So you can use these.

[00:13:41] Paolo: And I also think that thinking colors up front is very important because sometimes, especially in companies many people use default, but yet maybe call us from the company logo, stuff like that. And the end of this. Limits your possibilities is it’s really important to it’s of colors up front, and I mean, make thoughtful decisions without relying on default options, because maybe you need more personal levels different use, maybe the colors, of your brand doesn’t work very well maybe they have not color or blindness friendly intent of their use.

And there are many tools available online to test these things. And test if your, for example, green text works. In a blue background or not for example, there are many different tools. You can use it for improving the…

[00:14:57] Alexander: I usually refer to the adobe color wheel So if you search for that on google, you can find it easily. That’s a nice tool for you to Not only test color schemes, but also create some. So if you want to create something that is, you know, where color different colors work well together you will see and the Adobe color wheel.

That you can, you know, play with different set up schemes and then, you know, you see this wheel of color and see kind of selected colors within there and you can move them around and see how, you know what, what looks good. And especially for artistically handicapped people, like myself, get really nice suggestions that work very, very nicely.

You can also, of course. Combine these different dimensions. So, imagine you have two to have categorical variables. Like let’s say, Europe, North America, and Asia. Yeah? As three categories. And you choose three different views for that. And now you want to show another variable within that, let’s say proportion of response, yeah, from 0 to 100, then the proportion of response you can encode using, for example the saturation of the different colors, yeah, and have then the different colors for the different geographies.

And that way you can combine two variables into one coding of the, of the colors. Just as it says, and there’s many more of these examples sets that you can do, you know to help understand things, you know okay. So color plays also really good way to focus the attention, but only if you use it sparingly. If you have these kind of rainbow charts where you have. 10 different hues, yeah, green and orange and blue and red and black and brown and whatsoever, yeah, all different colors of the rainbow, then you only see, you know, many, many different colors. Nothing stands out anymore. So using it sparingly is really important. So two, maybe three colors is usually, you know, maximum.

[00:17:47] Paolo: Yeah. And it’s also important to use them consistently across, for example different visualizations because you can see, for example, using red and blue for the two categories. And then in the next plot, you have two completely different colors. And the audience, you know. Surprised them and misguided through the visualizations in general use them consistently is very important. Also using them in the text for example, in the legends, it could be a nice way of I mean, guiding the attention of the reader.

[00:18:42] Alexander: Yep. That is, that is really important, using it consistently but also distinguish. Yeah. So I’ve once seen a graph where there was, so color red was, you know, used in four different parts of the graph. Yeah. So it was actually a graph that consisted of four subgraphs, so to say. But red was, you know, in three of the graphs for one thing, but for something different than the fourth.

Yeah. And that is really confusing to the reader because he thinks, oh, A red is treatment A, A is red is treatment A, A is red is treatment A, and now red is gender stands for females. Yeah. Okay. Yeah, now I’m confused. Yeah. So that, that is really important. And that’s another trick. Yeah.

Using color, you can get rid of legends really, really nicely. So imagine you have a line graph with two different lines. One is for therapy A and one is for therapy B. Then by putting into the title, The names of therapy A and therapy B with the four color the same as the line color. You don’t need then a legend anymore, because then if you read treatment therapy A that’s in red and the line is also in red, then you naturally know that therapy A is the red line. Very, very nice and easy.

[00:20:30] Paolo: It’s a way of, Improving the ink information ratio in your plot. It’s it’s a really nice way of saving space. So you can use the space in a more efficient way. Or copy the same message.

[00:20:47] Alexander: And also think about how you process a figure. You first read the title. Then you look into the figure. And last, you usually would look into the legend. And if you don’t have it encoded correctly, you basically go into the figure, into the title, then the figure, then the legend, then back to the figure. And you can get rid of this loop at the end by color coding correctly.

[00:21:23] Paolo: Yeah, I think that in this way, we use our brain in the most efficient way, and we usually Use visualization and not simplification, for example, because our brain is able to process visually more information, more information at once. So, it’s better to have all the information in one place, so to say, and let the brain process this information instead of creating a flow in which you go from the legend to the plot, to the title and back to the plot because now you, you have the information on the title and you can read the plot in a different way. So everything that goes in this direction of having multiple things at once in the visualization. It’s a good way to move.

[00:22:32] Alexander: Yep. Very good. So, in this episode, we talked quite a lot about coloring. We also talked a little bit about CEC long labels as a kind of starter. Color is

so important. And we’ll put links to the tools that we talked about in a couple of examples into the show notes of this episode. If you love this episode, please tell your friends and colleagues about it and ask them to listen as well. Paolo, any final thoughts on what we discussed today?

[00:23:14] Paolo: Just enjoy the show and have a look at the show notes and looking forward to the next episode in data visualization.