Skip to content

A Picture Says More Than 1000 Tables (Episode 14)


A Picture Says More Than 1000 Tables

Alexander: [00:00:00] This is Alexander Schacht, and today I’m again with my co-host Benjamin. Hi Benjamin.

Benjamin: Hello Alexander.

Alexander: And we have a guest here. Hi Zach. How are you doing?

Zach: Hello. Thank you.

Alexander: Okay. Zach is actually also at Lilly like myself, and he is has a very special role. He is research advisor for visual analytics. Can you explain a little bit what that is and can you explain a little bit? How, you came to this position?

Zach: Sure. I’ll explain what it is first. Visual analytics combines auto analysis techniques with interactive data visualizations for an effective understanding, reasoning and decision making.

It all flows the cognitive memorization burden to the visual cortex to allow the users to focus on the task at hand making decisions. Based on data, a data visualization is viewed by many disciplines as a modern [00:01:00] equivalent of visual communication. It involves encoding of information using shapes like dots, lines, or bars, colors, and movement to visually communicate a quantitative message.

Effective visualization helps users analyze and reasons about data and evidence makes complex data more accessible, understandable, and usable. And there are two general applications of visual analytics. That’s I call exploration, Explanation. Exploration is when you don’t really know the answer, you don’t know what the signal is, and you’re trying to identify it, you’re trying to learn about it.

And so it might involve a lot more interactivity. You might not pay attention to as much pay as much attention to GRAPPL display, for instance, whereas, And the explanation part when you explanation is when you already have the answer, you have a message that you want to communicate. And there you might be a little bit more concerned about providing a pixel [00:02:00] perfect visualization.

You might have less interactivity or more guardrails on the interactivity, but the two areas still apply the same basic principles of data visualization. Now you asked earlier about my interest Alexander?

Alexander: Yeah. And how you, what’s your career up to now that led you to this visual analytics position?

Zach: Sure. And I guess he’d have to trace it back to my undergrad school and at Cornell University where I studied under Professor Velman in statistics. And he studied under Tukey who was, is one of the founders of exploratory data analysis. In fact, that’s the name of one of his well-known books.

And he used in interactive data visualization and his methods. And in fact I did an internship with him where we, where I contributed to a software package called data desks. And now I’m dating myself a bit. This is actually now we’re talking about in the early 1990s. [00:03:00] And so at the time it was cutting edge.

Right now it’s, there’s a lot of software that can do what it does, but it was one of the first softwares that could do brushing, 3D visualizations, went 3D visualizations Vogue at the time, et cetera. And then after I went to grad school and I did some additional work with data desk and I did a dissertation in linkage analysis where we’re studying associations between phenotypes and genotypes, and we’d look at pedigree data and we’d visualize that using trees type of structure and see how it’s associated with different markers for the genome. When I went to when I graduated I’ve got a job at Eli Lilly and I worked in early drug development working in biomarker space, endocrinology.

And we, I’ve used visualization heavily in that work. A lot of times that’s all I use to to present the data. But when I’m presenting the data, I’m taking account statistical concept still. If there’s a confounding factor, you want to condition a confounding factor. Otherwise you might [00:04:00] obfuscate the message.

It might be it might be misleading. When I after a few years I’ve worked in late phase drug development and I leveraged data visualization heavily when I was in charge of the efficacy submission and it was quite effective. If the drug really works, you should be able to show it easily. And that segued my career into leading the visual and efforts at Eli Lilly.

Alexander: So basically in all your different career steps, you have been relying heavily on visual analytics from undergraduate through the different phases of development up to now where you basically apply it across the complete company.

Zach: It affects all different areas of building and understanding your data from prescriptive to predictive. We, I use data visualization throughout all the different phases of understanding the data.

Alexander: Let’s dig a little [00:05:00] bit back into what you explained about visual analytics. You talked about two different concepts on the one hand, exploring and learning and on the other hand, messaging or communicating something that you already know. Can you explain, maybe you can give some examples for these and what are the key differences between these different phases?

Zach: Sure. Certainly. An, an example of exploratory is when you’re maybe first you’re studying the, a drug that’s first introduced into human beings. And you might have some ideas about how that drug’s gonna affect humans based on your preclinical data, but these are just hypotheses and you can’t assume that’s gonna translate perfectly and it never translates perfectly or hardly ever, I should say.

And so you have to be wary and be ready to explore all different Possible associations of the drug with either unto side effects or even positive [00:06:00] benefits. There might be benefits that you didn’t anticipate. And often when we’re trying to understand these, the performance of a drug in human beings we we’ll try to identify patterns, see if there’s any even outliers could be a pattern.

Okay. And try to identify, what’s unique about those patients and when we try to out, what’s unique about those patients that might make them different than the other patients and how they reacted to the drug. We invariably will want to see different domains of interest.

For a group of patients, and depending on the indication you might want to look at certain lab values. Maybe demographics are of interest. Maybe you’re interested in their age, maybe that could affect your interpretation of the results and to assess all these different possible explanations.

Interactive data visualization is a great, is a very effective tool to do that. With presentation you already have you’ve identified some issues and you want to have. You want to communicate that [00:07:00] clearly? And if you could communicate that with whatever the appropriate vehicle is for your data, depending on whether you’re dealing with continuous data, discrete data, categor data, et cetera.

And there you’re focused on the message that you’re trying to communicate and make that clear. You’re not trying to deceive anybody. You’re trying to communicate the message in an unbiased fashion.

Alexander: So in terms of, the first thing is you have a really big data set and you are looking to gather insights from this dataset. You can condition on different variables. You’re looking to different associations over time. Whatsoever. So all these different dimensions in the data, you explore them and in the messaging you have just one specific finding that you wanna communicate and communicate effectively. Yes? Is that something?

Zach: That’s a fair way to characterize it.

Alexander: So how is it different from how you do this is [00:08:00] different kind of workload involved. Is it different tools that you are in involving there?

Zach: Okay. And so when you say different, you mean different from versus just producing tables, figures, and listings? Is that what you’re comparing to?

Alexander: No different between the exploring, learning.

Zach: Oh!

Alexander: The messaging part.

Benjamin: Okay. And and so the difference between exploring learning versus the messaging part in terms of the tools that you use for exploring learning, you might use an interactive software like Spotfire.

That’s really useful if you want to compare data from different domains for the same patients, and you can do, it has really good drill down capabilities. We might use jump. Jump, jump, jump. Just regular jump, pro, jump clinical. They all are great software programs that will that helps with exploring and trying to find, prototyping, prototyping your analysis and the other tools that aren’t.

That don’t have inherent statistical capabilities, but you can include [00:09:00] them because they segue with R are tools like Tableau and even Power BI has a great interactivity with R so you can include in inference, in your exploratory analysis now. And of course you can use r and r Shiny, but then you have to build the interactivity from scratch, and that building takes a lot of work so you can achieve a lot of the interactivity you can achieve in Spotfire or Jump or Tableau or Power Bi with software like r and r shiny and r shiny in particular.

But then you have to build it all and to build it all. That’s a big project. It’s a software project basically. And you might as well just start your own company. Whereas once you have the message that you wanted to convey, if you could convey it in a static form, then you might use r to produce the data visualization.

And if you wanna have some interactivity, but you wanna put guardrails on it and you want it to be limited, you want to control, the in interactivity then you might have been making R shiny app out of it that you can share with people.

Alexander: Yeah, in terms of [00:10:00] so these are all tools that are very much, give you lots of flexibility to look into the data, but they don’t produce very nice targeted figures for the individual case, which I guess is more needed for the messaging where you would.

Put much more detail into kind of what is the exact kind of color for these different things. What are exactly the different visual cues that you are using? What are the The phone size of the titles and the legends and all these kind of things. So that would be more with the messaging, isn’t it?

Zach: Correct. Yes. And then you could achieve that with software like r and if you want to have pixel perfect position and you wanna have high level interactivity and you can’t get everything you need out of r shiny, then you might break into JavaScript. And there’s a great d three library full of examples of Data visualizations that are interactive and can be pixel perfect at the same time.

Benjamin: Yeah, I was [00:11:00] just going like trying to get a step back again, maybe to, we talked, a lot of, the opportunities and that you are able to support and to show to visualize and stuff like this. But obviously there are also some limitations where we can maybe, talk a little bit about, I just a simple example.

For example, using visual analytics in a podcast might be questioning. So what is the area or where do you see limitations, problems or challenges? Where you basically, you find your the natural end of the involvement of visual analytics. So what is the challenges that you daily face?

Zach: Oh, I don’t know about the challenge you daily face. I was thinking you’re gonna ask me when you don’t want to use data visualization, but let’s see. So a challenge I mean there’s the biggest challenge with visual analytics is really the same challenge I think we have in statistics in general.

And that’s just with data wrangling, data munging, getting the data in the right format so you can consume it in a way that meets your visualization needs. That’s not used, that’s not unique to visual analytics though, that, but that [00:12:00] is, I’d say 80% of the work is often just getting your munging, your data, getting the right format, obtaining your data de-identifying if you have to, things like that.

That’s the biggest challenge of. If you work in a company with, access to playing software then that frees up a lot of the challenges. But for some people, they might not have that convenience. And so they might be limited in their software, and that, of course would be a challenge because without the interactive visualization software, you just can’t do it. But that’s probably not your question, Benjamin. I’m sorry.

Benjamin: No it’s part of basically what is the, it sounds all very, you know what, whatever you said before it’s really, it sounds very convenient and convincing and terms of how to use it and as well what are the benefits of it. But that’s why my question is also where we see the limitations and obviously data is a limitation.

Alexander: I think more the limitation is in setting up. Yes. The data so that you can visualize it easily. And,

Zach: but it’s true.

Alexander: My perception is that I, you can probably [00:13:00] also for certain things, you can standardize it.

Zach: Yeah.

Alexander: So let’s say you have a trial level safety review. You’ll look at similar data for many studies over and over again. And that is, of course something that is very nice to explore in a visual way because you’re looking for outliers, you’re looking for, trends, you’re looking for these kind of, patterns in the data.

And very often you wanna see especially for studies with small sample size, you wanna see individual patient level data. So how is the AE happening with respect to maybe co medications that is starting or with respect to when the doses are given? Things like that. So these kind of data you can very easily probably visualize and standardize.

Zach: Absolutely.

Alexander: And that, I think takes off lots of the workload, but I think you need to have, [00:14:00] you need to find these situations where you need to look into the data again and again, it’s similar also, I guess with these kind of dashboards that is used in more the business analytics case where, you know, you need to set up it once and then you can look at your business data on a weekly basis or something that’s, Something.

Benjamin: Yeah. And it’s similar to the usual stats programming and statistics task that we see. If we have, repeated business or repeated outputs and questions coming, then it’s easier to standardize it.

Zach: And I guess. The one thing we’re not saying is standardization can often be a big challenge in a company, an organization. It that’s, that can be hard to achieve, but is the key to making it, making visualization routine. If you had standardized data input, that would make it a much easier.

Benjamin: Another question, Zach, regarding also be regarding your day-to-day work. Is it, so what exactly, how can I imagine you working on a day-to-day basis and are you involved in, on a study level or is it that, that you [00:15:00] are more involved in providing and creating, designing the software around it to visualize the data or how can I. For me just to understand what’s your input into the visual analytics at Lilly?

Zach: Yep. Yeah. And so the key for me to be effective and this is true in general for anyone who’s trying to create, in trying to create innovation in the company, is that I need to have hands-on examples of real problems that people are trying to solve.

Real workflows are people going through in their daily or weekly activity. And I work with these. People or teams and I work on projects with them, along with the team of people. And we’ll do, won’t do two development right away. We’ll help, we’ll do things. We’ll write scripts from from scratch, and we’ll do program from scratch just trying to meet their needs. And if we’ve realized that this is a common need that is, they’re gonna need to over and over again and other teams they’re studying similar phenomenon, we’re gonna need over and over again.

Then [00:16:00] we’ve segue into tool development and develop a tool to automate a lot of the things that we might have done on the project when we’re just learning. So I’d divide my work life up into about 50% supporting projects where we’re actually. Embedded in teams and actually working on individual deliverables.

And the rest of the time is external focus and tool development and to, and this podcast even that’s, consider that would be part of my job.

Alexander: So basically 50% you’re working, so to say, in the company on the projects and 50% you’re working on the company to overall improve the company. Yeah.

Zach: Correct? Yes.

Alexander: So in terms of speaking about clinical studies in a setting where you have randomized control, nice, clean data, where do you see the benefits of using visual and analytics There?

Zach: The beautiful thing about clinical data is by design, we can infer causality. It’s pristine data. And so it’s a really perfect environment [00:17:00] for using visual analytics. And we always have to worry about strong controller type one error. But that’s usually included in a formal testing strategy for the primary and secondary analysis. But invariably, The other thing that state teams and need to understand is, The relationships they need to contextualize their data.

And so when we have a clinical database lock, we have data from various domains. We have domains for labs, we have a domain for adverse events, a domain for disposition et cetera, and invariably, When you’re trying to interpret a certain result you want to contextualize that result. What I mean is, for instance, if you see someone with who’s above two times upper limb, normal for aminotransferase and above, three times upper limb, uhin, normal for bilirubin, then that might trigger a flag for being concerned about.

Liver tox toxicity. Okay. And so if you have an in interactive [00:18:00] ization, you could imagine a scatter plot where you got reference lines for two times upper limb normal for the one lab and three times upper normal for the other lab. And that, those are the criteria for highs law for liver toxicity.

Yes. And then you can identify the patients in those, in that quadrant. And then if it’s interactive data visualization, then you can select those patients and then automatically. Seamlessly within seconds, even less than that, see a visualization of other relevant data, for instance, adverse events maybe that matters for those particular patients and time of onset and things like that.

Alexander: So going back into that kind of setting so imagine we have this scatter plot where we have these two lab tests on the vertical and the horizontal access. And we have now we are getting into the challenge of having a visual analytics discussion in the podcast. But so imagine you have this this scatter plot there.

What I think is also really nice is you could have this, get [00:19:00] plot animated over time. Cause see that the lab use go up and down and see how cloud of patients with their lab data actually behave over time. And you can see whether there’s an overall trend over time that moves into this corner that is dangerous and then you can also use techniques to hover over these patients to see what are these patients, what’s their, co medication, whether they’re pretreated, whether they get certain concomitant medications, whether they have AEs or kind of other things. You can find out by just hovering over the dots in the s scatter plot.

So I think. This is, for example, a very nice thing that you could potentially use across multiple studies because for certain indications you’ll need to check for these kind of things over and over again.

Zach: Absolutely. Absolutely. Yes.

Benjamin: [00:20:00] Nicely described.

Zach: Yes, I can visualize that.

Alexander: Yeah. We are trying to create pictures in your head at moment. Besides lab tests. Can you come up with other examples, for example, in the efficacy area where you have used visual analytics quite successfully?

Zach: Sure. It visual, I mean for efficacy. If you have your once, you have your database lock and you have your primary key secondary analysis evaluated. Okay. Invariably when you, when it’s time to show your results to to decision makers. At least in my experience, every time that when I was on a state team, we had to show, results for database lock decision makers. They wanted data visualization. And maybe that was just my world, but I think it’s, I don’t think it was that unique.

And that was the way to convince them, with the little asterisk for a significant P value. But they wanted to see the launch channel date [00:21:00] over time. The data visualization is an effective means of communication and with efficacy that it really makes sense if, if your drug is efficacious, to, demonstrate that visually.

One area of data visualization that I’ve been making a lot of progress in. With is animating clinical data and animating clinical data really gives a nice concise story of the data. And when I talk about animating clinical data I’m thinking of continuous data.

And you’re animated over time, so you have fixed visits at fixed points in time, and you basically do tween, which is the IT term, or you do interpolating, which is the stats term, but they’re mathematically equivalent to interpolate the data over time and the advantage of that is that first of all you show you’re showing the raw data and people to see the raw data. [00:22:00] When we’ve presented to advisors and thought leaders, they’re really impressed by being, showing the raw data, not just a bar plot. Okay. And by seeing the raw data, you can see relationships, like for instance, you could plot post-baseline versus baseline. And that’s important because baseline often will affect how e how much efficacy you observe in the trial.

And when you have these higher baselines you can often see a greater change from baseline. And one of your concerns is, does it get to. Do you improve as much? Do you get to the same level of improvement? Say there’s a threshold of improvement, like with a, with diabetes the primary surrogate endpoint is a mean change from baseline, a1c, and often a1c cutoff of 7% is desirable.

And so you wanna make sure that everyone got below 7%, even if they had a very high baseline, if they had a baseline of 7.1%, you’re not that impressed. But they had a baseline of 9%, you might be impressed that they got below seven. So anyways, by visualizing over time and visualizing using baseline, you can [00:23:00] provide a very rich.

Basically explanation of what happened in the data. And by vigilant time, you can also evaluate how early was the efficacy, how early did you get below that 7% threshold in H B one C for diabetes, for instance. And you could have a panel for different treatments. And you could see, you could hopefully show that, your experimental drug has earlier onset of efficacy than the control drug. If that’s the truth. If that’s the case,

Alexander: Yeah, I really like that visualization. So I’m coming more from the neuroscience areas and there’s lots of endpoints where you have the total score and you wanna see 30, 50 whatsoever percent improvement from this total score from baseline. But of course, you also wanna see where patients end up.

So do they meet remission? Do they come below a certain threshold that is perceived as clinic me meaningful? Or where you can’t differentiate anymore the symptoms from the normal population? And so both the [00:24:00] percent change as well as the absolute value of very important over time. And then I think.

If you have a s scatter plot like you described, where the horizontal axis is a baseline value and the vertical axi is a post-baseline value. And so patients develop over time and then they move up and down over time. Each patient in the scatter plot over time, you see how this cloud actually develops over time.

Whether that it gets, goes down, whether it goes up, whether it goes only down for certain parts of the baseline variable and where’s the differentiation between different groups is happening. And also, like you said, if you have it. Animated, you can see how fast the drop over time happens. Is it happening directly at the beginning or [00:25:00] is it happening slowly over time, or is it pretty stable for quite some time and then it drops?

Maybe just before the study end or something like this, or does, is there some kind of what I’ve also seen in some neuroscience studies that says some kind of end of study effect that kind of just before the study closes, says a placebo effect. Dropping in something like this, or you can also look then into subgroups of the patients.

That is something that I found also very helpful for especially for negative studies. So where you couldn’t separate between the two treatment arms. Then very often the question comes, why is this happening? Is there certain subgroups whereas there is actually differentiation. And then of course, if you have an interactive visualization, you can very easily go into that and see what’s happening [00:26:00] and not producing hundreds of tables to look into all different kind of combinations of the data.

Benjamin: But usually, you already see that there’s a huge advantage in terms of having figures rather than tables. Not normal figures as so basically I fully agree that this is producing hundreds of figures is quite a yeah, waste of resources if you could see this animated over time and get the the answers to your questions right away. So I that’s already, you already see with visualization in a sense of having designed figures, the big advantage of visualization and now having this animated over time, even with interactions interactive access to the individual patient data that’s an enormous advantage.

But now comparing it to, now again comparing it to just normal figures or even tables, how much, where do you see the future of visual analytics in the pharma [00:27:00] industry? You can probably. Talk about Lilly a little bit more. So do you see there’s an increasing need? Is this kind of, do you foresee the end of normal figures and tables, or is the, so where do you see, obviously there’s a future otherwise you wouldn’t work there. But what is your personal opinion about the the future of visual analytics?

Zach: I’d love to e expand on that. Let’s look back in time a little bit in in drug development in the last century of at, when we did submissions. And I’m gonna focus on submissions FDA. But I’m sure the same thing happened with E M E A and P M D A also. But I know personally I know for a fact that when we did submissions to the FDA, we would provide them piles of paper. And piles of paper would be so high and so massive. That we actually delivered it in a semi-truck, an 18 wheeler, Lori, I think they say in the UK.

This is a very large truck. In fact I moved my entire household a few years [00:28:00] ago. I moved from one state to another United States and this moving company they put four households in this one truck. And I was just one fourth of that whole household, so these are my beds.

 I just find it phenomenal to think a whole truck from floor to ceiling was filled with paper. From floor to ceiling. And I’ve heard in some cases they’ve done more than one truck. Now this is really a disservice to regulatory agencies disservice to us, it just, which is piles of paper.

And where we’re at now is we basically do what’s called electronics emissions. Both electronics emissions. Were giving the regulators, the equivalent of a truckload of electronic paper, and the biggest advancement that we have is we have hyperlinks. So we can click on a hyperlink table contents, we go directly to a certain page of interest.

And that is useful. That is useful, but it’s not, I don’t think where we’re going to be in the future. I can’t imagine that we’re [00:29:00] going to there. There’s no reason why do we even have a FAFSA meal. Of a piece of paper that we work with. And what I mean is I’m referring to word documents, I think, which is a common, it’s proprietary, but I think that’s a common form of, but general, whatever word processing software you’re using, you’re all submissions from my knowledge, are given in terms of some sort of word processing document and maybe convert to PDF or whatever.

But the point is that all these documents you could print out. And in an eight and a half, 11 by sheet of paper and you would not lose any integrity except for the hyperlinks. Okay. That’s what I mean. Now that I don’t think is the future. I just can’t imagine. And now I might be retired by the time we, we go beyond, word documents or word processing documents.

It might not be in my future. I hope I do get to see it but eventually I can’t imagine that we’re gonna be basically, working with this spasm of a paper in electronic form forever, even in the century, I can’t imagine [00:30:00] even making to the end of the century and doing this because when most people work and it’s changing over time, I still know physicians and people who they tend to be older than me who will print out their word documents, they’ll print it out, cause they just can’t work.

They can’t mentally handle working with the computer. But I think that generation is going away and I think the new generation’s coming up, and even millennials, but even the generations, coming after them they’re very comfortable doing work. On the computer. What I mean is they don’t need to print out a document and have a piece of paper to work with.

And why that’s important is now let’s look at the work environment. Or they’re working with these monitors that are landscape oriented, right? They’re wider than they are tall. I think most, yes, most follow this by nine format. That’s typical. We should fill that space up. We should fill it up.

And if you’re using a fast mill of eight and a half by 11 a portrait fast mill, even if you rotate it, you’re not filling up that whole space. [00:31:00] And that space is actually, that’s a very precious space cuz that space is our canvas that we can paint our picture on of our data. Okay.

And so I foresee an interactive data visualization format for providing information to regulatory agencies. And I don’t even see why we have to transfer anything. Why can’t we just host this interactive data visualization on top of our data base in a third party? Maybe a third party. It doesn’t, but that’s the route that seems to be most plausible.

Maybe a third party server. Some sort of quote unquote, cloud, just a bunch of CPUs attached together, right? And then the company can work on it. They can develop their messaging. They and I’m not saying that you’re not gonna have numbers and analyses. You can have that but you could be part of this interface.

It doesn’t have to be in the shape of a page. And when the company’s ready for submission what do they do? They could just give the password. To the FDA or the password to eea, and they could just access the exact same, interactive ization software with [00:32:00] the primary key secondary analysis, pre-calculated already done.

And that can be frozen, you can’t manipulate that, obviously. There’s guards on it. So when I say interactive ization, only where it makes sense. Only where it makes sense. So I’m not talking about, p hacking and facilitating that type of unguided type of analysis.

But I’m just talking about facilitating what we do all the time when we pro, when the sponsor provides submissions, and also what the f FDA and the P M D A and M E A and other regulatory agencies have to do when they analyze your submissions. And we could facilitate that with interactive data visualization software.

Alexander: I was just thinking about this. In terms of these benefits, risk analysis, so you wanna look whether all your efficacy endpoints, all your safety endpoints, whether they are consistent across certain subgroups of clinical interests. Where, these have been shown to be subjects of interest in the past maybe for other drugs that have a similar method of [00:33:00] action.

They play a role. So you wanna explore whether your benefit risk profile is the same across these different subgroups and of. And currently, very often you would have these data spread over lots of different tables. Yeah. Yeah. And. Maybe in even different modules of Yeah. Of your submission.

Yeah. And so to gather these data, you need to spend an enormous amount of time to dig into the data and then to manually carry all the data together. And if you are interested, if you have, let’s say 20 different endpoints Yeah. Across safety and efficacy and quality of life and whatsoever, and you are looking into, don’t know. 20 different subgroups, you end up with 400 searching for 40 different people. Yeah. 400 different people. And then, you want to have not only the P values, but also [00:34:00] the treatment effects, confidence and whatever.

Zach: Yep, that’s right.

Alexander: You search forever. Yeah. Just to understand the data. Where I think really important comes is to get a sense of these summary statistics across all the different submissions and have them be visualized very well. I think we had, that seems this topic already at the PSI conference last year where we talked about this so that, this results data set topic.

If you can visualize that, you can better understand your data, probably even better, communicate your data very easily using visualizations.

Zach: Yep. I will point out too that you’ve coin you refer to the results data, and again, the key is getting the data, the analysis results, data sets, right?

You need access to that data. You need it in a consumable format. That’s [00:35:00] often the biggest challenge, is the data itself. The visualization is, Fairly trivial with the software we have these days, to actually do a good visualization, you can do it pretty easily once you get the data in the right format and picking out the right visualization is part of the skill also. But there’s all sorts of references and forums to help you do that.

Alexander: Yeah. By, as I’m talking now about data, I I think I need to clarify is that I’m now talking about summary statistics as data. So for example, means of percentages within different treatments, over different time points and then of course you need for these type of summary statistics, you also need the relevant metadata.

Zach: Yep, that’s right.

Alexander: So what study you’re looking into, what time points you’re looking into, what is the statistics you’re looking into? Yep. What is the sample size? All these kind of other things. Whether you’re looking into the mean or the percentage, whether you’re looking into the lower upper confidence [00:36:00] interval, the P-value whatsoever. And if you have that, then of course you can very easily visualize, for example, forest plots thinking about the treatment of a act across different subgroups or stuff like this, or within the subgroup across different endpoints and that.

I think that is one thing that will come much more in the future, that we have the summary statistics together with metadata and can look into them interactively rather than, yeah, just having them as files.

Zach: Absolutely. Yeah.

Alexander: Now we talked a lot about kind of the future and the theory. Let’s go a little bit back into kind of the tools that you can use. We talked a lot about different tools, but I think the problem with the tool is very often simple is nice. It’s [00:37:00] not as flexible, but it’s probably good for a starter. Complex is very flexible, but probably not so nice for a starter.

Coming from this kind of hierarchy of tools, what would be tools that you would recommend for, let’s say starters, intermediate, advanced people?

Zach: Okay. So starters, I’d recommend using Jump. Jump is a good tool for statisticians. Yeah, there’s the other starter type of software is, I’d say Tableau is a start type of software and, but it’s pro it can be cost prohibitive. Power BI is actually much, much more affordable. That’s also starter software, but both Power BI and Tableau, they’re really catered towards marketing business people. And it could, it’s really reflected in how you do data visualization. So it took me a little bit of time as a statistician. To understand the mindset behind it.

Whereas with JUMP, I found very intuitive and I just felt like I was speaking to the statistician. But again, we’re, I think I mentioned this earlier, [00:38:00] if you want to connect multiple domains and have really great drill down capabilities I would recommend Spot Fire for that case. Then as you advance r offers some great data visualization capabilities and r combined with shiny, or even just, or just plotly, offers great data visualization capabilities, but it’s going to you’re gonna have to build them from the ground up, and you’re gonna to specify every single interactivity that you want.

And so it takes a little bit more scripting, a little bit more work, whereas to do the same thing. In a software like Jump or Spotfire, it’s just all gooey driven. Where to do r shiny, you have to write the script to, to be able to create a gooey. To create the gooey, you have to write a script. So I’d say r shiny is a little, is more advanced, and then a lot of the r shiny apps are really built on top of JavaScript.

And often from the D three library. And so all the San plots in are shiny. That’s all from JavaScript. The four STR plot in shiny that’s all from JavaScript. And you were talking earlier about the tradeoffs between complexity and [00:39:00] power, and the trade-offs apply here also, when you are using are shiny.

It’s actually it’s less complex. Then programming JavaScript from scratch. And so because of, it’s, because it makes it, the trade off though, is you don’t have not, you don’t have quite as much flexibility and you can do a lot more fine tuning if you get right to JavaScript. Now, of course, you could even host JavaScript right from our shiny.

Which is a whole different, yeah, a whole different twist. But that, that, the point is though I’m talking about how you actually program the visualization. Whether you’re using to host it or you have it hosted on your own webpage. It just has JavaScript embedded in webpage. The point is you can get to the most granular detail by programming in script And Python too can, is quite powerful too, actually.

And I’d say Python, our JavaScript are more advanced. And then do softwares like jump spot fire you can jump right in. Literally, that’s, I think that’s the reason why they got its name probably is to imply that I’ve always guessed that. I’m not sure, actually. [00:40:00] I’m not sure.

Benjamin: will put some names to the outline at the, at our website and just Cuz I think nobody will be able, like if a starter in this area, then will be able to remember all the names that you just mentioned. But anyway what is actually your advice if there are people out there who are interested in the visual analytics. So how where are places to learn about it? And is there any recommendation? Any, anything you can you can explain.

Zach: I would recommend flowing data. Okay. Flowing data is a website and it provides examples of great data visualizations. And it provides tutorials and courses and has discussions and of, all about data visualization. So flowing data that I recommend flowing data. I’d also recommend perceptual Edge, but that’s more business analytics focused. But it still is quite valuable perception. Perceptual edge also I’d say has some really good information. And they talk about color schemes and things like [00:41:00] that and get into the practical things as well as theoretical. So those are two areas. That’s correct. That’s correct.

Alexander: It is, yeah. And that’s also a homepage. Yeah. That’s also a webpage.

Benjamin: And is there, do you see that there’s an increasing community of people working in the area of visual analytics? Any congresses or any conferences, that are happening?

Zach: And sure. The I triple E visualization conference consists of three basic groups or organizations all dedicated to data visualization or visual analytics, I should say. There’s info vz and info vz is like infographics. Okay. And so you might see I know we have a fraud audience, so this might not be fair, but New York Times often has really good info on their websites. And they have interactive data visualization to try to, explore data, understand data of different topics.

And then there’s SI v, which is also part of i e Evis and they would probably be less interest to statisticians where they might use data visualization to apply to science. [00:42:00] Like you might have a very pixel perfect data physician of a mechanism of action, of a drug.

You might show the different organs and show how one receptor when a drug binds a receptor ha has a cascade of effects, and you visualize that in a 3D image that’s often used. To explain. But it’s to explain science. But it’s also used actually in scientific endeavors.

Actually. Scientists have used this type of imaging to diagnose patients for instance, even MRI scans are example of SI vis. And then there’s FAST V A S T and that’s where that’s more in lines with what I was calling visual analytics. Where they, there’s a problem workflow, you had to apply and use visual analytics to solve your answer. And so those are the, I’d say the cutting edge in visual analytics. And frankly the, we know in the past there was two key who’ve carved out a spot and exploratory vi visual analytics, exploratory data analysis.

And then there’s Tufty from Yale who also made his name in visual analytics. And that was [00:43:00] all in the past century. And a lot of their methods were applied to static static visualizations. Not all but a lot. And since the 1990s a lot of the cutting edge state of visualization has been coming from, I’d say computer science.

And it’s often a collaboration with neuroscience and PS psychology. That people like Mike Bostock he developed the D three library for JavaScript. And he was computer science background also. But that’s where I would go for, the, the cutting edge visual analytics.

Alexander: Speaking about cutting edge visual analytics and conferences. You gave great presentation about this topic at the last year PSI conference. So in 2017 in London. Now you are also coming to this year’s PSI conference in amsterdam. And we have a session there. Just by chance, sorry. And it has a really nice title. Of course, it has a nice title. A picture says More Than 1000 Tables Interactive Data Review using Visual A [00:44:00] Analytics. And Zach, you are actually giving the first presentation in this session. What can we expect from your presentation?

Zach: My presentation, I’m gonna be laying out the where I see. Visual links applying in pharmaceutical drug development and where it’s going to lead us. And then we have a group of presenters that are gonna delve into specific applications and to just, to really, they’re really proof of concepts, where they’re actually applying visual link already, to work going on today. But the idea is it also should inspire people to see, the potential of where this can go.

Alexander: Yeah, and then of course we don’t have the limitations of a podcast. Indeed. So we can actually see, yes. Talking about this.

Zach: Sounds good.

Alexander: See you all in and hopefully.

Zach: Thank you. Bye now.