Communicating data is so important! Quarto is a fantastic tool for writing reproducible reports
using literate programming. Literate programming allows us to incorporate documentation and
code in the same program. The data science community has embraced this idea by adopting
Rmarkdown and Jupyter Notebooks. Using Quarto efficiently, you can create parametrized
reports, write scientific publications, and build data-driven slides.
Enjoy this super-interesting conversation with Thomas Neitmann, and be an effective data
scientist!
Transcript
Writing Reproducible Reports using Quarto
[00:00:00] Paolo: Thomas, how are you doing today?
[00:00:02] Thomas: I’m doing very well. How are you Paolo?
[00:00:05] Paolo: Good, good. So we are here today for discussing a really interesting topic. Writing reproducible reports using Quarto, and first of all, reading about Quarto and its predecessors. I stumbled upon the concept of literate programming. What is it?
[00:00:27] Thomas: Yeah, so this is a concept which was introduced by a computer scientist, a very famous one called Donald Knuth and the book he wrote is literally called Literate Programming. And the idea behind that. And I found a nice quote from a Wikipedia article, which I’ll just quote out loud here, is to write a computer program as an explanation of how it works in a natural language such as English, and then enter first, or embedded with that is snippets of traditional source coach from which you then can compile the ultimate source code.
So what that at the end of the day means is that you have one document where you have a chunk of natural language text, so it just could be English, German or whatever. And then you have some code and then again some natural language and then again some source code. So in that way, literate programming What is different with that from the traditional approach is that both in a way, the documentation and the code itself live in the same document, whereas traditionally you would have your source files and then you have, would have your accompanying documentation somewhere else.
And to be honest, I’m not sure how much this concept has been taken up in kind of the software engineering. Part of the world, but I think really in data science, this has been really embraced this concept. And there are tools out there such as R markdown in R or Jupiter notebooks for Python or Quarto, which we’ll talk about here today, which really makes it super easy to do this literate programming practice. And you can write quite rich documents, which are an interspersion of text, which explains something and then code, which actually implements a certain logic.
[00:02:06] Paolo: Which makes me think that although we are passionate about statistics, mathematics, programming codes we’re human beings. And we are always interested in stories, which is at the end of day what is this Programming about?
[00:02:22] Thomas: Yeah, I think especially, if you’re doing data science at the end of the day, of course you write a lot of program and you maybe do some very fancy math behind the scenes, but you need to distill that into something that you can tell someone at the end of day to make a story out of it and hopefully get then the desired action out of it that you want.
It’s. I like this quote where you, where people say if you write code, you don’t actually write it for the computer, you also write it for other human beings. And that perspective is generally on writing code that other people can understand. But you could also view that from the point of view that at the end of the day really the thing that matters is the output that then gets displayed to your fellow humans and for them to make a sensible decision out of your data.
[00:03:02] Paolo: Okay. Speaking of Quarto? What is Quarto? It’s a system language software?
[00:03:09] Thomas: Yeah, so maybe it’s good to start off in the, R space where this thing called R markdown has been around for a lot of years. And if you are someone who used R markdown, if you then would look at a Quarto document, you would probably feel that yeah, to 90, 95% looks more or less the same.
So the concept again, is the same. It’s this concept of literate programming where you have a mix of natural language, text and source codes. But what is different from Quarto to R markdown is that R markdown, as the name suggests, is specifically implemented within R And even though you could then get, for example, Python code inside your R Markdown document, you still needed a running r installation to make that work.
Which for people who do all their workflows in Python is maybe something they don’t actually want to do. The difference with Quarto is the Quarto is basically a standalone piece of software. And I actually looked at the GitHub repository and I was surprised to see that it’s written in something called TypeScript, which is a variant of JavaScript, but I think that really doesn’t matter at the end of the day.
The important thing is you can install that as a standalone command line tool, and then if you are working in Python, You could use it without an r installation because it has multiple, what is called engines. So one engine is still basically knitter, which powered R markdown to execute R code.
But then another engine is Jupyter and a lot of people in the Python space will know Jupyter Notebooks. So basically you can use that sort of within Quarto, but then you can also use Quarto with a Julia and also something called Observable which comes out of the JavaScript space. And so in that sense, it’s really a language agnostic tool, which regardless of which data science language you use, at the end of the day, you can produce really rich reports with that.
[00:04:57] Paolo: And what are the capabilities of Quarto? Why should we use it?
[00:05:03] Thomas: Yeah, so in general it comes back again to this concept of literate programming that, you put in a bit of text and then you have what is called a chunk of source code. And then inside that chunk you maybe read in some data or you generate some plots and fit a model or whatever.
And at the end of the day, then you compile that document, which in case of Quarto, it’s basically a marked on document. So it’s a completely text-based thing where, for example a heading is with one hashtag symbol, and if you wanna bold something, you put it into asterisks, and then that gets compiled into a certain output format.
And that is actually, there’s quite a rich set of output formats. So you can. Compile it to a Word document, to an HTML document to an ePEP, even to presentations and whatnot. And in the process this markdown gets translated into whatever is the output format that you want. And the source code gets actually executed, and then if it produces certain output, such as a table or a plot, for example, that will then get rendered into the final report.
Depending on how you choose to set up your document, you could have something at the end of the day which only has usual texts or not source code, and then a bunch of tables and plots that were generated inside the document, but you don’t actually display the code. And oftentimes that’s what people do because.
These reports that are generated go to stakeholders which are not necessarily interested to see the code behind your analysis, but they just wanna produce the results. But in other cases, you do actually display all the source code and within the final document as well, especially if you have a more technical audience.
And in that case, people actually can. Read all the conclusions that you probably, for example, draw from a certain analysis, but they can also see the actual code that has been performed to produce that particular result.
[00:06:53] Paolo: And that’s really interesting. I have a maybe a knife question because I use the like R markdown, for example, this a paper published, I wrote the appendix with R markdown. So I was able to integrate a bit of the methods including formulas. They could, so readers can figure out how to program themselves and also, Of course the final results, and of course it’s nice to, work them. I mean everything was integrated and nice to read them also with the references at the end of the document.
So honestly, my first thought when I read about Quarto was why should I try out this new system instead of stick to R markdown.
[00:07:52] Thomas: Yeah, I think I had a similar reaction when I first learned about it. So Qro hasn’t been around that long yet. I think it’s only been end of last year when it was first released. And as I already alluded to before, if you actually look at portal, it looks for the most part, the same as R markdown. And so I did a little bit of digging. What is the advantage of doing the switch? And then actually if you go to the website of Quarto there’s a quote that says, if you like using R markdown, there’s no need to switch.
Because R markdown will continue to be supported. And in that way, if you are familiar with that, you can just continue using that. I think, Quarto is especially interesting to people who come from other languages because they probably didn’t have exposure to arm markdown. So if you work in Python or Julia or if you’re someone who only gets to learn r right now, I would probably say give Quarto a go and don’t learn arm markdown first.
And I think over time my hinge is that there will be New functionalities added to Quarto, which in the past have not been part of R markdown and probably they won’t be in the future as well. So if you in that sense wanna stay at the cutting edge of things and always have the access to the latest capabilities, probably switching over to R markdown at some point Makes sense.
But if you have something up and running, which uses r markdown now, which generates something like weekly reports for your stakeholders. There is really no immediate need at all to make the switch.
[00:09:17] Paolo: So one thing I noticed is that if you work at least in the, R studio ID for Quarto you have also what is what you get system, what you write the text part of the document. Of course, we’re not saying this for codes. Cause of course there’s no way. But for the textual part of the document, it’s quite convenient functionality. Because you can write includes easily tables, for example which is not so easy in R markdown especially if you have a lot of columns or rows, for example, if your table is big, then it’s difficult to code it in markdown. So this is quite convenient functionality and capability within quarto.
[00:10:17] Thomas: Yeah, absolutely. So that is called visual editor for Quarto in the newest releases of our studio. And it’s, as you said, it’s really convenient because unlike the traditional markdown where you basically have plain text, you need to then render it into the output format.
And only then you see how it will look like. It’s actually more of this word type feeling where you have. What is what you get, and you can actually toggle and switch between the two modes. But for example, you mentioned creating tables. I always very much struggled with creating markdown tables because I think the syntax is not very intuitive.
An individual editor, you just click on. Just like you do in Word, you say table and then you give it, okay, maybe I want a three by six table. And then you have it already and you see it and you just put stuff in the cells. And then if you actually make the switch to the text editor mode, you see the underlying markdown.
But certainly that’s not something ideally that you wanna write by hand. So I think that’s indeed a huge argument for Quarto that this visual editor is now part of it. And I think for most people, that’s probably much more intuitive to work with, and especially for newcomers. I think this concept of having a plain text document, which then gets rendered into the output format and you don’t necessarily see how it will look at the end can feel quite foreign. Whereas with this visual editor as you said, they already see what they will get in the end and it can be really, really nice.
[00:11:39] Paolo: So what are the typical situations in business or research maybe for using Quarto. Maybe I’ve already mentioned the one case for research I dunno, appendices for example, but maybe there are other examples?
[00:11:56] Thomas: I think something that is a really common use case. If you have something like a frequent report, whether it’s a weekly report, monthly, quarterly or whatever so you run a particular analysis recurringly over a certain time period. So over the span of the next week you get a new data and then every Monday you want to generate that new report.
And R markdown or Quarto also for that matter, is really well suited for that. You can just. Point always to the latest data and then run the analysis and put it in a nice report for your stakeholders to send over. And it can become even more fancy in a way because these reports don’t have to be static.
You actually can produce what is called a per parameterized report where you, for example, say I want to create a report for a particular country and maybe your business is doing sales in 10 different countries. So instead of having 10 different reports written in Quarto, you can actually do one master report and then the country becomes a parameter of the report.
And in that sense with one source document, you can actually generate 10 country specific reports at the end of the day. So that can become really powerful. And then another use case, which in my experience at pharma has been quite popular is creating basically a data-driven slide decks.
So instead of, creating some plots and tables in something like R and exporting it, importing it into PowerPoint you can actually also use Quarto to generate slides directly from within r And then whenever you need to change something the time it takes to update the slides obviously becomes much, much shorter cause it’s just clicking the render button and then regenerating the slides directly from within R and not doing the export step, and then manually the import step in your PowerPoint. That’s yet again another use case. But I can also still remember when I was in academia, I think actually my master thesis this at that point in time.
There was not, Quarto was not around yet, but I did actually write it in R markdown, which I’m not sure I would do again, maybe because there was a lot of text and not that much outputs actually. But for scientific write or technical writing can certainly be a very valuable tool because similar to the slides, You have both the text and the analysis with their outputs, just everything in one place.
And that makes it very easy to organize and you don’t have to jump from system A to system B and system C.
[00:14:17] Paolo: Maybe it’s only my viewpoint, but for creating slides for example. I think that these kind of systems force you to simplify your presentation. I think they’re really good that because sometimes when you work with PowerPoint you have a lot of options and you are tempted to play with all these options while using this kind of systems. I think everything becomes cleaner and you have less options, but of course you have all the options you need them for. And you can create also awesome stuff in the sense that you can have interactivity or motions in your presentations. It’s really another word and it’s much easier. Easier after you you have the first draft, then you do some tweaks maybe, and you can update the presentation quickly with new data, stuff like that, and render in a few seconds.
Instead of spending a lot of time in doing things with click interfaces PowerPoint. It’s really nice. Thanks a lot. I think that we covered a lot about Quarto and again, I think that what is really important is the community. So Quarto is here from something like one year or so. But there’s a lot of examples already and a lot of bugs and issues fixed because the community is helping a lot on this.
So thanks a lot, Thomas. It was nice speaking with you.
[00:16:03] Thomas: And it was a pleasure as always. Thank you Paolo.