Unlocking Data Storytelling [Part 2/3]
In the belly of the data beast
This article covers practical techniques to tell stories with data. It includes tips and tricks to approach data storytelling in a structured and meaningful way.
Not all good advice for storytelling is necessarily good advice for storytelling with data. However, there’s much we can learn from methods meant for creative and technical writing, especially if it’s adapted to the particular needs of data storytelling. The techniques I will recommend in this article have been tested and used multiple times, over more than ten years of experience telling data stories for businesses. I’ve made my way through part of the jungle and now invite you to hike some of those trails - but in the same way that I adapted from what came before me, it’s also up to you to evaluate how the techniques described here best fit your needs, and to adjust accordingly.
It’s time to focus on the message of the story. In order to tell both simple and complex stories in a simple way, we should learn about how to structure a story with data. But just telling a story isn’t enough, we must also learn how to make it relevant and meaningful.
How to structure a story with data
Down to the bare bones, the structure of a story should have a beginning, middle and end. That framework might not sound much helpful to tell a story with data, though. There are 3 main tricks I use to ensure my data storytelling has a neat structure: the 5W1H method, identifying linear vs. branched structures, and an adaptation of the Pixar narrative method.
THE 5W1H METHOD
A known problem-solving method and questioning approach that is useful in the data storyteller’s toolkit is the 5W1H method. Used in journalism to make sure all the essential elements of a news story are covered, the 5W1H consists of asking the following questions:
In a data story, there isn’t a single correct answer for each of these questions. First we should know “what” data the story is about, and that usually refers to a metric, or a key performance indicator. The subjects of the story may be “who” needs to tell or be told the story, or “who” the story is about — for example, the sample of a survey or the target audience of a marketing campaign. It’s always relevant for a story to know “where” it happens, “where” the data belongs to (is it about a specific country or city?), or if the data has any geographical dimension breakdown (can it be split per region or state?). To ask “where” the data came from also reminds us to indicate the data source, which is essential to lend credibility to the story.
Then it’s imperative to know “when” the story takes place, from which period the data is. Too many times I’ve seen data quality being questioned because the audience of a data story was comparing data from different periods. Besides, time can be a dimension of data, and to specify the different granularity is usually relevant to the story: months, days, hours. On occasion, it can also be pertinent to ask “when” the story will be told. Many people may feel discouraged to ask “why”, as if we were all born knowing the answers, so it can be the trickiest one on the 5W1H framework. But to create meaningful stories, it is imperative to shed light on “why” a story is being told, and “why” the data is behaving a certain way. Finally, “how” conclusions were made to get to the story and “how” the story is told must not be overlooked.
The 5W1H isn’t a one-size-fits all approach where the same questions will apply to every case. It should be treated as a framework and a thought-starter to help data storytellers to ask smarter questions.
LINEAR VS. BRANCHED NARRATIVES
One of the moments when I felt the power of data storytelling being unlocked in my work was when I started approaching linear and branched narratives differently.
Linear stories are welcome when we’re in explanatory mode, for example, presenting a report. It should feel linear like reading a novel where there’s a defined and singular line that takes us from the beginning to the middle and the end. Branched stories should exist when we’re in exploratory mode, for example examining a dataset or a dashboard and the multiple possible narratives that can be built from it. More similar to a “choose your own adventure” book or an RPG game.
For linear data storytelling, there’s usually a need to level the complexity of it according to the average member of the audience. The goals are set, and might not have the same relevance for everyone that receives the story. With branched stories, that gets trickier — but also way more interesting. In the brilliant article “Storytelling in Dashboards”, Suzie Lu points to two ways of approaching branched narratives in data storytelling:
- Branching the narrative based on the type of goal.
- Different users will be interested in different types of insights.
If data visualization resources are available, for example with an interactive dashboard, the visualization can be optimized to the different data scenarios and the different types of stories that may be told.
THE PIXAR STORY STRUCTURE
Years ago, when I was looking into narrative techniques that could be used to make data storytelling more engaging, I got so intrigued by Pixar’s story structure and decided to adapt it. It was particularly important for me to adjust the structure in a way that could be used to predict branched narratives for a data visualization project, where multiple different professionals were going to have access to the data.
This is the original story structure used by Pixar Animation Studios to create engaging narrative arcs:
Once upon a time…
Every day…
Until one day…
Because of that…
Because of that…
Until Finally…
Here’s how I rephrased it to work for data storytelling:
Currently, there is…
Every day/month…
One day…
Because of that…
With the consequence of…
And the conclusion is…
It works just as well for linear and branched storylines, like shown in the examples below.
Example 1:
Example 2:
The story structure created by Pixar and adapted to data storytelling encourages to define the context, identify possible incidents, as well as hypotheses and explanations for the data behavior, concluding with clear action points. This can serve as framework to plan complex branched narratives with data, like in some interactive dashboards or where data exploration is needed. While the original structure from Pixar meant to emphasize character development throughout the narrative arc, the adaptation for data storytelling aims to encourage insights and data-driven decisions.
How to tell a meaningful story with data
There’s an emblematic example that I often use when talking about data storytelling, to show how not every data story that seems simple and straightforward at the first glance is truly meaningful. I was working as a data visualization specialist and the stakeholder from a project asked me to build a dashboard with only two elements: a pie chart and a table. It should look something like this:
The stakeholder was sure that just the pie chart and the table would suffice and that it was a simple enough briefing. I’ll admit not to be the biggest fan of pie charts and to generally avoid adding tables to interactive dashboards, but in that case I just wanted to make sure that the project needs were really going to be met with that deliverable. After I asked him a series of questions, we noticed that it wouldn’t.
Here are some of the questions and answers we exchanged: Was the share of spends only relevant per channel, or also per brand? They actually needed the share per channel to be on brand level, so a pie chart wouldn’t suit it (because there were two categories to break down the data, and not only one). Was the data referring to only one or multiple markets? We needed to include a ranking of media spends per market to know what to prioritize. How often would people make decisions based on the data? It would be weekly, so we decided to add a weekly overview of the spends.
With these and many more questions, the draft that we ended up with was wildly different from the “pie chart and a table” requested at the start, and we were much more sure that it would help them make smarter business decisions based on that data. It looked similar to this:
That’s not to say that having more charts is always the best. Especially in data visualization, there are plenty of examples of how “less is more” (the wisdom behind the “data-ink ratio” concept from Edward Tufte). The moral of this anecdote is: to tell meaningful stories with data, we must learn to ask more effective questions.
ASKING BETTER QUESTIONS
The 5W1H framework is already a handy tool for the moments when we aren’t sure about what questions we should be asking, but it’s important to go beyond that.
Here’s a couple of premises we should keep in mind: It’s not enough to have data available, data only has value when it’s understood. Stories help us understand and engage with data. And data is as valuable as the questions that it helps answer and as the decisions it enables.
Therefore, we should ask: What is our goal with the story? What are the business questions that need to be answered? What decisions do we expect people to make based on the data? What is the data that can support those answers and decisions?
In business that may be more evident, but even in other contexts the goals, questions and decisions should be made clear when telling a story with data. The audience will attribute relevance and draw significance from a data story when it understands not only the message, but the causes, the reasonings and the implications behind it.
SIMPLICITY IS KEY
The most neat and impactful example from recent years of how simplicity can make a story more powerful is likely the case of “flattening the COVID-19 curve”.
The “COVID-19 curve” chart that became viral in 2020 was created by data visualization journalist Rosamund Pearce for an article published by The Economist, with data from the Centers for Disease Control and Prevention, the leading national public health institute of the United States. Although the perception of pandemic behavior in waves that can be represented in a chart dates back to the early 20th century at the time of the Spanish flu, Pearce got inspiration from a paper about pandemic prevention released in 2007 by the US Centers for Disease Control and Prevention [1]. It shows the importance of slowing down contamination, to ensure that the healthcare system can cope with the active cases, while vaccines and cures are researched. Lots of versions of Pearce’s chart were shared on social media and helped raise awareness to fight the coronavirus pandemic of 2020, including a brilliant animated version created by Alexander Radke that helped make it viral.
A key element was added a day later: a line to represent the healthcare system capacity. That version was created by Drew Harris, an assistant professor at the Thomas Jefferson University College of Population Health, who saw Pearce’s version in the Economist, and had previously used a similar chart inspired by the same CDC paper, in his work as a pandemic preparedness trainer. Even though the position of the line in the Y axis is speculative, it made clear the impact that prevention measures [2] could have. The idea is simple: slow down the illness spread so that the healthcare system can keep up.
We may compare the “COVID-19 curve” to other charts with huge impact in healthcare decisions, such as Snow’s map of Cholera or Nightingale’s rose diagram, both from the 19th century, but it is arguably even more impressive in some aspects. The “COVID-19 curve” communicates the “flatten the curve” message with simplicity, easy to understand even for people with only basic visual literacy, especially for a chart built with the mathematical concept of a parabola [3]. Although Snow’s and Nightingale’s visualizations aimed to tackle challenges locally, both had broader impact in the long term: Snow’s map had a huge role in the modern understanding of germs and how they can spread diseases, and Nightingale’s rose diagram influenced better and long-lasting sanitation practices across the whole world. However, the “covid-19 curve” chart had a clear worldwide short-term impact in people’s behavior during a global pandemic, with widespread usage and sharing of versions of this chart by scientists, governments, newspapers, and even common citizens on social media.
The long term impact is yet to be seen. But thanks to its simplicity, that chart supported data storytelling that encouraged life-saving behavior worldwide. We should all aim and strive to find such simple and effective twists in our day-to-day data storytelling.
Unlocking Data Storytelling
There are days when you stare at the dataset and it doesn’t seem willing to reveal its secret stories. The ideas discussed in this article on how to structure data stories and how to make them meaningful can hopefully serve as a framework and inspiration to unlock data narratives.
The next article from this series will address how to expand and deepen connection with the audience, and how to create a culture of data stories.
[1] Authors were not individually identified in this publication, and interviewed staff from the CDC couldn’t remember who made it (Wilson, 2020).
[2] Examples of preventive pandemic measures are: cleaning hands often with soap and water or an alcohol-based hand rub; maintaining a safe distance from other people, especially when coughing or sneezing; wearing an appropriate mask when physical distancing is not possible; staying at home when feeling unwell and seeking medical attention if any symptoms are identified.
[3] A parabola is a conic section, a plane curve which is mirror-symmetrical and is approximately U-shaped. Pandemic outbreaks manifest in the shape of a parabola, therefore this shape is used to explain and predict pandemic behavior.