SQL : PostgreSQL

  Aggregate Functions Like most other relational database products,  PostgreSQL  supports  aggregate functions . An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the  count ,  sum ,  avg  (average),  max  (maximum) and  min  (minimum) over a set of rows. As an example, we can find the highest low-temperature reading anywhere with: SELECT max(temp_lo) FROM weather; max ----- 46 (1 row) If we wanted to know what city (or cities) that reading occurred in, we might try: SELECT city FROM weather WHERE temp_lo = max(temp_lo); WRONG but this will not work since the aggregate  max  cannot be used in the  WHERE  clause. (This restriction exists because the  WHERE  clause determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are computed.) However, as is o...

Data visualization from google analysis course

 

Effective data visualizations

A data visualization, sometimes referred to as a “data viz,” allows analysts to properly interpret data. A good way to think of data visualization is that it can be the difference between utter confusion and really grasping an issue. Creating effective data visualizations is a complex task; there is a lot of advice out there, and it can be difficult to grasp it all. In this reading, you are going to learn some tips and tricks for creating effective data visualizations. First, you'll review two frameworks that are useful for thinking about how you can organize the information in your visualization. Second, you'll explore pre-attentive attributes and how they can be used to affect the way people think about your visualizations. From there, you'll do a quick review of the design principles that you should keep in mind when creating your visualization. You will end the reading by reviewing some practices that you can use to avoid creating misleading or inaccurate visualizations. 

Frameworks for organizing your thoughts about visualization

Frameworks can help you organize your thoughts about data visualization and give you a useful checklist to reference. Here are two frameworks that may be useful for you as you create your own data viz: 

1) The McCandless Method

You learned about the David McCandless method in the first lesson on effective data visualizations, but as a refresher, the McCandless Method lists four elements of good data visualization: 

  1. Information: the data you are working with

  2. Story: a clear and compelling narrative or concept

  3. Goal: a specific objective or function for the visual

  4. Visual form: an effective use of metaphor or visual expression

Note: One useful way of approaching this framework is to notice the parts of the graphic where there is incomplete overlap between all four elements. For example, visual form without a goal, story, or data could be a sketch or even art. Data plus visual form without a goal or function is eye candy. Data with a goal but no story or visual form is boring. All four elements need to be at work to create an effective visual.

2) Kaiser Fung’s Junk Charts Trifecta Checkup

This approach is a useful set of questions that can help consumers of data visualization critique what they are consuming and determine how effective it is. The Checkup has three questions:

  1. What is the practical question? 

  2. What does the data say?

  3. What does the visual say? 

Note: This checklist helps you think about your data viz from the perspective of your audience and decide if your visual is communicating your data effectively to them or not. In addition to these frameworks, there are some other building blocks that can help you construct your data visualizations. 

Pre-attentive attributes: marks and channels

Creating effective visuals means leveraging what we know about how the brain works, and then using specific visual elements to communicate the information effectively. Pre-attentive attributes are the elements of a data visualization that people recognize automatically without conscious effort. The essential, basic building blocks that make visuals immediately understandable are called marks and channels. 

Marks

M​arks are basic visual objects like points, lines, and shapes. Every mark can be broken down into four qualities:

  1. Position - Where a specific mark is in space in relation to a scale or to other marks

simple line chart with two lines. One is red and one is blue, and there is obvious space between them.

2. Size - How big, small, long, or tall a mark is

a plot with points that are different sizes

3. Shape - Whether a specific object is given a shape that communicates something about it

horizontal bar chart, but bars are made of icons shaped like people

4. Color - What color the mark is

a bar chart with a red, green, yellow, grey, and blue bar

Channels

C​hannels are visual aspects or variables that represent characteristics of the data. Channels are basically marks that have been used to visualize data. Channels will vary in terms of how effective they are at communicating data based on three elements: 

1. Accuracy - Are the channels helpful in accurately estimating the values being represented?

For example, color is very accurate when communicating categorical differences, like apples and oranges. But it is much less effective when distinguishing quantitative data like 5 from 5.5.

a plot with apples and oranges representing the data points

2. Popout - How easy is it to distinguish certain values from others?

There are many ways of drawing attention to specific parts of a visual, and many of them leverage pre-attentive attributes like line length, size, line width, shape, enclosure, hue, and intensity.

a line chart with three blue lines and one red line

3. Grouping - How good is a channel at communicating groups that exist in the data?

Consider the proximity, similarity, enclosure, connectedness, and continuity of the channel.

 bar chart with four groups of bars. In each group, there is a red bar and a blue bar

But, remember: the more you emphasize different things, the less that emphasis counts. The more you emphasize one single thing, the more that counts. 

Design principles

Once you understand the pre-attentive attributes of data visualization, you can go on to design principles for creating effective visuals. These design principles are important to your work as a data analyst because they help you make sure that you are creating visualizations that communicate your data effectively to your audience. By keeping these rules in mind, you can plan and evaluate your data visualizations to decide if they are working for you and your goals. And, if they aren’t, you can adjust them! 

Principle

Description

Choose the right visual

One of the first things you have to decide is which visual will be the most effective for your audience. Sometimes, a simple table is the best visualization. Other times, you need a more complex visualization to illustrate your point. 

Optimize the data-ink ratio

The data-ink entails focusing on the part of the visual that is essential to understanding the point of the chart. Try to minimize non-data ink like boxes around legends or shadows to optimize the data-ink ratio.

Use orientation effectively

Make sure the written components of the visual, like the labels on a bar chart, are easy to read. You can change the orientation of your visual to make it easier to read and understand. 

Color

There are a lot of important considerations when thinking about using color in your visuals. These include using color consciously and meaningfully, staying consistent throughout your visuals, being considerate of what colors mean to different people, and using inclusive color scales that make sense for everyone viewing them.

Numbers of things

Think about how many elements you include in any visual. If your visualization uses lines, try to plot five or fewer. If that isn’t possible, use color or hue to emphasize important lines. Also, when using visuals like pie charts, try to keep the number of segments to less than seven since too many elements can be distracting. 

Avoiding misleading or deceptive charts 

A line graph with several colorful lines going in different directions. It is intentionally difficult to read

As you are considering what kind of visualization to create and how to design it, you will want to be sure that you are not creating misleading or deceptive charts. As you have been learning, data analysis provides people with insights and knowledge they can use to make decisions. So, it is important that the visualizations you create are communicating your data accurately and truthfully. Here are some common errors to avoid so that your visualizations aren’t accidentally misleading: 

What to avoid

Why

Cutting off the y-axis

Changing the scale on the y-axis can make the differences between different groups in your data seem more dramatic, even if the difference is actually quite small. 

Misleading use of a dual y-axis

Using a dual y-axis without clearly labeling it in your data visualization can create extremely misleading charts. 

Artificially limiting the scope of the data

If you only consider the part of the data that confirms your analysis, your visualizations will be misleading because they don’t take all of the data into account. 

Problematic choices in how data is binned or grouped

It is important to make sure that the way you are grouping data isn’t misleading or misrepresenting your data and disguising important trends and insights. 

Using part-to-whole visuals when the totals do not sum up appropriately 

If you are using a part-to-whole visual like a pie chart to explain your data, the individual parts should add up to equal 100%. If they don’t, your data visualization will be misleading. 

Hiding trends in cumulative charts

Creating a cumulative chart can disguise more insightful trends by making the scale of the visualization too large to track any changes over time. 

Artificially smoothing trends

Adding smooth trend lines between points in a scatter plot can make it easier to read that plot, but replacing the points with just the line can actually make it appear that the point is more connected over time than it actually was. 

Finally, keep in mind that data visualization is an art form, and it takes time to develop these skills. Over your career as a data analyst, you will not only learn how to design good data visualizations, but you will also learn how to evaluate good data visualizations. Use these tips to think critically about data visualization—both as a creator and as an audience member.

Further reading

  • The beauty of data visualization: In this video, David McCandless explains the need for design to not just be beautiful, but for it to be meaningful as well. Data visualization must be able to balance function and form for it to be relevant to your audience. 

  • ‘The McCandless Method’ of data presentation: At first glance, this blog appears to be written by a David McCandless fan, and it is. However, it contains very useful information and provides an in-depth look at the 5-step process that McCandless uses to present his data.

  • Information is beautiful: Founded by McCandless himself, this site serves as a hub of sample visualizations that make use of the McCandless method. Explore data from the news, science, the economy, and so much more and learn how to make visual decisions based on facts from all kinds of sources. 

  • Beautiful daily news: In this McCandless collection, explore uplifting trends and statistics that are beautifully visualized for your creative enjoyment. A new chart is released every day so be sure to visit often to absorb the amazing things happening all over the world.

The beauty of visualizing

You will find that organizing your data and communicating your results are significant parts of a data analyst’s role. In this reading, you are going to navigate different resources for effective data visualization that will allow you to choose the best model to present your data. 

A collection of different types of charts, such as a bar chart, a pie chart, and a distribution graph

Inspiration is in the air

Data visualization is the graphical representation of data. But why should data analysts care about data visualization? Well your audience won’t always have the ability to interpret or understand the complex information that you relay to them so your job is to inform them of your analysis in a way that is meaningful, engaging, and easy to understand. Part of why data visualization is so effective is because people’s eyes are drawn to colors, shapes, and patterns, which makes those visual elements perfect for telling a story that goes beyond just the numbers. 

Of course, one of the best ways to understand the importance of data visualization is to go through different examples of it. As a junior data analyst, you want to have several visualization options for your creative process whenever you need. Below is a list of resources that can inspire your next data-driven decisions, as well as teach you how to make your data more accessible to your audience:

  • The data visualization catalogue: Not sure where to start with data visualization? This catalogue features a range of different diagrams, charts, and graphs to help you find the best fit for your project. As you navigate each category, you will get a detailed description of each visualization as well as its function and a list of similar visuals. 

  • The 25 best data visualizations: In this collection of images, explore the best examples of data that gets made into a stunning visual. Simply click on the link below each image to get an in-depth view of each project, and learn why making data visually appealing is so important.

  • 10 data visualization blogs: Each link will lead you to a blog that is a fountain of information on everything from data storytelling to graphic data. Get your next great idea or just browse through some visual inspiration.  

  • Information is beautiful: Founded by David McCandless, this gallery is dedicated to helping you make clearer, more informed visual decisions based on facts and data. These projects are made by students, designers, and even data analysts to help you gain insight into how they have taken their own data and turned it into visual storytelling.

  • Data studio gallery: Information is vital, but information presented in a digestible way is even more useful. Browse through this interactive gallery and find examples of different types of data communicated visually. You can even use the data studio tool to create your own data-driven visual.

Engage your audience

Remember: an important component of being a data analyst is the ability to communicate your findings in a way that will appeal to your audience. Data visualization has the ability to make complex (and even monotonous) information easily understood, and knowing how to utilize data visualization is a valuable skill to have. Your goal is always to help the audience have a conversation with the data so your visuals draw them into the conversation. This is especially true when you have to help your audience engage with a large amount of data, such as the flow of goods from one country to other parts of the world.

The wonderful world of visualizations

As a data analyst, you will often be tasked with relaying information and data that your audience might not readily understand. Presenting your data visually is an effective way to communicate complex information and engage your stakeholders. One question to ask yourself is: “what is the best way to tell the story within my data?” This reading includes several options for you to choose from (although there are many more).

Line chart 

A line chart is used to track changes over short and long periods of time. When smaller changes exist, line charts are better to use than bar graphs. Line charts can also be used to compare changes over the same period of time for more than one group. 

Let’s say you want to present the graduation frequency for a particular high school between the years 2008-2012. You would input your data in a table like this:

Year

Graduation rate

2008

87

2009

89

2010

92

2011

92

2012

96

From this table, you are able to present your data in a line chart like this:

This is a line graph depicting the percent rate of high school graduation over the years from 2008 to 2012

Maybe your data is more specific than above. For example, let’s say you are tasked with presenting the difference of graduation rates between male and female students. Then your chart would resemble something like this:

This is a line graph of percent rate of high school graduation for male and female students over the years from 2008 to 2012

Column chart 

Column charts use size to contrast and compare two or more values, using height or lengths to represent the specific values.  

The below is example data concerning sales of vehicles over the course of 5 months:

Month

Vehicles sold

August

2,800

September

3,700

October

3,750

November

4,300

December

4,600

Visually, it would resemble something like this:

A bar graph of monthly car sales from August to December

What would this column chart entail if we wanted to add the sales data for a competing car brand?

A double bar graph of monthly car sales for two different car brands from August to December

Heatmap 

Similar to bar charts, heatmaps also use color to compare categories in a data set. They are mainly used to show relationships between two variables and use a system of color-coding to represent different values. The following heatmap plots temperature changes for each city during the hottest and coldest months of the year.

A heatmap of varying climates for different cities around the world between June to January

Pie chart

The pie chart is a circular graph that is divided into segments representing proportions corresponding to the quantity it represents, especially when dealing with parts of a whole.

For example, let’s say you are determining favorite movie categories among avid movie watchers. You have gathered the following data:

Movie category

Preference

Comedy

41%

Drama

11%

Sci-fi

3%

Romance

17%

Action

28%

Visually, it would resemble something like this:

A pie chart of five movie categories and percentage of audience preference

Scatter plot

Scatter plots show relationships between different variables. Scatter plots are typically used for two variables for a set of data, although additional variables can be displayed.

For example, you might want to show data of the relationship between temperature changes and ice cream sales. It would resemble something like this:

A scatterplot showing the rising sales of ice cream as the temperature rises

As you may notice, the higher the temperature got, the more demand there was for ice cream – so the scatter plot is great for showing the relationship between the two variables.

Distribution graph

A distribution graph displays the spread of various outcomes in a dataset.

Let’s apply this to real data. To account for its supplies, a brand new coffee shop owner wants to measure how many cups of coffee their customers consume, and they want to know if that information is dependent on the days and times of the week. That distribution graph would resemble something like this:

A histogram showing cups of coffee purchased across all 7 days of the week

From this distribution graph, you may notice that the amount of coffee sales steadily increases from the beginning of the week, reaching the highest point mid-week, and then decreases towards the end of the week.

If outcomes are categorized on the x-axis by distinct numeric values (or ranges of numeric values), the distribution becomes a histogram. If data is collected from a customer rewards program, they could categorize how many customers consume between one and ten cups of coffee per week. The histogram would have ten columns representing the number of cups, and the height of the columns would indicate the number of customers drinking that many cups of coffee per week.

Reviewing each of these visual examples, where do you notice that they fit in relation to your type of data? One way to answer this is by evaluating patterns in data. Meaningful patterns can take many forms, such as:

  • Change: This is a trend or instance of observations that become different over time. A great way to measure change in data is through a line or column chart.

  • Clustering: A collection of data points with similar or different values. This is best represented through a distribution graph.

  • Relativity: These are observations considered in relation or in proportion to something else. You have probably seen examples of relativity data in a pie chart.

  • Ranking: This is a position in a scale of achievement or status. Data that requires ranking is best represented by a column chart.

  • Correlation: This shows a mutual relationship or connection between two or more things. A scatter plot is an excellent way to represent this type of data pattern.

Studying your data

Data analysts are tasked with collecting and interpreting data as well as displaying data in a meaningful and digestible way. Determining how to visualize your data will require studying your data’s patterns and converting it using visual cues. Feel free to practice your own charts and data in spreadsheets. Simply input your data in the spreadsheet, highlight it, then insert any chart type and view how your data can be visualized based on what you choose.

Principles of design

In this reading, you are going to learn more about using the elements of art and principles of design to create effective visualizations. So far, we have learned that communicating data visually is a form of art. Now, it's time to explore the nine design principles for creating beautiful and effective data visualizations that can be informative and appeal to all audiences.

After we go through the various design principles, spend some time examining the visual examples to ensure that you have a thorough understanding of how the principle is put into practice. Let’s get into it! 

Nine basic principles of design 

There are nine basic principles of design that data analysts should think about when building their visualizations.  

The 9 principles of design icons: Balance, emphasis, movement, pattern, repetition, proportion, rhythm, variety, unity

1. Balance: The design of a data visualization is balanced when the key visual elements, like color and shape, are distributed evenly. This doesn’t mean that you need complete symmetry, but your visualization shouldn’t have one side distracting from the other. If your data visualization is balanced, this could mean that the lines used to create the graphics are similar in length on both sides, or that the space between objects is equal. For example, this column chart (also shown below) is balanced; even though the columns are different heights and the chart isn’t symmetrical, the colors, width, and spacing of the columns keep this data visualization balanced. The colors provide sufficient contrast to each other so that you can pay attention to both the motivation level and the energy level displayed.

bar chart measuring motivation and energy levels throughout the day

2. Emphasis: Your data visualization should have a focal point, so that your audience knows where to concentrate. In other words, your visualizations should emphasize the most important data so that users recognize it first. Using color and value is one effective way to make this happen. By using contrasting colors, you can make certain that graphic elements—and the data shown in those elements—stand out. 

For example, you will notice a heat map data visualization below from The Pudding’s “Where Slang Comes From" article. This heat map uses colors and value intensity to emphasize the states where search interest is highest. You can visually identify the increase in the search over time from low interest to high interest. This way, you are able to quickly grasp the key idea being presented without knowing the specific data values.

heat map graphic measuring search interest

3. Movement: Movement can refer to the path the viewer’s eye travels as they look at a data visualization, or literal movement created by animations. Movement in data visualization should mimic the way people usually read. You can use lines and colors to pull the viewer’s attention across the page. 

For example, notice how the average line in this combo chart (also shown below) draws your attention from left to right. Even though this example isn’t moving, it still uses the movement principle to guide viewers’ understanding of the data. 

bar chart of monthly coffee production by country

4. Pattern: You can use similar shapes and colors to create patterns in your data visualization. This can be useful in a lot of different ways. For example, you can use patterns to highlight similarities between different data sets, or break up a pattern with a unique shape, color, or line to create more emphasis.

In the example below, the different colored categories of this stacked column chart (also shown below) are a consistent pattern that makes it easier to compare book sales by genre in each column. Notice in the chart that the Fantasy & Sci Fi category (royal blue) is increasing over time even as the general category (green) is staying about the same. 

Bar chart measuring book sales by genre: fantasy & sci-fi, general, western, romance, literature, mystery/crime

5. Repetition: Repeating chart types, shapes, or colors adds to the effectiveness of your visualization. Think about the book sales chart from the previous example: the repetition of the colors helps the audience understand that there are distinct sets of data. You may notice this repetition in all of the examples we have reviewed so far. Take some time to review each of the previous examples and notice the elements that are repeated to create a meaningful visual story.

6. Proportion: Proportion is another way that you can demonstrate the importance of certain data. Using various colors and sizes helps demonstrate that you are calling attention to a specific visual over others. If you make one chart in a dashboard larger than the others, then you are calling attention to it. It is important to make sure that each chart accurately reflects and visualizes the relationship among the values in it. In this dashboard (also shown below), the slice sizes and colors of the pie chart compared to the data in the table help make the number of donuts eaten by each person the focal point. 

dashboard with a pie chart and table

These first six principles of design are key considerations that you can make while you are creating your data visualization. These next three principles are useful checks once your data visualization is finished. If you have applied the initial six principles thoughtfully, then you will probably recognize these next three principles within your visualizations already. 

7. Rhythm: This refers to creating a sense of movement or flow in your visualization. Rhythm is closely tied to the movement principle. If your finished design doesn’t successfully create a flow, you might want to rearrange some of the elements to improve the rhythm.

8. Variety: Your visualizations should have some variety in the chart types, lines, shapes, colors, and values you use. Variety keeps the audience engaged. But it is good to find balance since too much variety can confuse people. The variety you include should make your dashboards and other visualizations feel interesting and unified.

9. Unity: The last principle is unity. This means that your final data visualization should be cohesive. If the visual is disjointed or not well organized, it will be confusing and overwhelming. 

Being a data analyst means learning to think in a lot of different ways. These nine principles of design can help guide you as you create effective and interesting visualizations. 

Data is beautiful

At this point, you might be asking yourself: What makes a good visualization? Is it the data you use? Or maybe it is the story that it tells? In this reading, you are going to learn more about what makes data visualizations successful by exploring David McCandless’ elements of successful data visualization and evaluating three examples based on those elements. Data visualization can change our perspective and allow us to notice data in new, beautiful ways. A picture is worth a thousand words—that’s true in data too! You will have the option to save all of the data visualization examples that are used throughout this reading; these are great examples of successful data visualization that you can use for future inspiration.

Four overlapping ovals outlining the four different parts of data visualization: information, story, goal, and visual form

You can also access a PDF version of this visualization and save it for your own reference by clicking the file below: 

WEB_What-Makes-a-Good-Infoviz.pdfPDF File
Open file

Four elements of successful visualizations

The Venn diagram by David McCandless identifies four elements of successful visualizations: 

  • Information (data): The information or data that you are trying to convey is a key building block for your data visualization. Without information or data, you cannot communicate your findings successfully.

  • Story (concept): Story allows you to share your data in meaningful and interesting ways. Without a story, your visualization is informative, but not really inspiring. 

  • Goal (function): The goal of your data visualization makes the data useful and usable. This is what you are trying to achieve with your visualization. Without a goal, your visualization might still be informative, but can’t generate actionable insights.

  • Visual form (metaphor): The visual form element is what gives your data visualization structure and makes it beautiful. Without visual form, your data is not visualized yet. 

All four of these elements are important on their own, but a successful data visualization balances all four. For example, if your data visualization has only two elements, like the information and story, you have a rough outline. This can be really useful in your early planning stages, but is not polished or informative enough to share. Even three elements are not quite enough— you need to consider all four to create a successful data visualization.

In the next part of this reading, you will use these elements to examine two data visualization examples and evaluate why they are successful. 

Example 1: Visualization of dog breed comparison

Data visualization titled “Best in Show: The Ultimate Data Dog” with dog breeds measured by data score and popularity.

Save this data visualization as a PDF by clicking the file below:

IIB-LICENSED_Best-in-Show.pdfPDF File
Open file

View the data

The Best in Show visualization uses data about different dog breeds from the American Kennel Club. The data has been compiled in a spreadsheet. Click the link below and select "Use Template" to view the data.

Link to the template: KIB - Best in Show

Or, if you don't have a Google account, download the file below.

KIB - Best in Show (public)XLSX File
Download file

Examine the four elements

This visualization compares the popularity of different dog breeds to a more objective data score. Consider how it uses the elements of successful data visualization:

  • Information (data): If you view the data, you can explore the metrics being illustrated in the visualization. 

  • Story (concept): The visualization shows which dogs are overrated, which are rightly ignored, and those that are really hot dogs! And, the visualization reveals some overlooked treasures you may not have known about previously.

  • Goal (function): The visualization is interested in exploring the relationship between popularity and the objective data scores for different dog breeds. By comparing these data points, you can learn more about how different dog breeds are perceived. 

  • Visual form (metaphor): In addition to the actual four-square structure of this visualization, other visual cues are used to communicate information about the dataset. The most obvious is that the data points are represented as dog symbols. Further, the size of a dog symbol and the direction the dog symbol faces communicate other details about the data.  

Example 2: Visualization of rising sea levels

Visualization titled “When Sea Levels Attack!”

Save this data visualization as a PDF by clicking the file below:

IIB-LICENSED_Sea-Levels.pdfPDF File
Open file

Examine the four elements

This When Sea Levels Attack visualization illustrates how much sea levels are projected to rise over the course of 8,000 years. The silhouettes of different cities with different sea levels, rising from right to left, helps to drive home how much of the world will be affected as sea levels continue to rise. Here is how this data visualization stacks up using the four elements of successful visualization:

  • Information (data): This visualization uses climate data on rising sea levels from a variety of sources, including NASA and the Intergovernmental Panel on Climate Change. In addition to that data, it also uses recorded sea levels from around the world to help illustrate how much rising sea levels will affect the world. 

  • Story (concept): The visualization tells a very clear story: Over the course of 8,000 years, much of the world as we know it will be underwater. 

  • Goal (function): The goal of this project is to demonstrate how soon rising sea levels are going to affect us on a global scale. Using both data and the visual form, this visualization makes rising sea levels feel more real to the audience. 

  • Visual form (metaphor): The city silhouettes in this visualization are a beautiful way to drive home the point of the visualization. It gives the audience a metaphor for how rising sea levels will affect the world around them in a way that showing just the raw numbers can’t do. And for a more global perspective, the visualization also uses inset maps. 

Key takeaways

Notice how each of these visualizations balance all four elements of successful visualization. They clearly incorporate data, use storytelling to make that data meaningful, focus on a specific goal, and structure the data with visual forms to make it beautiful and communicative. The more you practice thinking about these elements, the more you will be able to include them in your own data visualizations.

Design thinking for visualization improvement

Design thinking for data visualization involves five phases:

  1. Empathize: Thinking about the emotions and needs of the target audience for the data visualization 

  2. Define: Figuring out exactly what your audience needs from the data

  3. Ideate: Generating ideas for data visualization

  4. Prototype: Putting visualizations together for testing and feedback

  5. Test: Showing prototype visualizations to people before stakeholders see them

As interactive dashboards become more popular for data visualization, new importance has been placed on efficiency and user-friendliness. In this reading, you will learn how design thinking can improve an interactive dashboard. As a junior analyst, you wouldn’t be expected to create an interactive dashboard on your own, but you can use design thinking to suggest ways that developers can improve data visualizations and dashboards.

An example: online banking dashboard

Suppose you are an analyst at a bank that has just released a new dashboard in their online banking application. This section describes how you might explore this dashboard like a new user would, consider a user’s needs, and come up with ideas to improve data visualization in the dashboard. The dashboard in the banking application has the following data visualization elements:

  • Monthly spending is shown as a donut chart that reflects different categories like utilities, housing, transportation, education, and groceries. 

  • When customers set a budget for a category, the donut chart shows filled and unfilled portions in the same view.

  • Customers can also set an overall spending limit, and the dashboard will automatically assign the budgeted amounts (unfilled areas of the donut chart) to each category based on past spending trends.

This illustration shows a dashboard for online banking that has a donut chart to track spending versus budget.

E​mpathize

First, empathize by putting yourself in the shoes of a customer who has a checking account with the bank. 

  • Do the colors and labels make sense in the visualization? 

  • How easy is it to set or change a budget? 

  • When you click on a spending category in the donut chart, are the transactions in the category displayed?

What is the main purpose of the data visualization? If you answered that it was to help customers stay within budget or to save money, you are right! Saving money was a top customer need for the dashboard. 

D​efine

Now, imagine that you are helping dashboard designers define other things that customers might want to achieve besides saving money.

What other data visualizations might be needed? 

  • Track income (in addition to spending)

  • Track other spending that doesn’t neatly fit into the set categories (this is sometimes called discretionary spending)

  • Pay off debt

Can you think of anything else?

I​deate

Next, ideate additional features for the dashboard and share them with the software development team. 

  • What new data visualizations would help customers?

  • Would you recommend bar charts or line charts in addition to the standard donut chart?

  • Would you recommend allowing users to create their own (custom) categories?

Can you think of anything else?

P​rototype

Finally, developers can prototype the next version of the dashboard with new and improved data visualizations.

T​est

Developers can close the cycle by having you (and others) test the prototype before it is sent to stakeholders for review and approval.

Key takeaways

This design thinking example showed how important it is to:

  • Understand the needs of users

  • Generate new ideas for data visualizations

  • Make incremental improvements to data visualizations over time

You can refer to the following articles for more information about design thinking:

Pro tips for highlighting key information

Headlines, subtitles, labels, and annotations help you turn your data visualizations into more meaningful displays. After all, you want to invite your audience into your presentation and keep them engaged. When you present a visualization, they should be able to process and understand the information you are trying to share in the first five seconds. This reading will teach you what you can do to engage your audience immediately. 

If you already know what headlines, subtitles, labels and annotations do, go to the guidelines and style checks at the end of this reading. If you don’t, these next sections are for you. 

Headlines that pop

A headline is a line of words printed in large letters at the top of a visualization to communicate what data is being presented. It is the attention grabber that makes your audience want to read more. Here are some examples:

Check out the chart below. Can you identify what type of data is being represented? Without a headline, it can be hard to figure out what data is being presented. A graph like the one below could be anything from average rents in the tri-city area, to sales of competing products, or daily absences at the local elementary, middle, and high schools. 

 This illustration is of an unfinished stacked line chart that has no headline or other labels.

Turns out, this illustration is showing average rents in the tri-city area. So, let’s add a headline to make that clear to the audience. Adding the headline, “Average Rents in the Tri-City Area” above the line chart instantly informs the audience what it is comparing.

This illustration is an unfinished stacked line chart with a headline added that reads “Average Rents in the Tri-City Area.”

Subtitles that clarify

A subtitle supports the headline by adding more context and description. Adding a subtitle will help the audience better understand the details associated with your chart. Typically, the text for subtitles has a smaller font size than the headline. 

In the average rents chart, it is unclear from the headline “Average Rents in the Tri-City Area which cities are being described. There are tri-cities near San Diego, California (Oceanside, Vista, and Carlsbad), tri-cities in the San Francisco Bay Area (Fremont, Newark, and Union City), tri-cities in North Carolina (Raleigh, Durham, and Chapel Hill), and tri-cities in the United Arab Emirates (Dubai, Ajman, and Sharjah). 

We are actually reporting the data for the tri-city area near San Diego. So adding “Oceanside, Vista, and Carlsbad” becomes the subtitle in this case. This subtitle enables the audience to quickly identify which cities the data reflects.

This is an unfinished stacked line chart and headline and now an added  subtitle that reads “Oceanside, Vista, and Carlsbad.”

Labels that identify

A label in a visualization identifies data in relation to other data. Most commonly, labels in a chart identify what the x-axis and y-axis show. Always make sure you label your axes. We can add “Months (January - June 2020)” for the x-axis and “Average Monthly Rents ($)” for the y-axis in the average rents chart. 

This is an unfinished stacked line chart, headline, subtitle, and newly added labels for the x and y axes.

Data can also be labeled directly in a chart instead of through a chart legend. This makes it easier for the audience to understand data points without having to look up symbols or interpret the color coding in a legend. 

We can add direct labels in the average rents chart. The audience can then identify the data for Oceanside in yellow, the data for Carlsbad in green, and the data for Vista in blue. 

This is an unfinished stacked line chart, headline, subtitle, and newly added labels for the individual data lines.

Annotations that focus

An annotation briefly explains data or helps focus the audience on a particular aspect of the data in a visualization. 

Suppose in the average rents chart that we want the audience to pay attention to the rents at their highs. Annotating the data points representing the highest average rents will help people focus on those values for each city.

This is a finished chart with headline, subtitle, labels, and newly added annotations for the highest rents in each city.

Guidelines and pro tips

Refer to the following table for recommended guidelines and style checks for headlines, subtitles, labels, and annotations in your data visualizations. Think of these guidelines as guardrails. Sometimes data visualizations can become too crowded or busy. When this happens, the audience can get confused or distracted by elements that aren’t really necessary. The guidelines will help keep your data visualizations simple, and the style checks will help make your data visualizations more elegant.

Visualization components

Guidelines

Style checks

Headlines

- Content: Briefly describe the data - Length: Usually the width of the data frame - Position: Above the data

- Use brief language - Don’t use all caps - Don’t use italic - Don’t use acronyms - Don't use abbreviations - Don’t use humor or sarcasm

Subtitles

- Content: Clarify context for the data - Length: Same as or shorter than headline - Position: Directly below the headline

- Use smaller font size than headline - Don’t use undefined words - Don’t use all caps, bold, or italic - Don’t use acronyms - Don't use abbreviations

Labels

- Content: Replace the need for legends - Length: Usually fewer than 30 characters - Position: Next to data or below or beside axes

- Use a few words only - Use thoughtful color-coding - Use callouts to point to the data - Don’t use all caps, bold, or italic

Annotations

- Content: Draw attention to certain data - Length: Varies, limited by open space - Position: Immediately next to data annotated

- Don’t use all caps, bold, or italic - Don't use rotated text - Don’t distract viewers from the data

You want to be informative without getting too detailed. To meaningfully communicate the results of your data analysis, use the right visualization components with the right style. In other words, let simplicity and elegance work together to help your audience process the data you are sharing in five seconds or less.

Logging in to Tableau Public

Tableau Public is a free platform to publicly share and explore data visualizations online. Anyone can create visualizations using either Tableau Desktop Professional Edition or the free Tableau Public Edition. With millions of inspiring data visualizations (or “vizzes” as we affectionately call them), anyone can check out vizzes about an array of public data topics, encouraging growth within the community. 

In this reading, we will discuss how you can create a profile for using Tableau Public. We will also introduce you to some of the existing public data galleries available to you. Finally, we will end the reading with a list of resources that you can use to continue to learn about Tableau on your own. 

Creating a Tableau Public profile

Coming up, you are going to be using Tableau Public to explore data visualizations yourself. But first, you are going to learn how to sign up for a Tableau Public profile and how to access the Google Career Certificates Gallery. This will give you access to the data visualizations created in the lesson videos. Keep in mind that once you create a profile, you can use it to access both Tableau Public as well as Tableau desktop. 

To get started, go to the Tableau Public home page at public.tableau.com. Once you navigate to that page, you can create your account by clicking on the Sign Up button in the top-right corner of the screen. A pop up dialog box will appear asking you for basic profile information. Enter the requested information and click on Create My Profile once the button becomes available. If the button doesn’t become available, you may have missed a place where you need to fill out requested information.

screenshot of the Create a Profile menu on Tableau Public

Once you have created your account, you will be able to explore public datasets and check out other creators’ work. 

Visualization galleries 

One of the coolest features of Tableau Public is the public gallery, where you can explore what visualizations other people have created. In addition, you have the option to explore the data behind the visualizations, as well as download visualizations that you may want to explore in detail later on. You can find the gallery from the header on the home page, or use the search function, which appears as a magnifying glass icon, to explore data and vizzes about particular topics.

screenshot of the Tableau Public menu

Here are a few useful links within Tableau Public:

  • Public Gallery: These are data visualizations created by other users that you can scroll through. 

  • Featured Gallery: This is a collection of featured data visualizations created by other users. This is a great source of inspiration.

  • Viz of the Day: Tableau Public features a new data viz every day; check back for new visualizations daily!

  • Google Career Certificates page on Tableau Public: This gallery contains all of the visualizations created in the video lessons; you can explore these examples more here. 

  • Tableau Public resources page: This links to the resources page, including some how-to videos and sample data.

  • Tableau user forum: Search for answers and connect with other users in the community on the forum page.

Designing a chart in 60 minutes

By now, you understand the principles of design and how to think like a designer. Among the many options of data visualization is creating a chart, which is a graphical representation of data. 

Choosing to represent your data via a chart is usually the most simple and efficient method. Let’s go through the entire process of creating any type of chart in 60 minutes. The goal here is to develop a prototype or mock up of your chart that you can quickly present to an audience. This will also enable you to have a sense of whether or not the chart is communicating the information that you want.

Image of pie chart divided into 4 timed sections which equal 60 minutes

Follow this high level 60-minute chart to guide your thinking whenever you begin working on a data visualization. 

Prep (5 min): Create the mental and physical space necessary for an environment of comprehensive thinking. This means allowing yourself room to brainstorm how you want your data to appear while considering the amount and type of data that you have.

Talk and listen (15 min): Identify the object of your work by getting to the “ask behind the ask” and establishing expectations. Ask questions and really concentrate on feedback from stakeholders regarding your projects to help you hone how to lay out your data. 

Sketch and design (20 min): Draft your approach to the problem. Define the timing and output of your work to get a clear and concise idea of what you are crafting.

Prototype and improve (20 min): Generate a visual solution and gauge its effectiveness at accurately communicating your data. Take your time and repeat the process until a final visual is produced. It is alright if you go through several visuals until you find the perfect fit. 

Key takeaway

This is a great overview you can use when you need to create a visualization in a short amount of time. As you become more experienced in data visualization, you will find yourself creating your own process. You will get a more detailed description of different visualization options in the next reading, including line charts, bar charts, scatter plots, and more. No matter what you choose, always remember to take the time to prep, identify your objective, take in feedback, design, and create.


The wonderful world of visualizations

As a data analyst, you will often be tasked with relaying information and data that your audience might not readily understand. Presenting your data visually is an effective way to communicate complex information and engage your stakeholders. One question to ask yourself is: “what is the best way to tell the story within my data?” This reading includes several options for you to choose from (although there are many more).

Line chart 

A line chart is used to track changes over short and long periods of time. When smaller changes exist, line charts are better to use than bar graphs. Line charts can also be used to compare changes over the same period of time for more than one group. 

Let’s say you want to present the graduation frequency for a particular high school between the years 2008-2012. You would input your data in a table like this:

Year

Graduation rate

2008

87

2009

89

2010

92

2011

92

2012

96

From this table, you are able to present your data in a line chart like this:

This is a line graph depicting the percent rate of high school graduation over the years from 2008 to 2012

Maybe your data is more specific than above. For example, let’s say you are tasked with presenting the difference of graduation rates between male and female students. Then your chart would resemble something like this:

This is a line graph of percent rate of high school graduation for male and female students over the years from 2008 to 2012

Column chart 

Column charts use size to contrast and compare two or more values, using height or lengths to represent the specific values.  

The below is example data concerning sales of vehicles over the course of 5 months:

Month

Vehicles sold

August

2,800

September

3,700

October

3,750

November

4,300

December

4,600

Visually, it would resemble something like this:

A bar graph of monthly car sales from August to December

What would this column chart entail if we wanted to add the sales data for a competing car brand?

A double bar graph of monthly car sales for two different car brands from August to December

Heatmap 

Similar to bar charts, heatmaps also use color to compare categories in a data set. They are mainly used to show relationships between two variables and use a system of color-coding to represent different values. The following heatmap plots temperature changes for each city during the hottest and coldest months of the year.

A heatmap of varying climates for different cities around the world between June to January

Pie chart

The pie chart is a circular graph that is divided into segments representing proportions corresponding to the quantity it represents, especially when dealing with parts of a whole.

For example, let’s say you are determining favorite movie categories among avid movie watchers. You have gathered the following data:

Movie category

Preference

Comedy

41%

Drama

11%

Sci-fi

3%

Romance

17%

Action

28%

Visually, it would resemble something like this:

A pie chart of five movie categories and percentage of audience preference

Scatter plot

Scatter plots show relationships between different variables. Scatter plots are typically used for two variables for a set of data, although additional variables can be displayed.

For example, you might want to show data of the relationship between temperature changes and ice cream sales. It would resemble something like this:

A scatterplot showing the rising sales of ice cream as the temperature rises

As you may notice, the higher the temperature got, the more demand there was for ice cream – so the scatter plot is great for showing the relationship between the two variables.

Distribution graph

A distribution graph displays the spread of various outcomes in a dataset.

Let’s apply this to real data. To account for its supplies, a brand new coffee shop owner wants to measure how many cups of coffee their customers consume, and they want to know if that information is dependent on the days and times of the week. That distribution graph would resemble something like this:

A histogram showing cups of coffee purchased across all 7 days of the week

From this distribution graph, you may notice that the amount of coffee sales steadily increases from the beginning of the week, reaching the highest point mid-week, and then decreases towards the end of the week.

If outcomes are categorized on the x-axis by distinct numeric values (or ranges of numeric values), the distribution becomes a histogram. If data is collected from a customer rewards program, they could categorize how many customers consume between one and ten cups of coffee per week. The histogram would have ten columns representing the number of cups, and the height of the columns would indicate the number of customers drinking that many cups of coffee per week.

Reviewing each of these visual examples, where do you notice that they fit in relation to your type of data? One way to answer this is by evaluating patterns in data. Meaningful patterns can take many forms, such as:

  • Change: This is a trend or instance of observations that become different over time. A great way to measure change in data is through a line or column chart.

  • Clustering: A collection of data points with similar or different values. This is best represented through a distribution graph.

  • Relativity: These are observations considered in relation or in proportion to something else. You have probably seen examples of relativity data in a pie chart.

  • Ranking: This is a position in a scale of achievement or status. Data that requires ranking is best represented by a column chart.

  • Correlation: This shows a mutual relationship or connection between two or more things. A scatter plot is an excellent way to represent this type of data pattern.

Studying your data

Data analysts are tasked with collecting and interpreting data as well as displaying data in a meaningful and digestible way. Determining how to visualize your data will require studying your data’s patterns and converting it using visual cues. Feel free to practice your own charts and data in spreadsheets. Simply input your data in the spreadsheet, highlight it, then insert any chart type and view how your data can be visualized based on what you choose.

Data grows on decision trees

With so many visualization options out there for you to choose from, how do you decide what is the best way to represent your data? 

A decision tree is a decision-making tool that allows you, the data analyst, to make decisions based on key questions that you can ask yourself. Each question in the visualization decision tree will help you make a decision about critical features for your visualization. Below is an example of a basic decision tree to guide you towards making a data-driven decision about which visualization is the best way to tell your story. Please note that there are many different types of decision trees that vary in complexity, and can provide more in-depth decisions. 

A decision tree leading to the best chart

Begin with your story

Start off by evaluating the type of data you have and go through a series of questions to determine the best visual source:

  • Does your data have only one numeric variable? If you have data that has one, continuous, numerical variable, then a histogram or density plot are the best methods of plotting your categorical data. Depending on your type of data, a bar chart can even be appropriate in this case. For example, if you have data pertaining to the height of a group of students, you will want to use a histogram to visualize how many students there are in each height range:

A histogram measuring the height of students and how many students are in each height grouping
  • Are there multiple datasets? For cases dealing with more than one set of data, consider a line or pie chart for accurate representation of your data. A line chart will connect multiple data sets over a single, continuous line, showing how numbers have changed over time. A pie chart is good for dividing a whole into multiple categories or parts. An example of this is when you are measuring quarterly sales figures of your company. Below are examples of this data plotted on both a line and pie chart.

A line graph measuring quarterly sales figures throughout 1st, 2nd, 3rd and 4th quartersA pie chart measuring quarterly sales figures for 1st quarter (55.6%), 2nd quarter (30.1%), 3rd q (7.8%), 4th q (6.5%)

  • Are you measuring changes over time? A line chart is usually adequate for plotting trends over time. However, when the changes are larger, a bar chart is the better option. If, for example, you are measuring the number of visitors to NYC over the past 6 months, the data would look like this:

A bar graph measuring number of visitors over the months of June to November
  • Do relationships between the data need to be shown? When you have two variables for one set of data, it is important to point out how one affects the other. Variables that pair well together are best plotted on a scatter plot. However, if there are too many data points, the relationship between variables can be obscured so a heat map can be a better representation in that case. If you are measuring the population of people across all 50 states in the United States, your data points would consist of millions so you would use a heat map. If you are simply trying to show the relationship between the number of hours spent studying and its effects on grades, your data would look like this:

A scatterplot measuring the rise in test scores corresponding with the increase of minutes spent studying

A​dditional resources

The decision tree example used in this reading is one of many. There are multiple decision trees out there with varying levels of details that you can use to help guide your visual decisions. If you want more in-depth insight into more visual options, explore the following resources:

  • From data to visualization: This is an excellent analysis of a larger decision tree. With this comprehensive selection, you can search based on the kind of data you have or click on each  graphic example for a definition and proper usage.

  • Selecting the best chart: This two-part YouTube video can help take the guesswork out of data chart selection. Depending on the type of data you are aiming to illustrate, you will be guided through when to use, when to avoid, and several examples of best practices. Part 2 of this video provides even more examples of different charts, ensuring that there is a chart for every type of data out there. 


Comments

Popular posts from this blog

Using BigQuery / MySQL / other SQL

SQL : PostgreSQL

About spreadsheet basics