Data analysis processes
- Get link
- X
- Other Apps
from COURSERA : Data analytics by Google
Origins of the data analysis process
When you decided to join this program, you proved that you are a curious person. So let’s tap into your curiosity and talk about the origins of data analysis. We don’t fully know when or why the first person decided to record data about people and things. But we do know it was useful because the idea is still around today!
We also know that data analysis is rooted in statistics, which has a pretty long history itself. Archaeologists mark the start of statistics in ancient Egypt with the building of the pyramids. The ancient Egyptians were masters of organizing data. They documented their calculations and theories on papyri (paper-like materials), which are now viewed as the earliest examples of spreadsheets and checklists. Today’s data analysts owe a lot to those brilliant scribes, who helped create a more technical and efficient process.
It is time to enter the data analysis life cycle—the process of going from data to decision. Data goes through several phases as it gets created, consumed, tested, processed, and reused. With a life cycle model, all key team members can drive success by planning work both up front and at the end of the data analysis process. While the data analysis life cycle is well known among experts, there isn't a single defined structure of those phases. There might not be one single architecture that’s uniformly followed by every data analysis expert, but there are some shared fundamentals in every data analysis process. This reading provides an overview of several, starting with the process that forms the foundation of the Google Data Analytics Certificate.
The process presented as part of the Google Data Analytics Certificate is one that will be valuable to you as you keep moving forward in your career:
Ask: Business Challenge/Objective/Question
Prepare: Data generation, collection, storage, and data management
Process: Data cleaning/data integrity
Analyze: Data exploration, visualization, and analysis
Share: Communicating and interpreting results
Act: Putting your insights to work to solve the problem
Understanding this process—and all of the iterations that helped make it popular—will be a big part of guiding your own analysis and your work in this program. Let’s go over a few other variations of the data analysis life cycle.
EMC's data analysis life cycle
EMC Corporation's data analytics life cycle is cyclical with six steps:
Discovery
Pre-processing data
Model planning
Model building
Communicate results
Operationalize
EMC Corporation is now Dell EMC. This model, created by David Dietrich, reflects the cyclical nature of real-world projects. The phases aren’t static milestones; each step connects and leads to the next, and eventually repeats. Key questions help analysts test whether they have accomplished enough to move forward and ensure that teams have spent enough time on each of the phases and don’t start modeling before the data is ready. It is a little different from the data analysis life cycle this program is based on, but it has some core ideas in common: the first phase is interested in discovering and asking questions; data has to be prepared before it can be analyzed and used; and then findings should be shared and acted on.
For more information, refer to The Genesis of EMC's Data Analytics Lifecycle.
SAS's iterative life cycle
An iterative life cycle was created by a company called SAS, a leading data analytics solutions provider. It can be used to produce repeatable, reliable, and predictive results:
Ask
Prepare
Explore
Model
Implement
Act
Evaluate
The SAS model emphasizes the cyclical nature of their model by visualizing it as an infinity symbol. Their life cycle has seven steps, many of which we have seen in the other models, like Ask, Prepare, Model, and Act. But this life cycle is also a little different; it includes a step after the act phase designed to help analysts evaluate their solutions and potentially return to the ask phase again.
For more information, refer to Managing the Analytics Life Cycle for Decisions at Scale.
Project-based data analytics life cycle
A project-based data analytics life cycle has five simple steps:
Identifying the problem
Designing data requirements
Pre-processing data
Performing data analysis
Visualizing data
This data analytics project life cycle was developed by Vignesh Prajapati. It doesn’t include the sixth phase, or what we have been referring to as the Act phase. However, it still covers a lot of the same steps as the life cycles we have already described. It begins with identifying the problem, preparing and processing data before analysis, and ends with data visualization.
For more information, refer to Understanding the data analytics project life cycle.
Big data analytics life cycle
Authors Thomas Erl, Wajid Khattak, and Paul Buhler proposed a big data analytics life cycle in their book, Big Data Fundamentals: Concepts, Drivers & Techniques. Their life cycle suggests phases divided into nine steps:
Business case evaluation
Data identification
Data acquisition and filtering
Data extraction
Data validation and cleaning
Data aggregation and representation
Data analysis
Data visualization
Utilization of analysis results
This life cycle appears to have three or four more steps than the previous life cycle models. But in reality, they have just broken down what we have been referring to as Prepare and Process into smaller steps. It emphasizes the individual tasks required for gathering, preparing, and cleaning data before the analysis phase.
→→→→→→→→→→→→→→→→→→→→→→→→→→→🤍🤍🤍→→→→→→→→→→→→→→→→
From issue to action: The six data analysis phases
There are six data analysis phases that will help you make seamless decisions: ask, prepare, process, analyze, share, and act. Keep in mind, these are different from the data life cycle, which describes the changes data goes through over its lifetime. Let’s walk through the steps to see how they can help you solve problems you might face on the job.

Step 1: Ask
It’s impossible to solve a problem if you don’t know what it is. These are some things to consider:
Define the problem you’re trying to solve
Make sure you fully understand the stakeholder’s expectations
Focus on the actual problem and avoid any distractions
Collaborate with stakeholders and keep an open line of communication
Take a step back and see the whole situation in context
Questions to ask yourself in this step:
What are my stakeholders saying their problems are?
Now that I’ve identified the issues, how can I help the stakeholders resolve their questions?

Step 2: Prepare
You will decide what data you need to collect in order to answer your questions and how to organize it so that it is useful. You might use your business task to decide:
What metrics to measure
Locate data in your database
Create security measures to protect that data
Questions to ask yourself in this step:
What do I need to figure out how to solve this problem?
What research do I need to do?

Step 3: Process
Clean data is the best data and you will need to clean up your data to get rid of any possible errors, inaccuracies, or inconsistencies. This might mean:
Using spreadsheet functions to find incorrectly entered data
Using SQL functions to check for extra spaces
Removing repeated entries
Checking as much as possible for bias in the data
Questions to ask yourself in this step:
What data errors or inaccuracies might get in my way of getting the best possible answer to the problem I am trying to solve?
How can I clean my data so the information I have is more consistent?

Step 4: Analyze
You will want to think analytically about your data. At this stage, you might sort and format your data to make it easier to:
Perform calculations
Combine data from multiple sources
Create tables with your results
Questions to ask yourself in this step:
What story is my data telling me?
How will my data help me solve this problem?
Who needs my company’s product or service? What type of person is most likely to use it?

Step 5: Share
Everyone shares their results differently so be sure to summarize your results with clear and enticing visuals of your analysis using data viz tools like graphs or dashboards. This is your chance to show the stakeholders you have solved their problem and how you got there. Sharing will certainly help your team:
Make better decisions
Make more informed decisions
Lead to stronger outcomes
Successfully communicate your findings
Questions to ask yourself in this step:
How can I make what I present to the stakeholders engaging and easy to understand?
What would help me understand this if I were the listener?

Step 6: Act
Now it’s time to act on your data. You will take everything you have learned from your data analysis and put it to use. This could mean providing your stakeholders with recommendations based on your findings so they can make data-driven decisions.
Questions to ask yourself in this step:
How can I use the feedback I received during the share phase (step 5) to actually meet the stakeholder’s needs and expectations?
These six steps can help you to break the data analysis process into smaller, manageable parts, which is called structured thinking. This process involves four basic activities:
Recognizing the current problem or situation
Organizing available information
Revealing gaps and opportunities
Identifying your options
When you are starting out in your career as a data analyst, it is normal to feel pulled in a few different directions with your role and expectations. Following processes like the ones outlined here and using structured thinking skills can help get you back on track, fill in any gaps and let you know exactly what you need.
- Get link
- X
- Other Apps

Comments
Post a Comment