More on R

More about tibbles

In this reading, you will learn about tibbles, which are a super useful tool for organizing data in R. You will get a review of what tibbles are, how they differ from standard data frames, and how to create them in R. 
Tibbles 
Tibbles are a little different from standard data frames. A data frame is a collection of columns, like a spreadsheet or a SQL table. Tibbles are like streamlined data frames that are automatically set to pull up only the first 10 rows of a dataset, and only as many columns as can fit on the screen. This is really useful when you’re working with large sets of data. Unlike data frames, tibbles never change the names of your variables, or the data types of your inputs. Overall, you can make more changes to data frames, but tibbles are easier to use. The tibble package is part of the core tidyverse. So, if you’ve already installed the tidyverse, you have what you need to start working with tibbles. 
Creating tibbles Now, let’s go through an example of how to create a tibble in R. You can use the pre-loaded diamonds dataset that you’re familiar with from earlier videos. As a reminder, the diamonds dataset includes information about different diamond qualities, like carat, cut, color, clarity, and more. 
You can load the dataset with the data() function using the the following code:
library(tidyverse) 
data(diamonds)
Then, let’s add the data frame to our data viewer in RStudio with the View() function. 
View(diamonds)
The dataset has 10 columns and thousands of rows. This image displays part of the data frame:
Now let’s create a tibble from the same dataset. You can create a tibble from existing data with the as_tibble() function. Indicate what data you’d like to use in the parentheses of the function. In this case, you will put the word “diamonds."
as_tibble(diamonds)
ResultsWhen you run the function, you get a tibble of the diamonds dataset. 
While RStudio’s built-in data frame tool returns thousands of rows in the diamonds dataset, the tibble only returns the first 10 rows in a neatly organized table. That makes it easier to view and print. 
Additional resources For more information on tibbles, check out the following resources: 
The entry for Tibble in the tidyverse documentation summarizes what a tibble is and how it works in R code. If you want a quick overview of the essentials, this is the place to go. 
The Tidy chapter in "A Tidyverse Cookbook" is a great resource if you want to learn more about how to work with tibbles using R code. The chapter explores a variety of R functions that can help you create and transform tibbles to organize and tidy your data. 
Data-import basics
You can save this reading for future reference. Feel free to download a PDF version of this reading below:
Data import.pdfPDF File
Open file
The data() function 
The default installation of R comes with a number of preloaded datasets that you can practice with. This is a great way to develop your R skills and learn about some important data analysis functions. Plus, many online resources and tutorials use these sample datasets to teach coding concepts in R. 
You can use the data() function to load these datasets in R. If you run the data function without an argument, R will display a list of the available datasets. 
data()
This includes the list of preloaded datasets from the datasets package.
If you want to load a specific dataset, just enter its name in the parentheses of the data() function. For example, let’s load the mtcars dataset, which has information about cars that have been featured in past issues of Motor Trend magazine. 
data(mtcars)
When you run the function, R will load the dataset. The dataset will also appear in the Environment pane of your RStudio. The Environment pane displays the names of the data objects, such as data frames and variables, that you have in your current workspace. In this image, mtcars appears in the fifth row of the pane. R tells us that it contains 32 observations and 11 variables. 
Now that the dataset is loaded, you can get a preview of it in the R console pane. Just type its name...
mtcars
...and then press ctrl (or cmnd) and enter.
You can also display the dataset by clicking directly on the name of the dataset in the Environment pane. So, if you click on mtcars in the Environment pane, R automatically runs the View() function and displays the dataset in the RStudio data viewer. 
Try experimenting with other datasets in the list if you want some more practice. 
The readr packageIn addition to using R’s built-in datasets, it is also helpful to import data from other sources to use for practice or analysis. The readr package in R is a great tool for reading rectangular data. Rectangular data is data that fits nicely inside a rectangle of rows and columns, with each column referring to a single variable and each row referring to a single observation. 
Here are some examples of file types that store rectangular data:
.csv (comma separated values): a .csv file is a plain text file that contains a list of data. They mostly use commas to separate (or delimit) data, but sometimes they use other characters, like semicolons. 
.tsv (tab separated values): a .tsv file stores a data table in which the columns of data are separated by tabs. For example, a database table or spreadsheet data. 
.fwf (fixed width files): a .fwf file has a specific format that allows for the saving of textual data in an organized fashion. 
.log: a .log file is a computer-generated file that records events from operating systems and other software programs.
Base R also has functions for reading files, but the equivalent functions in readr are typically much faster. They also produce tibbles, which are easy to use and read. 
The readr package is part of the core tidyverse. So, if you’ve already installed the tidyverse, you have what you need to start working with readr. If not, you can install the tidyverse now. 
readr functionsThe goal of readr is to provide a fast and friendly way to read rectangular data. readr supports several read_ functions. Each function refers to a specific file format.
read_csv(): comma-separated values (.csv) files
read_tsv(): tab-separated values files
read_delim(): general delimited files
read_fwf(): fixed-width files
read_table(): tabular files where columns are separated by white-space
read_log(): web log files
These functions all have similar syntax, so once you learn how to use one of them, you can apply your knowledge to the others. This reading will focus on the read_csv() function, since .csv files are one of the most common forms of data storage and you will work with them frequently.
In most cases, these functions will work automatically: you supply the path to a file, run the function, and you get a tibble that displays the data in the file. Behind the scenes, readr parses the overall file and specifies how each column should be converted from a character vector to the most appropriate data type. 
Reading a .csv file with readr The readr package comes with some sample files from built-in datasets that you can use for example code. To list the sample files, you can run the readr_example() function with no arguments. 
readr_example()
[1] "challenge.csv"     "epa78.txt"         "example.log"      
[4] "fwf-sample.txt"    "massey-rating.txt" "mtcars.csv"       
[7] "mtcars.csv.bz2"    "mtcars.csv.zip"
The “mtcars.csv” file refers to the mtcars dataset that was mentioned earlier. Let’s use the read_csv() function to read the “mtcars.csv” file, as an example. In the parentheses, you need to supply the path to the file. In this case, it’s “readr_example(“mtcars.csv”). 
read_csv(readr_example("mtcars.csv"))
When you run the function, R prints out a column specification that gives the name and type of each column. 
R also prints a tibble. 
------------------------------------------------------------------------------------------------------
Optional: the readxl packageTo import spreadsheet data into R, you can use the readxl package. The readxl package makes it easy to transfer data from Excel into R. Readxl supports both the legacy .xls file format and the modern xml-based .xlsx file format. 
The readxl package is part of the tidyverse but is not a core tidyverse package, so you need to load readxl in R by using the library() function.  
library(readxl)
Reading a .csv file with readxlLike the readr package, readxl comes with some sample files from built-in datasets that you can use for practice. You can run the code readxl_example() to see the list.  
You can use the read_excel() function to read a spreadsheet file just like you used read_csv() function to read a  .csv file. The code for reading the example file “type-me.xlsx” includes the path to the file in the parentheses of the function.  
read_excel(readxl_example("type-me.xlsx"))
You can use the excel_sheets() function to list the names of the individual sheets. 
 excel_sheets(readxl_example("type-me.xlsx"))
[1] "logical_coercion" "numeric_coercion" "date_coercion" "text_coercion"
You can also specify a sheet by name or number.  Just type “sheet =” followed by the name or number of the sheet. For example, you can use the sheet named “numeric_coercion” from the list above. 
read_excel(readxl_example("type-me.xlsx"), sheet = "numeric_coercion")
When you run the function, R returns a tibble of the sheet. 
Additional resourcesIf you want to learn how to use readr functions to work with more complex files, check out the Data Import chapter of the R for Data Science book. It explores some of the common issues you might encounter when reading files, and how to use readr to manage those issues. 
The readxl entry in the tidyverse documentation gives a good overview of the basic functions in readxl, provides a detailed explanation of how the package operates and the coding concepts behind them, and offers links to other useful resources.
​The R "datasets" package contains lots of useful preloaded datasets. Check out The R Datasets Package for a list. The list includes links to detailed descriptions of each dataset.
              
tidyverse.org/packages

More on R operators
You might remember that an operator is a symbol that identifies the type of operation or calculation to be performed in a formula. In an earlier video, you learned how to use the assignment and arithmetic operators to assign variables and perform calculations. In this reading, you will review a detailed summary of the main types of operators in R, and learn how to use specific operators in R code. 
Operators  In R, there are four main types of operators:
Arithmetic
Relational 
Logical
Assignment 
Review the specific operators in each category and check out some examples of how to use them in R code.
Arithmetic operatorsArithmetic operators let you perform basic math operations like addition, subtraction, multiplication, and division. 
The table below summarizes the different arithmetic operators in R. The examples used in the table are based on the creation of two variables: : x equals 2 and y equals 5. Note that you use the assignment operator to store these values: 
x <- 2
y <- 5 
Operator
Description
Example Code
Result/ Output
+
Addition
x + y
[1] 7
-
Subtraction 
x - y
[1] -3
*
Multiplication
x * y
[1] 10
/
Division
x / y
[1] 0.4
%%
Modulus (returns the remainder after division)
y %% x
[1] 1
%/%
Integer division (returns an integer value after division) 
y%/% x
[1] 2
^
Exponent 
y ^ x
[1]25

Relational operators
Relational operators, also known as comparators, allow you to compare values. Relational operators identify how one R object relates to another—like whether an object is less than, equal to, or greater than another object. The output for relational operators is either TRUE or FALSE (which is a logical data type, or boolean).
The table below summarizes the six relational operators in R. The examples used in the table are based on the creation of two variables: x equals 2 and y equals 5. Note that you use the assignment operator to store these values.
x <- 2
y <- 5
If you perform calculations with each operator, you get the following results. In this case, the output is boolean: TRUE or FALSE. Note that the [1] that appears before each output is used to represent how output is displayed in RStudio.
Operator 
Description
Example Code
Result/Output
<
Less than
x < y
[1] TRUE
>
Greater than
x > y
[1] FALSE
<=
Less than or equal to
x < = 2
[1] TRUE
>=
Greater than or equal to
y >= 10
[1] FALSE
==
Equal to 
y == 5
[1] TRUE
!=
Not equal to 
x != 2
[1] FALSE

Logical operators
Logical operators allow you to combine logical values. Logical operators return a logical data type or boolean (TRUE or FALSE). You encountered logical operators in an earlier reading, Logical operators and conditional statements, but here is a quick refresher. 
The table below summarizes the logical operators in R.
Operator 
Description
&
Element-wise logical AND
&&
Logical AND
|
Element-wise logical OR
||
Logical OR 
!
Logical NOT

Next, check out some examples of how logical operators work in R code. 
Element-wise logical AND (&) and OR (|)
You can illustrate logical AND (&) and OR (|) by comparing numerical values. Create a variable x that is equal to 10. 
x <- 10
The AND operator returns TRUE only if both individual values are TRUE. 
x > 2 & x < 12
[1] TRUE
10 is greater than 2 and 10 is less than 12. So, the operation evaluates to TRUE. 
The OR operator (|) works in a similar way to the AND operator (&). The main difference is that just one of the values of the OR operation needs to be TRUE for the entire OR operation to evaluate to TRUE. Only if both values are FALSE will the entire OR operation evaluate to FALSE.
Now try an example with the same variable (x <- 10): 
x > 2 | x < 8
[1] TRUE
10 is greater than 2, but 10 is not less than 8. But since at least one of the values (10>2) is TRUE, the OR operation evaluates to TRUE. 
Logical AND (&&)  and OR (||)
The main difference between element-wise logical operators (&, |) and logical operators (&&, ||) is the way they apply to operations with vectors. The operations with double signs, AND (&&) and logical OR (||), only examine the first element of each vector. The operations with single signs, AND (&) and OR (|), examine all the elements of each vector. 
For example, imagine you are working with two vectors that each contain three elements: c(3, 5, 7) and c(2, 4, 6). The element-wise logical AND (&) will compare the first element of the first vector with the first element of the second vector (3&2), the second element with the second element (5&4), and the third element with the third element (7&6).
Now check out this example in R code. 
First, create two variables, x and y, to store the two vectors:
x <- c(3, 5, 7)
y <- c(2, 4, 6)
Then run the code with a single ampersand (&). The output is boolean (TRUE or FALSE).
x < 5 & y < 5
[1]  TRUE FALSE FALSE
When you compare each element of the two vectors, the output is TRUE, FALSE, FALSE. The first element of both x (3) and y (2) is less than 5, so this is TRUE. The second element of x is not less than 5 (it’s equal to 5) but the second element of y is less than 5, so this is FALSE (because you used AND). The third element of both x and y is not less than 5, so this is also FALSE.
Now, run the same operation using the double ampersand (&&):
x < 5 && y < 5
[1] TRUE
In this case, R only compares the first elements of each vector: 3 and 2. So, the output is TRUE because 3 and 2 are both less than 5. 
Depending on the type of work you do, you might make use of single sign operators more often than double sign operators. But it is helpful to know how all of the operators work regardless. 
Logical NOT (!)
The NOT operator simply negates the logical value, and evaluates to its opposite. In R, zero is considered FALSE and all non-zero numbers are considered TRUE. 
For example, apply the NOT operator to your variable (x <- 10):  
!(x < 15)
[1] FALSE
The NOT operation evaluates to FALSE because it takes the opposite logical value of the statement x < 15, which is TRUE (10 is less than 15).
Assignment operatorsAssignment operators let you assign values to variables. 
In many scripting programming languages you can just use the equal sign (=) to assign a variable. For R, the best practice is to use the arrow assignment (<-). Technically, the single arrow assignment can be used in the left or right direction. But the rightward assignment is not generally used in R code. 
You can also use the double arrow assignment, known as a scoping assignment. But the scoping assignment is for advanced R users, so you won’t learn about it in this reading. 
The table below summarizes the assignment operators and example code in R. Notice that the output for each variable is its assigned value.
Operator 
Description
Example Code (after the sample code below, typing x will generate the output in the next column)
Result/ Output
<-
Leftwards assignment 
x <- 2 
[1] 2
<<-
Leftwards assignment
x <<- 7 
[1] 7
= 
Leftwards assignment
x = 9 
[1] 9
->
Rightwards assignment 
11 -> x 
[1] 11
->>
Rightwards assignment 
21 ->> x 
[1] 21
The operators you learned about in this reading are a great foundation for using operators in R. 
Additional resourceCheck out the article about R Operators on the R Coder website for a comprehensive guide to the different types of operators in R. The article includes lots of useful coding examples, and information about miscellaneous operators, the infix operator, and the pipe operator. 
Logical operators and conditional statements
Tip: You may refresh on the concepts presented in Understanding Boolean logic to help you understand how logical operators work. 
You can save this reading for future reference. Feel free to download a PDF version of this reading below:
Logical operators and conditional statements.pdfPDF File
Open file
Earlier, you learned that an operator is a symbol that identifies the type of operation or calculation to be performed in a formula. In this reading, you will learn about the main types of logical operators and how they can be used to create conditional statements in R code. 
Logical operatorsLogical operators return a logical data type such as TRUE or FALSE. 
There are three primary types of logical operators:
AND (sometimes represented as & or && in R)
OR (sometimes represented as | or || in R)
NOT (!)
Review the summarized logical operators below.
AND operator “&”The AND operator takes two logical values. It returns TRUE only if both individual values are TRUE. This means that TRUE & TRUE evaluates to TRUE. However, FALSE & TRUE, TRUE & FALSE, and FALSE & FALSE all evaluate to FALSE.
If you run the the corresponding code in R, you get the following results:

> TRUE & TRUE
[1] TRUE
> TRUE & FALSE
[1] FALSE
> FALSE & TRUE
[1] FALSE
> FALSE & FALSE
[1] FALSE

You can illustrate this using the results of our comparisons. Imagine you create a variable x that is equal to 10. 

x <- 10

To check if x is greater than 3 but less than 12, you can use x > 3 and x < 12 as the values of an “AND” expression. 

x > 3 & x < 12

When you run the function, R returns the result TRUE.

[1] TRUE

The first part, x > 3 will evaluate to TRUE since 10 is greater than 3. The second part, x < 12 will also evaluate to TRUE since 10 is less than 12. So, since both values are TRUE, the result of the AND expression is TRUE. The number 10 lies between the numbers 3 and 12. 

However, if you make x equal to 20, the expression x > 3 & x < 12 will return a different result. 

x <- 20
x > 3 & x < 12
[1] FALSE

Although x > 3 is TRUE (20 > 3), x < 12 is FALSE (20 < 12). If one part of an AND expression is FALSE, the entire expression is FALSE (TRUE & FALSE = FALSE). So, R returns the result FALSE. 
OR operator “|”The OR operator (|) works in a similar way to the AND operator (&). The main difference is that at least one of the values of the OR operation must be TRUE for the entire OR operation to evaluate to TRUE. This means that TRUE | TRUE, TRUE | FALSE, and FALSE | TRUE all evaluate to TRUE. When both values are FALSE, the result is FALSE. 
If you write out the code, you get the following results: 

> TRUE | TRUE
[1] TRUE
> TRUE | FALSE
[1] TRUE
> FALSE | TRUE
[1] TRUE
> FALSE | FALSE
[1] FALSE

For example, suppose you create a variable y equal to 7. To check if y is less than 8 or greater than 16, you can use the following expression:

y <- 7
y < 8 | y > 16

The comparison result is TRUE (7 is less than 8) | FALSE (7 is not greater than 16). Since only one value of an OR expression needs to be TRUE for the entire expression to be TRUE, R returns a result of TRUE. 

[1] TRUE

Now, suppose y is 12. The expression y < 8 | y > 16 now evaluates to FALSE (12 < 8) | FALSE (12 > 16). Both comparisons are FALSE, so the result is FALSE.

y <- 12
y < 8 | y > 16
[1] FALSE
NOT operator “!”The NOT operator (!) simply negates the logical value it applies to. In other words, !TRUE evaluates to FALSE, and !FALSE evaluates to TRUE.
When you run the code, you get the following results: 

> !TRUE
 [1] FALSE
> !FALSE
[1] TRUE

Just like the OR and AND operators, you can use the NOT operator in combination with logical operators. Zero is considered FALSE and non-zero numbers are taken as TRUE. The NOT operator evaluates to the opposite logical value. 

Let’s imagine you have a variable x that equals 2: 

x <- 2

The NOT operation evaluates to FALSE because it takes the opposite logical value of a non-zero number (TRUE). 

> !x
[1] FALSE
-----------------
Let’s check out an example of how you might use logical operators to analyze data. Imagine you are working with the airquality dataset that is preloaded in RStudio. It contains data on daily air quality measurements in New York from May to September of 1973. 
The data frame has six columns: Ozone (the ozone measurement), Solar.R (the solar measurement), Wind (the wind measurement), Temp (the temperature in Fahrenheit), and the Month and Day of these measurements (each row represents a specific month and day combination). 
Let’s go through how the AND, OR, and NOT operators might be helpful in this situation.
AND exampleImagine you want to specify rows that are extremely sunny and windy, which you define as having a Solar measurement of over 150 and a Wind measurement of over 10.  
In R, you can express this logical statement as Solar.R > 150 & Wind > 10.
Only the rows where both of these conditions are true fulfill the criteria: 
OR exampleNext, imagine you want to specify rows where it’s extremely sunny or it’s extremely windy, which you define as having a Solar measurement of over 150 or a Wind measurement of over 10. 
In R, you can express this logical statement as Solar.R > 150 | Wind > 10.
All the rows where either of these conditions are true fulfill the criteria:
NOT exampleNow, imagine you just want to focus on the weather measurements for days that aren't the first day of the month.
In R, you can express this logical statement as Day != 1.
The rows where this condition is true fulfill the criteria:
Finally, imagine you want to focus on scenarios that aren't extremely sunny and not extremely windy, based on your previous definitions of extremely sunny and extremely windy. In other words, the following statement should not be true: either a Solar measurement greater than 150 or a Wind measurement greater than 10.
Notice that this statement is the opposite of the OR statement used above. To express this statement in R, you can put an exclamation point (!) in front of the previous OR statement: !(Solar.R > 150 | Wind > 10). R will apply the NOT operator to everything within the parentheses. 
In this case, only one row fulfills the criteria:
----------------------------------------------------------------------------------------------------------------------------------------
Optional: Conditional statements A conditional statement is a declaration that if a certain condition holds, then a certain event must take place. For example, “If the temperature is above freezing, then I will go outside for a walk.” If the first condition is true (the temperature is above freezing), then the second condition will occur (I will go for a walk). Conditional statements in R code have a similar logic. 
Let’s discuss how to create conditional statements in R using three related statements: 
if() 
else()
else if()
if statementThe if statement sets a condition, and if the condition evaluates to TRUE, the R code associated with the if statement is executed.
In R, you place the code for the condition inside the parentheses of the if statement. The code that has to be executed if the condition is TRUE follows in curly braces (expr). Note that in this case, the second curly brace is placed on its own line of code and identifies the end of the code that you want to execute. 
if (condition) {
 expr
}
For example, let’s create a variable x equal to 4.
x <- 4
Next, let’s create a conditional statement: if x is greater than 0, then R will print out the string “x is a positive number". 
if (x > 0) {
  print("x is a positive number")
}
Since x = 4, the condition is true (4 > 0). Therefore, when you run the code, R prints out the string “x is a positive number".
[1] "x is a positive number"
But if you change x to a negative number, like -4, then the condition will be FALSE (-4 > 0). If you run the code, R will not execute the print statement. Instead, a blank line will appear as the result.
else statementThe else statement is used in combination with an if statement. This is how the code is structured in R: 
if (condition) {
  expr1
} else {
 expr2
}
The code associated with the else statement gets executed whenever the condition of the if statement is not TRUE. In other words, if the condition is TRUE, then R will execute the code in the if statement (expr1); if the condition is not TRUE, then R will execute the code in the else statement (expr2). 
Let’s try an example. First, create a variable x equal to 7.  
x <- 7
Next, let’s set up the following conditions: 
If x is greater than 0, R will print “x is a positive number”.
If x is less than or equal to 0, R will print “x is either a negative number or zero”.
In our code, the first condition (x > 0) will be part of the if statement. The second condition of x less than or equal to 0 is implied in the else statement. If x > 0, then R will print “x is a positive number”. Otherwise, R will print “x is either a negative number or zero”. 
x <- 7
if (x > 0) {
 print ("x is a positive number")
} else {
 print ("x is either a negative number or zero")
}
Since 7 is greater than 0, the condition of the if statement is true. So, when you run the code, R prints out “x is a positive number”.
[1] "x is a positive number"
But if you make x equal to -7, the condition of the if statement is not true (-7 is not greater than 0). Therefore, R will execute the code in the else statement. When you run the code, R prints out “x is either a negative number or zero”. 
x <- -7
if (x > 0) {
 print("x is a positive number")
} else {
 print ("x is either a negative number or zero")
}
[1] "x is either a negative number or zero"
else if statementIn some cases, you might want to customize your conditional statement even further by adding the else if statement. The else if statement comes in between the if statement and the else statement. This is the code structure: 
if (condition1) {
 expr1
} else if (condition2) {
 expr2
} else {
 expr3
}
If the if condition (condition1) is met, then R executes the code in the first expression (expr1). If the if condition is not met, and the else if condition (condition2) is met, then R executes the code in the second expression (expr2). If neither of the two conditions are met, R executes the code in the third expression (expr3). 
In our previous example, using only the if and else statements, R can only print “x is either a negative number or zero” if x equals 0 or x is less than zero. Imagine you want R to print the string “x is zero” if x equals 0. You need to add another condition using the else if statement.
Let’s try an example. First, create a variable x equal to negative 1 (“-1”).  
x <- -1
Now, you want to set up the following conditions:
If x is less than 0, print “x is a negative number”
If x equals 0, print “x is zero”
Otherwise, print “x is a positive number”
In the code, the first condition will be part of the if statement, the second condition will be part of the else if statement, and the third condition will be part of the else statement. If x < 0, then R will print “x is a negative number”. If x = 0, then R will print “x is zero”. Otherwise, R will print “x is a positive number”. 
x <- -1
if (x < 0) {
 print("x is a negative number")
} else if (x == 0) {
 print("x is zero")
} else {
 print("x is a positive number")
}
Since -1 is less than 0,  the condition for the if statement evaluates to TRUE, and R prints “x is a negative number”. 
[1] "x is a negative number"
If you make x equal to 0, R will first check the if condition (x < 0), and determine that it is FALSE. Then, R will evaluate the else if condition. This condition, x==0, is TRUE. So, in this case, R prints “x is zero”. 
If you make x equal to 1, both the if condition and the else if condition evaluate to FALSE. So, R will execute the else statement and print “x is a positive number”.
As soon as R discovers a condition that evaluates to TRUE, R executes the corresponding code and ignores the rest. 
Additional resourceTo learn more about logical operators and conditional statements, check out DataCamp's tutorial Conditionals and Control Flow in R. DataCamp is a popular resource for people learning about computer programming. The tutorial is filled with useful examples of coding applications for logical operators and conditional statements (and relational operators), and offers a helpful overview of each topic and the connections between them. 
Understanding Boolean logic
In this reading, you will explore the basics of Boolean logic and learn how to use multiple conditions in a Boolean statement. These conditions are created with Boolean operators, including AND, OR, and NOT. These operators are similar to mathematical operators and can be used to create logical statements that filter your results. Data analysts use Boolean statements to do a wide range of data analysis tasks, such as creating queries for searches and checking for conditions when writing programming code. 
B​oolean logic exampleImagine you are shopping for shoes, and are considering certain preferences:
You will buy the shoes only if they are pink and grey
You will buy the shoes if they are entirely pink or entirely grey, or if they are pink and grey
You will buy the shoes if they are grey, but not if they have any pink
Below are Venn diagrams that illustrate these preferences. AND is the center of the Venn diagram, where two conditions overlap. OR includes either condition. NOT includes only the part of the Venn diagram that doesn't contain the exception.  
The AND operatorYour condition is “If the color of the shoe has any combination of grey and pink, you will buy them.” The Boolean statement would break down the logic of that statement to filter your results by both colors. It would say “IF (Color=”Grey”) AND (Color=”Pink”) then buy them.” The AND operator lets you stack multiple conditions. 
Below is a simple truth table that outlines the Boolean logic at work in this statement. In the Color is Grey column, there are two pairs of shoes that meet the color condition. And in the Color is Pink column, there are two pairs that meet that condition. But in the If Grey AND Pink column, there is only one pair of shoes that meets both conditions. So, according to the Boolean logic of the statement, there is only one pair marked true. In other words, there is one pair of shoes that you can buy.
Color is Grey
Color is Pink
If Grey AND Pink, then Buy
Boolean Logic
Grey/True
Pink/True
True/Buy
True AND True = True
Grey/True
Black/False
False/Don't buy
True AND False = False
Red/False
Pink/True
False/Don't buy
False AND True = False
Red/False
Green/False
False/Don't buy
False AND False = False 
The OR operatorThe OR operator lets you move forward if either one of your two conditions is met. Your condition is “If the shoes are grey or pink, you will buy them.” The Boolean statement would be “IF (Color=”Grey”) OR (Color=”Pink”) then buy them.” Notice that any shoe that meets either the Color is Grey or the Color is Pink condition is marked as true by the Boolean logic. According to the truth table below, there are three pairs of shoes that you can buy.
Color is Grey
Color is Pink
If Grey OR Pink, then Buy
Boolean Logic
Red/False
Black/False
False/Don't buy
False OR False = False
Black/False
Pink/True
True/Buy
False OR True = True
Grey/True
Green/False
True/Buy
True OR False = True
Grey/True
Pink/True
True/Buy
True OR True = True
The NOT operatorFinally, the NOT operator lets you filter by subtracting specific conditions from the results. Your condition is "You will buy any grey shoe except for those with any traces of pink in them." Your Boolean statement would be “IF (Color="Grey") AND (Color=NOT “Pink”) then buy them.” Now, all of the grey shoes that aren't pink are marked true by the Boolean logic for the NOT Pink condition. The pink shoes are marked false by the Boolean logic for the NOT Pink condition. Only one pair of shoes is excluded in the truth table below.
Color is Grey
Color is Pink
Boolean Logic 
for NOT Pink
If Grey AND (NOT Pink), then Buy
Boolean Logic
Grey/True
Red/False
Not False = True
True/Buy
True AND True = True
Grey/True
Black/False
Not False = True
True/Buy
True AND True = True
Grey/True
Green/False
Not False = True
True/Buy
True AND True = True
Grey/True
Pink/True
Not True = False
False/Don't buy
True AND False = False
The power of multiple conditionsFor data analysts, the real power of Boolean logic comes from being able to combine multiple conditions in a single statement. For example, if you wanted to filter for shoes that were grey or pink, and waterproof, you could construct a Boolean statement such as: “IF ((Color = ”Grey”) OR (Color = “Pink”)) AND (Waterproof=”True”).”  Notice that you can use parentheses to group your conditions together. 
Whether you are doing a search for new shoes or applying this logic to your database queries, Boolean logic lets you create multiple conditions to filter your results. And now that you know a little more about how Boolean logic is used, you can start using it!
Additional Reading/ResourcesLearn about who pioneered Boolean logic in this historical article: Origins of Boolean Algebra in the Logic of Classes.
F​ind more information about using AND, OR, and NOT from these tips for searching with Boolean operators.

File-naming conventions
An important part of cleaning data is making sure that all of your files are accurately named. Although individual preferences will vary a bit, most analysts generally agree that file names should be accurate, consistent, and easy to read. This reading provides some general guidelines for you to follow when naming or renaming your data files. 
What’s in a (file)name?When you first start working with R (or any other programming language, analysis tool, or platform, for that matter), you or your company should establish naming conventions for your files. This helps ensure that anyone reviewing your analysis–yourself included–can quickly and easily find what they need. Next are some helpful “do’s” and “don’ts” to keep in mind when naming your files.
DoKeep your filenames to a reasonable length
Use underscores and hyphens for readability
Start or end your filename with a letter or number
Use a standard date format when applicable; example: YYYY-MM-DD
Use filenames for related files that work well with default ordering; example: in chronological order, or logical order using numbers first
E​xamples of good filenames
2020-04-10_march-attendance.R
2021_03_20_new_customer_ids.csv
01_data-sales.html
02_data-sales.html
Don'tUse unnecessary additional characters in filenames
Use spaces or “illegal” characters; examples: &, %, #, <, or >
Start or end your filename with a symbol
Use incomplete or inconsistent date formats; example: M-D-YY
Use filenames for related files that do not work well with default ordering; examples: a random system of numbers or date formats, or using letters first
E​xamples of filenames to avoid
4102020marchattendance<workinprogress>.R
_20210320*newcustomeridsforfebonly.csv
firstfile_for_datasales/1-25-2020.html
secondfile_for_datasales/2-5-2020.html
Additional resourcesThese resources include more info about some of the file naming standards discussed here, and provide additional insights into best practices.
How to name files: this resource from Speaker Deck is a playful take on file naming. It includes several slides with tips and examples for how to accurately name lots of different types of files. You will learn why filenames should be both machine readable and human readable. 
File naming and structure: this resource from the Princeton University Library provides an easy-to-scan list of best practices, considerations, and examples for developing file naming conventions. 
Optional: Manually create a data frame
Coming up in the next video, you are going to learn how to transform data in R. The video will be using manually entered data instead of a data set from an R package.
If you would like to follow along with the video in your own RStudio console, you can copy and paste the following code to enter the data and create a data frame: 
id <- c(1:10)
name <- c("John Mendes", "Rob Stewart", "Rachel Abrahamson", "Christy Hickman", "Johnson Harper", "Candace Miller", "Carlson Landy", "Pansy Jordan", "Darius Berry", "Claudia Garcia")
job_title <- c("Professional", "Programmer", "Management", "Clerical", "Developer", "Programmer", "Management", "Clerical", "Developer", "Programmer")
employee <- data.frame(id, name, job_title)
Then, you can perform the functions from the video in your own console to practice transforming and cleaning data in R! Practicing along with the video will help you explore how these functions are supposed to work while also executing them yourself. You can also use this data frame to practice more after the video.  
Wide to long with tidyr
When organizing or tidying your data using R, you might need to convert wide data to long data or long to wide. Recall that this is what data in a wide format looks like in a spreadsheet:
Wide data has observations across several columns. Each column contains data from a different condition of the variable. In this example, different years. 
Now check out the same data in a long format:
And, to review what you already learned about the difference, long data has all the observations in a single column, and variables in separate columns. 
The pivot_longer and pivot_wider functionsThere are compelling reasons to use both formats. But as an analyst, it is important to know how to tidy data when you need to. In R, you may have a data frame in a wide format that has several variables and conditions for each variable. It might feel a bit messy. 
That’s where pivot_longer()comes in. As part of the tidyr package, you can use this R function to lengthen the data in a data frame by increasing the number of rows and decreasing the number of columns. Similarly, if you want to convert your data to have more columns and fewer rows, you would use the pivot_wider() function.
A​dditional resourcesTo learn more about these two functions and how to apply them in your R programming, check out these resources:
Pivoting: Consider this a starting point for tidying data through wide and long conversions. This web page is taken directly from tidyr package information at tidyverse.org. It explores the components of the pivot_longer and pivot_wider functions using specific details, examples, and definitions. 
CleanItUp 5: R-Ladies Sydney: Wide to Long to Wide to…PIVOT: This resource gives you additional details about the pivot_longer and pivot_wider functions. The examples provided use interesting datasets to illustrate how to convert data from wide to long and back to wide. 
Plotting multiple variables: This resource explains how to visualize wide and long data, with ggplot2 to help tidy it. The focus is on using pivot_longer to restructure data and make similar plots of a number of variables at once. You can apply what you learn from the other resources here for a broader understanding of the pivot functions.

Working with biased data
Every data analyst will encounter an element of bias at some point in the data analysis process. That’s why it’s so important to understand how to identify and manage biased data whenever possible. You might recall we explored bias in detail in Course 3 of this program. In this reading, you will read a real-life example of an analyst who discovered bias in their data, and learn how they used R to address it.  
Addressing biased data with RThis scenario was shared by a quantitative analyst who collects data from people all over the world. They explain how they discovered bias in their data, and how they used R to address it:
“I work on a team that collects survey-like data. One of the tasks my team does is called a side-by-side comparison. For example, we might show users two ads side-by-side at the same time. In our survey, we ask which of the two ads they prefer. In one case, after many iterations, we were seeing consistent bias in favor of the first item. There was also a measurable decrease in the preference for an item if we swapped its position to second.
So we decided to add randomization to the position of the ads using R. We wanted to make sure that the items appeared in the first and second positions with similar frequencies. We used sample() to inject a randomization element into our R programming. In R, the sample() function allows you to take a random sample of elements from a data set. Adding this piece of code shuffled the rows in our data set randomly. So when we presented the ads to users, the positions of the ads were now random and controlled for bias. This made the survey more effective and the data more reliable.”
Key takeawaysThe sample() function is just one of many functions and methods in R that you can use to address bias in your data. Depending on the kind of analysis you are conducting, you might need to incorporate some advanced processes in your programming. Although this program won’t cover those kinds of processes in detail, you will likely learn more about them as you get more experience in the data analytics field. 
To learn more about bias and data ethics, check out these resources: 
Bias function: This web page is a good starting point to learn about how the bias function in R can help you identify and manage bias in your analysis.
Data Science Ethics: This online course provides slides, videos, and exercises to help you learn more about ethics in the world of data analytics. It includes information about data privacy, misrepresentation in data, and applying ethics to your visualizations.
https://tidyr.tidyverse.org/articles/pivot.html
https://rladiessydney.org/courses/ryouwithme/02-cleanitup-5/
https://scc.ms.unimelb.edu.au/resources-list/simple-r-scripts-for-analysis/r-scripts
Operator	Description	Example Code	Result/ Output
+	Addition	x + y	[1] 7
-	Subtraction	x - y	[1] -3
*	Multiplication	x * y	[1] 10
/	Division	x / y	[1] 0.4
%%	Modulus (returns the remainder after division)	y %% x	[1] 1
%/%	Integer division (returns an integer value after division)	y%/% x	[1] 2
^	Exponent	y ^ x	[1]25
Operator	Description	Example Code	Result/Output
<	Less than	x < y	[1] TRUE
>	Greater than	x > y	[1] FALSE
<=	Less than or equal to	x < = 2	[1] TRUE
>=	Greater than or equal to	y >= 10	[1] FALSE
==	Equal to	y == 5	[1] TRUE
!=	Not equal to	x != 2	[1] FALSE
Operator	Description
&	Element-wise logical AND
&&	Logical AND
\|	Element-wise logical OR
\|\|	Logical OR
!	Logical NOT
Operator	Description	Example Code (after the sample code below, typing x will generate the output in the next column)	Result/ Output
<-	Leftwards assignment	x <- 2	[1] 2
<<-	Leftwards assignment	x <<- 7	[1] 7
=	Leftwards assignment	x = 9	[1] 9
->	Rightwards assignment	11 -> x	[1] 11
->>	Rightwards assignment	21 ->> x	[1] 21
Color is Grey	Color is Pink	If Grey AND Pink, then Buy	Boolean Logic
Grey/True	Pink/True	True/Buy	True AND True = True
Grey/True	Black/False	False/Don't buy	True AND False = False
Red/False	Pink/True	False/Don't buy	False AND True = False
Red/False	Green/False	False/Don't buy	False AND False = False
Color is Grey	Color is Pink	Boolean Logic for NOT Pink	If Grey AND (NOT Pink), then Buy	Boolean Logic
Grey/True	Red/False	Not False = True	True/Buy	True AND True = True
Grey/True	Black/False	Not False = True	True/Buy	True AND True = True
Grey/True	Green/False	Not False = True	True/Buy	True AND True = True
Grey/True	Pink/True	Not True = False	False/Don't buy	True AND False = False
SQL : PostgreSQL

More on R

More about tibbles

Tibbles

Creating tibbles

Results

Additional resources

Data-import basics

The data() function

The readr package

readr functions

Reading a .csv file with readr

Optional: the readxl package

Reading a .csv file with readxl

Additional resources

More on R operators

Operators

In R, there are four main types of operators:ArithmeticRelational LogicalAssignment Review the specific operators in each category and check out some examples of how to use them in R code.

Arithmetic operators

Assignment operators

Additional resource

Check out the article about R Operators on the R Coder website for a comprehensive guide to the different types of operators in R. The article includes lots of useful coding examples, and information about miscellaneous operators, the infix operator, and the pipe operator.

Logical operators and conditional statements

Logical operators

Logical operators return a logical data type such as TRUE or FALSE. There are three primary types of logical operators:AND (sometimes represented as & or && in R)OR (sometimes represented as | or || in R)NOT (!)Review the summarized logical operators below.

AND operator “&”

OR operator “|”

NOT operator “!”

AND example

OR example

NOT example

Optional: Conditional statements

if statement

else statement

else if statement

Additional resource

Understanding Boolean logic

B​oolean logic example

The AND operator

The OR operator

The NOT operator

The power of multiple conditions

Additional Reading/Resources

Learn about who pioneered Boolean logic in this historical article: Origins of Boolean Algebra in the Logic of Classes.F​ind more information about using AND, OR, and NOT from these tips for searching with Boolean operators.

File-naming conventions

What’s in a (file)name?

Do

Don't

Additional resources

Optional: Manually create a data frame

Wide to long with tidyr

The pivot_longer and pivot_wider functions

A​dditional resources

Working with biased data

Addressing biased data with R

Key takeaways

Comments

Post a Comment

Popular posts from this blog

SQL : PostgreSQL

Using BigQuery / MySQL / other SQL

The R-versus-Python debate

In R, there are four main types of operators:
Arithmetic
Relational
Logical
Assignment
Review the specific operators in each category and check out some examples of how to use them in R code.

Logical operators return a logical data type such as TRUE or FALSE.
There are three primary types of logical operators:
AND (sometimes represented as & or && in R)
OR (sometimes represented as | or || in R)
NOT (!)
Review the summarized logical operators below.

Boolean logic example

Learn about who pioneered Boolean logic in this historical article: Origins of Boolean Algebra in the Logic of Classes.
Find more information about using AND, OR, and NOT from these tips for searching with Boolean operators.

Additional resources