SQL : PostgreSQL

  Aggregate Functions Like most other relational database products,  PostgreSQL  supports  aggregate functions . An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the  count ,  sum ,  avg  (average),  max  (maximum) and  min  (minimum) over a set of rows. As an example, we can find the highest low-temperature reading anywhere with: SELECT max(temp_lo) FROM weather; max ----- 46 (1 row) If we wanted to know what city (or cities) that reading occurred in, we might try: SELECT city FROM weather WHERE temp_lo = max(temp_lo); WRONG but this will not work since the aggregate  max  cannot be used in the  WHERE  clause. (This restriction exists because the  WHERE  clause determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are computed.) However, as is o...

The R-versus-Python debate

 


The R-versus-Python debate

People often wonder which programming language they should learn first. You might be wondering about this, too. This certificate teaches the open-source programming language, R. R is a great starting point for foundational data analysis, and it has helpful packages that beginners can apply to projects. Python isn’t covered in the curriculum, but we encourage you to explore Python after completing the certificate. If you are curious about other programming languages, make every effort to continue learning.

Any language a beginner starts to learn will have some advantages and challenges. Let’s put this into context by looking at R and Python. The following table is a high-level overview based on a sampling of articles and opinions of those in the field. You can review the information without necessarily picking a side in the R vs. Python debate. In fact, if you check out RStudio’s blog article in the Additional resources section, it’s actually more about working together than winning a debate. 

Languages

R

Python

Common features

- Open-source - Data stored in data frames - Formulas and functions readily available - Community for code development and support

- Open-source - Data stored in data frames - Formulas and functions readily available - Community for code development and support

Unique advantages

- Data manipulation, data visualization, and statistics packages - "Scalpel" approach to data: find packages to do what you want with the data

- Easy syntax for machine learning needs - Integrates with cloud platforms like Google Cloud, Amazon Web Services, and Azure

Unique challenges

- Inconsistent naming conventions make it harder for beginners to select the right functions - Methods for handling variables may be a little complex for beginners to understand

- Many more decisions for beginners to make about data input/output, structure, variables, packages, and objects - "Swiss army knife" approach to data: figure out a way to do what you want with the data

Additional resources 

For more information on comparing R and Python, refer to these resources:

Key takeaways

Certain aspects make some programming languages easier to learn than others. But, that doesn’t make the harder languages impossible for beginners to learn. On the flip side, a programming language’s popularity doesn’t always make it the best language for beginners either. 

R has been used by professionals who have a statistical or research-oriented approach to solving problems; among them are scientists, statisticians, and engineers. Python has been used by professionals looking for solutions in the data itself, those who must heavily mine data for answers; among them are data scientists, machine learning specialists, and software developers.

As you grow as a data analytics professional, you may need to learn additional programming languages. The skills and competencies you learn from your first programming experience are a good foundation. That's why this course focuses on the basics of R. You can develop the right perspective, that programming languages play an important part in the data analysis process no matter what job title you have.

The good news is that many of the concepts and coding principles that you will learn from using R in this course are transferable to other programming languages. You will also learn how to write R code in an Integrated Development Environment (IDE) called RStudio. RStudio allows you to manage projects that use R or Python, or even a combination of the two. Refer to RStudio: A Single Home for R & Python for more information. So, after you have worked with R and RStudio, learning Python or another programming language in the future will be more intuitive. 

For a better idea of popular programming languages by job role, refer to Ways to learn about programming. The programming languages most commonly used by data analysts, web designers, mobile and web application developers, and game developers are listed, along with links to resources to help you start learning more about those languages. 

From spreadsheets to SQL to R

Although the programming language R might be new to you, it actually has a lot of similarities to the other tools you have explored in this program. In this reading, you will compare spreadsheet programs, SQL, and R to have a better sense of how to use each moving forward.

Image of person thinking with 3 speech bubbles: one has a bar chart, one has a spreadsheet, and one has the word "function"

Spreadsheets, SQL, and R: a comparison

As a data analyst, there is a good chance you will work with SQL, R, and spreadsheets at some point in your career. Each tool has its own strengths and weaknesses, but they all make the data analysis process smoother and more efficient. There are two main things that all three have in common:

  • They all use filters: for example, you can easily filter a dataset using any of these tools. In R, you can use the filter function. This performs the same task as a basic SELECT-FROM-WHERE SQL query. In a spreadsheet, you can create a filter using the menu options.

  • They all use functions: In spreadsheets, you use functions in formulas, and in SQL, you include them in queries. In R, you will use functions in the code that is part of your analysis.

The table below presents key questions to explore a few more ways that these tools compare to each other. You can use this as a general guide as you begin to navigate R. 

Key question

Spreadsheets

SQL

R

What is it?

A program that uses rows and columns to organize data and allows for analysis and manipulation through formulas, functions, and built-in features

A database programming language used to communicate with databases to conduct an analysis of data

A general purpose programming language used for statistical analysis, visualization, and other data analysis

W​hat is a primary advantage?

I​ncludes a variety of visualization tools and features

A​llows users to manipulate and reorganize data as needed to aid analysis

P​rovides an accessible language to organize, modify, and clean data frames, and create insightful data visualizations

Which datasets does it work best with?

Smaller datasets

Larger datasets

Larger datasets

What is the source of the data?

Entered manually or imported from an external source

Accessed from an external database

Loaded with R when installed, imported from your computer, or loaded from external sources

Where is the data from my analysis usually stored?

In a spreadsheet file on your computer

Inside tables in the accessed database

In an R file on your computer

Do I use formulas and functions?

Yes

Yes

Yes

Can I create visualizations?

Yes

Yes, by using an additional tool like a database management system (DBMS) or a business intelligence (BI) tool

Yes

CRAN Mirrors

The Comprehensive R Archive Network is available at the following URLs, please choose a location close to you. Some statistics on the status of the mirrors can be found here: main pagewindows releasewindows old release.

If you want to host a new mirror at your institution, please have a look at the CRAN Mirror HOWTO.

0-Cloud
https://cloud.r-project.org/Automatic redirection to servers worldwide, currently sponsored by Rstudio
Algeria
https://cran.usthb.dz/University of Science and Technology Houari Boumediene
Argentina
http://mirror.fcaglp.unlp.edu.ar/CRAN/Universidad Nacional de La Plata
Australia
https://cran.csiro.au/CSIRO
https://mirror.aarnet.edu.au/pub/CRAN/AARNET
https://cran.ms.unimelb.edu.au/School of Mathematics and Statistics, University of Melbourne
https://cran.curtin.edu.au/Curtin University
Austria
https://cran.wu.ac.at/Wirtschaftsuniversität Wien
Belgium
https://www.freestatistics.org/cran/Patrick Wessa
https://ftp.belnet.be/mirror/CRAN/Belnet, the Belgian research and education network
Brazil
https://nbcgib.uesc.br/mirrors/cran/Computational Biology Center at Universidade Estadual de Santa Cruz
https://cran-r.c3sl.ufpr.br/Universidade Federal do Parana
https://cran.fiocruz.br/Oswaldo Cruz Foundation, Rio de Janeiro
https://vps.fmvz.usp.br/CRAN/University of Sao Paulo, Sao Paulo
https://brieger.esalq.usp.br/CRAN/University of Sao Paulo, Piracicaba
Bulgaria
https://ftp.uni-sofia.bg/CRAN/Sofia University
Canada
https://mirror.rcg.sfu.ca/mirror/CRAN/Simon Fraser University, Burnaby
https://muug.ca/mirror/cran/Manitoba Unix User Group
https://utstat.toronto.edu/cran/University of Toronto
https://cran.pacha.dev/DigitalOcean
https://mirror.csclub.uwaterloo.ca/CRAN/University of Waterloo
Chile
https://cran.dcc.uchile.cl/Departamento de Ciencias de la Computación, Universidad de Chile
China
https://mirrors.tuna.tsinghua.edu.cn/CRAN/TUNA Team, Tsinghua University
https://mirrors.bfsu.edu.cn/CRAN/Beijing Foreign Studies University
https://mirrors.ustc.edu.cn/CRAN/University of Science and Technology of China
https://mirror-hk.koddos.net/CRAN/KoDDoS in Hong Kong
https://mirrors.e-ducation.cn/CRAN/Elite Education
https://mirror.lzu.edu.cn/CRAN/Lanzhou University Open Source Society
https://mirrors.nju.edu.cn/CRAN/eScience Center, Nanjing University
https://mirrors.tongji.edu.cn/CRAN/Tongji University
https://mirrors.sjtug.sjtu.edu.cn/cran/Shanghai Jiao Tong University
https://mirrors.sustech.edu.cn/CRAN/Southern University of Science and Technology (SUSTech)
Colombia
https://www.icesi.edu.co/CRAN/Icesi University
Costa Rica
https://mirror.uned.ac.cr/cran/Distance State University (UNED)
Cyprus
https://mirror.library.ucy.ac.cy/cran/University of Cyprus
Czech Republic
https://mirrors.nic.cz/R/CZ.NIC, Prague
Denmark
https://mirrors.dotsrc.org/cran/Aalborg University
0-Cloud-East-Asia
https://cran.asia/Personnel Psychology Laboratory, Kwangwoon University (sites: Seoul, Tokyo, Singapore, Manila, Bangalore)
Ecuador
https://mirror.cedia.org.ec/CRAN/CEDIA
https://mirror.epn.edu.ec/CRAN/Escuela Politécnica Nacional
El Salvador
http://cran.salud.gob.sv/Ministry of Health (Ministerio de Salud)
Estonia
https://ftp.eenet.ee/pub/cran/EENet
France
https://pbil.univ-lyon1.fr/CRAN/Dept. of Biometry & Evol. Biology, University of Lyon
https://mirror.ibcp.fr/pub/CRAN/CNRS IBCP, Lyon
https://cran.biotools.fr/IBDM, Marseille
https://ftp.igh.cnrs.fr/pub/CRAN/Institut de Genetique Humaine, Montpellier
https://cran.irsn.fr/French Nuclear Safety Institute, Paris
Germany
https://ftp.fau.de/cran/Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
https://mirror.dogado.de/cran/dogado GmbH
https://ftp.gwdg.de/pub/misc/cran/GWDG Göttingen
https://cran.uni-muenster.de/University of Münster, Germany
https://mirror.clientvps.com/CRAN/ClientVPS
https://packages.othr.de/cran/OTH Regensburg
Greece
https://ftp.cc.uoc.gr/mirrors/CRAN/University of Crete
Hungary
https://cran.rapporter.net/Rapporter.net, Budapest
Iceland
https://cran.hafro.is/Marine Research Institute
India
https://mirror.niser.ac.in/cran/National Institute of Science Education and Research (NISER)
Indonesia
https://repo.bppt.go.id/cran/Agency for The Application and Assessment of Technology
Iran
https://cran.um.ac.ir/Ferdowsi University of Mashhad
Italy
https://cran.mirror.garr.it/CRAN/Garr Mirror, Milano
https://cran.stat.unipd.it/University of Padua
Japan
https://cran.ism.ac.jp/The Institute of Statistical Mathematics, Tokyo
https://ftp.yz.yamagata-u.ac.jp/pub/cran/Yamagata University
Korea
https://ftp.harukasan.org/CRAN/Information and Database Systems Laboratory, Pukyong National University
https://cran.yu.ac.kr/Yeungnam University
https://cran.seoul.go.kr/Bigdata Campus, Seoul Metropolitan Govermment
https://cran.biodisk.org/The Genome Institute of UNIST (Ulsan National Institute of Science and Technology)
Malaysia
https://mirrors.upm.edu.my/CRAN/Universiti Putra Malaysia
Mexico
https://cran.itam.mx/Instituto Tecnologico Autonomo de Mexico
https://www.est.colpos.mx/Colegio de Postgraduados, Texcoco
Morocco
https://mirror.marwan.ma/cran/MARWAN
Netherlands
https://mirror.lyrahosting.com/CRAN/Lyra Hosting
New Zealand
https://cran.stat.auckland.ac.nz/University of Auckland
Norway
https://cran.uib.no/University of Bergen
Portugal
https://cran.radicaldevelop.com/RadicalDevelop, Lda
Russia
https://cran.cmm.msu.ru/Department of Biokinetics, Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University
https://mirror.truenetwork.ru/CRAN/Truenetwork
South Africa
https://cran.mirror.ac.za/TENET, Johannesburg
Spain
https://ftp.cixug.es/CRAN/Oficina de software libre (CIXUG)
https://cran.rediris.es/Spanish National Research Network, Madrid
Sweden
https://ftpmirror1.infania.net/mirror/CRAN/Infania Networks
https://ftp.acc.umu.se/mirror/CRAN/Academic Computer Club, Umeå University
Switzerland
https://stat.ethz.ch/CRAN/ETH Zürich
Taiwan
https://cran.csie.ntu.edu.tw/National Taiwan University, Taipei
Thailand
http://mirrors.psu.ac.th/pub/cran/Prince of Songkla University, Hatyai
Turkey
https://cran.pau.edu.tr/Pamukkale University, Denizli
https://cran.gedik.edu.tr/Istanbul Gedik University
https://cran.ncc.metu.edu.tr/Middle East Technical University Northern Cyprus Campus, Mersin
UK
https://www.stats.bris.ac.uk/R/University of Bristol
https://cran.ma.imperial.ac.uk/Imperial College London
USA
https://mirror.las.iastate.edu/CRAN/Iowa State University, Ames, IA
http://ftp.ussg.iu.edu/CRAN/Indiana University
https://rweb.crmda.ku.edu/cran/University of Kansas, Lawrence, KS
https://repo.miserver.it.umich.edu/cran/MBNI, University of Michigan, Ann Arbor, MI
http://cran.wustl.edu/Washington University, St. Louis, MO
https://archive.linux.duke.edu/cran/Duke University, Durham, NC
https://cran.case.edu/Case Western Reserve University, Cleveland, OH
https://ftp.osuosl.org/pub/cran/Oregon State University
http://lib.stat.cmu.edu/R/CRAN/Statlib, Carnegie Mellon University, Pittsburgh, PA
https://cran.mirrors.hoobly.com/Hoobly Classifieds, Pittsburgh, PA
https://mirrors.nics.utk.edu/cran/National Institute for Computational Sciences, Oak Ridge, TN
https://cran.microsoft.com/Revolution Analytics, Dallas, TX
Uruguay
https://espejito.fder.edu.uy/cran/



Facultad de Derecho, Universidad de la República



When to use RStudio

As a data analyst, you will have plenty of tools to work with in each phase of your analysis. Sometimes, you will be able to meet your objectives by working in a spreadsheet program or using SQL with a database. In this reading, you will go through some examples of when working in R and RStudio might be your better option instead. 

Image of maintenance worker handing a wrench tool to an office worker who is sitting at their desk

Why RStudio?

One of your core tasks as an analyst will be converting raw data into insights that are accurate, useful, and interesting. That can be tricky to do when the raw data is complex. R and RStudio are designed to handle large data sets, which spreadsheets might not be able to handle as well. RStudio also makes it easy to reproduce your work on different datasets. When you input your code, it's simple to just load a new dataset and run your scripts again. You can also create more detailed visualizations using RStudio. 

When RStudio truly shines

When the data is spread across multiple categories or groups, it can be challenging to manage your analysis, visualize trends, and build graphics. And the more groups of data that you need to work with, the harder those tasks become. That’s where RStudio comes in.

For example, imagine you are analyzing sales data for every city across an entire country. That is a lot of data from a lot of different groups–in this case, each city has its own group of data. 

Here are a few ways RStudio could help in this situation:

  • Using RStudio makes it easy to take a specific analysis step and perform it for each group using basic code. In this example, you could calculate the yearly average sales data for every city. 

  • RStudio also allows for flexible data visualization. You can visualize differences across the cities effectively using plotting features like facets–which you’ll learn more about later on.

  • You can also use RStudio to automatically create an output of summary stats—or even your visualized plots—for each group.

As you learn more about R and RStudio moving forward in this program, you’ll get a better understanding of when RStudio should be your data analysis tool of choice.

For more information

  • The Advantages of RStudio: This web page explains some of the reasons why RStudio is many analysts’ preferred choice for interfacing with R. You’ll learn about the advantages of using RStudio for data analysis, from ease of use to accessibility of graphics and more. 

  • Data analysis and R programming: This online introduction to data analysis and R programming is a good starting point for R and RStudio users. It also includes a list of detailed explanations about the advantages of using R and RStudio. You’ll also find a helpful guide for getting set up with RStudio.


onnecting with other analysts in the R community

R is a powerful tool in your data analysis toolkit–and it also has a powerful community of users who are excited to share, collaborate, and connect with others. This reading will give you a few places where you can start to connect, online and in-person, with other analysts in the R community.

A group of people is standing in a conference room. They are greeting each other and shaking hands

Online communities

Online communities allow you to connect with other R users no matter where you live. This list includes forums and discussion channels where you can join the conversation. It also includes social media tags you can use on your existing social media platforms to connect with other data analysts. 

  • RStudio Community: The RStudio Community forum is a great place to get help and find solutions to challenges you have with R–and maybe help someone else out, too!

  • r/RLanguage: The R language subreddit is an active online community on the social media platform Reddit, where R users go to discuss R, ask questions, and share tips. 

  • rOpenSci: rOpenSci has a community forum where R users can ask questions and search for solutions. It also includes links to their Best Practices guide and support pages. 

  • R4DS Online Learning Community and Slack channel: This is a community with another Slack channel where R learners and mentors can gather and connect. This is a great place to chat about using R for data science. 

  • Twitter #rstats: If you use Twitter, you can connect with other R users using the hashtag #rstats; a lot of R developers and analysts are active on Twitter. 

Meetups

Many organizations host both in-person and online meetups for R users; you should always practice caution and be safe whenever attending meetups in-person. 

  • Local Data Analytics meetups: These meetups are a great way to meet other people who are interested in data analytics and build your network. These meetups are location-based, so you can connect with other data analysts in your area. 

  • R User Groups: This list contains links to regional R communities, including subreddits and meetup groups. This is a useful resource if you are interested in finding R users in your area. 

  • RLadies Meetups: These are in-person and virtual meetups specifically for R enthusiasts who identify as underrepresented or marginalized. These meetups are also location-based and can help you connect with other data analysts in your area. 

R can be tricky to learn, but luckily there is a strong community of R users who are interested in working together and helping each other out. These resources are a good starting point if you want to begin connecting with the larger data analyst community, so take advantage of them! 


Comments

Popular posts from this blog

Using BigQuery / MySQL / other SQL

SQL : PostgreSQL

About spreadsheet basics