SQL : PostgreSQL

  Aggregate Functions Like most other relational database products,  PostgreSQL  supports  aggregate functions . An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the  count ,  sum ,  avg  (average),  max  (maximum) and  min  (minimum) over a set of rows. As an example, we can find the highest low-temperature reading anywhere with: SELECT max(temp_lo) FROM weather; max ----- 46 (1 row) If we wanted to know what city (or cities) that reading occurred in, we might try: SELECT city FROM weather WHERE temp_lo = max(temp_lo); WRONG but this will not work since the aggregate  max  cannot be used in the  WHERE  clause. (This restriction exists because the  WHERE  clause determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are computed.) However, as is o...

IBM data -basic knowledge

 A data analyst ecosystem includes the infrastructure, software, tools, frameworks, and processes used to gather, clean, analyze, mine, and visualize data.  

Based on how well-defined the structure of the data is, data can be categorized as:

  • Structured Data, that is data which is well organized in formats that can be stored in databases.

  • Semi-Structured Data, that is data which is partially organized and partially free form.

  • Unstructured Data, that is data which can not be organized conventionally into rows and columns.

Data comes in a wide-ranging variety of file formats, such as delimited text files, spreadsheets, XML, PDF, and JSON, each with its own list of benefits and limitations of use.  

Data is extracted from multiple data sources, ranging from relational and non-relational databases to APIs, web services, data streams, social platforms, and sensor devices. 

Once the data is identified and gathered from different sources, it needs to be staged in a data repository so that it can be prepared for analysis. The type, format, and sources of data influence the type of data repository that can be used. 

Data professionals need a host of languages that can help them extract, prepare, and analyze data. These can be classified as:  

  • Querying languages, such as SQL, used for accessing and manipulating data from databases. 

  • Programming languages such as Python, R, and Java, for developing applications and controlling application behavior.

  • Shell and Scripting languages, such as Unix/Linux Shell, and PowerShell, for automating repetitive operational tasks.

Comments

Popular posts from this blog

SQL : PostgreSQL

Using BigQuery / MySQL / other SQL

The R-versus-Python debate