SQL : PostgreSQL

  Aggregate Functions Like most other relational database products,  PostgreSQL  supports  aggregate functions . An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the  count ,  sum ,  avg  (average),  max  (maximum) and  min  (minimum) over a set of rows. As an example, we can find the highest low-temperature reading anywhere with: SELECT max(temp_lo) FROM weather; max ----- 46 (1 row) If we wanted to know what city (or cities) that reading occurred in, we might try: SELECT city FROM weather WHERE temp_lo = max(temp_lo); WRONG but this will not work since the aggregate  max  cannot be used in the  WHERE  clause. (This restriction exists because the  WHERE  clause determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are computed.) However, as is o...

Data model : from IBM

 

Data model families: The Fab 5

Now that you’ve got a handle on licenses, let’s talk about another critical consideration when selecting your database—data models.

When I first started at IBM, I needed to get up to speed fast, so I turned to Martin Fowler’s NoSQL Distilled.

In his writing, and in the industry at large, people tend to categorize databases into five "data model" families: document, key-value, graph, relational, and wide columnar. Here’s a quick overview of each one, including use cases and database-specific examples. This will help you determine, based on your data sets and business needs, which database you need.

1. Document

In this case, data is modeled in JSON-like documents, rather than rows and columns. These databases, by nature, value availability over transactional consistency. Document databases lend themselves to simplicity and scalability, as well as fast iteration in development.

Business use cases:

  • Mobile apps that require fast iterations

  • Event logging, online shopping, content management and in-depth analytical processing

  • Retail catalogs with product attributes

Examples:

2. Key-Value 

This type of model represents the most basic type of non-relational database, where each item in the database is stored as an attribute name (referred to as a key) with its corresponding value.

Business uses cases:

  • User preference and profile stores

  • Product recommendations based on browsing data

  • Shopping carts

Examples:

  • DynamoDB

  • Redis

  • etcd

3. Graph

Data here is modeled as vertices and edges (values and connections). Similarly to how people think and process information, graph databases recall the relationships between discrete units of data. These databases make the persistence, exploration, and visualization of data and relationships more intuitive.

Business uses cases:

  • Fraud detection

  • Real-time recommendation engines

  • Master data management

  • Network and IT operations

  • Identity and access management

Examples:

4. Relational 

The relational model, introduced by R.F. Codd while here at IBM, is the titan of the industry. Data is stored in tables as rows and columns and often have sophisticated query engines for analytics and exploration. Relational databases support transactional guarantees and ACID (atomicity, consistency, isolation, and durability) compliance, whereas most databases in the other four families are eventually consistent.

Business uses cases:

  • E-commerce

  • Enterprise resource planning

  • Customer relationship management

Examples:

5. Wide Columnar

Column family stores enable very quick data access using a row key, column name, and cell timestamp. The flexible schema of these types of databases means that the columns don’t have to be consistent across records, and you can add a column to specific rows without having to add them to every single record. Wide columnar stores are derived from Google's BigTable paper. These data models shouldn't be confused with Column-Oriented storage models, which is more relevant to data warehousing technologies and analytical access patterns due to improved compression of data on disk and more efficient use of CPU.

Business use cases:

  • Security and stock market analytics

  • Click stream analytics

  • IoT and telemetry

Examples:

  • Apache Cassandra

  • DataStax Enterprise

  • Google Cloud BigTable  

The long and short of it is this—there are advantages and disadvantages to each primary data model (and we barely scratched the surface here). But when in doubt, go with something battle-tested and ubiquitous like PostgreSQL. To learn more about Data Model Families archetype, check out Martin Fowler’s book NoSQL Distilled, particularly chapters 8-11.

Ready to learn more about databases?

Phew! I covered a bit of ground here, but if you are itching to learn more, here are some suggestions based on time investment:

Comments

Popular posts from this blog

Using BigQuery / MySQL / other SQL

SQL : PostgreSQL

About spreadsheet basics