Data model : from IBM

November 09, 2021

Data model families: The Fab 5

Now that you’ve got a handle on licenses, let’s talk about another critical consideration when selecting your database—data models.

When I first started at IBM, I needed to get up to speed fast, so I turned to Martin Fowler’s NoSQL Distilled.

In his writing, and in the industry at large, people tend to categorize databases into five "data model" families: document, key-value, graph, relational, and wide columnar. Here’s a quick overview of each one, including use cases and database-specific examples. This will help you determine, based on your data sets and business needs, which database you need.

1. Document

In this case, data is modeled in JSON-like documents, rather than rows and columns. These databases, by nature, value availability over transactional consistency. Document databases lend themselves to simplicity and scalability, as well as fast iteration in development.

Business use cases:

Mobile apps that require fast iterations
Event logging, online shopping, content management and in-depth analytical processing
Retail catalogs with product attributes

Examples:

2. Key-Value

This type of model represents the most basic type of non-relational database, where each item in the database is stored as an attribute name (referred to as a key) with its corresponding value.

Business uses cases:

User preference and profile stores
Product recommendations based on browsing data
Shopping carts

Examples:

DynamoDB
Redis
etcd

3. Graph

Data here is modeled as vertices and edges (values and connections). Similarly to how people think and process information, graph databases recall the relationships between discrete units of data. These databases make the persistence, exploration, and visualization of data and relationships more intuitive.

Business uses cases:

Fraud detection
Real-time recommendation engines
Master data management
Network and IT operations
Identity and access management

Examples:

Neo4j
JanusGraph
AWS Neptune

4. Relational

The relational model, introduced by R.F. Codd while here at IBM, is the titan of the industry. Data is stored in tables as rows and columns and often have sophisticated query engines for analytics and exploration. Relational databases support transactional guarantees and ACID (atomicity, consistency, isolation, and durability) compliance, whereas most databases in the other four families are eventually consistent.

Business uses cases:

E-commerce
Enterprise resource planning
Customer relationship management

Examples:

PostgreSQL
IBM Db2

5. Wide Columnar

Column family stores enable very quick data access using a row key, column name, and cell timestamp. The flexible schema of these types of databases means that the columns don’t have to be consistent across records, and you can add a column to specific rows without having to add them to every single record. Wide columnar stores are derived from Google's BigTable paper. These data models shouldn't be confused with Column-Oriented storage models, which is more relevant to data warehousing technologies and analytical access patterns due to improved compression of data on disk and more efficient use of CPU.

Business use cases:

Security and stock market analytics
Click stream analytics
IoT and telemetry

Examples:

Apache Cassandra
DataStax Enterprise
Google Cloud BigTable

The long and short of it is this—there are advantages and disadvantages to each primary data model (and we barely scratched the surface here). But when in doubt, go with something battle-tested and ubiquitous like PostgreSQL. To learn more about Data Model Families archetype, check out Martin Fowler’s book NoSQL Distilled, particularly chapters 8-11.

Ready to learn more about databases?

Phew! I covered a bit of ground here, but if you are itching to learn more, here are some suggestions based on time investment:

15 minutes: “How to Choose a Database” by Ben Anderson
15 minutes: "SQL vs. NoSQL Databases: What's the Difference?" by Ben Anderson and Brad Nicholson
45 minutes: "How to Choose a Database on IBM Cloud" webinar
Three hours: Jepsen analyses of distributed systems safety
One week: Designing Data-Intensive Applications by Martin Kleppman.

" DATA-mine " by Jang

SQL : PostgreSQL