Data model : from IBM
- Get link
- X
- Other Apps
Data model families: The Fab 5
Now that you’ve got a handle on licenses, let’s talk about another critical consideration when selecting your database—data models.
When I first started at IBM, I needed to get up to speed fast, so I turned to Martin Fowler’s NoSQL Distilled.
In his writing, and in the industry at large, people tend to categorize databases into five "data model" families: document, key-value, graph, relational, and wide columnar. Here’s a quick overview of each one, including use cases and database-specific examples. This will help you determine, based on your data sets and business needs, which database you need.
1. Document
In this case, data is modeled in JSON-like documents, rather than rows and columns. These databases, by nature, value availability over transactional consistency. Document databases lend themselves to simplicity and scalability, as well as fast iteration in development.
Business use cases:
Mobile apps that require fast iterations
Event logging, online shopping, content management and in-depth analytical processing
Retail catalogs with product attributes
Examples:
Firebase
2. Key-Value
This type of model represents the most basic type of non-relational database, where each item in the database is stored as an attribute name (referred to as a key) with its corresponding value.
Business uses cases:
User preference and profile stores
Product recommendations based on browsing data
Shopping carts
Examples:
DynamoDB
Redis
etcd
3. Graph
Data here is modeled as vertices and edges (values and connections). Similarly to how people think and process information, graph databases recall the relationships between discrete units of data. These databases make the persistence, exploration, and visualization of data and relationships more intuitive.
Business uses cases:
Fraud detection
Real-time recommendation engines
Master data management
Network and IT operations
Identity and access management
Examples:
Neo4j
AWS Neptune
4. Relational
The relational model, introduced by R.F. Codd while here at IBM, is the titan of the industry. Data is stored in tables as rows and columns and often have sophisticated query engines for analytics and exploration. Relational databases support transactional guarantees and ACID (atomicity, consistency, isolation, and durability) compliance, whereas most databases in the other four families are eventually consistent.
Business uses cases:
E-commerce
Enterprise resource planning
Customer relationship management
Examples:
IBM Db2
5. Wide Columnar
Column family stores enable very quick data access using a row key, column name, and cell timestamp. The flexible schema of these types of databases means that the columns don’t have to be consistent across records, and you can add a column to specific rows without having to add them to every single record. Wide columnar stores are derived from Google's BigTable paper. These data models shouldn't be confused with Column-Oriented storage models, which is more relevant to data warehousing technologies and analytical access patterns due to improved compression of data on disk and more efficient use of CPU.
Business use cases:
Security and stock market analytics
Click stream analytics
IoT and telemetry
Examples:
Apache Cassandra
DataStax Enterprise
Google Cloud BigTable
The long and short of it is this—there are advantages and disadvantages to each primary data model (and we barely scratched the surface here). But when in doubt, go with something battle-tested and ubiquitous like PostgreSQL. To learn more about Data Model Families archetype, check out Martin Fowler’s book NoSQL Distilled, particularly chapters 8-11.
Ready to learn more about databases?
Phew! I covered a bit of ground here, but if you are itching to learn more, here are some suggestions based on time investment:
15 minutes: “How to Choose a Database” by Ben Anderson
15 minutes: "SQL vs. NoSQL Databases: What's the Difference?" by Ben Anderson and Brad Nicholson
45 minutes: "How to Choose a Database on IBM Cloud" webinar
Three hours: Jepsen analyses of distributed systems safety
One week: Designing Data-Intensive Applications by Martin Kleppman.
- Get link
- X
- Other Apps
Comments
Post a Comment