The structure of data

Data is everywhere and it can be stored in lots of ways. Two general categories of data are: 
Structured data: Organized in a certain format, such as rows and columns.
Unstructured data: Not organized in any easy-to-identify way.
For example, when you rate your favorite restaurant online, you're creating structured data. But when you use Google Earth to check out a satellite image of a restaurant location, you're using unstructured data. 
Here's a refresher on the characteristics of structured and unstructured data:
Structured data: 
- Defined data types
- Most often quantitative data
- Easy to organize
- Easy to search
- Easy to analyze
- Stored in relational databases
- Contained in rows and columns
- Examples: Excel, Google Sheets, SQL, customer data, phone records, transaction history

Unstructured data:
- Varied data types
- Most often qualitative data
- Difficult to search
- Provides more freedom for analysis
- Stored in data lakes and NoSQL databases
- Can't be put in rows and columns
- Examples: Text messages, social media comments, phone call transcriptions, various log files, images, audio, videoStructured dataAs we described earlier, structured data is organized in a certain format. This makes it easier to store and query for business needs. If the data is exported, the structure goes along with the data.
Unstructured dataUnstructured data can’t be organized in any easily identifiable manner. And there is much more unstructured than structured data in the world. Video and audio files, text files, social media content, satellite imagery, presentations, PDF files, open-ended survey responses, and websites all qualify as types of unstructured data. 
The fairness issueThe lack of structure makes unstructured data difficult to search, manage, and analyze. But recent advancements in artificial intelligence and machine learning algorithms are beginning to change that. Now, the new challenge facing data scientists is making sure these tools are inclusive and unbiased. Otherwise, certain elements of a dataset will be more heavily weighted and/or represented than others. And as you're learning, an unfair dataset does not accurately represent the population, causing skewed outcomes, low accuracy levels, and unreliable analysis.

" DATA-mine " by Jang

SQL : PostgreSQL

The structure of data

The structure of data

Structured data

Unstructured data

The fairness issue

Comments

Post a Comment

Popular posts from this blog

SQL : PostgreSQL

Familiarize with the ER diagram : UCDAVIS

The R-versus-Python debate