Cloudera
- Get link
- X
- Other Apps
Data Extraction from Digital Images
The examples involving the movie poster, and the information you obtain by reading it, invite the question: If I have a digital photograph of a movie poster, couldn't reading the photo be a form of data analysis?
The answer is certainly yes, but that image processing would be a kind of data analysis that is very different from the focus of this course and specialization. This reading covers such image processing and related topics briefly.
To have a computer with a digital camera "see" a movie poster, and then interpret the image in a useful way, is in the general domain of computer vision. Identifying objects or letters represented in a digital image is a kind of classification, a type of machine learning in increasing use today.
A classifier is a type of computer program that can take records with potentially many data points, and can infer one or more simple categorical values that are suitable for the record. For example, given a picture of an object (a record with many pixel values), name the object (say, "cat," "house," or "table"). The program performs the computational task of resolving all the pixel values to the simple label. Modern classifiers can "learn" how to classify pictures by first being presented with a large number of pictures that are already properly labeled. Even after "seeing" millions of examples, a classifier system may well mislabel a new picture with different lighting, or a new style of house or table, or an image presented from a different camera angle. Users of such systems always measure accuracy by reporting what percentage of the labels coming from the program are correct: computer systems are not perfect at classifying images. Even the latest, profoundly compute-intensive algorithms like deep learning systems report a percentage of successful classification tasks, not perfect accuracy.
When you think about it, it's not just a problem with computers: people are not always 100% accurate in identifying what they see, either. However, considering the number of neurons in your brain, and the amount of sensory input you have processed in your life, it's no surprise that you can far outperform a computer in looking across the street at a movie poster and extracting its information from your visual field. Autonomous vehicles represent a great technical accomplishment in computer vision (and other forms of signal processing), but in the near term you can expect these applications to require more restricted settings than the ones an average human driver can manage, such as unpaved roads, pedestrians, animals, or unexpected changes in terrain.
By the way, one narrow form of computer vision is already in wide, successful use: optical character recognition, or OCR. If you have an image—a record of pixels—and you know you are looking for letters or numeric digits, it's a relatively simple matter to scan the image for those characters and signal when and where they appear. The success of OCR systems today is evident when you deposit a check in an automatic teller machine or scan it using a mobile check deposit app. The software finds the images of digits for the check amount and interprets each digit as one of the ten printed numerals, 0, 1, 2, up to 9. Still, the ATM or app will typically require you to verify the amount, to confirm the accuracy of the OCR software!
This reading began with the question, if I have a digital photograph of a movie poster, wouldn't reading the photo be a form of data analysis? Working with the compute-intensive, somewhat fuzzy problems of image recognition can be called a form a data analysis, but the work of someone with the job title of data analyst is likely to be rather different. Big Data Analysis with SQL, the subject of this specialization, focuses on the use of data records that are already very well organized with clearly defined features: records such as customer orders, airline flights, or store items. A data scientist, working with machine learning algorithms, may extract some clear features from data such as images, even though the accuracy of such features must be assessed as less than perfect, and you can then go on with analysis using the SQL tools you will learn here. Indeed, modern enterprises gain useful insights from the mass of data they have today using an interplay of the data analysis skills you will develop here, and statistical techniques such as machine learning.
Three Notes about SQL
In the video "Relational Databases and SQL," I presented some basic SQL commands, which fall into four well-known categories. The following three notes provide some detail and warnings about those categories.
Rare or (sometimes) muddy terms: DQL, DML, query
The terms on my list are all commonly used in discussions about relational databases and SQL, with one exception: DQL (Data Query Language). Because DQL is a category with only one statement in it, some people may never learn or say "DQL"; they just say "SELECT."
In fact, some people include the SELECT statement in the category of DML. I don't use that classification (and I think I'm in the majority), but when you're talking to someone new, be aware that you may need to clear things up with them when they say "DML": Do they mean INSERT-UPDATE-DELETE statements, or do they also include SELECT in the DML category?
There's one more muddy word: query. For most people, the word query is another name for the SELECT statement. This is reasonable, because the English word query is another word for question, and whenever you ask the database a question, you do so by issuing a SELECT statement. But for some people, query can refer to any SQL statement, of any kind at all! This doesn't make good sense to me, but I've heard the words "database query" used in this way many times through the years.
Other commands, in and out of SQL
I've started here with just the beginning, foundational concepts of SQL. These are not the only commands in SQL, and you'll quickly learn others. For example, you'll probably like the DESCRIBE command, to remind you of the details about how you designed a table. Or there's UPSERT, a funny kind of statement that can be an INSERT or an UPDATE depending on the circumstances.
There are even some commands that are provided with most database systems, but that are not SQL statements, like commands for backing up a database, or importing data from some other data store into your database. If there's no regular SQL command for something you need to do, that's okay; you'll just need to learn the particular command you need in the particular program you're using.
Standards and SQL dialects
Speaking of different commands, there's a twist on SQL itself. The worldwide engineering community has developed a standard for SQL. SQL has evolved and grown over the years, and so the "official standard" has been revised and republished several times. However, almost no commercial vendor actually implements standard SQL exactly, 100 percent.
(In fact, in the late 1980s, competing companies lobbied to get features into the standard that their competitors didn't have, and so the standard became inconsistent and no one program could possibly be 100 percent compliant with the standard at that time!)
So, you really have different SQL dialects for different software programs. Although most SQL systems are at least 90 or 95 percent alike, every system will have its own peculiar dialect, with some small differences in the exact commands that are available, or some optional details in the commands. I'm giving you fundamental concepts of SQL in these early weeks. My suggestion is that whenever you actually use a new SQL-based program, you should familiarize yourself with the particular SQL dialect that is special to that program. Then when you use a different program, learn its SQL dialect. You'll quickly deepen your understanding of both SQL in general and the particular use of SQL in the various programs you use. That is exactly the approach in the later courses of this specialization.
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6359709
- Get link
- X
- Other Apps
Comments
Post a Comment