Excerpt: No matter how talented you are at developing complex models or making fancy visualizations as a data analyst or data scientist, you need data to perform those things at the end of the day. These data are often kept in a database when working for a large firm so that everyone can quickly access and search for the information they require to do their tasks. So what's holding you back from mastering the most popular SQL statement for data extraction and speedy analysis?
Introduction:
Structured Query Language, or SQL, as it is more commonly known, is a standardized language used to query or retrieve information from a database system. Additionally, it can be used to modify the data in many ways, including sorting columns and rows and basic data manipulation. One of the first things I noticed when I started to work as a data scientist was that absolutely everyone understood how to operate SQL.
This is a skill set that just about everyone needs to know, whether you are a senior data analyst who has worked for a while or an analyst who is just an intern that just about everyone, whether you are a senior data analyst who has worked for while or an analyst who is just an intern, needs to know. This demonstrates the importance of SQL in the field of data science insights.
Due to this, we have tried to emphasize the top 10 SQL queries in this blog article to assist you in beginning to begin using SQL, but more significantly, I will be showing each command using real-world scenarios to replicate the experience of dealing with a database. If you are interested in this SQL, you can use SQL Online Training and join the course and improve your skills in this field.
1. SELECT and FROM
SELECT and FROM are the building blocks of all SQL states. These two commands will be used in the simplest basic SQL query, and as the query becomes more sophisticated, other commands will be placed on top of them.
Pick indicates which columns to select, whereas FROM defines which table to query the value from. Now let us look at a few scenarios related to the "Transaction" table presently.
For a complete look at the transaction table's columns:
To Get Data Science Course you can visit the link.
QUERY:
Select * from the Transaction
Assuming we want to select from the transaction table's transaction_id, purchase_date, and sales columns:
QUERY:
SELECT transaction_id, purchase_date, sales FROM transaction;
2. DISTINCT
The duplicate columns are eliminated from the dataset using the SQL DISTINCT clause. Along with the chosen keyword, a distinct keyword is employed. Avoiding duplicate values in certain columns or tables is beneficial. When we employ the distinct keyword, the unique values are retrieved. For instance, let's say we wanted to view the unique dates associated with transactions:
QUERY:
SELECT DISTINCT purchase_date FROM transaction;
RESULT:
3. WHERE
WHERE has been used to select rows according to a specific criterion. Additionally, it is frequently used to combine numerous criteria with other operators like AND, OR, BETWEEN, IN, and LIKE.
When retrieving data from a specific table or by merging data from many tables, a condition is specified using the SQL WHERE clause. Only it provides a particular value from the table if the specified condition is met. The WHERE clause should be used to filter the entries and only retrieve those that are required.
QUERY:
SELECT * FROM transaction WHERE purchase_date = '2021-10-15';
RESULT:
Source:https://towardsdatascience.com/10-most-important-sql-commands-every-data-analyst-needs-to-know-f0f568914b98
QUERY:
SELECT * FROM transaction WHERE purchase_date = '2021-10-15' AND store_location = 'Melbourne CBD';
RESULT:
QUERY:
SELECT * FROM transaction WHERE purchase_date = '2021-10-15' OR store_location = 'Melbourne CBD';
RESULT:
QUERY:
RESULT:
Source:https://towardsdatascience.com/10-most-important-sql-commands-every-data-analyst-needs-to-know-f0f568914b98
4. % WILDCARD
To replace one or even more characters in a string, use the wildcard operator. The LIKE operator uses wildcard characters. To look for a specific pattern in a column, you should use the LIKE operator in a WHERE clause.
In a LIKE operator, the wildcard % is utilized to detect string structures.
Let's first take a look at the customer profile data before we analyze how this wildcard functions. This table provides information about a certain customer's life stage and premium status.
QUERY:
SELECT * FROM customers;
RESULT:
Let's say we now wish to remove all rows from the customers' table whose customer life stage begins with the word Young.
QUERY:
SELECT * FROM customers WHERE customer_lifestage LIKE 'Young%';
RESULT:
Furthermore, if we desire to see the rows where the phrase "families" comes after the customer life stage
QUERY:
SELECT * FROM customers WHERE customer_lifestage LIKE '%families';
RESULT:
As you'll see, SQL provides a quick and simple method for comparing string sequences, which can be useful when filtering out entries in various circumstances.
5. ORDER BY
The query result can be sorted by a specific column, either alphabetical or numerical using the ORDER BY function. DESC, which is in descending order, and ASC, which is in ascending order, are the two ordering options. However, since SQL already sets this as the default, you'll observe that most users don't include ASC in their queries.
To illustrate this, let's say we want to arrange the transactions according to the size of the sales in ascending order.
QUERY:
RESULT:
As an alternative, we can also arrange the transactions in descending order according to the value of the sales.
QUERY:
SELECT store_location, sales FROM transaction ORDER BY sales DESC;
RESULT:
6. AS
We can modify a table or column using AS. Note that the names in the original columns or tables will not be changed directly by this.
The given query will retrieve the purchase date column's replacement date from the transaction table.
QUERY:
SELECT purchase_date as date FROM Transaction;
RESULT:
7. CASE WHEN ,ELSE and THEN
This is a lot like an if-else statement if you've used other programming languages before. In essence, the instruction reads something like this in plain English: if a condition is met, do this; otherwise, do something like that.
To make this point clear, let's examine an illustration. Let's say we want to add a new column that indicates whether a specific transaction's sales value is greater than $20.
QUERY:
SELECT transaction_id, sales,
CASE WHEN sales < 20
THEN 'Sales amount is less than $20'
ELSE 'Sales amount is greater than $20' END AS sales_threshold FROM Transaction;
RESULT:
8. GROUP BY and aggregate functions
Data will be grouped using GROUP BY based on similar values. To describe the characteristics of a certain group of data, it is typically used in conjunction with aggregate functions.
On the other hand, aggregate functions carry out calculations on a variety of values and provide a specific value. A few instances of aggregate functions are:
The row total is returned by the COUNT function.
SUM: provides the total sum of the values.
MAX: gives the highest value.
MIN: provides the lowest value.
AVG: Returning the average value.
Now let's look at some illustrations. Let's say we want to know how many rows there are in the transaction table.
QUERY:
SELECT COUNT(*) FROM transaction;
What about the transaction dataset's biggest sales amount?
QUERY:
SELECT MAX(sales) as max_sales FROM transaction;
And finally, what if we want to know the total daily sales, rounded to the closest dollar?
QUERY:
RESULT:
9. JOIN
I believe it is crucial that we first clarify the distinction between such a primary and foreign key before talking about the idea of joins. A primary key in relational database systems is used to distinguish each row within a table in a unique way. For instance, the customer id column serves as the primary key for the customer profile database, while the transaction id column serves as the primary key for the transaction table.
In contrast, a foreign key creates a connection between the information contained in two tables. A foreign key in one table will specifically connect to the primary key in another table. For instance, the customer id column is a primary key inside the customer profile table but a foreign key in the transaction table.
Given that a primary key, as well as a foreign key, are related, we can use a LEFT JOIN in this particular situation. I won't go into detail about the other join types in SQL, including such INNER JOIN, RIGHT JOIN, and FULL JOIN, here. If you want to know more, read this blog article for additional information.
Let's assume, for the time being, that we wish to use the customer id column to do a LEFT JOIN upon that transaction table.
QUERY:
RESULT:
In order to ensure that the new column contains the same amount of rows as the left table even before joining, in this case, the transaction table, it is also a good idea always to verify the number of rows after we conduct LEFT JOIN.
10. UNION
Last but not least, SELECT statements from several queries are combined using UNION. Keep in mind that the tables you desire for the union must share the same amount of columns and, more importantly, the very same data type for all of the columns.
I must agree that none of the tables I created for this exercise may be the best examples of the strength of UNION, but I will still provide one here for completeness.
Consider that we wish to combine the transaction table's customer id, and the quantity purchased columns.
QUERY:
SELECT customer_id AS sample_union FROM transaction
UNION
SELECT quantity_purchased FROM transaction;
RESULT:
Conclusion
The ten most crucial SQL commands that are required to begin using SQL are concluded with this. I hope this blog post has helped you better understand SQL and its role in data science analytics. More significantly, I hope it has helped you realize how simple SQL actually is to learn once you grasp the foundations.
Since SQL is here to stay, this is unquestionably one of the abilities that any prospective data scientist or data analyst should think about adding to their toolkit.
Author Bio
Meravath Raju is a Digital Marketer, and a passionate writer, who is working with MindMajix, a top global online training provider. He also holds in-depth knowledge of IT and demanding technologies such as Business Intelligence, Salesforce, Cybersecurity, Software Testing, QA, Data analytics, Project Management and ERP tools, etc.