A city dweller who loves to travel, interested in outdoor activity most on road trips also, A data science professional with a liking for deep learning.
Well, data science is making its impact on every industry. You must have heard its applications and achievements. Many organizations are now actively using data science to solve business problems like sales, revenue improvements, identify suspicious transactions, task priority, etc. Have you ever thought about how these big organizations handle such big scale data?
The answer is database!
To handle such data, organisations save data in different databases for procurement, sales, revenue, customer data, etc. to manage these databases we use SQL.
SQL and Python/java are not in competition. Python and Java have achieved excel in different domains, but both are general-purpose languages. SQL is working with relational databases.
What are Relational databases?
A relational database is a Digital Database that is dependent on a relational model of data.
Now What is Relational model?
This model arranges data into tables (or relations) of rows & columns. To identify each data row, relational databases use a unique key. Normally, each table keeps one unit of information (a table) that contains customer data, address, previous customer purchases, etc., respectively.
A Key is a single or combination of multiple fields in a table. Keys are an attribute of database management systems. This plays an important role to connect and create relationships between two or more tables Well, this is the definition, in simpler terms to communicate with different columns of different tables in databases we use keys and its property.
Primarily, we have Seven types of keys as follow.
Data science is a multidisciplinary field that uses scientific techniques, processes, algorithms to draw out information from data. Sources of data grow with innovations like 5G, which will create a new standard in wireless technology in data collection and its application in different devices. Recent development like this increase potential of data science. Hence, building in-demand skills in Data Science is important. To equip yourself with these career-building blocks can be followed at Science Bootcamp Review.
Data can be noisy, structured, or unstructured. Here, we are concentrating on Structured data which is normally stored in databases. So, to work around the data we use SQL queries to get the desired data.
Let’s have a look why we need SQL?
Now we will take a deep dive and look into the factors that can help you become a SQL expert. Besides the primary qualification, you must have proper education. On that part you need Industry oriented and globally acknowledge courses like Best Data Science Courses, which are helpful in this case.
Get to know more about Data Science Career path.
In this section we will focus on basic SQL syntax, which will be useful in almost all your future SQL queries.
A query is a request for details from a database using a merger of statements. In general, we will focus on the syntax for constructing a SQL query.
For syntax reference we will follow following figure:
Example syntax for SELECT statement:
1. SELECT RollNo, Name FROM Table1 2. SELECT * FROM Table1
In general, it is not a good practice to use an asterisk (*) in the SELECT statement if you don’t really need all columns. It will impatiently query everything, which increases congestion between the database server and the application, which can slow down the reclaiming of results. If you need certain columns, only query those columns.
The DISTINCT keyword applies on column, syntax as follow:
3. SELECT DISTINCT column FROM table 4. SELECT DISTINCT (Address) FROM Table2
Code line 6 will give output like this-
5. SELECT COUNT (Name) FROM Table1
Code line 7 can simply return the number of rows in the table, it is same regardless of the column.
6. SELECT COUNT (Name) FROM Table1 7. SELECT COUNT (ID) FROM Table1 8. SELECT COUNT * FROM Table1
Code lines 8,9,10 returns the same thing, since original table had 6 rows.
To apply the condition, we will first study comparison Operators & Logical operators.
Logical Operators allow us to combine multiple comparison operations
9. SELECT COUNT (Address) FROM Table2 10. WHERE Address = ‘Delhi’
Code lines 11, 12 will return following output:
11. SELECT COUNT (Address) FROM Table2 12. WHERE Address = ‘Delhi’ AND EnrollNo = ‘AX105’
Code lines 13,14 will return following output:
13. SELECT column_1, column_2 14. FROM table 15. ORDER BY column_1 ASC/DESC 16. SELECT Address 17. FROM Table2 18. ORDER BY Address DESC
Code lines 18, 19, 20 will return following output:
Note: We use ORDER BY towards the end of the query, as we want to do any selection or filtering first, before doing the sorting. ASC is used for ascending order and DESC is used for descending order, by default ORDER BY uses ASC.
19. SELECT column_1, column_2 20. FROM table 21. ORDER BY column_1 DESC 22. LIMIT (number) 23. SELECT Address 24. FROM Table2 25. ORDER BY Address DESC 26. LIMIT 4
Code lines 25, 26, 27, 28 will return following output:
27. value BETWEEN low AND high
The BETWEEN operator is same as,
28. value >= low AND value <= high 29. value BETWEEN low AND high
We can also use BETWEEN with NOT logical operator:
30. value NOT BETWEEN low AND high
The NOT BETWEEN operator is the same as,
31. value < low OR value > high 32. value NOT BETWEEN low AND high
The BETWEEN operator can also be used with time and dates.
Note: The requirement of date format to be the same as ISO 8601 standard format which is YYYY-MM-DD.
33. date BETWEEN ‘2007-01-01’ AND ‘2007-02-01’
We can use the IN operator to define condition to see if a value in present in a column or many options. Basic syntax for IN,
34. value IN (option1, option2, …, option_n) 35. SELECT Address 36. FROM Table2 37. WHERE Address IN (‘Delhi’, ‘Noida’)
The Like operator helps us to perform matching operation with string data with the use of wildcard characters:
For example, all email ending with ‘@gmail.com’ or names that begins with ‘S’. Basic Syntax for LIKE and ILIKE,
38. WHERE name LIKE ‘A%’ 39. WHERE name LIKE ‘%a’
Note: LIKE is case-sensitive and ILIKE is case-insensitive
Most of the time after objective definition data scientists start to e selecting columns that are important for solving a business problem from an enterprise database. Here to connect SQL databases to applications defined in python we use various packages. These packages make simple for data selection and operation in databases using python and SQL combined.
Let’s check similar connection with MySQL database and python application.
40. import mysql.connector 41. # Create the connection object 42. myconn = mysql.connector.connect(host = "localhost", user = "root", passwd = "google", database = "mydb") 43. # printing the connection object 44. print(myconn) 45. # Creating the cursor object 46. Cur = myconn.cursor() 47. print(Cur)
<mysql.connector.connection.MySQLConnection object at Ox7ytSS42edd870> MySQLCursor: (Nothing executed yet)
Wait it's not done yet, if you want to learn more do follow KnowledgeHut Data Science Bootcamp Review. For in-depth and complete learning.
In This article, we talk about general SQL introduction, fundamental commands used to create queries. This article is ideal for a beginner. In future articles, we will discuss advanced commands and cloud-based database connection, and query implementation.
Normally we only use Unique key, Primary key and foreign key. The constraints are applied while the functionality of keys is defined in the table. Like Primary keys must contain UNIQUE values and cannot contain NULL values.
A Database Management Systems (Or DBMS) are software systems used to store, manipulate and run queries on data. A DBMS is bridge between end-user and database, allowing users to perform SETL operations.
A data lake is a repository of data stored on its natural/raw format. Data lakes are used to simplify data management, speed up analysis and improved security and governance.
Data lake and warehouse are used for storing big data. A data lake is storage for raw data, whose purpose is yet to conclude. A data warehouse is structured, filtered data designed considering a purpose.
04 Oct 2022
04 Oct 2022
03 Oct 2022
03 Oct 2022
03 Oct 2022
03 Oct 2022