How you write your queries will be tested in a technical interview, and so be prepared. I do test sequel queries during technical interviews.
In short, you would have to be good enough that it becomes muscle memory within you.
It is not covered in our course, but query optimization is also important. There are several ways to get to the same result, but there is usually the most optimal way to write queries. Your technical interviewer might ask you to optimize your query during an interview.
Even at work, we do query optimization and log slow queries so that we can run our operations better.
Import Data
Let’s open up the file(s) in the 01-Evr_Import_Data
folder to get started.
There is a blend of SQL within the school’s resources that is MySQL based. It advises us to use the CSV import method rather than SQL based if we meet problems.
This is the reason why even though we generically refer SQL for relational databases, yet in different engines such as Microsoft SQL vs Postgres, they can be totally different ways of the same thing.
Example:
- To limit the top 5 entries during a SELECT statement, Microsoft SQL uses TOP.
- MySQL and Postgres will use LIMIT.
Thus, we always recommend not to blindly copy code but to understand what it does, because semantically they refer to the same thing. However, due to different engines that you might work with, you may have to write them differently. Being flexible is key.
In general, import SQL files is generally preferred if your company uses a homogenous system for database (if every system uses Postgres).
- You will incur less errors in importing because each statement is a SQL query. CSVs can have mismatched or type errors when you ingest.
However, there are cases where the landscape is very heterogenous, we may prefer CSVs because it is more universally accepted in interpreting the file.
Aggregate Functions, Aliases, and Grouping
Let’s open up the file(s) in the 02-Ins_Aggregates
folder to get started.
You can apply statistical and mathematical formulas when you query.
In general, they are used for quick checks to highlight issues. However, it is not good practice to over complicate the queries; we typically use Python or Jupyter Notebooks for complex work.
- The more complex it is, the more we prefer to version control the work to ensure quality.
Students Do: Gregarious Aggregates
Let’s open up the file in the 03-Stu_GregariousAggregates
folder to get started.
Order By Aggregates
Let’s open up the file in the 04-Ins_Order_By
folder to get started.
Students Do: Movies Ordered By
Let’s open up the file(s) in the 05-Stu_Order_By
folder to get started.
Introduction to Subqueries
Let’s open up the file(s) in the 06-Ins_Subqueries
folder to get started.
Complex subqueries is only good for technical interviews, where interviewers want to understand your core understanding of queries.
In reality, I discourage the team for over-doing sub queries for the sake of maintainability.
Students Do: Subqueries
Let’s open up the file(s) in the 07-Stu_Subqueries
folder to get started.
Create Views
Let’s open up the file(s) in the 08-Ins_Create_Views
folder to get started.
Views is a very important feature within relational database. We use views to protect the actual tables from modification, and we typically only allow SELECT statements on views.
Student Do: A View with a Roomful of Queries
Let’s open up the file(s) in the 09-Stu_View_Room_Queries
folder to get started.
Revisit Subqueries
Let’s open up the file(s) in the 10-Ins_Revist_Subquery
folder to get started.
Entity-Relationship Diagram (ERD)
When we build a system to collect data from scratch, we need to model it so that we collect data.
Ref: https://www.postgresqltutorial.com/postgresql-getting-started/postgresql-sample-database/
We will learn about data modeling in the next lesson. For now, do get used to the idea that it is essential to design your data storage well so that it will not impact your analysis.
Students Do: Mine the Subquery
Let’s open up the file(s) in the 11-Stu_Mine_the_Subquery
folder to get started.