We are finalizing some of the techniques to prep data for data analysis.

This is by far not exhaustive, and as you grow your skills, knowledge and experience, you will grow your repertoire as you practice more and more.

Merging DataFrames

Let’s open up the file in the 01-Ins_Merging folder to get started.

Merging datasets is a staple activity, especially when we want to get insights across multiple datasets.

Data modeling will be crucial because we want to ensure data integrity when we merge data, but that will part of your future coursework.

Students Do: Census Merging

Let’s open up the file in the 02-Stu_Census_Merging to get started.

Binning Data

Let’s open up the file in the 03-Ins_Binning folder to get started.

Categorizing data with specific conditions is necessary for all types of data analysis. We used to do it manually, but as Pandas evolves, it lowered the barrier of entry for doing it manually.

Students Do: Binning Movies

Let’s open up the README.md file in the 04-Stu_MovieRatings_Binning folder to get started.

Mapping

Let’s open up the file in the 05-Ins_Mapping to get started.

If you read the official documentation, map is a function that applies transformation logic on each value in an entire column: https://pandas.pydata.org/pandas-docs/version/0.19/generated/pandas.Series.map.html

Notes

Notice that when we apply formatting on NaN values, it becomes an object (string) within Pandas.
- You would want to remove NaN values first before applying formatting.
${:.2f} means we want to round the data to 2 floating points.
This is not the only way to round data. I typically use the numpy library, which we will cover in future coursework.
- If your work requires high precision values, such as architecture and buildings, you will use numpy to ensure accuracy.

What you have learned is not exhaustive

There are depths to column-level transformation as it is beyond the scope of the class. However, it is useful to know so that you can research later.

Using lambda x to use functions over the values if your transformation logic is too complex: https://pandas.pydata.org/pandas-docs/version/0.19/generated/pandas.Series.map.html
- Example: file_df['INCOME_BOOL'] = file_df['INCOME'].map(lambda x: False if x == 0 else True)
  - We are creating a new column based on each value within the ‘INCOME’ column.
  - x is the variable that contains the value of each row as it iterates.
Using apply to access other column’s values for transformation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
- Example: file_df['NET_INCOME'] = file_df.apply(lambda x: x['INCOME'] - x['COSTS'], axis=1)
  - Using apply, x can assume any column’s value as though it is a dictionary.

Crowdfunding Cleaning

Let’s open up the file in the 06-Evr_Crowdfunding_Cleaning to get started.

Introduction to Bug Fixing

Look at the file(s) in the 07-Ins_Intro_to_Bugfixing folder.

Bug fixing is something that is caught, not only taught. It is like riding a bicycle. You can’t learn bicycle just be reading about it, but you actually have to be doing it to be better at it.

As you do more bug fixing, your troubleshooting skills will grow as well.

Being able to debug code is key to excellence, especially when you’re working in a team. You will need to ensure quality and excellence with your team mates’ work in order to produce good products.

Bug Fixing Bonanza

Let’s open up the file in the 08-Evr_Bugfixing_Bonanza to get started.

4.3 Merging and Data Cleaning

Merging DataFrames

Students Do: Census Merging

Binning Data

Students Do: Binning Movies

Mapping

Notes

What you have learned is not exhaustive

Crowdfunding Cleaning

Introduction to Bug Fixing

Bug Fixing Bonanza

About The Author

Jonathan Moo

Leave a reply Cancel reply

Categories

4.3 Merging and Data Cleaning

Merging DataFrames

Students Do: Census Merging

Binning Data

Students Do: Binning Movies

Mapping

Notes

What you have learned is not exhaustive

Crowdfunding Cleaning

Introduction to Bug Fixing

Bug Fixing Bonanza

About The Author

Jonathan Moo

Related Posts

8.3 Project 1 Presentation

12.3 Aggregation, Analysis and Integration with MongoDB

2.2 Fundamentals of Programming with VBA Part 2

5.2 Plotting with Pandas

Leave a reply Cancel reply

Categories