Python Beginner/Noob Tutorial - How to merge/join/combine datasets using the Pandas module

Опубликовано: 14 Декабрь 2024
на канале: Too Long; Didn't Watch Tutorials
371
4

This is a very basic guide for merging datasets in Python using the Pandas module. I will start from opening a new Python file, using Pip to get Pandas (and Openpyxl if you are working with MS Excel data), to merging two datasets. I will also show you a left and right join, and eventually how to export your data.

As I mention in the video joins can get much more complicated with functions like inner/outer joins, etc. Since this is a tutorial intended for beginners, I didn't want to get into that too much. However, I've included the Pandas documentation below.

Pandas Documentation: https://pandas.pydata.org/docs/user_g...

Finally, if you'd like to copy and paste the Python code, I've provided it below:

import pandas as pd

dataset1 = pd.read_excel('dataset1.xlsx') #left
df1 = pd.DataFrame(dataset1)

dataset2 = pd.read_excel('dataset2.xlsx') #right
df2 = pd.DataFrame(dataset2)

df = pd.merge(df1, df2, on= 'employee_id', how='right')

df.to_excel('mergedresults.xlsx')