5 Python Libraries For Data Science Beginners | Pandas | NumPy | Matplotlib | Seaborn | SciPy

Опубликовано: 24 Октябрь 2024
на канале: Data Science Lovers

In this video, we are telling about the most useful & common Python Libraries for Data Science, with their use and some basic commands.

1. Pandas
Pandas is the most famous library for Data Analysis.
Pandas is an open-source library that allows us to perform data manipulation in Python.
Pandas provides an easy way to create, manipulate and wrangle the data.

Some Basic Pandas Commands :
import pandas as pd - To Import the Pandas Library.
pd.read_csv(“filename”) – To read the CSV file.
df.head( ) / df.tail ( ) - To check first/last 5 rows of the dataframe(table).
df.describe( ) – To get the summary statistics of all numeric columns.
df.isnull( ) – To detect the missing values from the dataframe.
df.dropna( ) – To delete the rows that contains all or any missing values.
df.to_excel(“filename”) – To save a file in Excel format.

2. Numpy
NumPy – Numerical Python
Numpy is an open-source library that is used for performing mathematical operations on arrays.
Numpy is a general-purpose array processing package.
Numpy provides a high-performance multidimensional array object and tools for working with these arrays.
Numpy contains a multi-dimensional array and matrix data structures.

Some Basic Numpy Commands :
import numpy as np - To Import the Numpy Library.
np.array( [1,2,3,4,5] ) – To create an One-dimensional array.
A.reshape ( 3,4 ) – To reshape an array in 3x4 size.
np.random.random() – To create an array with random values.
np.ones((2,4)) – To create an array of size 2x4 with all 1.
A[1:2,1:2,1:2] – Array indexing

3. Matplotlib
Matplotlib is a powerful library for creating graphs and charts.
Matplotlib in Python is used for visualization purposes.
Matplotlib allows us to draw many different types of plots like :
Line Plot
Bar Plot
Scatter Plot
Pie Chart
Heat Map

Some Basic Matplotlib Commands :
import matplotlib.pyplot as plt – To import matplotlib library.
plt.title(‘Title_Name’, fontsize=24) – To give the title on the graph.
plt.plot( df[‘Year’] , df[‘Sales’] ) – To draw a plot with Year & Sales column.
plt.scatter( x-elements, y-elements , color = ‘r’) - To draw a scatter plot with red color.
plt.pcolor(df, cmap=‘RdBu’) – To draw a heatmap.
Select & Convert the Cell to MarkDown -- Edit Tab -- Insert Image -- Run - To insert an image in Jupyter Notebook.

4. Seaborn
Seaborn is an open-source library based on matplotlib.
Seaborn provides a variety of visualization patterns by using fewer syntax.
Seaborn is preferred to draw interactive and informative graphics.

Some Basic Seaborn Commands :
import seaborn as sns – To import the Seaborn library.
sns.regplot(x=df.Col_x , y=df.Col_y) – To draw Linear Regression graph.
sns.distplot(df.Col_y , color= ‘r’) – To draw a Distribution plot.
sns.relplot(x=‘Col_1’ , y=‘Col_2’ , data=df_name ) – To check the relationship between two columns.
sns.heatmap(df.corr( ) , vmin=-1, vmax=1, center=0) – Pearson Correlation Heatmap.
sns.catplot(x = ‘Col_1’ , y = 'Col_2’ , data = df_name) – To draw a plot with Categorical data.

5. SciPy
SciPy – Scientific Python
SciPy is an open-source library based on Numpy.
SciPy is used to perform Scientific & Mathematical operations.
SciPy is used in the fields of mathematics, science, and engineering.
There are many sub-packages also in SciPy like : Cluster, Integrate, Optimize etc.
import scipy - To import SciPy library.

