Python Pandas Cheat Sheet

5 min readNov 2, 2020

Every day use of pandas functions as a data scientist

Pandas is a python library used in data manipulation ( create, delete, and update the data).

It is one of the most commonly used libraries for data analysis in python. Pandas offer data structures and operations for manipulating numerical and time-series data.

Pandas First Steps

Install and import

Pandas is an easy package to install. Open up your terminal program (for Mac users) or command line (for PC users) and install it using either of the following commands:

conda install pandas

pip install pandas

Alternatively, if you’re currently viewing this article in a Jupyter notebook you can run this cell:

How to install pandas

The ! at the beginning runs cells as if they were in a terminal.

To import pandas we usually import it with a shorter name since it’s used so much:

importing pandas into jupyer and ‘pd’ stands alias name for pandas.

For this excersis taken dataset of Loan Prediction and can download the dataset from : https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii/#ProblemStatement

1. Import necessary Libraries

2. Load dataset (Test & Train)
# Read train and test dataset
train = pd.read_csv(“train_ctrUa4K.csv”)
test = pd.read_csv(“test_lAUu6dG.csv”)
3. Head()

Viewing your data:

a) The first thing to do when opening a new dataset is print out a few rows to keep as a visual reference. We accomplish this with .head():

b) .head() outputs the first five rows of your DataFrame by default, including column header and the content of each row.

c) But we could also pass a number as well: .head(3) would output the top 3 rows.

4. tail()

.tail() outputs the last five rows of your DataFrame by default, including column header and the content of each row.

But we can also pass a number as well: .tail(2) would output the top 2 rows.

5. shape

Gives the size of the data frame in the format (row, column).

6. Info()

prints the column header and the data type stored in each column. It also gives the number of non-null values and the memory the data takes.

Displays data types & missing values of each feature

7. dtypes()

Pandas DataFrame.dtypes attribute to find out the data type (dtype) of each column in the given dataframe.

8. Count()

Pandas dataframe.count() is used to count the no. of non-NA/null observations across the given axis. It works with non-floating type data as well.

9. Value_counts()

Pandas .value_counts() function returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

Out of 614 data points, Y repeating 422 times & N 192 times

10. Unique()

a) shows all the non-repeating values of a particular column.

b) Pandas unique() function return unique values in the feature. Uniques are returned in order of appearance, this does NOT sort.

For feature “ Education” showing non-repeating values

11. Printing Column Names

To get all the column headers of a Pandas DataFrame as a list, df.columns.values attribute will return a list of column headers.

12. Describe()

Pandas describe() is used to view some basic statistical details like mean, median, standard deviation, and percentiles of all the numerical values in your dataset.

13. Missing Values()

In Pandas missing data is represented by two value:

None: None is a Python singleton object that is often used for missing data in Python code.
NaN : NaN (an acronym for Not a Number), is a special floating-point value recognized by all systems that use the standard IEEE floating-point representation

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.

No of Missing values in each feature and corresponding percentage. By using missing value percentage can take decision wheather to drop feature or not.

That’s It!

Thanks for reading!

Found this article useful? Follow me (Anuganti Suresh) on Medium and check out my most popular articles! Please 👏 this article to share it!

References:

Tutorials - pandas 0.15.2 documentation

This is a guide to many pandas tutorials, geared mainly for new users. The goal of this cookbook (by Julia Evans) is to…

pandas.pydata.org

pandas

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with…

pypi.org

Python Pandas Tutorial: A Complete Introduction for Beginners

The pandas package is the most important tool at the disposal of Data Scientists and Analysts working in Python today…

www.learndatasci.com

pandas documentation - pandas 1.1.4 documentation

Saw a typo in the documentation? Want to improve existing functionalities? The contributing guidelines will guide you…

pandas.pydata.org

Learn Pandas Tutorials

Solve short hands-on challenges to perfect your data manipulation skills.

www.kaggle.com

5 Striking Pandas Tips and Tricks for Analysts and Data Scientists

Overview Pandas provide tools and techniques to make data analysis easier in Python We'll discuss tips and tricks that…

www.analyticsvidhya.com

Clap if you liked the article!