Every day use of pandas functions as a data scientist
Pandas is a python library used in data manipulation ( create, delete, and update the data).
It is one of the most commonly used libraries for data analysis in python. Pandas offer data structures and operations for manipulating numerical and time-series data.
Pandas First Steps
Install and import
Pandas is an easy package to install. Open up your terminal program (for Mac users) or command line (for PC users) and install it using either of the following commands:
conda install pandas
pip install pandas
Alternatively, if you’re currently viewing this article in a Jupyter notebook you can run this cell:
! at the beginning runs cells as if they were in a terminal.
To import pandas we usually import it with a shorter name since it’s used so much:
For this excersis taken dataset of Loan Prediction and can download the dataset from : https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii/#ProblemStatement
1. Import necessary Libraries
2. Load dataset (Test & Train)
# Read train and test dataset
train = pd.read_csv(“train_ctrUa4K.csv”)
test = pd.read_csv(“test_lAUu6dG.csv”)
Viewing your data:
a) The first thing to do when opening a new dataset is print out a few rows to keep as a visual reference. We accomplish this with
.head() outputs the first five rows of your DataFrame by default, including column header and the content of each row.
c) But we could also pass a number as well:
.head(3) would output the top 3 rows.
.tail() outputs the last five rows of your DataFrame by default, including column header and the content of each row.
But we can also pass a number as well:
.tail(2) would output the top 2 rows.
Gives the size of the data frame in the format (row, column).
prints the column header and the data type stored in each column. It also gives the number of non-null values and the memory the data takes.
DataFrame.dtypes attribute to find out the data type (dtype) of each column in the given dataframe.
dataframe.count() is used to count the no. of non-NA/null observations across the given axis. It works with non-floating type data as well.
.value_counts() function returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
a) shows all the non-repeating values of a particular column.
unique() function return unique values in the feature. Uniques are returned in order of appearance, this does NOT sort.
11. Printing Column Names
To get all the column headers of a Pandas DataFrame as a list,
df.columns.values attribute will return a list of column headers.
describe() is used to view some basic statistical details like mean, median, standard deviation, and percentiles of all the numerical values in your dataset.
13. Missing Values()
In Pandas missing data is represented by two value:
- None: None is a Python singleton object that is often used for missing data in Python code.
- NaN : NaN (an acronym for Not a Number), is a special floating-point value recognized by all systems that use the standard IEEE floating-point representation
In order to check missing values in Pandas DataFrame, we use a function
notnull(). Both function help in checking whether a value is
NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
Thanks for reading!
Found this article useful? Follow me (Anuganti Suresh) on Medium and check out my most popular articles! Please 👏 this article to share it!
Tutorials - pandas 0.15.2 documentation
This is a guide to many pandas tutorials, geared mainly for new users. The goal of this cookbook (by Julia Evans) is to…
pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with…
Python Pandas Tutorial: A Complete Introduction for Beginners
The pandas package is the most important tool at the disposal of Data Scientists and Analysts working in Python today…
pandas documentation - pandas 1.1.4 documentation
Saw a typo in the documentation? Want to improve existing functionalities? The contributing guidelines will guide you…
Learn Pandas Tutorials
Solve short hands-on challenges to perfect your data manipulation skills.
5 Striking Pandas Tips and Tricks for Analysts and Data Scientists
Overview Pandas provide tools and techniques to make data analysis easier in Python We'll discuss tips and tricks that…