
Cópia de python_pandas_tutorial

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 3, do total de 178 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 6, do total de 178 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 9, do total de 178 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Prévia do material em texto

Python Pandas 
About the Tutorial 
Pandas is an open-source, BSD-licensed Python library providing high-performance, easy-
to-use data structures and data analysis tools for the Python programming language. 
Python with Pandas is used in a wide range of fields including academic and commercial 
domains including finance, economics, Statistics, analytics, etc. 
In this tutorial, we will learn the various features of Python Pandas and how to use them 
in practice. 
This tutorial has been prepared for those who seek to learn the basics and various functions 
of Pandas. It will be specifically useful for people working with data cleansing and analysis. 
After completing this tutorial, you will find yourself at a moderate level of expertise from 
where you can take yourself to higher levels of expertise. 
You should have a basic understanding of Computer Programming terminologies. A basic 
understanding of any of the programming languages is a plus. 
Pandas library uses most of the functionalities of NumPy. It is suggested that you go 
through our tutorial on NumPy before proceeding with this tutorial. You can access it from: 
NumPy Tutorial 
Disclaimer & Copyright 
 Copyright 2017 by Tutorials Point (I) Pvt. Ltd. 
All the content and graphics published in this e-book are the property of Tutorials Point (I) 
Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish 
any contents or a part of contents of this e-book in any manner without written consent 
of the publisher. 
We strive to update the contents of our website and tutorials as timely and as precisely as 
possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. 
Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our 
website or its contents including this tutorial. If you discover any errors on our website or 
in this tutorial, please notify us at contact@tutorialspoint.com. 
Python Pandas 
Table of Contents 
About the Tutorial ............................................................................................................................................ i 
Audience ........................................................................................................................................................... i 
Prerequisites ..................................................................................................................................................... i 
Disclaimer & Copyright ..................................................................................................................................... i 
Table of Contents ............................................................................................................................................ ii 
1. Pandas – Introduction ............................................................................................................................... 1 
2. Pandas – Environment Setup .................................................................................................................... 2 
3. Pandas – Introduction to Data Structures ................................................................................................. 3 
Dimension & Description ................................................................................................................................. 3 
Series ............................................................................................................................................................... 4 
DataFrame ....................................................................................................................................................... 4 
Data Type of Columns ..................................................................................................................................... 4 
Panel ................................................................................................................................................................ 5 
4. Pandas — Series ........................................................................................................................................ 6 
pandas.Series ................................................................................................................................................... 6 
Create an Empty Series.................................................................................................................................... 7 
Create a Series f ............................................................................................................................................... 7 
rom ndarray ..................................................................................................................................................... 7 
Create a Series f ............................................................................................................................................... 8 
rom dict ........................................................................................................................................................... 8 
Create a Series f ............................................................................................................................................... 9 
rom Scalar ........................................................................................................................................................ 9 
Accessing Data from Series with Position ..................................................................................................... 10 
Retrieve Data Using Label (Index) ................................................................................................................. 11 
5. Pandas – DataFrame ............................................................................................................................... 13 
pandas.DataFrame ........................................................................................................................................ 14 
Create DataFrame ......................................................................................................................................... 14 
Create an Empty DataFrame ......................................................................................................................... 15 
Create a DataFrame from Lists ...................................................................................................................... 15 
Create a DataFrame from Dict of ndarrays / Lists ......................................................................................... 16 
Create a DataFrame from List of Dicts........................................................................................................... 17 
Create a DataFrame from Dict of Series ........................................................................................................ 19 
Column Selection ........................................................................................................................................... 20 
Column .......................................................................................................................................................... 20 
Addition ......................................................................................................................................................... 20 
Column Deletion ............................................................................................................................................ 21 
Row Selection, Addition, and Deletion ..........................................................................................................23 
6. Pandas – Panel ........................................................................................................................................ 26 
pandas.Panel() ............................................................................................................................................... 26 
Create Panel .................................................................................................................................................. 26 
Selecting the Data from Panel ....................................................................................................................... 28 
Python Pandas 
7. Pandas – Basic Functionality ................................................................................................................... 30 
DataFrame Basic Functionality ...................................................................................................................... 35 
8. Pandas – Descriptive Statistics ................................................................................................................ 45 
Functions & Description ................................................................................................................................ 48 
Summarizing Data ......................................................................................................................................... 49 
9. Pandas – Function Application ................................................................................................................ 53 
Table-wise Function Application ................................................................................................................... 53 
Row or Column Wise Function Application ................................................................................................... 54 
Element Wise Function Application .............................................................................................................. 55 
10. Pandas – Reindexing ............................................................................................................................... 57 
Reindex to Align with Other Objects ............................................................................................................. 58 
Filling while ReIndexing ................................................................................................................................. 58 
Limits on Filling while Reindexing .................................................................................................................. 60 
Renaming ....................................................................................................................................................... 61 
11. Pandas – Iteration ................................................................................................................................... 62 
Iterating a DataFrame.................................................................................................................................... 62 
iteritems() ...................................................................................................................................................... 63 
iterrows() ....................................................................................................................................................... 64 
itertuples() ..................................................................................................................................................... 64 
12. Pandas – Sorting ..................................................................................................................................... 66 
By Label ......................................................................................................................................................... 66 
Sorting Algorithm .......................................................................................................................................... 70 
13. Pandas – Working with Text Data ........................................................................................................... 71 
14. Pandas – Options and Customization ...................................................................................................... 82 
get_option(param) ........................................................................................................................................ 82 
set_option(param,value) ............................................................................................................................... 83 
reset_option(param) ..................................................................................................................................... 83 
describe_option(param) ................................................................................................................................ 84 
option_context() ............................................................................................................................................ 84 
15. Pandas – Indexing and Selecting Data ..................................................................................................... 86 
.loc() ............................................................................................................................................................... 86 
.iloc() .............................................................................................................................................................. 90 
.ix() ................................................................................................................................................................. 92 
Use of Notations ............................................................................................................................................ 93 
16. Pandas – Statistical Functions ................................................................................................................. 96 
Percent_change ............................................................................................................................................. 96 
Covariance ..................................................................................................................................................... 97 
Correlation ..................................................................................................................................................... 98 
Data Ranking.................................................................................................................................................. 98 
Python Pandas 
17. Pandas – Window Functions ................................................................................................................. 100 
.rolling() Function ........................................................................................................................................ 100 
.expanding() Function .................................................................................................................................. 101 
.ewm() Function .......................................................................................................................................... 101 
18. Pandas – Aggregations .......................................................................................................................... 103 
Applying Aggregations on DataFrame ......................................................................................................... 103 
19. Pandas – Missing Data ..........................................................................................................................108 
Cleaning / Filling Missing Data ..................................................................................................................... 111 
Replace NaN with a Scalar Value ................................................................................................................. 111 
Fill NA Forward and Backward .................................................................................................................... 112 
Drop Missing Values .................................................................................................................................... 113 
Replace Missing (or) Generic Values ........................................................................................................... 114 
20. Pandas – GroupBy ................................................................................................................................. 116 
Split Data into Groups ................................................................................................................................. 117 
View Groups ................................................................................................................................................ 117 
Iterating through Groups ............................................................................................................................. 119 
Select a Group ............................................................................................................................................. 120 
Aggregations ................................................................................................................................................ 121 
Transformations .......................................................................................................................................... 123 
Filtration ...................................................................................................................................................... 124 
21. Pandas – Merging/Joining ..................................................................................................................... 125 
Merge Using 'how' Argument ...................................................................................................................... 127 
22. Pandas – Concatenation........................................................................................................................ 131 
Concatenating Objects ................................................................................................................................ 131 
Time Series .................................................................................................................................................. 136 
23. Pandas – Date Functionality .................................................................................................................. 139 
24. Pandas – Timedelta ............................................................................................................................... 141 
25. Pandas – Categorical Data ..................................................................................................................... 144 
Object Creation ........................................................................................................................................... 144 
26. Pandas – Visualization .......................................................................................................................... 150 
Bar Plot ........................................................................................................................................................ 151 
Histograms ................................................................................................................................................... 153 
Box Plots ...................................................................................................................................................... 154 
Area Plot ...................................................................................................................................................... 155 
Scatter Plot .................................................................................................................................................. 155 
Pie Chart ...................................................................................................................................................... 156 
27. Pandas – IO Tools .................................................................................................................................. 157 
read.csv ....................................................................................................................................................... 157 
Python Pandas 
28. Pandas – Sparse Data ............................................................................................................................ 161 
29. Pandas – Caveats & Gotchas ................................................................................................................. 164 
30. Pandas – Comparison with SQL ............................................................................................................. 169 
Python Pandas 
Pandas is an open-source Python Library providing high-performance data manipulation 
and analysis tool using its powerful data structures. The name Pandas is derived from the 
word Panel Data – an Econometrics from Multidimensional data. 
In 2008, developer Wes McKinney started developing pandas when in need of high 
performance, flexible tool for analysis of data. 
Prior to Pandas, Python was majorly used for data munging and preparation. It had very 
less contribution towards data analysis. Pandas solved this problem. Using Pandas, we can 
accomplish five typical steps in the processing and analysis of data, regardless of the origin 
of data — load, prepare, manipulate, model, and analyze. 
Python with Pandas is used in a wide range of fields including academic and commercial 
domains including finance, economics, Statistics, analytics, etc. 
Key Features of Pandas 
 Fast and efficient DataFrame object with default and customized indexing. 
 Tools for loading data into in-memory data objects from different file formats. 
 Data alignment and integrated handling of missing data. 
 Reshaping and pivoting of date sets. 
 Label-based slicing, indexing and subsetting of large data sets. 
 Columns from a data structure can be deleted or inserted. 
 Group by data for aggregation and transformations. 
 High performance merging and joining of data. 
 Time Series functionality. 
1. Pandas – Introduction 
Python Pandas 
Standard Python distribution doesn't come bundled with Pandas module. A lightweight 
alternative is to install NumPy using popular Python package installer, pip. 
pip install pandas 
If you install Anaconda Python package, Pandas will be installed by default with the 
 Anaconda (from https://www.continuum.io) is a free Python distribution for SciPy 
stack. It is also available for Linux and Mac. 
 Canopy (https://www.enthought.com/products/canopy/) is available as free as 
well as commercial distribution with full SciPy stack for Windows, Linux and Mac. 
 Python (x,y) is a free Python distribution with SciPy stack and Spyder IDE for 
Windows OS. (Downloadable from http://python-xy.github.io/) 
Package managers of respective Linux distributions are used to install one or more 
packages in SciPy stack. 
For Ubuntu Users 
sudo apt-get install python-numpy python-scipy python-matplotlibipythonipython-notebook python-pandas python-sympy python-nose 
For Fedora Users 
sudo yum install numpyscipy python-matplotlibipython python-pandas sympy 
python-nose atlas-devel 
2. Pandas – Environment Setup 
Python Pandas 
Pandas deals with the following three data structures: 
 Series 
 DataFrame 
 Panel 
These data structures are built on top of Numpy array, which means they are fast. 
Dimension & Description 
The best way to think of these data structures is that the higher dimensional data structure 
is a container of its lower dimensional data structure. For example, DataFrame is a 
container of Series, Panel is a container of DataFrame. 
Data Structure Dimensions Description 
Series 1 
1D labeled homogeneous array, size-
Data Frames 2 
General 2D labeled, size-mutable tabular 
structure with potentially heterogeneously-
typed columns. 
Panel 3 General 3D labeled, size-mutable array. 
Building and handling two or more dimensional arrays is a tedious task, burden is placed 
on the user to consider the orientation of the data set when writing functions. But using 
Pandas data structures, the mental effort of the user is reduced. 
For example, with tabular data (DataFrame) it is more semantically helpful to think of 
the index (the rows) and the columns rather than axis 0 and axis 1. 
All Pandas data structures are value mutable (can be changed) and except Series all are 
size mutable. Series is size immutable. 
Note: DataFrame is widely used and one of the most important data structures. Panel is 
very less used. 
3. Pandas – Introduction to Data Structures 
Python Pandas 
Series is a one-dimensional array like structure with homogeneous data. For example, the 
following series is a collection of integers 10, 23, 56, … 
10 23 56 17 52 61 73 90 26 72 
Key Points 
 Homogeneous data 
 Size Immutable 
 Values of Data Mutable 
DataFrame is a two-dimensional array with heterogeneous data. For example, 
Name Age Gender Rating 
Steve 32 Male 3.45 
Lia 28 Female 4.6 
Vin 45 Male 3.9 
Katie 38 Female 2.78 
The table represents the data of a sales team of an organization with their overall 
performance rating. The data is represented in rows and columns. Each column represents 
an attribute and each row represents a person. 
Data Type of Columns 
The data types of the four columns are as follows: 
Column Type 
Name String 
Age Integer 
Gender String 
Rating Float 
Python Pandas 
Key Points 
 Heterogeneous data 
 Size Mutable 
 Data Mutable 
Panel is a three-dimensional data structure with heterogeneous data. It is hard to 
represent the panel in graphical representation. But a panel can be illustrated as a 
container of DataFrame. 
Key Points 
 Heterogeneous data 
 Size Mutable 
 Data Mutable 
Python Pandas 
Series is a one-dimensional labeled array capable of holding data of any type (integer, 
string, float, python objects, etc.). The axis labels are collectively called index. 
A pandas Series can be created using the following constructor: 
pandas.DataFrame( data, index, dtype, copy) 
The parameters of the constructor are as follows: 
S.No Parameter & Description 
data takes various forms like ndarray, list, constants 
Index values must be unique and hashable, same length as data. 
Default np.arrange(n) if no index is passed. 
dtype is for data type. If None, data type will be inferred 
Copy data. Default False 
A series can be created using various inputs like: 
 Array 
 Dict 
 Scalar value or constant 
4. Pandas — Series 
Python Pandas 
Create an Empty Series 
A basic series, which can be created is an Empty Series. 
#import the pandas library and aliasing as pd 
import pandas as pd 
s = pd.Series() 
print s 
Its output is as follows: 
Series([], dtype: float64) 
Create a Series from ndarray 
If data is an ndarray, then index passed must be of the same length. If no index is passed, 
then by default index will be range(n) where n is array length, i.e., [0,1,2,3…. 
Example 1 
#import the pandas library and aliasing as pd 
import pandas as pd 
import numpy as np 
data = np.array(['a','b','c','d']) 
s = pd.Series(data) 
print s 
Its output is as follows: 
0 a 
1 b 
2 c 
3 d 
dtype: object 
We did not pass any index, so by default, it assigned the indexes ranging from 0 to 
len(data)-1, i.e., 0 to 3. 
Python Pandas 
Example 2 
#import the pandas library and aliasing as pd 
import pandas as pd 
import numpy as np 
data = np.array(['a','b','c','d']) 
s= pd.Series(data,index=[100,101,102,103]) 
print s 
Its output is as follows: 
100 a 
101 b 
102 c 
103 d 
dtype: object 
We passed the index values here. Now we can see the customized indexed values in the 
Create a Series from dict 
A dict can be passed as input and if no index is specified, then the dictionary keys are 
taken in a sorted order to construct index. If index is passed, the values in data 
corresponding to the labels in the index will be pulled out. 
Example 1 
#import the pandas library and aliasing as pd 
import pandas as pd 
import numpy as np 
data = {'a' : 0., 'b' : 1., 'c' : 2.} 
s= pd.Series(data) 
print s 
Its output is as follows: 
a 0.0 
b 1.0 
c 2.0 
dtype: float64 
Observe: Dictionary keys are used to construct index. 
Python Pandas 
Example 2 
#import the pandas library and aliasing as pd 
import pandas as pd 
import numpy as np 
data = {'a' : 0., 'b' : 1., 'c' : 2.} 
s = pd.Series(d, index=['b', 'c', 'd', 'a']) 
print s 
Its output is as follows: 
b 1.0 
c 2.0 
d NaN 
a 0.0 
dtype: float64 
Observe: Index order is persisted and the missing element is filled with NaN (Not a 
Create a Series from Scalar 
If data is a scalar value, an index must be provided. The value will be repeated to match 
the length of index 
#import the pandas library and aliasing as pd 
import pandas as pd 
import numpy as np 
s = pd.Series(5, index=[0, 1, 2, 3]) 
print s 
Its output is as follows: 
0 5 
1 5 
2 5 
3 5 
dtype: int64 
Python Pandas 
Accessing Data from Series with Position 
Data in the series can be accessed similar to that in an ndarray. 
Example 1 
Retrieve the first element. As we already know, the counting starts from zero for the array, 
which means the first element is stored at zeroth position and so on. 
import pandas as pd 
#retrieve the first element 
print s[0] 
Its output is as follows: 
Example 2 
Retrieve the first three elements in the Series. If a : is inserted in front of it, all items 
from that index onwards will be extracted. If two parameters (with : between them) is 
used, items between the two indexes (not including the stop index) 
import pandas as pd 
#retrieve the first three element 
print s[:3] 
Its output is as follows: 
a 1 
b 2 
c 3 
dtype: int64 
Python Pandas 
Example 3 
Retrieve the last three elements. 
import pandas as pds=pd.Series([1,2,3,4,5],index=['a','b','c','d','e']) 
#retrieve the last three element 
print s[-3:] 
Its output is as follows: 
c 3 
d 4 
e 5 
dtype: int64 
Retrieve Data Using Label (Index) 
A Series is like a fixed-size dict in that you can get and set values by index label. 
Example 1 
Retrieve a single element using index label value. 
import pandas as pd 
#retrieve a single element 
print s['a'] 
Its output is as follows: 
Example 2 
Retrieve multiple elements using a list of index label values. 
import pandas as pd 
#retrieve multiple elements 
print s[['a','c','d']] 
Python Pandas 
Its output is as follows: 
a 1 
c 3 
d 4 
dtype: int64 
Example 3 
If a label is not contained, an exception is raised. 
import pandas as pd 
#retrieve multiple elements 
print s['f'] 
Its output is as follows: 
KeyError: 'f' 
Python Pandas 
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion 
in rows and columns. 
Features of DataFrame 
 Potentially columns are of different types 
 Size – Mutable 
 Labeled axes (rows and columns) 
 Can Perform Arithmetic operations on rows and columns 
Let us assume that we are creating a data frame with student’s data. 
You can think of it as an SQL table or a spreadsheet data representation. 
5. Pandas – DataFrame 
Python Pandas 
A pandas DataFrame can be created using the following constructor: 
pandas.DataFrame( data, index, columns, dtype, copy) 
The parameters of the constructor are as follows: 
S.No. Parameter & Description 
data takes various forms like ndarray, series, map, lists, dict, constants and 
also another DataFrame. 
For the row labels, the Index to be used for the resulting frame is Optional 
Default np.arrange(n) if no index is passed. 
For column labels, the optional default syntax is - np.arrange(n). This is only 
true if no index is passed. 
Data type of each column. 
This command (or whatever it is) is used for copying of data, if the default is 
Create DataFrame 
A pandas DataFrame can be created using various inputs like: 
 Lists 
 dict 
 Series 
 Numpy ndarrays 
 Another DataFrame 
In the subsequent sections of this chapter, we will see how to create a DataFrame using 
these inputs. 
Python Pandas 
Create an Empty DataFrame 
A basic DataFrame, which can be created is an Empty Dataframe. 
#import the pandas library and aliasing as pd 
import pandas as pd 
df = pd.DataFrame() 
print df 
Its output is as follows: 
Empty DataFrame 
Columns: [] 
Index: [] 
Create a DataFrame from Lists 
The DataFrame can be created using a single list or a list of lists. 
Example 1 
import pandas as pd 
data = [1,2,3,4,5] 
df = pd.DataFrame(data) 
print df 
Its output is as follows: 
0 1 
1 2 
2 3 
3 4 
4 5 
Python Pandas 
Example 2 
import pandas as pd 
data = [['Alex',10],['Bob',12],['Clarke',13]] 
df = pd.DataFrame(data,columns=['Name','Age']) 
print df 
Its output is as follows: 
 Name Age 
0 Alex 10 
1 Bob 12 
2 Clarke 13 
Example 3 
import pandas as pd 
data = [['Alex',10],['Bob',12],['Clarke',13]] 
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float) 
print df 
Its output is as follows: 
 Name Age 
0 Alex 10.0 
1 Bob 12.0 
2 Clarke 13.0 
Note: Observe, the dtype parameter changes the type of Age column to floating point. 
Create a DataFrame from Dict of ndarrays / Lists 
All the ndarrays must be of same length. If index is passed, then the length of the index 
should equal to the length of the arrays. 
If no index is passed, then by default, index will be range(n), where n is the array length. 
Example 1 
import pandas as pd 
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'], 
df= pd.DataFrame(data) 
Python Pandas 
print df 
Its output is as follows: 
 Age Name 
0 28 Tom 
1 34 Jack 
2 29 Steve 
3 42 Ricky 
Note: Observe the values 0,1,2,3. They are the default index assigned to each using the 
function range(n). 
Example 2 
Let us now create an indexed DataFrame using arrays. 
import pandas as pd 
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'], 
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4']) 
print df 
Its output is as follows: 
 Age Name 
rank1 28 Tom 
rank2 34 Jack 
rank3 29 Steve 
rank4 42 Ricky 
Note: Observe, the index parameter assigns an index to each row. 
Create a DataFrame from List of Dicts 
List of Dictionaries can be passed as input data to create a DataFrame. The dictionary keys 
are by default taken as column names. 
Example 1 
The following example shows how to create a DataFrame by passing a list of dictionaries. 
import pandas as pd 
data = [{'a': 1, 'b': 2}, 
Python Pandas 
 {'a': 5, 'b': 10, 'c': 20}] 
df = pd.DataFrame(data) 
print df 
Its output is as follows: 
 a b c 
0 1 2 NaN 
1 5 10 20.0 
Note: Observe, NaN (Not a Number) is appended in missing areas. 
Example 2 
The following example shows how to create a DataFrame by passing a list of dictionaries 
and the row indices. 
import pandas as pd 
data = [{'a': 1, 'b': 2}, 
 {'a': 5, 'b': 10, 'c': 20}] 
df = pd.DataFrame(data, index=['first', 'second']) 
print df 
Its output is as follows: 
 a b c 
first 1 2 NaN 
second 5 10 20.0 
Example 3 
The following example shows how to create a DataFrame with a list of dictionaries, row 
indices, and column indices. 
import pandas as pd 
data = [{'a': 1, 'b': 2}, 
 {'a': 5, 'b': 10, 'c': 20}] 
#With two column indices, values same as dictionary keys 
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b']) 
#With two column indices with one index with other name 
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1']) 
Python Pandas 
print df1 
print df2 
Its output is as follows: 
#df1 output 
 a b 
first 1 2 
second 5 10 
#df2 output 
 a b1 
first 1 NaN 
second 5 NaN 
Note: Observe, df2 DataFrame is created with a column index other than the dictionary 
key; thus, appended the NaN’s in place. Whereas, df1 is created with column indices same 
as dictionary keys, so NaN’s appended. 
Create a DataFrame from Dict of Series 
Dictionary of Series can be passed to form a DataFrame. The resultant index is the union 
of all the series indexes passed. 
import pandas as pd 
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d) 
print df 
Its output is as follows: 
 one two 
a 1.0 1b 2.0 2 
c 3.0 3 
d NaN 4 
Python Pandas 
Note: Observe, for the series one, there is no label ‘d’ passed, but in the result, for the d 
label, NaN is appended with NaN. 
Let us now understand column selection, addition, and deletion through examples. 
Column Selection 
We will understand this by selecting a column from the DataFrame. 
import pandas as pd 
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d) 
print df ['one'] 
Its output is as follows: 
a 1.0 
b 2.0 
c 3.0 
d NaN 
Name: one, dtype: float64 
Column Addition 
We will understand this by adding a new column to an existing data frame. 
import pandas as pd 
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d) 
# Adding a new column to an existing DataFrame object with column label by 
passing new series 
Python Pandas 
print ("Adding a new column by passing as Series:") 
print df 
print ("Adding a new column using the existing columns in DataFrame:") 
print df 
Its output is as follows: 
Adding a new column by passing as Series: 
 one two three 
a 1.0 1 10.0 
b 2.0 2 20.0 
c 3.0 3 30.0 
d NaN 4 NaN 
Adding a new column using the existing columns in DataFrame: 
 one two three four 
a 1.0 1 10.0 11.0 
b 2.0 2 20.0 22.0 
c 3.0 3 30.0 33.0 
d NaN 4 NaN NaN 
Column Deletion 
Columns can be deleted or popped; let us take an example to understand how. 
# Using the previous DataFrame, we will delete a column 
# using del function 
import pandas as pd 
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
 'three' : pd.Series([10,20,30], index=['a','b','c'])} 
Python Pandas 
df = pd.DataFrame(d) 
print ("Our dataframe is:") 
print df 
# using del function 
print ("Deleting the first column using DEL function:") 
del df['one'] 
print df 
# using pop function 
print ("Deleting another column using POP function:") 
print df 
Its output is as follows: 
Our dataframe is: 
 one three two 
a 1.0 10.0 1 
b 2.0 20.0 2 
c 3.0 30.0 3 
d NaN NaN 4 
Deleting the first column using DEL function: 
 three two 
a 10.0 1 
b 20.0 2 
c 30.0 3 
d NaN 4 
Deleting another column using POP function: 
a 10.0 
b 20.0 
c 30.0 
d NaN 
Python Pandas 
Row Selection, Addition, and Deletion 
We will now understand row selection, addition and deletion through examples. Let us 
begin with the concept of selection. 
Selection by Label 
Rows can be selected by passing row label to a loc function.import pandas as pd 
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d) 
print df.loc['b'] 
Its output is as follows: 
one 2.0 
two 2.0 
Name: b, dtype: float64 
The result is a series with labels as column names of the DataFrame. And, the Name of 
the series is the label with which it is retrieved. 
Selection by integer location 
Rows can be selected by passing integer location to an iloc function. 
import pandas as pd 
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d) 
print df.iloc[2] 
Its output is as follows: 
one 3.0 
two 3.0 
Name: c, dtype: float64 
Python Pandas 
Slice Rows 
Multiple rows can be selected using ‘ : ’ operator. 
import pandas as pd 
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d) 
print df[2:4] 
Its output is as follows: 
 one two 
c 3.0 3 
d NaN 4 
Addition of Rows 
Add new rows to a DataFrame using the append function. This function will append the 
rows at the end. 
import pandas as pd 
df = pd.DataFrame([[1, 2], [3, 4]], columns=['a','b']) 
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=['a','b']) 
df = df.append(df2) 
print df 
Its output is as follows: 
 a b 
0 1 2 
1 3 4 
0 5 6 
1 7 8 
Python Pandas 
Deletion of Rows 
Use index label to delete or drop rows from a DataFrame. If label is duplicated, then 
multiple rows will be dropped. 
If you observe, in the above example, the labels are duplicate. Let us drop a label and will 
see how many rows will get dropped. 
import pandas as pd 
df = pd.DataFrame([[1, 2], [3, 4]], columns=['a','b']) 
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=['a','b']) 
df = df.append(df2) 
# Drop rows with label 0 
df = df.drop(0) 
print df 
Its output is as follows: 
 a b 
1 3 4 
1 7 8 
In the above example, two rows were dropped because those two contain the same label 0. 
Python Pandas 
A panel is a 3D container of data. The term Panel data is derived from econometrics and 
is partially responsible for the name pandas: pan(el)-da(ta)-s. 
The names for the 3 axes are intended to give some semantic meaning to describing 
operations involving panel data. They are: 
 items: axis 0, each item corresponds to a DataFrame contained inside. 
 major_axis: axis 1, it is the index (rows) of each of the DataFrames. 
 minor_axis: axis 2, it is the columns of each of the DataFrames. 
A Panel can be created using the following constructor: 
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy) 
The parameters of the constructor are as follows: 
Parameter Description 
Data takes various forms like ndarray, series, map, lists, dict, constants 
and also another DataFrame 
items axis=0 
major_axis axis=1 
minor_axis axis=2 
dtype Data type of each column 
copy Copy data. Default, false 
Create Panel 
A Panel can be created using multiple ways like - 
 From ndarrays 
 From dict of DataFrames 
6. Pandas – Panel 
Python Pandas 
From 3D ndarray 
# creating an empty panel 
import pandas as pd 
import numpy as np 
data = np.random.rand(2,4,5) 
p = pd.Panel(data) 
print p 
Its output is as follows: 
<class 'pandas.core.panel.Panel'> 
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis) 
Items axis: 0 to 1 
Major_axis axis: 0 to 3 
Minor_axis axis: 0 to 4 
Note: Observe the dimensions of the empty panel and the above panel, all the objects 
are different. 
From dict of DataFrame Objects 
#creating an empty panel 
import pandas as pd 
import numpy as np 
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)), 
 'Item2' : pd.DataFrame(np.random.randn(4, 2))} 
p = pd.Panel(data) 
print p 
Its output is as follows: 
<class 'pandas.core.panel.Panel'> 
Dimensions: 2 (items) x 4 (major_axis) x 3 (minor_axis) 
Items axis: Item1 to Item2 
Major_axis axis: 0 to 3 
Minor_axis axis: 0 to 2 
Python Pandas 
Create an Empty Panel 
An empty panel can be created using the Panel constructor as follows: 
#creating an empty panel 
import pandas as pd 
p = pd.Panel() 
print p 
Its output is as follows: 
<class 'pandas.core.panel.Panel'> 
Dimensions: 0 (items) x 0 (major_axis) x 0 (minor_axis) 
Items axis: None 
Major_axis axis: None 
Minor_axis axis: None 
Selecting the Data from Panel 
Select the data from the panel using: 
 Items 
 Major_axis 
 Minor_axis 
Using Items 
# creating an empty panel 
import pandas as pd 
import numpy as np 
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)), 
 'Item2' : pd.DataFrame(np.random.randn(4, 2))} 
p = pd.Panel(data) 
print p['Item1'] 
Its output is as follows: 
0 1 2 
0 0.488224 -0.128637 0.930817 
1 0.417497 0.896681 0.576657 
2 -2.775266 0.571668 0.290082 
3 -0.400538 -0.144234 1.110535 
Python Pandas 
We have two items, and we retrieved item1. The result is a DataFrame with 4 rows and 3 
columns, which are the Major_axis and Minor_axis dimensions. 
Using major_axis 
Data can be accessed using the method panel.major_axis(index). 
# creating an empty panel 
import pandas as pd 
import numpy as np 
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)), 
 'Item2' : pd.DataFrame(np.random.randn(4, 2))} 
p = pd.Panel(data) 
print p.major_xs(1) 
Its output is as follows: 
Item1 Item2 
0 0.417497 0.748412 
1 0.896681 -0.557322 
2 0.576657 NaN 
Using minor_axis 
Data can be accessed using the method panel.major_axis(index). 
# creating an empty panel 
import pandas as pd 
import numpy as np 
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)), 
 'Item2' : pd.DataFrame(np.random.randn(4, 2))} 
p = pd.Panel(data) 
print p.minor_xs(1) 
Its output is as follows: 
Item1 Item2 
0 -0.128637 -1.047032 
1 0.896681 -0.557322 
2 0.571668 0.431953 
3 -0.144234 1.302466 
Note: Observe the changes in the dimensions. 
Python Pandas 
By now, we learnt about the three Pandas DataStructures and how to create them. We 
will majorly focus on the DataFrame objects because of its importance in the real time 
data processing and also discuss a few other DataStructures. 
Series Basic Functionality 
Attribute or 
1 axes Returns a list of the row axis labels. 
2 dtype Returns the dtype of the object. 
3 empty Returns True if series is empty. 
4 ndim 
Returns the number of dimensions of the underlying data, by 
definition 1. 
5 size Returns the number of elements in the underlying data. 
6 values Returns the Series as ndarray. 
7 head() Returns the first n rows. 
8 tail() Returns the last n rows. 
Let us now create a Series and see all the above tabulated attributes operation. 
import pandas as pd 
import numpy as np 
#Create a series with 100 random numbers 
s = pd.Series(np.random.randn(4)) 
print s 
Its output is as follows: 
0 0.9678531 -0.148368 
2 -1.395906 
3 -1.758394 
dtype: float64 
7. Pandas – Basic Functionality 
Python Pandas 
Returns the list of the labels of the series. 
import pandas as pd 
import numpy as np 
#Create a series with 100 random numbers 
s = pd.Series(np.random.randn(4)) 
print ("The axes are:") 
print s.axes 
Its output is as follows: 
The axes are: 
[RangeIndex(start=0, stop=4, step=1)] 
The above result is a compact format of a list of values from 0 to 5, i.e., [0,1,2,3,4]. 
Returns the Boolean value saying whether the Object is empty or not. True indicates that 
the object is empty. 
import pandas as pd 
import numpy as np 
#Create a series with 100 random numbers 
s = pd.Series(np.random.randn(4)) 
print ("Is the Object empty?") 
print s.empty 
Its output is as follows: 
Is the Object empty? 
Returns the number of dimensions of the object. By definition, a Series is a 1D data 
structure, so it returns 1. 
import pandas as pd 
import numpy as np 
Python Pandas 
#Create a series with 4 random numbers 
s = pd.Series(np.random.randn(4)) 
print s 
print ("The dimensions of the object:") 
print s.ndim 
Its output is as follows: 
0 0.175898 
1 0.166197 
2 -0.609712 
3 -1.377000 
dtype: float64 
The dimensions of the object: 
Returns the size(length) of the series. 
import pandas as pd 
import numpy as np 
#Create a series with 4 random numbers 
s = pd.Series(np.random.randn(2)) 
print s 
print ("The size of the object:") 
print s.size 
Its output is as follows: 
0 3.078058 
1 -1.207803 
dtype: float64 
The size of the object: 
Python Pandas 
Returns the actual data in the series as an array. 
import pandas as pd 
import numpy as np 
#Create a series with 4 random numbers 
s = pd.Series(np.random.randn(4)) 
print s 
print ("The actual data series is:") 
print s.values 
Its output is as follows: 
0 1.787373 
1 -0.605159 
2 0.180477 
3 -0.140922 
dtype: float64 
The actual data series is: 
[ 1.78737302 -0.60515881 0.18047664 -0.1409218 ] 
Head & Tail 
To view a small sample of a Series or the DataFrame object, use the head() and the 
tail() methods. 
head() returns the first n rows(observe the index values). The default number of elements 
to display is five, but you may pass a custom number. 
import pandas as pd 
import numpy as np 
#Create a series with 4 random numbers 
s = pd.Series(np.random.randn(4)) 
print ("The original series is:") 
print s 
Python Pandas 
print ("The first two rows of the data series:") 
print s.head(2) 
Its output is as follows: 
The original series is: 
0 0.720876 
1 -0.765898 
2 0.479221 
3 -0.139547 
dtype: float64 
The first two rows of the data series: 
0 0.720876 
1 -0.765898 
tail() returns the last n rows(observe the index values). The default number of elements 
to display is five, but you may pass a custom number. 
import pandas as pd 
import numpy as np 
#Create a series with 4 random numbers 
s = pd.Series(np.random.randn(4)) 
print ("The original series is:") 
print s 
print ("The last two rows of the data series:") 
print s.tail(2) 
Its output is as follows: 
The original series is: 
0 -0.655091 
1 -0.881407 
2 -0.608592 
3 -2.341413 
dtype: float64 
Python Pandas 
The last two rows of the data series: 
2 -0.608592 
3 -2.341413 
dtype: float64 
DataFrame Basic Functionality 
Let us now understand what DataFrame Basic Functionality is. The following tables lists 
down the important attributes or methods that help in DataFrame Basic Functionality. 
Attribute or 
1 T Transposes rows and columns. 
2 axes 
Returns a list with the row axis labels and column axis 
labels as the only members. 
3 dtypes Returns the dtypes in this object. 
4 empty 
True if NDFrame is entirely empty [no items]; if any of the 
axes are of length 0. 
5 ndim Number of axes / array dimensions. 
6 shape 
Returns a tuple representing the dimensionality of the 
7 size Number of elements in the NDFrame. 
8 values Numpy representation of NDFrame. 
9 head() Returns first n rows. 
10 tail() Returns last n rows. 
Let us now create a DataFrame and see all how the above mentioned attributes operate. 
import pandas as pd 
import numpy as np 
#Create a Dictionary of series 
Python Pandas 
#Create a DataFrame 
df = pd.DataFrame(d) 
print ("Our data series is:") 
print df 
Its output is as follows: 
Our data series is: 
 Age Name Rating 
0 25 Tom 4.23 
1 26 James 3.24 
2 25 Ricky 3.98 
3 23 Vin 2.56 
4 30 Steve 3.20 
5 29 Smith 4.60 
6 23 Jack 3.80 
T (Transpose) 
Returns the transpose of the DataFrame. The rows and columns will interchange. 
import pandas as pd 
import numpy as np 
# Create a Dictionary of series 
# Create a DataFrame 
df = pd.DataFrame(d) 
print ("The transpose of the data series is:") 
print df.T 
Its output is as follows: 
Python Pandas 
The transpose of the data series is: 
 0 1 2 3 4 5 6 
Age 25 26 25 23 30 29 23 
Name Tom James Ricky Vin Steve Smith Jack 
Rating 4.23 3.24 3.98 2.56 3.2 4.6 3.8 
Returns the list of row axis labels and column axis labels. 
import pandas as pd 
import numpy as np 
#Create a Dictionary of series 
#Create a DataFrame 
df = pd.DataFrame(d) 
print ("Row axis labels and column axis labels are:") 
print df.axes 
Its output is as follows: 
Row axis labels and column axis labels are: 
[RangeIndex(start=0, stop=7, step=1), Index([u'Age', u'Name', u'Rating'], 
Returns the data type of each column. 
import pandas as pd 
import numpy as np 
#Create a Dictionary of series 
Python Pandas 
#Create a DataFrame 
df = pd.DataFrame(d) 
print ("The data types of each column are:") 
print df.dtypes 
Its output is as follows: 
The data types of each column are: 
Age int64Name object 
Rating float64 
dtype: object 
Returns the Boolean value saying whether the Object is empty or not; True indicates that 
the object is empty. 
import pandas as pd 
import numpy as np 
#Create a Dictionary of series 
#Create a DataFrame 
df = pd.DataFrame(d) 
print ("Is the object empty?") 
print df.empty 
Its output is as follows: 
Is the object empty? 
Returns the number of dimensions of the object. By definition, DataFrame is a 2D object. 
Python Pandas 
import pandas as pd 
import numpy as np 
#Create a Dictionary of series 
#Create a DataFrame 
df = pd.DataFrame(d) 
print ("Our object is:") 
print df 
print ("The dimension of the object is:") 
print df.ndim 
Its output is as follows: 
Our object is: 
 Age Name Rating 
0 25 Tom 4.23 
1 26 James 3.24 
2 25 Ricky 3.98 
3 23 Vin 2.56 
4 30 Steve 3.20 
5 29 Smith 4.60 
6 23 Jack 3.80 
The dimension of the object is: 
Python Pandas 
Returns a tuple representing the dimensionality of the DataFrame. Tuple (a,b), where a 
represents the number of rows and b represents the number of columns. 
import pandas as pd 
import numpy as np 
#Create a Dictionary of series 
#Create a DataFrame 
df = pd.DataFrame(d) 
print ("Our object is:") 
print df 
print ("The shape of the object is:") 
print df.shape 
Its output is as follows: 
Our object is: 
 Age Name Rating 
0 25 Tom 4.23 
1 26 James 3.24 
2 25 Ricky 3.98 
3 23 Vin 2.56 
4 30 Steve 3.20 
5 29 Smith 4.60 
6 23 Jack 3.80 
The shape of the object is: 
(7, 3) 
Python Pandas 
Returns the number of elements in the DataFrame. 
import pandas as pd 
import numpy as np 
#Create a Dictionary of series 
#Create a DataFrame 
df = pd.DataFrame(d) 
print ("Our object is:") 
print df 
print ("The total number of elements in our object is:") 
print df.size 
Its output is as follows: 
Our object is: 
 Age Name Rating 
0 25 Tom 4.23 
1 26 James 3.24 
2 25 Ricky 3.98 
3 23 Vin 2.56 
4 30 Steve 3.20 
5 29 Smith 4.60

Continue navegando