Beginners Pandas Getting Started¶

Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.

If you're new to this first get the enviroment Setup in our previous post
- Getting Started with Jupyter [Part -1] http://www.androidxu.com/2017/04/guide-On-Jupyter-Notebook.html
- Getting Started with Jupyter [Part -2] http://www.androidxu.com/2017/04/the-ultimate-guide-on-jupyter-ipython-mardown.html#.WPJOBYVOL4g

pandas is well suited for:

Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
Ordered and unordered (not necessarily fixed-frequency) time series data.
Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

Key features:

Easy handling of missing data
Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the data can be aligned automatically
Powerful, flexible group by functionality to perform split-apply-combine operations on data sets
Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
Intuitive merging and joining data sets
Flexible reshaping and pivoting of data sets
Hierarchical labeling of axes
Robust IO tools for loading data from flat files, Excel files, databases, and HDF5
Time series functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

We’ll start with a quick, non-comprehensive overview of the fundamental data structures in pandas to get you started. The fundamental behavior about data types, indexing, and axis labeling / alignment apply across all of the objects. To get started, import numpy and load pandas into your namespace:
documentation: http://pandas.pydata.org/pandas-docs/stable/10min.html

Series¶

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers,
Python objects, etc.). The axis labels are collectively referred to as the index.
documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html

In [38]:

#importing numpy and pandas library
import pandas as pd
import numpy as np

Create series from NumPy array¶

Creating a basic series from NumpPy array.
Number of labels in 'index' must be the same as the number of elements in array

In [39]:

my_simple_series = pd.Series(np.random.randn(7), index=['a', 'b', 'c', 'd', 'e','f','g'])
my_simple_series

Out[39]:

a    0.623720
b    0.397227
c    0.470759
d    0.323920
e   -1.186631
f   -1.175695
g    0.744503
dtype: float64

In [40]:

my_simple_series.index

Out[40]:

Index([u'a', u'b', u'c', u'd', u'e', u'f', u'g'], dtype='object')

Create series from NumPy array, without explicit index¶

In [41]:

my_simple_series = pd.Series(np.random.randn(5))
my_simple_series

Out[41]:

0    1.285379
1   -0.672387
2   -0.720461
3   -0.263968
4    0.547311
dtype: float64

Access a series like a NumPy array

In [42]:

my_simple_series[:3]

Out[42]:

0    1.285379
1   -0.672387
2   -0.720461
dtype: float64

Create series from Python dictionary¶

In [43]:

my_dictionary = {'a' : 45., 'b' : -19.5, 'c' : 4444}
my_second_series = pd.Series(my_dictionary)
my_second_series

Out[43]:

a      45.0
b     -19.5
c    4444.0
dtype: float64

Access a series like a dictionary

In [44]:

my_second_series['b']

Out[44]:

-19.5

note order in display; same as order in "index"
note NaN

In [45]:

pd.Series(my_dictionary, index=['b', 'c', 'd', 'a'])

Out[45]:

b     -19.5
c    4444.0
d       NaN
a      45.0
dtype: float64

In [46]:

my_second_series.get('a')

Out[46]:

45.0

In [47]:

unknown = my_second_series.get('f')
type(unknown)

Out[47]:

NoneType

Create series from scalar¶

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

In [48]:

pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])

Out[48]:

a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64

Vectorized Operations¶

not necessary to write loops for element-by-element operations
pandas' Series objects can be passed to MOST NumPy functions

documentation: http://pandas.pydata.org/pandas-docs/stable/basics.html

In [49]:

my_dictionary = {'a' : 45., 'b' : -19.5, 'c' : 4444}
my_series = pd.Series(my_dictionary)
my_series

Out[49]:

a      45.0
b     -19.5
c    4444.0
dtype: float64

Add Series without loop¶

In [50]:

my_series + my_series

Out[50]:

a      90.0
b     -39.0
c    8888.0
dtype: float64

In [51]:

my_series

Out[51]:

a      45.0
b     -19.5
c    4444.0
dtype: float64

Series within arithmetic expression¶

In [52]:

#adding values into a series
my_series +5

Out[52]:

a      50.0
b     -14.5
c    4449.0
dtype: float64

Series used as argument to NumPy function¶

In [53]:

np.exp(my_series)

Out[53]:

a    3.493427e+19
b    3.398268e-09
c             inf
dtype: float64

A key difference between Series and ndarray is that operations between Series automatically align the data based on
label. Thus, you can write computations without giving consideration to whether the Series involved have the same labels.

In [54]:

my_series[1:]

Out[54]:

b     -19.5
c    4444.0
dtype: float64

In [55]:

my_series[:-1]

Out[55]:

a    45.0
b   -19.5
dtype: float64

In [56]:

my_series[1:] + my_series[:-1]

Out[56]:

a     NaN
b   -39.0
c     NaN
dtype: float64

Apply Python functions on an element-by-element basis¶

In [57]:

def multiply_by_ten (input_element):
    return input_element * 10.0

In [58]:

my_series.map(multiply_by_ten)

Out[58]:

a      450.0
b     -195.0
c    44440.0
dtype: float64

Vectorized string methods¶

Series is equipped with a set of string processing methods that make it easy to operate on each element of the array. Perhaps most importantly, these methods exclude missing/NA values automatically.

In [59]:

series_of_strings = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])

In [60]:

series_of_strings.str.lower()

Out[60]:

0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

Please Subscribe and Share with fellow developer!

Reference resource :
- documentation: http://pandas.pydata.org/pandas-docs/stable/10min.html
- documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html

5 comments

Balas

Ankit

16 April 2017 at 04:46

Wow...Nyc post to get started with series!

JackMusk

16 April 2017 at 04:50

very well written and helpful!

Anonymous

16 April 2017 at 05:02

Well documented one for series but still something are missing

mike Lawson

17 April 2017 at 20:08

Well explained .. Thanks..

Venkatesh Pillai

17 April 2017 at 20:10

Thanks mike you liked it !

EmoticonEmoticon

Nintyzeros

Pandas in Python for Data Analysis with Example(Step-by-Step guide)

Beginners Pandas Getting Started¶

Series¶

Create series from NumPy array¶

Create series from NumPy array, without explicit index¶

Create series from Python dictionary¶

Create series from scalar¶

Vectorized Operations¶

Add Series without loop¶

Series within arithmetic expression¶

Series used as argument to NumPy function¶

Apply Python functions on an element-by-element basis¶

Vectorized string methods¶

Venkat

5 comments

Get new posts by email:

Nintyzeros

Pandas in Python for Data Analysis with Example(Step-by-Step guide)

Beginners Pandas Getting Started¶

Series¶

Create series from NumPy array¶

Create series from NumPy array, without explicit index¶

Create series from Python dictionary¶

Create series from scalar¶

Vectorized Operations¶

Add Series without loop¶

Series within arithmetic expression¶

Series used as argument to NumPy function¶

Apply Python functions on an element-by-element basis¶

Vectorized string methods¶

In the next post we will continue seeing the arithmetic Operations, So Subscribe it and Stay tuned!¶

Related Post

Venkat

5 comments