Sunday, April 23, 2017

Data analysis with Python and Panda

Python 

  • Programming language that lets us work quickly and integrate systems more efficiently
  • Written in C
  • Python is almost fast as C
  • Syntax is very easy


Installation

  • Download python from https://www.python.org/downloads/
  • http://pandas.pydata.org/
PIP
  • Is a package management system used to install and manage software packages written in Python

Numpy

     Is the fundamental package for scientific computing with Python. It contains among other things:
  • A powerful N-dimensional array object
  • Sophisticated (broadcasting) functions
  • Tools for integrating C/C++ and FORTRAN code
  • Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. 
C:\Users\Prath>pip install pandas

Pandas :

  • Python wrapper around C
  • Is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • Pandas is a NUMFocus sponsored project.

Reason for using pandas module

  • It generally works with data frames. Data frames are like a spread sheet. It would be like an excel spread sheet
  • Built to make life easier

Why Phtyhon over excel?

  • Excel is slow. If we are dealing with more than 2 GB of data, excel would stop responding ( It might take 30 minutes)
The Jupyter Notebook
(Formerly known as the IPython Notebook)

pip3 install jupyter

Anaconda 

  • Anaconda is the leading open data science platform powered by Python. 
  • The open source version of Anaconda is a high performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science.

Jupyter Notebook

  • The Jupyter Notebook App is a server-client application that allows editing and running notebook documents via a web browser
  • The Jupyter Notebook App can be executed on a local desktop requiring no internet access (as described in this document) or can be installed on a remote server and accessed through the internet.

Open notebook

  • We need to install anaconda to use Jupyter notebook

Sample report 



4 comments: