Sunday, April 23, 2017

Data analysis with Python and Panda

Python 

  • Programming language that lets us work quickly and integrate systems more efficiently
  • Written in C
  • Python is almost fast as C
  • Syntax is very easy


Installation

  • Download python from https://www.python.org/downloads/
  • http://pandas.pydata.org/
PIP
  • Is a package management system used to install and manage software packages written in Python

Numpy

     Is the fundamental package for scientific computing with Python. It contains among other things:
  • A powerful N-dimensional array object
  • Sophisticated (broadcasting) functions
  • Tools for integrating C/C++ and FORTRAN code
  • Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. 
C:\Users\Prath>pip install pandas

Pandas :

  • Python wrapper around C
  • Is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • Pandas is a NUMFocus sponsored project.

Reason for using pandas module

  • It generally works with data frames. Data frames are like a spread sheet. It would be like an excel spread sheet
  • Built to make life easier

Why Phtyhon over excel?

  • Excel is slow. If we are dealing with more than 2 GB of data, excel would stop responding ( It might take 30 minutes)
The Jupyter Notebook
(Formerly known as the IPython Notebook)

pip3 install jupyter

Anaconda 

  • Anaconda is the leading open data science platform powered by Python. 
  • The open source version of Anaconda is a high performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science.

Jupyter Notebook

  • The Jupyter Notebook App is a server-client application that allows editing and running notebook documents via a web browser
  • The Jupyter Notebook App can be executed on a local desktop requiring no internet access (as described in this document) or can be installed on a remote server and accessed through the internet.

Open notebook

  • We need to install anaconda to use Jupyter notebook

Sample report 



8 comments:

  1. The Data Scientists must be prepared with the accompanying open source programming, for example, Spark, R programming, Python just as beneficial programming like SPSS and SAS. ExcelR Data Science Courses

    ReplyDelete
  2. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.

    Correlation vs Covariance

    ReplyDelete
  3. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.

    Simple Linear Regression

    Correlation vs covariance

    KNN Algorithm

    Logistic Regression explained

    ReplyDelete
  4. This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.

    Simple Linear Regression

    Correlation vs Covariance

    ReplyDelete
  5. Extraordinary blog filled with an amazing content which no one has touched this subject before. Thanking the blogger for all the terrific efforts put in to develop such an awesome content. Expecting to deliver similar content further too and keep sharing as always.

    data science training

    ReplyDelete