Sunday, April 23, 2017

Data analysis with Python and Panda

Python 

  • Programming language that lets us work quickly and integrate systems more efficiently
  • Written in C
  • Python is almost fast as C
  • Syntax is very easy


Installation

  • Download python from https://www.python.org/downloads/
  • http://pandas.pydata.org/
PIP
  • Is a package management system used to install and manage software packages written in Python

Numpy

     Is the fundamental package for scientific computing with Python. It contains among other things:
  • A powerful N-dimensional array object
  • Sophisticated (broadcasting) functions
  • Tools for integrating C/C++ and FORTRAN code
  • Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. 
C:\Users\Prath>pip install pandas

Pandas :

  • Python wrapper around C
  • Is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • Pandas is a NUMFocus sponsored project.

Reason for using pandas module

  • It generally works with data frames. Data frames are like a spread sheet. It would be like an excel spread sheet
  • Built to make life easier

Why Phtyhon over excel?

  • Excel is slow. If we are dealing with more than 2 GB of data, excel would stop responding ( It might take 30 minutes)
The Jupyter Notebook
(Formerly known as the IPython Notebook)

pip3 install jupyter

Anaconda 

  • Anaconda is the leading open data science platform powered by Python. 
  • The open source version of Anaconda is a high performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science.

Jupyter Notebook

  • The Jupyter Notebook App is a server-client application that allows editing and running notebook documents via a web browser
  • The Jupyter Notebook App can be executed on a local desktop requiring no internet access (as described in this document) or can be installed on a remote server and accessed through the internet.

Open notebook

  • We need to install anaconda to use Jupyter notebook

Sample report 



7 comments:

  1. The Data Scientists must be prepared with the accompanying open source programming, for example, Spark, R programming, Python just as beneficial programming like SPSS and SAS. ExcelR Data Science Courses

    ReplyDelete
  2. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.

    Correlation vs Covariance

    ReplyDelete
  3. Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.

    Data Science In Banglore With Placements
    Data Science Course In Bangalore
    Data Science Training In Bangalore
    Best Data Science Courses In Bangalore
    Data Science Institute In Bangalore

    Thank you..

    ReplyDelete
  4. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.

    Simple Linear Regression

    Correlation vs covariance

    KNN Algorithm

    Logistic Regression explained

    ReplyDelete