Wednesday, February 5, 2020

Machine Learning workflow



Machine Learning Workflow consists of 3 components

  • Explore and process data
  • Modeling
  • Deployment

Wednesday, January 29, 2020

Error functions




  • In most learning networks, error is calculated as the difference between the actual output and the predicted output.
  • The error function is which tells us how far are we from the solution.
  • The function that is used to compute this error is known as loss function.
  • Different loss functions will give different errors for the same prediction and thus would have a considerable effort on the performance of the model.

Canny edge detector



The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a computational theory of edge detection explaining why the technique works.



Grayscale to RGB conversion


we will convert an color image into a grayscale image. There are two methods to convert it. Both has their own merits and demerits. The methods are:
  • Average method
  • Weighted method or luminosity method
  • using skimage

Average method

Average method is the most simple one. You just have to take the average of three colors. Since its an RGB image, so it means that you have add r with g with b and then divide it by 3 to get your desired grayscale image.
Its done in this way.
Grayscale = (R + G + B / 3)

pyplot

Provides a MATLAB-like plotting framework.
pylab combines pyplot with numpy into a single namespace. This is convenient for interactive work, but for programming it is recommended that the namespaces be kept separate, e.g.:
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0, 5, 0.1);
y = np.sin(x)
plt.plot(x, y)

Thursday, January 23, 2020

Transfer learning




TRANSFER LEARNING Using a pre trained network on images not in training set is known as transfer learning

MNIST GAN


Here we have 2 neural networks GENERATOR and  DISCRIMINATOR

DISCRIMINATOR:

  • Is a simple classifier that tries to classify the images as real from the training set or fake generated images.

Glob module in python


Usually, the programmers require to traverse through a list of files at some location, mostly having a specific pattern. Python’s glob module has several functions that can help in listing files under a specified folder. We may filter them based on extensions, or with a particular string as a portion of the filename.
All the methods of Glob module follow the Unix-style pattern matching mechanism and rules. However, it doesn’t allow expanding the tilde (~) and environment variables.

Wednesday, January 22, 2020

Flattening in NLP




  • Instead of representing this 4*4 matrix, we can construct a vector with 16 entries.
  • First 4 entries correspond to the first row.similarly 2nd 3rd and 4th

Note:After converting our image to an vector they can be fed to an input layer on NLP.




MoviePy




MoviePy is a Python module for video editing, which can be used for basic operations (like cuts, concatenations, title insertions), video compositing (a.k.a. non-linear editing), video processing, or to create advanced effects. It can read and write the most common video formats, including GIF.
Here it is in action (run in an IPython Notebook):

Gaussian Blur operation




In Gaussian Blur operation, the image is convolved with a Gaussian filter instead of the box filter. The Gaussian filter is a low-pass filter that removes the high-frequency components are reduced.
You can perform this operation on an image using the Gaussianblur()method of the imgproc class. Following is the syntax of this method −
GaussianBlur(src, dst, ksize, sigmaX)
This method accepts the following parameters −
  • src − A Mat object representing the source (input image) for this operation.
  • dst − A Mat object representing the destination (output image) for this operation.
  • ksize − A Size object representing the size of the kernel.
  • sigmaX − A variable of the type double representing the Gaussian kernel standard deviation in X direction.

Geometric Image Transformation




Geometric Image Transformations

The functions in this section perform various geometrical transformations of 2D images. They do not change the image content but deform the pixel grid and map this deformed grid to the destination image. In fact, to avoid sampling artifacts, the mapping is done in the reverse order, from destination to the source. That is, for each pixel (x, y) of the destination image, the functions compute coordinates of the corresponding “donor” pixel in the source image and copy the pixel value:
\texttt{dst} (x,y)= \texttt{src} (f_x(x,y), f_y(x,y))
In case when you specify the forward mapping \left<g_x, g_y\right>: \texttt{src} \rightarrow \texttt{dst} , the OpenCV functions first compute the corresponding inverse mapping \left<f_x, f_y\right>: \texttt{dst} \rightarrow \texttt{src} and then use the above formula.

Tuesday, January 21, 2020

Deep learning car behavioral cloning



Git hub repo for this project :  https://github.com/PRkudupu/Car_Behavioral_Cloning

Softmax function



  • A wonderful activation function that turns numbers aka logits into probabilities that sum to one. 
  • Outputs a vector that represents the probability distributions of a list of potential outcomes
  • Core element that is used in deep learning task

matplotlib and pyplot

Startup commands

First, let’s start IPython. It is a most excellent enhancement to the standard Python prompt, and it ties in especially well with Matplotlib. Start IPython either at a shell, or the IPython Notebook now.
With IPython started, we now need to connect to a GUI event loop. This tells IPython where (and how) to display plots. To connect to a GUI loop, execute the %matplotlib magic at your IPython prompt. There’s more detail on exactly what this does at IPython’s documentation on GUI event loops.
If you’re using IPython Notebook, the same commands are available, but people commonly use a specific argument to the %matplotlib magic:

Rolling average





GRAPH SHOWING MOVING AVERAGE FOR THE 7 DAY PRECEDING AVERAGE.


Difference between AWS glue and Hive warehouse




Apache Hive vs AWS Glue: What are the differences?
Apache Hive: Data Warehouse Software for Reading, Writing, and Managing Large Datasets. Hive facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage; AWS Glue:Fully managed extract, transform, and load (ETL) service. A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.
Apache Hive and AWS Glue can be primarily classified as "Big Data" tools.
Some of the features offered by Apache Hive are:
  • Built on top of Apache Hadoop
  • Tools to enable easy access to data via SQL
  • Support for extract/transform/load (ETL), reporting, and data analysis

Lag and lead in sql




LEAD() and LAG() Function


The LEAD() and LAG() function in MySQL are used to get preceding and succeeding value of any row within its partition. These functions are termed as nonaggregate Window functions.

AZURE BLOB STORAGE


  • Service for storing large amount of unstructured object data, such as text or binary data
  • Expose data publicly to the world, or store application data privately
  • Common uses of blob storage
    • Serving images or documents directly to a browser
    • Storing files for distributed access
    • Storing data for backup and restore, disaster, recovery and archiving
STORAGE ACCOUNT
  • All access to storage is done through a storage account.
  • This account can be a General purpose or a blob storage.

Spark Executors and Drivers




SPARK CONSISTS OF DRIVER PROGRAM
  • Our code that defines which datasources to consume from and which data sources to consume to which is driving all our execution
  • Responsible for getting the spark context
  • It would understand how to communicate with the cluster manager.(Yarn mesos or stand-alone)

Loading facts in ETL


LOADING FACTS
The most important thing about loading fact tables is that first you need to load dimension tables and then according to the specification the fact tables.


The fact table is often located in the center of a star schema, surrounded by dimension tables. It has two types of columns: those containing facts and other containing foreign keys to dimension tables.

Installing spark on windows




 https://www.youtube.com/watch?v=2CvtwKTjI4Q&vl=en

1) Download specific version of spark
 http://spark.apache.org/downloads.html
2) Unzip and create a directory for spark

Wasb



What is HDFS?
The Hadoop Distributed File System (HDFS) is one of the core Hadoop components, it is how Hadoop manages data and storage. At a high level, when you load a file into Hadoop the "name node" uses HDFS to chunk the file into blocks and it spreads those blocks of data across the worker nodes within the cluster. Each chunk of data is stored on multiple nodes (assuming the replication factor is set to > 1) for higher availability. The name node knows where each chunk of data is stored and that information is used by the job manager to allocate tasks and resources appropriately across nodes.

DStream vs RDD