Wednesday, January 29, 2020

Error functions

In most learning networks, error is calculated as the difference between the actual output and the predicted output.
The error function is which tells us how far are we from the solution.
The function that is used to compute this error is known as loss function.
Different loss functions will give different errors for the same prediction and thus would have a considerable effort on the performance of the model.

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. Canny also produced a computational theory of edge detection explaining why the technique works.

we will convert an color image into a grayscale image. There are two methods to convert it. Both has their own merits and demerits. The methods are:

Average method
Weighted method or luminosity method
using skimage

Average method

Average method is the most simple one. You just have to take the average of three colors. Since its an RGB image, so it means that you have add r with g with b and then divide it by 3 to get your desired grayscale image.

Its done in this way.

Grayscale = (R + G + B / 3)

matplotlib.pyplot

Provides a MATLAB-like plotting framework.

pylab combines pyplot with numpy into a single namespace. This is convenient for interactive work, but for programming it is recommended that the namespaces be kept separate, e.g.:

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0, 5, 0.1);
y = np.sin(x)
plt.plot(x, y)

Transfer learning

TRANSFER LEARNING Using a pre trained network on images not in training set is known as transfer learning

Here we have 2 neural networks GENERATOR and DISCRIMINATOR

DISCRIMINATOR:

Is a simple classifier that tries to classify the images as real from the training set or fake generated images.

Usually, the programmers require to traverse through a list of files at some location, mostly having a specific pattern. Python’s glob module has several functions that can help in listing files under a specified folder. We may filter them based on extensions, or with a particular string as a portion of the filename.

All the methods of Glob module follow the Unix-style pattern matching mechanism and rules. However, it doesn’t allow expanding the tilde (~) and environment variables.

Flattening in NLP

Instead of representing this 4*4 matrix, we can construct a vector with 16 entries.
First 4 entries correspond to the first row.similarly 2nd 3rd and 4th

Note:After converting our image to an vector they can be fed to an input layer on NLP.

MoviePy

MoviePy is a Python module for video editing, which can be used for basic operations (like cuts, concatenations, title insertions), video compositing (a.k.a. non-linear editing), video processing, or to create advanced effects. It can read and write the most common video formats, including GIF.

Here it is in action (run in an IPython Notebook):

In Gaussian Blur operation, the image is convolved with a Gaussian filter instead of the box filter. The Gaussian filter is a low-pass filter that removes the high-frequency components are reduced.

You can perform this operation on an image using the Gaussianblur()method of the imgproc class. Following is the syntax of this method −

GaussianBlur(src, dst, ksize, sigmaX)

This method accepts the following parameters −

src − A Mat object representing the source (input image) for this operation.
dst − A Mat object representing the destination (output image) for this operation.
ksize − A Size object representing the size of the kernel.
sigmaX − A variable of the type double representing the Gaussian kernel standard deviation in X direction.

Geometric Image Transformations

The functions in this section perform various geometrical transformations of 2D images. They do not change the image content but deform the pixel grid and map this deformed grid to the destination image. In fact, to avoid sampling artifacts, the mapping is done in the reverse order, from destination to the source. That is, for each pixel $(x, y)$ of the destination image, the functions compute coordinates of the corresponding “donor” pixel in the source image and copy the pixel value:

$\texttt{dst} (x,y)= \texttt{src} (f_x(x,y), f_y(x,y))$

In case when you specify the forward mapping $\left<g_x, g_y\right>: \texttt{src} \rightarrow \texttt{dst}$ , the OpenCV functions first compute the corresponding inverse mapping $\left<f_x, f_y\right>: \texttt{dst} \rightarrow \texttt{src}$ and then use the above formula.

Softmax function

A wonderful activation function that turns numbers aka logits into probabilities that sum to one.
Outputs a vector that represents the probability distributions of a list of potential outcomes
Core element that is used in deep learning task

Startup commands

First, let’s start IPython. It is a most excellent enhancement to the standard Python prompt, and it ties in especially well with Matplotlib. Start IPython either at a shell, or the IPython Notebook now.

With IPython started, we now need to connect to a GUI event loop. This tells IPython where (and how) to display plots. To connect to a GUI loop, execute the %matplotlib magic at your IPython prompt. There’s more detail on exactly what this does at IPython’s documentation on GUI event loops.

If you’re using IPython Notebook, the same commands are available, but people commonly use a specific argument to the %matplotlib magic:

GRAPH SHOWING MOVING AVERAGE FOR THE 7 DAY PRECEDING AVERAGE.

Apache Hive vs AWS Glue: What are the differences?

Apache Hive: Data Warehouse Software for Reading, Writing, and Managing Large Datasets. Hive facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage; AWS Glue:Fully managed extract, transform, and load (ETL) service. A fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.

Apache Hive and AWS Glue can be primarily classified as "Big Data" tools.

Some of the features offered by Apache Hive are:

Built on top of Apache Hadoop
Tools to enable easy access to data via SQL
Support for extract/transform/load (ETL), reporting, and data analysis

LEAD() and LAG() Function

The LEAD() and LAG() function in MySQL are used to get preceding and succeeding value of any row within its partition. These functions are termed as nonaggregate Window functions.

Service for storing large amount of unstructured object data, such as text or binary data
Expose data publicly to the world, or store application data privately
Common uses of blob storage

Serving images or documents directly to a browser
Storing files for distributed access
Storing data for backup and restore, disaster, recovery and archiving

STORAGE ACCOUNT

All access to storage is done through a storage account.
This account can be a General purpose or a blob storage.

SPARK CONSISTS OF DRIVER PROGRAM

Our code that defines which datasources to consume from and which data sources to consume to which is driving all our execution
Responsible for getting the spark context
It would understand how to communicate with the cluster manager.(Yarn mesos or stand-alone)

LOADING FACTS
The most important thing about loading fact tables is that first you need to load dimension tables and then according to the specification the fact tables.

The fact table is often located in the center of a star schema, surrounded by dimension tables. It has two types of columns: those containing facts and other containing foreign keys to dimension tables.

https://www.youtube.com/watch?v=2CvtwKTjI4Q&vl=en

1) Download specific version of spark
http://spark.apache.org/downloads.html
2) Unzip and create a directory for spark

What is HDFS?

The Hadoop Distributed File System (HDFS) is one of the core Hadoop components, it is how Hadoop manages data and storage. At a high level, when you load a file into Hadoop the "name node" uses HDFS to chunk the file into blocks and it spreads those blocks of data across the worker nodes within the cluster. Each chunk of data is stored on multiple nodes (assuming the replication factor is set to > 1) for higher availability. The name node knows where each chunk of data is stored and that information is used by the job manager to allocate tasks and resources appropriately across nodes.

Web Snippets

Labels

Wednesday, January 29, 2020

Error functions

Canny edge detector

Grayscale to RGB conversion

Average method

pyplot

Thursday, January 23, 2020

Transfer learning

MNIST GAN

Glob module in python

Wednesday, January 22, 2020

Flattening in NLP

MoviePy

Gaussian Blur operation