Sunday, June 24, 2018

Create UDF functions in hive



HIVE UDF FUNCTIONS
Functions are built for a specific purpose to perform operations like Mathematical, arithmetic, logical and relational on the operands of table column names.

We can write the UDF function in java as shown below.
In this example,  we are replacing a character string into "*" . We are masking characters which should not be shown to the user.



package HiveUDF;
import org.apache.hadoop.hive.ql.exec.UDF;

public class Masking extends UDF {

    public String evaluate(String s) {
        int len = s.length();
        char[] input = s.toCharArray();
        if (len > 2) {
            for (int counter = 1; counter < len - 1; counter++)
                if (input[counter] != ' ')
                    input[counter] = '*';
        } else {
            for (int counter = 0; counter < len; counter++) {
                if (input[counter] != ' ')
                    input[counter] = '*';
            }
        }
        return String.valueOf(input);
    }

    public String evaluate(int i) {
        String IntToString = Integer.toString(i);
        int len = IntToString.length();
        char[] input = IntToString.toCharArray();
        for (int counter = 0; counter < len; counter++)
            input[counter] = '*';
        String maskedIntString = String.valueOf(input);
        return (maskedIntString);
    }
}

Now let us use this in our hive session by adding the JAR file and creating a temporary function called MASK.

hive> ADD JAR /home/cloudera/MaskingData.jar ;
hive> CREATE TEMPORARY FUNCTION MASK AS 'HiveUDF.Masking';

To test it, let us use this function on the id and name fields of the movies table which are of Integer and String types respectively.

First lets have a look at our input data.

hive> SELECT id, name FROM movies;



Now lets use our UDF function on the same fields and observe the output.

hive> SELECT MASK(id), MASK(name) FROM movies;


As expected, the MASK function accepts and executes the evaluate method depending on the type of input.

Hope you find this useful. In next article, we will see how to create Permanent functions in Hive.

No comments:

Post a Comment