Every time we want to use the function, we need to add the jar and create a temporary function
hive> ADD JAR /home/cloudera/mask.jar;
Added [/home/cloudera/mask.jar] to class path
Added resources: [/home/cloudera/mask.jar]
hive> CREATE TEMPORARY FUNCTION MASK AS 'hiveudf.PImask';
hive> ADD JAR /home/cloudera/mask.jar; Added [/home/cloudera/mask.jar] to class path Added resources: [/home/cloudera/mask.jar] hive> CREATE TEMPORARY FUNCTION MASK AS 'hiveudf.PImask';
HIVE PERMANENT FUNCTION
Note: If we have already created a temporary file then we need to create a new function name while creating permanent function
The problem with temporary function is that the function is valid only till
the session is alive in which it was created and is lost as soon as we
log off.
Many a times we have requirements where we need the functions to be
permanent so that they can be used across sessions and across different
edge nodes. Let us create a permanent function from the same jar file
and test the same.
1. Store
the JAR file in any HDFS location instead of local. This is to make sure
that all the nodes, always have access to the JAR files.
$> hadoop fs -put MaskingData.jar ;
2. Next, we create a permanent function with JAR path of HDFS included.
hive> CREATE FUNCTION MASK AS 'HiveUDF.Masking' using JAR 'hdfs://localhost:8020/user/cloudera/MaskingData.jar';
Please note in the highlighted are that when function is created, if
moves the JAR from HDFS to local file system (in /tmp/ location, but the
resource is added with hdfs:// location reference.
First let’s run it in the same session in which we created the function:
hive> SELECT category_id, category_name, MASK(category_name) FROM categories LIMIT 10;
The function is working as expected and the category_name field is masked.
Next, we log out the of the session and log back in (or login from any
other edge node if possible) and perform the same test. If this was a
temporary function, it would have been lost as soon as we logged out of
the session.
Let's see how Permanent Functions work
Let's see how Permanent Functions work
hive> exit; $> hive hive> SELECT category_id, category_name, MASK(category_name) FROM categories LIMIT 10;
Please note that even in the new session, as soon as we use the function
MASK, hive automatically fetched and adds the required JAR file.
This obviously means that the location of the JAR file should not be changed from where it was defined while function creation.
This obviously means that the location of the JAR file should not be changed from where it was defined while function creation.
Very informative. Could you please explain how did you created the MaskingData.jar.
ReplyDelete