Wednesday, January 31, 2018

Masking PII data using Hive


Hive table creation

Create table for import data with fields with CSV
hive> create table Account(id int,name string,phone string)
row format delimited
fields terminated by ',';

Create table for secured account where PI column would be masked
hive> create table Accountmasked(id int,name string,phone string)
row format delimited
fields terminated by ',';

Create contact table 
hive> create table contact(id int,accountid int,firstname string,lastName string,
phone string,email string) 
row format delimited 
fields terminated by ','; 



Create secured contact table where PI column would be masked
hive> create table securedcontact(id int,accountid int,firstname string,
lastName string,phone string,email string)
row format delimited 
fields terminated by ',';

MySQL table creation

Create account and contact table in mysql 
create table accounts(id int not null,name varchar(255) not null,phone varchar(255) 
not null,primary key (id) ); 

create table contacts(id int not null,accountid int not null,firstname 
varchar(255) not null,
lastName varchar(255),phone varchar(100),email varchar(100),primary key (id));

MySQL Data ingestion

Import data from account.csv to accounts and contacts.csv to contacts table in mysql. Ignore the headers
LOAD DATA LOCAL INFILE '/home/cloudera/workspace/Contact.csv' 
into table contacts fields terminated by ',' 
IGNORE 1 LINES;
LOAD DATA LOCAL INFILE '/home/cloudera/workspace/Account.csv' 
into table accounts fields terminated by ',' 
IGNORE 1 LINES;

Import data to hive from mysql contacts and account tables

sqoop import --connect jdbc:mysql://localhost:3306/securemask 
--username root --password cloudera 
--table contacts 
--fields-terminated-by "," 
--target-dir /user/cloudera/workspace/contactsmask 
--hive-import 

 sqoop import --connect jdbc:mysql://localhost:3306/securemask 
--username root --password cloudera 
--table accounts --fields-terminated-by "," 
--target-dir /user/cloudera/workspace/accountsmask 
--hive-import 

VALIDATION

 Hive tables can be seen using hue
CREATE UDF function in Java

package hiveudf;
import org.apache.hadoop.hive.ql.exec.UDF;

public class PImask extends UDF{
 public String evaluate(String s) {
        int len = s.length();
        char[] input = s.toCharArray();
        if (len > 2) {
            for (int counter = 1; counter < len - 1; counter++)
                if (input[counter] != ' ')
                    input[counter] = '*';
        } else {
            for (int counter = 0; counter < len; counter++) {
                if (input[counter] != ' ')
                    input[counter] = '*';
            }
        }
        return String.valueOf(input);
    }

    public String evaluate(int i) {
        String IntToString = Integer.toString(i);
        int len = IntToString.length();
        char[] input = IntToString.toCharArray();
        for (int counter = 0; counter < len; counter++)
            input[counter] = '*';
        String maskedIntString = String.valueOf(input);
        return (maskedIntString);
    }

}

Add UDF function
 hive> ADD JAR /home/cloudera/mask.jar; 
 hive> CREATE TEMPORARY FUNCTION MASK AS 'hiveudf.PImask';
 

Note: Before running this command we need to make sure that we have added the Jar file for UDF and create temporary function as shown above

Create similar tables in Hive
 hive> create table securedaccounts like accounts; 

DATA MASKING

Insert masked phone numer into accounts
 insert overwrite table securedaccounts select id,name,MASK(phone) from accounts; 

Iinsert masked phone and email into contacts

insert overwrite table securedcontacts select id,accountid,firstname,
lastname,MASK(phone),
MASK(email) from contacts;

 Move tables (securedcontacts,securedaccounts) from hive to my SQL
sqoop export --connect jdbc:mysql://localhost:3306/securemask 
--username root --password cloudera 
--table contacts  
--export-dir /user/hive/warehouse/securedcontacts

sqoop export --connect jdbc:mysql://localhost:3306/securemask 
--username root --password cloudera 
--table accounts
--export-dir /user/hive/warehouse/securedaccounts

Note: I had to create a new table in mysql for import

5 comments:

  1. Amazing post.Thanks for your details and explanations..I want more information from your side.Thank you

    Article submission sites
    Technology

    ReplyDelete
  2. Thanks for sharing the information. I found the information very useful. That’s an awesome story you posted. I will come back to scan some more.

    ReplyDelete