Friday, January 26, 2018

SQOOP Cheat Sheet

Sqoop commands:

List all commands
sqoop help

Connect and list all databases
Ex: sqoop list –databases \ 
       >--connect jdbc:mysql://quickstart.cloudera\ 
       >--username root –password cloudera

 List all tables in a specified database
sqoop list-tables --connect jdbc:mysql://quickstart.cloudera/retail_db  
            --username root --password cloudera

Simple Import
 sqoop import --connect jdbc:mysql://quickstart.cloudera/retail_db  
              --username root --password cloudera \
              > --table categories 

We can see the imported categories table using Hue as shown in the screenshot
Importing the table by working with one mapper
We can import table with required number of mappers in a specified directory path.

 $ sqoop import --connect jdbc:mysql://quickstart.cloudera/retail_db  
--username root --password cloudera  
--table products  
--fields-terminated-by '\t' \
> -m 1 \
> --target-dir /itemsdata

We can find that data transferred through sqoop in the specified hdfs directory as shown below


Code generated by Sqoop

  • By default sqoop runs four mappers when importing and exporting data.
  • While running sqoop job,it will create java classes and jar files.
  • We can see these files under /tmp/sqoop-cloudera/  

Import data to hive from mysql contacts and account tables

sqoop import --connect jdbc:mysql://localhost:3306/securemask 
--username root --password cloudera 
--table contacts 
--fields-terminated-by "," 
--target-dir /user/cloudera/workspace/contactsmask 
--hive-import 

In this import we are specifying the schema and not the directory path
sqoop import --connect jdbc:mysql://quickstart.cloudera/original 
--username root --password cloudera 
--table employee 
--hive-import 
--hive-table hiveschema.employee -m1 
--fields-terminated-by '\t' 
--direct; 

Incremental Import 
This option would be used if we need only incremental data(new data)
sqoop import \
> --connect jdbc:mysql://localhost:3306/retail_db \
> --username retail_dba --password cloudera \
> --table departments --check-column \
> department_id --incremental append --last-value 7

Export data from HDFS to mysql
Before we import ,we need to have the same schema/table in the destination location in this scenario in mysql

[cloudera@quickstart ~]$ sqoop export \
> --connect jdbc:mysql://quickstart.cloudera/retail_db \
> --username root --password cloudera \
> --table categories \
> --export-dir categories_data

Create jobs
Note: we need to have space before import. If we have space then we would get error
sqoop job --create myjob  
          -- import 
          --connect jdbc:mysql://localhost:3306/retail_db 
          --username root  
          --password cloudera  
          --table categories 

Show list of jobs
sqoop job --list

Execute the created job
sqoop job --exec myjob

Evaluate  in sqoop 
 Allows users to execute user defined queries against respective database.
sqoop eval --connect jdbc:mysql://localhost:3306/retail_db \
           --username root --password cloudera 
           --query "select * from departments"

No comments:

Post a Comment