Sunday, June 24, 2018

Overview of HBase

  • A non-relational (NoSQL) database that runs on top of HDFS
  • Is an open source NoSQL database that provides real-time read/write access to those large datasets.
  • Scales linearly to handle huge data sets with billions of rows and millions of columns, and it easily combines data sources that use a wide variety of different structures and schemas. 
  • Is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN

is a web application for querying and visualizing data by interacting with Apache Hadoop.
It is a web interface for analyzing data with Apache Hadoop.

Accessing HBase by using the HBase Shell:
Connect to the running instance of HBase using the hbase shell command

[cloudera@quickstart ~]$ hbase shell

Check the installed version

hbase(main):001:0> version
1.2.0-cdh5.12.0, rUnknown, Thu Jun 29 04:42:07 PDT 2017
Note:The table names, rows, columns all must be enclosed in quote characters.

Creating a table:
Use the create command to create a new table. We must specify the table name and the ColumnFamily name. Here test is a table name and cf is a column family name

hbase(main):004:0> create 'test', 'cf'

List table info:
hbase(main):003:0> list 'test'

Populating the table:
Here, we insert three values, one at a time. The first insert is at row1, column cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix, cf in this example, followed by a colon and then a column qualifier suffix, a in the case below.
hbase(main):004:0> put 'test','row1','cf:a','value1'
0 row(s) in 0.1290 seconds

hbase(main):005:0> put 'test','row2','cf:a','value2'
0 row(s) in 0.0140 seconds

hbase(main):006:0> put 'test','row3','cf:a','value3'
0 row(s) in 0.0080 seconds

The table for all data at once: We can get data from HBase using scan. We can limit our scan, but for now, all data is fetched.
hbase(main):007:0> scan 'test'
ROW                   COLUMN+CELL                                               
 row1                 column=cf:a, timestamp=1517363261294, value=value1        
 row2                 column=cf:a, timestamp=1517363312331, value=value2        
 row3                 column=cf:a, timestamp=1517363341148, value=value3        
3 row(s) in 0.1180 seconds

To get a single row
 Of data at a time, we can use the get command
hbase(main):008:0> get 'test', 'row1'
COLUMN                CELL                                                      
 cf:a                 timestamp=1517363261294, value=value1                     
1 row(s) in 0.0150 seconds

Disabling and enabling a table:
 If you want to delete a table or change its settings, as well as in some other situations, we need to disable the table first, using the disable command. We can re-enable it using the enable command.
hbase(main):010:0> disable 'test'
0 row(s) in 2.3770 seconds

hbase(main):011:0> enable 'test'
0 row(s) in 1.3770 seconds
Exiting the HBase Shell: To exit the HBase Shell and disconnect from your cluster, use the quit command. HBase is still running in the background.


No comments:

Post a Comment