We learnt about Hive in previous post. We will setup and run Hive (with MySQL based metastore) here. Note that we will use one of the machines used in previous post hence JAVA_HOME and HADOOP_HOME related variables are assumed to be set in ~/.bashrc
Installation
For installation we will use master node (note that here in this demo will run HiveQL using Hive commandline console only, hence we will just install on master node). We will use following details for installation:- Installation base directory:
- /home/anishsneh/installs
- Installation user name:
- anishsneh
- Hadoop details (will use same Hadoop Cluster configured in previous post) with address:
- hdfs://server01:9000
- We will use MySQL based metastore with following details (note that we need to install/configure a MySQL server with following details):
- Server name: server01
- Server URL: jdbc:mysql://localhost:3306
- Database name: hive
- MySQL username: hiveuser
- MySQL password: Welcome1hive
- Install Metastore Database
- Install MySQL server on server01 (using yum or any convenient method)
- Use root password as "Welcome1root"
- Configure Metastore Database
- Connect and configure MySQL metastore as follows:
[anishsneh@server01 installs]$ mysql -uroot -pWelcome1root Warning: Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 5 Server version: 5.6.17 MySQL Community Server (GPL) Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> create database hive; Query OK, 1 row affected (0.26 sec) mysql> create user 'hiveuser'@'%' IDENTIFIED BY 'Welcome1hive'; Query OK, 0 rows affected (0.07 sec) mysql> GRANT all on *.* to 'hiveuser'@localhost identified by 'Welcome1hive'; Query OK, 0 rows affected (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.00 sec)
- Install Hive
- Download Apache Hive binary from Apache Website. We are using apache-hive-1.2.1-bin.tar.gz
- Set HIVE_HOME="/home/anishsneh/installs/apache-hive-1.2.1-bin" in ~/.bashrc (or wherever maintaining environment variables), reload profile/bash.
- Extract downloaded package to /home/anishsneh/installs, such that we have:
[anishsneh@server01 installs]$ ls -ltr apache-hive-1.2.1-bin total 476 -rw-rw-r--. 1 anishsneh anishsneh 24754 Apr 29 10:26 LICENSE -rw-rw-r--. 1 anishsneh anishsneh 421129 Jun 19 01:59 RELEASE_NOTES.txt -rw-rw-r--. 1 anishsneh anishsneh 4366 Jun 19 01:59 README.txt -rw-rw-r--. 1 anishsneh anishsneh 397 Jun 19 01:59 NOTICE drwxrwxr-x. 4 anishsneh anishsneh 4096 Sep 27 14:29 examples drwxrwxr-x. 3 anishsneh anishsneh 4096 Sep 27 14:29 bin drwxrwxr-x. 3 anishsneh anishsneh 4096 Sep 27 14:29 scripts drwxrwxr-x. 4 anishsneh anishsneh 4096 Sep 27 14:29 lib drwxrwxr-x. 7 anishsneh anishsneh 4096 Sep 27 14:29 hcatalog drwxrwxr-x. 2 anishsneh anishsneh 4096 Sep 27 14:29 conf
- Configure Hive
- Download MySQL connector zip from MySQL website (we are using mysql-connector-java-5.1.36), extract and copy mysql-connector-java-*.jar to $HIVE_HOME/lib such that we have:
[anishsneh@server01 installs]$ ls -ltr apache-hive-1.2.1-bin/lib/mysql-connector-* -rw-r--r--. 1 anishsneh anishsneh 972009 Sep 27 14:40 apache-hive-1.2.1-bin/lib/mysql-connector-java-5.1.36-bin.jar
- Copy hive configuration file from template as follows:
[anishsneh@server01 installs]$ cd $HIVE_HOME/conf [anishsneh@server01 conf]$ cp hive-default.xml.template hive-site.xml
- Edit newly created/copied hive-site.xml with following properties:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property>
<property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property>
<property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> <description>Username to use against metastore database</description> </property>
<property> <name>javax.jdo.option.ConnectionPassword</name> <value>Welcome1hive</value> <description>password to use against metastore database</description> </property>
<property> <name>hive.metastore.warehouse.dir</name> <value>/data/hive/warehouse</value> <description>location of default database for the warehouse</description> </property>
<property> <name>hive.exec.local.scratchdir</name> <value>/tmp/${system:user.name}</value> <description>Local scratch space for Hive jobs</description> </property>
<property> <name>hive.downloaded.resources.dir</name> <value>/tmp/${hive.session.id}_resources</value> <description>Temporary local directory for added resources in the remote file system.</description> </property>
<property> <name>hive.server2.logging.operation.log.location</name> <value>/tmp/${system:user.name}/operation_logs</value> <description>Top level directory where operation logs are stored if logging functionality is enabled</description> </property>
<property> <name>hive.querylog.location</name> <value>/tmp/${system:user.name}</value> <description>Location of Hive run time structured log file</description> </property>
- Connect Hive
- Connect to hive as follows:
[anishsneh@server01 installs]$ cd $HIVE_HOME [anishsneh@server01 apache-hive-1.2.1-bin]$ ./bin/hive 15/09/27 15:13:43 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 15/09/27 15:13:43 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/09/27 15:13:43 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 15/09/27 15:13:43 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 15/09/27 15:13:43 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 15/09/27 15:13:43 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 15/09/27 15:13:43 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 15/09/27 15:13:43 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Logging initialized using configuration in jar:file:/home/anishsneh/installs/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/anishsneh/installs/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. hive> show tables; OK Time taken: 1.359 seconds
- Verify Metastore
- After successful connection on another console connect to mysql as follows:
[anishsneh@server01 ~]$ mysql -uhiveuser -pWelcome1hive Warning: Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 35 Server version: 5.6.17 MySQL Community Server (GPL) Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use hive Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> show tables; +---------------------------+ | Tables_in_hive | +---------------------------+ | BUCKETING_COLS | | CDS | | COLUMNS_V2 | | DATABASE_PARAMS | | DBS | | FUNCS | | FUNC_RU | | GLOBAL_PRIVS | | PARTITIONS | | PARTITION_KEYS | | PARTITION_KEY_VALS | | PARTITION_PARAMS | | PART_COL_STATS | | ROLES | | SDS | | SD_PARAMS | | SEQUENCE_TABLE | | SERDES | | SERDE_PARAMS | | SKEWED_COL_NAMES | | SKEWED_COL_VALUE_LOC_MAP | | SKEWED_STRING_LIST | | SKEWED_STRING_LIST_VALUES | | SKEWED_VALUES | | SORT_COLS | | TABLE_PARAMS | | TAB_COL_STATS | | TBLS | | VERSION | +---------------------------+ 29 rows in set (0.00 sec)
- Create database/tables
Connect to Hive console:
[anishsneh@server01 apache-hive-1.2.1-bin]$ cd $HIVE_HOME [anishsneh@server01 apache-hive-1.2.1-bin]$ ./bin/hive
Create database in Hive:hive> CREATE DATABASE user_db; OK Time taken: 1.139 seconds
Create Hive table:hive> USE user_db; OK Time taken: 0.044 seconds hive> CREATE TABLE IF NOT EXISTS demo_users (userid String, login String, name String) COMMENT 'User details' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE; OK Time taken: 0.384 seconds
File to be uploaded to demo_users table:anishsneh@server01 tmp]$ cat user_data.txt u00001 anishsneh Anish u00002 yogesh Yogesh u00003 bally Balwinder u00004 rakesh Rakesh u00005 manoj Manoj
Load data to Hive table:hive> LOAD DATA LOCAL INPATH '/tmp/user_data.txt' into table demo_users; Loading data to table user_db.demo_users Table user_db.demo_users stats: [numFiles=1, totalSize=108] OK Time taken: 0.663 seconds
View loaded table data:hive> SELECT * FROM user_db.demo_users; OK u00001 anishsneh Anish u00002 yogesh Yogesh u00003 bally Balwinder u00004 rakesh Rakesh u00005 manoj Manoj Time taken: 0.112 seconds, Fetched: 5 row(s)
- Verify created database/tables on HDFS
[anishsneh@server01 tmp]$ hadoop fs -cat hdfs://server01:9000/user/hive/warehouse/demo_db.db/demo_users/user_data.txt Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/anishsneh/installs/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. 15/09/27 17:08:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable u00001 anishsneh Anish u00002 yogesh Yogesh u00003 bally Balwinder u00004 rakesh Rakesh u00005 manoj ManojIn next post we will learn about various input and output formats used in Hive.