Anish Sneh - Open Source: September 2015

In previous post we learnt about setting up and runnning Hive on our distributed Hadoop cluster. In this post we will learn about various Hive input and output formats.

Key Formats

TEXTFILE
AVRO
RCFILE
SEQUENCEFILE
PARQUET

We will use same Hadoop cluster and Hive setup done in previous post.

Usage | Hands On

TEXTFILE

Separated readable text file e.g. text file with tab or comma separated fields. This is the default format for Hive (depending on hive.default.fileformat configuration).
Syntax: STORED AS TEXTFILE
Usually human readable text.

CREATE TABLE

hive> USE user_db;
OK
Time taken: 0.044 seconds
hive> CREATE TABLE IF NOT EXISTS users_txt (uid String, login String, full_name String, email String, country String) COMMENT 'User details' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
OK
Time taken: 0.384 seconds

LOAD DATA

hive> LOAD DATA LOCAL INPATH '/tmp/users.csv' OVERWRITE INTO TABLE user_db.users_txt;
Loading data to table user_db.users_txt
Table user_db.users_txt stats: [numFiles=1, numRows=0, totalSize=7860, rawDataSize=0]
OK
Time taken: 1.138 seconds

READ RECORDS

hive> SELECT * FROM users_txt LIMIT 10;
OK
755777ae-3d5f-415e-ac33-5d24db748e09 rjones0 Randy rjones0@archive.org RU
a4dae376-970e-4548-908e-cbe6bff88550 mmitchell1 Martin mhamilton1@stumbleupon.com FI
f4781787-c731-4db6-add2-13ab91de22a0 pharvey2 Peter pkim2@com.com FR
d35df636-a7c8-4c50-aa57-e99db4cbdb1a gjames3 Gary gtorres3@bbb.org LT
d26c04a3-ca28-4d2e-84cf-0104ad2acb92 rburton4 Russell rwest4@youtube.com YE
6a487cfb-5177-4cc2-bdbd-4bc4751b9592 pharris5 Patrick ptaylor5@cnn.com NO
3671d7f7-2a75-41dc-be84-609106e5bdfa kcrawford6 Keith ksmith6@weibo.com PT
beae01c4-3ee6-4c59-b0d6-60c5811367f2 jedwards7 Juan joliver7@fc2.com PH
899dc8a4-5a8f-44cf-ac23-ae8c3729836c slynch8 Samuel smcdonald8@princeton.edu VN
f274e93d-378c-4377-a9c7-7c235a36b72a mgray9 Martin mrodriguez9@constantcontact.com IE
Time taken: 0.696 seconds, Fetched: 10 row(s)

Pages

Monday, September 28, 2015

Hive | Input & Output Formats

Key Formats

Usage | Hands On