In previous post we learnt about setting up and runnning Hive on our distributed Hadoop cluster. In this post we will learn about various Hive input and output formats.
Key Formats
- TEXTFILE
- AVRO
- RCFILE
- SEQUENCEFILE
- PARQUET
Usage | Hands On
- TEXTFILE
- Separated readable text file e.g. text file with tab or comma separated fields. This is the default format for Hive (depending on hive.default.fileformat configuration).
- Syntax: STORED AS TEXTFILE
- Usually human readable text.
- CREATE TABLE
hive> USE user_db; OK Time taken: 0.044 seconds hive> CREATE TABLE IF NOT EXISTS users_txt (uid String, login String, full_name String, email String, country String) COMMENT 'User details' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE; OK Time taken: 0.384 seconds
- LOAD DATA
hive> LOAD DATA LOCAL INPATH '/tmp/users.csv' OVERWRITE INTO TABLE user_db.users_txt; Loading data to table user_db.users_txt Table user_db.users_txt stats: [numFiles=1, numRows=0, totalSize=7860, rawDataSize=0] OK Time taken: 1.138 seconds
- READ RECORDS
hive> SELECT * FROM users_txt LIMIT 10; OK 755777ae-3d5f-415e-ac33-5d24db748e09 rjones0 Randy rjones0@archive.org RU a4dae376-970e-4548-908e-cbe6bff88550 mmitchell1 Martin mhamilton1@stumbleupon.com FI f4781787-c731-4db6-add2-13ab91de22a0 pharvey2 Peter pkim2@com.com FR d35df636-a7c8-4c50-aa57-e99db4cbdb1a gjames3 Gary gtorres3@bbb.org LT d26c04a3-ca28-4d2e-84cf-0104ad2acb92 rburton4 Russell rwest4@youtube.com YE 6a487cfb-5177-4cc2-bdbd-4bc4751b9592 pharris5 Patrick ptaylor5@cnn.com NO 3671d7f7-2a75-41dc-be84-609106e5bdfa kcrawford6 Keith ksmith6@weibo.com PT beae01c4-3ee6-4c59-b0d6-60c5811367f2 jedwards7 Juan joliver7@fc2.com PH 899dc8a4-5a8f-44cf-ac23-ae8c3729836c slynch8 Samuel smcdonald8@princeton.edu VN f274e93d-378c-4377-a9c7-7c235a36b72a mgray9 Martin mrodriguez9@constantcontact.com IE Time taken: 0.696 seconds, Fetched: 10 row(s)