*Friday CLOSED

Timings 10.00 am - 08.00 pm

Call : 021-3455-6664, 0312-216-9325 DHA 021-35344-600, 03333808376, ISB 03333808376

Hive Job Interview Questions and Answers Karachi Pakistan Dubai

Hive Job Interview  Q&A

Top Hive Interview Questions – Most Asked

We are offering a list of industry-designed Apache Hive interview questions to help you ace your Hive job interview. In this interview questions list, you will learn what a Hive variable is, Hive table types, adding nodes in Hive, concatenation function in Hive, changing column data type, Hive query processor components, and Hive bucketing. Learn Big Data Hadoop from Omni Academy’s Hadoop Training and fast-track your career!

Top Answers to Hive Interview Questions

1. Differentiate between Pig and Hive.

CriteriaApache PigApache Hive
NatureProcedural data flow languageDeclarative SQL-like language
ApplicationUsed for programmingUsed for report creation
Used byResearchers and programmersMainly Data Analysts
Operates onThe client-side of a clusterThe server-side of a cluster
Accessing raw dataNot as fast as HiveQLFaster with in-built features
Schema or data typeAlways defined in the script itselfStored in the local database
Ease of learningTakes little extra time and effort to masterEasy to learn from database experts

2. What is the definition of Hive? What is the present version of Hive? Explain ACID transactions in Hive.

Hive is an open-source data warehouse system. We can use Hive for analyzing and querying large datasets. It’s similar to SQL. The present version of Hive is 0.13.1. Hive supports ACID (Atomicity, Consistency, Isolation, and Durability) transactions. ACID transactions are provided at row levels. Following are the options Hive uses to support ACID transactions:

  • Insert
  • Delete
  • Update

3. What is a Hive variable? What do we use it for?

Hive variables are basically created in the Hive environment that is referenced by Hive scripting languages. They allow to pass some values to a Hive query when the query starts executing. They use the source command.

4. What kind of data warehouse application is suitable for Hive? What are the types of tables in Hive?

Hive is not considered a full database. The design rules and regulations of Hadoop and HDFS have put restrictions on what Hive can do. However, Hive is most suitable for data warehouse applications because it:

  • Analyzes relatively static data
  • Has less responsive time
  • Does not make rapid changes in data

Although Hive doesn’t provide fundamental features required for Online Transaction Processing (OLTP), it is suitable for data warehouse applications in large datasets. There are two types of tables in Hive:

  • Managed tables
  • External tables

5. Can we change the settings within a Hive session? If yes, how?

Yes, we can change the settings within a Hive session using the SET command. It helps change the Hive job settings for an exact query. For example, the following command shows that buckets are occupied according to the table definition:

hive> SET hive.enforce.bucketing=true;

We can see the current value of any property by using SET with the property name. SET will list all the properties with their values set by Hive.

hive> SET hive.enforce.bucketing;

hive.enforce.bucketing=true

This list will not include the defaults of Hadoop. So, we should use the below code:

SET -v

It will list all the properties including the Hadoop defaults in the system.

6. Is it possible to add 100 nodes when we already have 100 nodes in Hive? If yes, how?

Yes, we can add the nodes by following the below steps:

Step 1: Take a new system; create a new username and password
Step 2: Install SSH and with the master node setup SSH connections
Step 3: Add ssh public_rsa id key to the authorized keys file
Step 4: Add the new DataNode hostname, IP address, and other details in /etc/hosts slaves file:

192.168.1.102 slave3.in slave3

Step 5: Start the DataNode on a new node
Step 6: Login to the new node like suhadoop or:

ssh -X hadoop@192.168.1.103

Step 7: Start HDFS of the newly added slave node by using the following command:

./bin/hadoop-daemon.sh start data node

Step 8: Check the output of the jps command on the new node

7. Explain the concatenation function in Hive with an example.

The concatenate function will join the input strings. We can specify
‘n’ number of strings separated by a comma.

Example:

CONCAT ('Omni Academy ','-','is','-','a','-','eLearning',’-’,’provider’);

Output:

 Omni Academy -is-a-eLearning-provider

Every time, we set the limits of the strings by ‘-‘. If it is common for every string, then Hive provides another command:

CONCAT_WS

In this case, we have to specify the set limits of the operator first as follows:

CONCAT_WS ('-',’Omni Academy’,’is’,’a’,’eLearning’,‘provider’);

Output:

Omni Academy-is-a-eLearning-provider

8. Explain the Trim and Reverse functions in Hive with examples.

  • The trim function will delete the spaces associated with a string.

Example:

TRIM(‘ OMNI ACADEMY ‘);

Output:

 OMNI ACADEMY 

To remove the leading space:

LTRIM(‘ OMNI ACADEMY ’);

To remove the trailing space:

RTRIM(‘ OMNI ACADEMY  ‘);
  • In the reverse function, characters are reversed in the string.

Example:

REVERSE(‘OMNI ACADEMY’);

Output:

TAAPILLETNI

9. How to change the column data type in Hive? Explain RLIKE in Hive.

We can change the column data type by using ALTER and CHANGE as follows:

ALTER TABLE table_name CHANGE column_namecolumn_namenew_datatype;

For example, if we want to change the data type of the salary column from integer to bigint in the employee table, we can use the following:

ALTER TABLE employee CHANGE salary salary BIGINT;

RLIKE: Its full form is Right-Like and it is a special function in Hive. It helps examine two substrings, i.e., if the substring of A matches with B, then it evaluates to true.

Example:

‘Omni Academy’ RLIKE ‘tell’  True
‘Omni Academy’ RLIKE ‘^I.*’  True (this is a regular expression)

Learn more about Apache Hive from this detailed blog post now!

10. What are the components used in Hive Query Processor?

Following are the components of a Hive Query Processor:

  • Parse and Semantic Analysis (ql/parse)
  • Metadata Layer (ql/metadata)
  • Type Interfaces (ql/typeinfo)
  • Sessions (ql/session)
  • Map/Reduce Execution Engine (ql/exec)
  • Plan Components (ql/plan)
  • Hive Function Framework (ql/udf)
  • Tools (ql/tools)
  • Optimizer (ql/optimizer)

11. What are Buckets in Hive?

Buckets in Hive are used in segregating Hive table data into multiple files or directories. They are used for efficient querying.

12. Explain the process to access subdirectories recursively in Hive queries.

By using the below commands, we can access subdirectories recursively in Hive:

hive> Set mapred.input.dir.recursive=true;
hive> Set hive.mapred.supports.subdirectories=true;

Hive tables can be pointed to the higher level directory, and this is suitable for the directory structure like:

/data/country/state/city/

13. How to skip header rows from a table in Hive?

Imagine that header records in a table are as follows:

System=…
Version=…
Sub-version=…

Suppose, we do not want to include the above three lines of headers in our Hive query. To skip the header lines from our table in Hive, we will set a table property.

CREATE EXTERNAL TABLE employee (
name STRING,
job STRING,
dob STRING,
id INT,
salary INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ STORED AS TEXTFILE
LOCATION ‘/user/data’
TBLPROPERTIES("skip.header.line.count"="2”);

14. What is the maximum size of a string data type supported by Hive? Explain how Hive supports binary formats.

The maximum size of a string data type supported by Hive is 2 GB. Hive supports the text file format by default, and it also supports the binary format sequence files, ORC files, Avro data files, and Parquet files.

  • Sequence file: It is a splittable, compressible, and row-oriented file with a general binary format.
  • ORC file: Optimized row columnar (ORC) format file is a record-columnar and column-oriented storage file. It divides the table in row split. Each split stores the value of the first row in the first column and follows subsequently.
  • Avro data file: It is the same as a sequence file that is splittable, compressible, and row-oriented but without the support of schema evolution and multilingual binding.
  • Parquet file: In Parquet format, along with storing rows of data adjacent to one another, we can also store column values adjacent to each other such that both horizontally and vertically datasets are partitioned.

15. What is the precedence order of Hive configuration?

We are using a precedence hierarchy for setting properties:

  1. The SET command in Hive
  2. The command-line –hiveconf option
  3. Hive-site.XML
  4. Hive-default.xml
  5. Hadoop-site.xml
  6. Hadoop-default.xml

16. If you run a select * query in Hive, why doesn’t it run MapReduce?

The hive.fetch.task.conversion property of Hive lowers the latency of MapReduce overhead, and in effect when executing queries such as SELECT, FILTER, LIMIT, etc. it skips the MapReduce function.

17. How can we improve the performance with ORC format tables in Hive?

We can store Hive data in a highly efficient manner in an Optimized Row Columnar (ORC) file format. It can simplify many Hive file format limitations. We can improve the performance by using ORC files while reading, writing, and processing data.

Set hive.compute.query.using.stats-true;
Set hive.stats.dbclass-fs;
CREATE TABLE orc_table (
idint,
name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\:’
LINES TERMINATED BY ‘\n’
STORES AS ORC;

18. Explain the functionality of ObjectInspector.

ObjectInspector helps analyze the internal structure of a row object and the individual structure of columns in Hive. It also provides a uniform way to access complex objects that can be stored in multiple formats in the memory.

  • An instance of Java class
  • A standard Java object
  • A lazily initialized object

ObjectInspector tells the structure of the object and also the ways to access the internal fields inside the object.

19. Whenever we run a Hive query, a new metastore_db is created. Why?

A local metastore is created when we run Hive in an embedded mode. Before creating, it checks whether the metastore exists or not, and this metastore property is defined in the configuration file, hive-site.xml. The property is:

javax.jdo.option.ConnectionURL

with the default value:

jdbc:derby:;databaseName=metastore_db;create=true

Therefore, we have to change the behavior of the location to an absolute path so that from that location the metastore can be used.

20. Differentiate between Hive and HBase.

HiveHBase
Enables most SQL queriesDoes not allow SQL queries
Operations do not run in real timeOperations run in real time
A data warehouse frameworkA NoSQL database
Runs on top of MapReduceRuns on top of HDFS

21. How can we access the subdirectories recursively?

By using the below commands, we can access subdirectories recursively in Hive:

hive> Set mapred.input.dir.recursive=true;
hive> Set hive.mapred.supports.subdirectories=true;

Hive tables can be pointed to the higher level directory, and this is suitable for the directory structure:

/data/country/state/city/

22. What are the uses of Hive Explode?

Hadoop Developers consider an array as their input and convert it into a separate table row. To convert complicated data types into desired table formats, Hive uses Explode.

23. What is the available mechanism for connecting applications when we run Hive as a server?

  • Thrift Client: Using Thrift, we can call Hive commands from various programming languages, such as C++, PHP, Java, Python, and Ruby.
  • JDBC Driver: JDBC Driver enables accessing data with JDBC support, by translating calls from an application into SQL and passing the SQL queries to the Hive engine.
  • ODBC Driver: It implements the ODBC API standard for the Hive DBMS, enabling ODBC-compliant applications to interact seamlessly with Hive.

24. How do we write our own custom SerDe?

Mostly, end-users prefer writing a Deserializer instead of using SerDe as they want to read their own data format instead of writing to it, e.g., RegexDeserializer deserializes data with the help of the configuration parameter ‘regex’ and with a list of column names.

If our SerDe supports DDL (i.e., SerDe with parameterized columns and column types), we will probably implement a protocol based on DynamicSerDe, instead of writing a SerDe. This is because the framework passes DDL to SerDe through the ‘Thrift DDL’ format and it’s totally unnecessary to write a “Thrift DDL” parser.

25. Mention various date types supported by Hive.

The timestamp data type stores date in the java.sql.timestamp format.

Three collection data types in Hive are:

  • Arrays
  • Maps
  • Structs

26. Can we run UNIX shell commands from Hive? Can Hive queries be executed from script files? If yes, how? Give an example.

Yes, we can run UNIX shell commands from Hive using an ‘!‘ mark before the command. For example, !pwd at Hive prompt will display the current directory.
We can execute Hive queries from the script files using the source command.

Example:

Hive> source /path/to/file/file_with_query.hql

Related Courses 

RPA (Robotic Process Automation)

Machine Learning with 9 Practical Applications

Mastering Python – Machine Learning

Data Sciences with Python Machine Learning 

Data Sciences Specialization
Diploma in Big Data Analytics

Learn Internet of Things (IoT) Programming
Oracle BI – Create Analyses and Dashboards
Microsoft Power BI with Advance Excel

Join FREE – Big Data Workshop 

sharing is caring
Print Friendly, PDF & Email

Leave a Reply


ABOUT US

OMNI ACADEMY & CONSULTING is one of the most prestigious Training & Consulting firm, founded in 2010, under MHSG Consulting Group aim to help our customers in transforming their people and business - be more engage with customers through digital transformation. Helping People to Get Valuable Skills and Get Jobs.

Read More

Contact Us

Get your self enrolled for unlimited learning 1000+ Courses, Corporate Group Training, Instructor led Class-Room and ONLINE learning options. Join Now!
  • Head Office: A-2/3 Westland Trade Centre, Shahra-e-Faisal PECHS Karachi 75350 Pakistan Call 0213-455-6664 WhatsApp 0334-318-2845, 0336-7222-191, +92 312 2169325
  • Gulshan Branch: A-242, Sardar Ali Sabri Rd. Block-2, Gulshan-e-Iqbal, Karachi-75300, Call/WhatsApp 0213-498-6664, 0331-3929-217, 0334-1757-521, 0312-2169325
  • ONLINE INQUIRY: Call/WhatsApp +92 312 2169325, 0334-318-2845, Lahore 0333-3808376, Islamabad 0331-3929217, Saudi Arabia 050 2283468
  • DHA Branch: 14-C, Saher Commercial Area, Phase VII, Defence Housing Authority, Karachi-75500 Pakistan. 0213-5344600, 0337-7222-191, 0333-3808-376
  • info@omni-academy.com
  • FREE Support | WhatsApp/Chat/Call : +92 312 2169325
WORKING HOURS

  • Monday10.00am - 7.00pm
  • Tuesday10.00am - 7.00pm
  • Wednesday10.00am - 7.00pm
  • Thursday10.00am - 7.00pm
  • FridayClosed
  • Saturday10.00am - 7.00pm
  • Sunday10.00am - 7.00pm
Select your currency
PKR Pakistani rupee
WhatsApp Us