Hadoop was a real solution for companies, observing to store and manage huge volumes of data. Though, investigating that data for comprehensions showed to be a problematic finest left to talented data professional, leaving data analysts in the shady. Two Facebook data experts shaped Apache “Hive” in 2008. Based on the detail that SQL is a comprehensively used and commonly assumed language among data professional, Hive was intended to mechanically interpret SQL-like explorations into MapReduce jobs on Hadoop—to create a language called HiveQL.
Hive is intended to allow easy data summarization, ad-hoc enquiring and examination of large capacities of data. It delivers SQL which lets users to do ad-hoc querying, design, summarization and data analysis effortlessly. Hive’s SQL gives users multiple seats to mix their own functionality to do custom inspection, such as User Defined Functions (UDFs).
Types of tables
Hive has two types of tables:
Managed table is also called as Internal table. This is the default table in Hive. When we create a table in Hive without specifying it as external, by default we will get a Managed table.
External table is created for external use as when the data is used outside Hive. Whenever we want to delete the table’s meta data and we want to keep the table’s data as it is, we use External table. External table only deletes the schema of the table.
Components of Hive
• User Interface (UI)
It bring user interface to have an interface between user and hive. It lets user to submit queries to the system. Hive web UI, Hive command line, and Hive HD Insight are reinforced.
Queries of the user after the interface are established by the driver within the Hive. Concept of session grips is applied by driver. Execution and fetching of APIs established on JDBC/ODBC interfaces is delivered by the user.
Queries are parses, semantic inspection on the diverse query blocks and query expression is complete by the compiler. Execution plan with the assistance of the table in the database and partition metadata observed from the metastore are made by the compiler ultimately.
All the organized data or info of the dissimilar tables and partition in the warehouse covering attributes and attributes level information are kept in the metastore.
• Execution Engine
Execution of the execution plan completed by the compiler is done in the execution engine. The plan is a DAG of stages.