BigData
Hawq
2 | 3 days
Events Book On-Site
Pivotal
EInnovator - Core Spring Training
EInnovator - HAWQ Training

HAWQ Course

Course Overview

In this course, delegates learn how use the high-performance Hadoop database HAWQ/HDB to run to run SQL queries on BigData datasets.

Course topics include how to install and configure HAWQ/HDB, the architecture of HAWQ/HDB and its relation to Hadoop/HDFS, how to use the PSQL command-line tool, how to use the DDL to create tables and views, how write SQL queries, how to import and export data into/from HAWQ/HDB, the data storage strategies in HAWQ/HDD, data distribution and partitioning, query analysis and optimization.

Course Format and Modes of Delivery

  • Four days of instructor-led training
  • 50% lecture, 50% hands-on lab
  • Corporate On-Site
  • Public

Target Audience

  • Data Engineers and Data Scientists
  • BigData and IOT Software Developers and Architects
  • BigData and IOT application developers and architects
  • Hadoop and Database Administrators

Prerequisites

  • Familiarity with SQL and DB concepts useful.

Datasheet

Course Objectives

  • Learn how to install and configure to configure HAWQ/HDB nodes and clusters
  • Understand the architecture of HAWQ/HDB
  • Learn how to use the PSQL command-line tool
  • Learn how to use the Postgres/ HAWQ/ HDB DDL to create tables and views
  • Learn how write simple and advanced SQL queries in HAWQ
  • Learn how to import and export data into/from HAWQ/HDB
  • Learn different strategies for storage, distribution, and partitioning of data
  • Understand query plans and how to optimize query execution

Course Modules

  • HAWQ Features and Use Cases
  • HAWQ Installation and Configuration
  • HAWQ Architecture
  • Starting and Managing HAWQ cluster
  • The psql command-line tool
  • Creating Databases and Schemas
  • Creating Tables and Views
  • Basic SQL Queries
  • SQL Clauses and Queries
  • Data Types
  • Built-in Functions
  • Joins
  • Aggregation Functions
  • Windowing Functions
  • Arrays
  • User Defined Functions
  • User Defined Aggregates
  • Loading Data Sets from Files
  • Exporting Data Sets
  • Parallel Load with gpdist
  • Parallel Load with gpload tool
  • Defining External Tables
  • Accessing Data in local files with PXF
  • Accessing Data in HDFS with PXF
  • Extending PXF
  • Parquet Data Format
  • Distribution of data in Tables
  • Partitioning of data in Tables
  • Simple Query Plans
  • Query Plans with Data-Migration
  • Optimizing Queries
  • JAVA/JDBC HAWQ Driver
  • Using Spring JDBCTemplate with HAWQ
  • Spring JDBCTemplate API
  • Installing MadLib
  • Regression Models with MadLib
  • Bayesian Inference with MadLib
  • Clustering Algorithms with MadLib
  • Classification with Decision Trees and Random Forests in MadLib
  • Generating PMML Models
  • Hadoop Ecosystem
  • HDFS Architecture
  • HDFS Java API
  • HDFS Administration
  • IoT, BigData, and the Lambda Architecture
  • Spring XD Architecture and Shell
  • Spring XD – HAWQ Integration
  • Loading PMML Model in Spring XD