BigData & Analytics

Why BigData Processing ?

Globally accessible internet-scale applications gather and produce large-amount of data, that can not be effectively processed using traditional model of data-storage and computation. Alternatively approaches are required to handle this deluge of data. Modern big-data storage and processing software achieve this by using storage, distribution, and processing model that scale horizontally using commodity hardware in a reliable way. required to in useful time and in a reliable way.

Why Advanced Analytics ?

Advanced analytics and machine-learning algorithms reinforce the power of BigData processing, by being able to infer patterns and provide actionable insight on response to the data. Knowledge of the science and technology required to benefit from this data is a distinguishing feature of cutting-edge enterprises.

Technologies of Choice

Hadoop is the de-facto standard open-source solution for BigData processing. Inspired in Google's research and internally used architectures, Hadoop allies the power of a massively scalable and reliable distributed file-systems – HDFS, with a cluster management and task scheduler – YARN, and a distributed data processing model – Map–Reduce.

Spark provide an alternative model of BigData distributed computation through a functional programming model on immutable data-streams (RDDs). More expressive and efficient than Map-Reduce, it is a highly popular approach for BigData Processing.

HAWQ/HDB provides a high-performance PostGres SQL engine on top of Hadoop/HDFS, that brings the convenience of OLAP warehouses and analytics databases to Hadoop world. Beats Hadoop native SQL engine – Hive – by a considerable margin.

BigData Processing

Hadoop Infrastructure – HDFS & YARN

Hadoop/HDFS is the de-facto standard solution for BigData ingestion and storage. Many of the platforms used for BigData processing and analytics rely on HDFS for storage, including: Hadoop's native Map-Reduce, Pig, and Hive, and also Spark, and HAWQ/HDB.

Hadoop/YARN is the standard cluster resource manager for Hadoop. It allows jobs to be concurrently scheduled and executed for multiple BigData processing framework, in a resource efficient and fair way.

Map-Reduce – BigData Processing

Map-Reduce framework, and the higher-level languages of Pig and Hive, define powerful abstractions to define BigData progressing jobs that are executed in a massively parallel way.

Consulting Services

EInnovator can help you identify business use cases that benefit from BigData processing, which data to gather and use, and how to implement data processing jobs using Hadoop frameworks.

Consulting Services

EInnovator can help your organization defining and setting up Hadoop clusters, using our favourite Hadoop distribution – including Apache Hadoop, Pivotal's PHD, among other, and dimensioning and configuring the system to provide the desired QoS.

Spark & MLLib

Spark provide a very flexible functional programming model for BigData processing. Advanced analytics also be performed within the Spark ecosystem using MLLib machine-learning library.

Use Cases:

  • BigData Processing with Functional Programming Model
  • Batch Processing
  • Processing of Event Streams
  • Machine Learning Algorithms on BigData
  • Analysis of user interaction patterns in websites
  • Sentiment analytics from social media data, workflows
  • Process analysis from email messages
  • Fraud-detection
  • Credit simulations

Consulting Services

EInnovator EInnovator can help you identify business use cases that benefit from distributed processing architecture, and programming model of Spark, and how to solve and deploy advanced analytics problems using MLLib.

HAWQ/HDB & MadLib

HAWQ/HDB provides familiar SQL interface to BigData processing, while levering on the massive storage capacity of HDFS and optimized query plans for distributed execution, bringing high performance and interactive usage in BigData processing tasks. Advanced analytics also be performed within the HAWQ/HDB ecosystem using MadLib machine-learning library.

Use Cases:

  • BigData Processing with Standard SQL
  • Batch Processing
  • Analysis of user interaction patterns in websites
  • Machine Learning Algorithms on BigData
  • Sentiment analytics from social media data, workflows
  • Process analysis from email messages
  • Fraud-detection
  • Credit simulations

Consulting Services

EInnovator can help adapt your organization intrinsic skills in SQL to a BigData context, and explore the ways by which distributed BigData processing can be used to improved business processes and solutions. Advanced modelling and predictive analytics using Madlib can also be to bring new insights from data.