Best Big Data Classes in Wakad Pune | Big Data Training Institute

Big Data Hadoop Training in Pune by Industry Experts

Online / Offline Weekdays / Weekend Batches Availble
Duration of Training: 6 weekends

Bigdata Hadoop Syllabus

For whom Hadoop is?

IT folks who want to change their profile in a most demanding technology which is in demand by almost all clients in all domains because of below mentioned reasons

Hadoop is open source (Cost saving / Cheaper) Hadoop solves Big Data problem which is very difficult or impossible to solve using highly paid tools in market
It can process Distributed data and no need to store entire data in centralized storage as it is there with other tools.
Now a days there is job cut in market in so many existing tools and technologies because clients are moving towards a cheaper and efficient solution in market named HADOOP
There will be almost 4.4 million jobs in market on Hadoop by next year.

Why Hadoop?

Solution for BigData Problem
Open Source Technology
Based on open source platforms
Contains several tool for entire ETL data processing Framework
It can process Distributed data and no need to store entire data in centralized storage as it is required for SQL based tools.

Introduction and Getting Started Big data

Distributed computing
Data management – Industry Challenges
Overview of Big Data
Characteristics of Big Data
Types of data
Sources of Big Data
Big Data examples
What is streaming data?
Batch vs Streaming data processing
Overview of Analytics
Big data Hadoop opportunities

Hadoop

Why we need Hadoop
Data centers and Hadoop Cluster overview
Overview of Hadoop Daemons
Hadoop Cluster and Racks
Learning Linux required for Hadoop
Hadoop ecosystem tools overview
Understanding the Hadoop configurations and Installation.

HDFS (Storage)

HDFS
HDFS Daemons – Namenode, Datanode, Secondary Namenode
Hadoop FS and Processing Environment’s UIs
Fault Tolerant
High Availability
Block Replication
How to read and write files
Hadoop FS shell commands

YARN (Hadoop Processing Framework)

YARN
YARN Daemons – Resource Manager, NodeManager etc.
Job assignment & Execution flow

MapReduce using Java (Processing Data)

The introduction of MapReduce.
MapReduce Architecture
Data flow in MapReduce
Understand Difference Between Block and InputSplit
Role of RecordReader
Basic Configuration of MapReduce
MapReduce life cycle
How MapReduce Works
Writing and Executing the Basic MapReduce Program using Java
Submission & Initialization of MapReduce Job.
File Input/Output Formats in MapReduce Jobs
Text Input Format
Key Value Input Format
Sequence File Input Format
NLine Input Format
Joins
Map-side Joins
Reducer-side Joins
Word Count Example(or) Election Vote Count
Will cover five to Ten Map Reduce Examples with real time data.

Apache Hive

Data warehouse basics
OLTP vs OLAP Concepts
Hive
Hive Architecture
Metastore DB and Metastore Service
Hive Query Language (HQL)
Managed and External Tables
Partitioning & Bucketing
Query Optimization
Hiveserver2 (Thrift server)
JDBC , ODBC connection to Hive
Hive Transactions
Hive UDFs
Working with Avro Schema and AVRO file format
Hands on Multiple Real Time datasets.

Apache Pig

Apache Pig
Advantage of Pig over MapReduce
Pig Latin (Scripting language for Pig)
Schema and Schema-less data in Pig
Structured , Semi-Structure data processing in Pig
Pig UDFs
HCatalog
Pig vs Hive Use case
Hands On Two more examples daily use case data analysis in google. And Analysis on Date time dataset