Hurry Up!!      Call Now... +919405101166      Admission Open      Best Training Institute in Pune (Digital Marketing Training, Python, JAVA, Dot Net, Testing)          Free Seminar of Digital Marketing For More Details Call Now On +919405101166

Best Big Data Hadoop Training Institute

  • Home
  • Best Big Data Hadoop Training Institute

Big Data Hadoop Training in Pune by Industry Experts

Online / Offline Weekdays / Weekend Batches Availble
Duration of Training: 6 weekends

Bigdata Hadoop Syllabus
    For whom Hadoop is?

    IT folks who want to change their profile in a most demanding technology which is in demand by almost all clients in all domains because of below mentioned reasons

  • Hadoop is open source (Cost saving / Cheaper) Hadoop solves Big Data problem which is very difficult or impossible to solve using highly paid tools in market
  • It can process Distributed data and no need to store entire data in centralized storage as it is there with other tools.
  • Now a days there is job cut in market in so many existing tools and technologies because clients are moving towards a cheaper and efficient solution in market named HADOOP
  • There will be almost 4.4 million jobs in market on Hadoop by next year.
    Why Hadoop?
  • Solution for BigData Problem
  • Open Source Technology
  • Based on open source platforms
  • Contains several tool for entire ETL data processing Framework
  • It can process Distributed data and no need to store entire data in centralized storage as it is required for SQL based tools.
    • Distributed computing
    • Data management – Industry Challenges
    • Overview of Big Data
    • Characteristics of Big Data
    • Types of data
    • Sources of Big Data
    • Big Data examples
    • What is streaming data?
    • Batch vs Streaming data processing
    • Overview of Analytics
    • Big data Hadoop opportunities
  • Why we need Hadoop
  • Data centers and Hadoop Cluster overview
  • Overview of Hadoop Daemons
  • Hadoop Cluster and Racks
  • Learning Linux required for Hadoop
  • Hadoop ecosystem tools overview
  • Understanding the Hadoop configurations and Installation.
  • HDFS
  • HDFS Daemons – Namenode, Datanode, Secondary Namenode
  • Hadoop FS and Processing Environment’s UIs
  • Fault Tolerant
  • High Availability
  • Block Replication
  • How to read and write files
  • Hadoop FS shell commands
  • YARN
  • YARN Daemons – Resource Manager, NodeManager etc.
  • Job assignment & Execution flow
  • The introduction of MapReduce.
  • MapReduce Architecture
  • Data flow in MapReduce
  • Understand Difference Between Block and InputSplit
  • Role of RecordReader
  • Basic Configuration of MapReduce
  • MapReduce life cycle
  • How MapReduce Works
  • Writing and Executing the Basic MapReduce Program using Java
  • Submission & Initialization of MapReduce Job.
  • File Input/Output Formats in MapReduce Jobs
  • Text Input Format
  • Key Value Input Format
  • Sequence File Input Format
  • NLine Input Format
  • Joins
  • Map-side Joins
  • Reducer-side Joins
  • Word Count Example(or) Election Vote Count
  • Will cover five to Ten Map Reduce Examples with real time data.
  • Data warehouse basics
  • OLTP vs OLAP Concepts
  • Hive
  • Hive Architecture
  • Metastore DB and Metastore Service
  • Hive Query Language (HQL)
  • Managed and External Tables
  • Partitioning & Bucketing
  • Query Optimization
  • Hiveserver2 (Thrift server)
  • JDBC , ODBC connection to Hive
  • Hive Transactions
  • Hive UDFs
  • Working with Avro Schema and AVRO file format
  • Hands on Multiple Real Time datasets.
  • Apache Pig
  • Advantage of Pig over MapReduce
  • Pig Latin (Scripting language for Pig)
  • Schema and Schema-less data in Pig
  • Structured , Semi-Structure data processing in Pig
  • Pig UDFs
  • HCatalog
  • Pig vs Hive Use case
  • Hands On Two more examples daily use case data analysis in google. And Analysis on Date time dataset
  • Introduction to HBASE
  • Basic Configurations of HBASE
  • Fundamentals of HBase
  • What is NoSQL?
  • HBase Data Model
  • Table and Row.
  • Column Family and Column Qualifier.
  • Cell and its Versioning
  • Categories of NoSQL Data Bases
  • Key-Value Database
  • Document Database
  • Column Family Database
  • HBASE Architecture
  • HMaster
  • Region Servers
  • Regions
  • MemStore
  • Store
  • SQL vs. NOSQL
  • How HBASE is differed from RDBMS
  • HDFS vs. HBase
  • Client-side buffering or bulk uploads
  • HBase Designing Tables
  • HBase Operations
  • Get
  • Scan
  • Put
  • Delete
  • Live Dataset
  • Sqoop
  • Sqoop commands
  • Sqoop practical implementation
  • Importing data to HDFS
  • Importing data to Hive
  • Exporting data to RDBMS
  • Sqoop connectors
  • Flume
  • Flume commands
  • Configuration of Source, Channel and Sink
  • Fan-out flume agents
  • How to load data in Hadoop that is coming from web server or other storage
  • How to load streaming data from Twitter data in HDFS using Hadoop
  • Oozie
  • Action Node and Control Flow node
  • Designing workflow jobs
  • How to schedule jobs using Oozie
  • How to schedule jobs which are time based
  • Oozie Conf file
  • Scala
  • Syntax formation, Datatypes , Variables
  • Classes and Objects
  • Basic Types and Operations
  • Functional Objects
  • Built-in Control Structures
  • Functions and Closures
  • Composition and Inheritance
  • Scala’s Hierarchy
  • Traits
  • Packages and Imports
  • Working with Lists, Collections
  • Abstract Members
  • Implicit Conversions and Parameters
  • For Expressions Revisited
  • The Scala Collections API
  • Extractors
  • Modular Programming Using Objects
  • Spark
  • Architecture and Spark APIs
  • Spark components
  • Spark master
  • Driver
  • Executor
  • Worker
  • Significance of Spark context
  • Concept of Resilient distributed datasets (RDDs)
  • Properties of RDD
  • Creating RDDs
  • Transformations in RDD
  • Actions in RDD
  • Saving data through RDD
  • Key-value pair RDD
  • Invoking Spark shell
  • Loading a file in shell
  • Performing some basic operations on files in Spark shell
  • Spark application overview
  • Job scheduling process
  • DAG scheduler
  • RDD graph and lineage
  • Life cycle of spark application
  • How to choose between the different persistence levels for caching RDDs
  • Submit in cluster mode
  • Web UI – application monitoring
  • Important spark configuration properties
  • Spark SQL overview
  • Spark SQL demo
  • SchemaRDD and data frames
  • Joining, Filtering and Sorting Dataset
  • Spark SQL example program demo and code walk through
  • What is Kafka
  • Cluster architecture With Hands On
  • Basic operation
  • Integration with spark
  • Integration with Camel
  • Additional Configuration
  • Security and Authentication
  • Apache Kafka With Spring Boot Integration
  • Running
  • Usecase
  • Introduction & Installing Splunk
  • Play with Data and Feed the Data
  • Searching & Reporting
  • Visualizing Your Data
  • Advanced Splunk Concepts
  • Introduction of NoSQL
  • What is NOSQL & N0-SQL Data Types
  • System Setup Process
  • MongoDB Introduction
  • MongoDB Installation
  • DataBase Creation in MongoDB
  • ACID and CAP Theorum
  • What is JSON and what all are JSON Features?
  • JSON and XML Difference
  • CRUD Operations – Create , Read, Update, Delete
  • Cassandra Introduction
  • Cassandra – Different Data Supports
  • Cassandra – Architecture in Detail
  • Cassandra’s SPOF & Replication Factor
  • Cassandra – Installation & Different Data Types
  • Database Creation in Cassandra
  • Tables Creation in Cassandra
  • Cassandra Database and Table Schema and Data
  • Update, Delete, Insert Data in Cassandra Table
  • Insert Data From File in Cassandra Table
  • Add & Delete Columns in Cassandra Table
  • Cassandra Collections
REGISTER NOW

IT Training