• Software Training and Placement Center

Bigdata Hadoop , Spark full stack specialization program

  • Overview
  • Course Highlights
  • Pre-requisites and Eligibility
  • Syllabus
  • Audience for this course
  • Batches
  • Mode of Training
  • Big Data Certification
  • Key Features

Overview

Hadoop is an open-source software by apache to store & process Big Data. Hadoop stores a large volume of Data in a distributed & fault-tolerant manner over commodity hardware. Afterward, Hadoop tools are used to perform parallel data processing over Hadoop Distributed File System.
Every large company has realized the benefits of Big Data Analytics, so there is a huge demand for Big Data professionals. Major Companies are looking for Big data Hadoop Spark experts with the knowledge of Hadoop Eco System components about HDFS, MapReduce, Spark, HBase, Hive, Pig, Oozie, Sqoop, Kafka & Flume.

The Big Data Hadoop Training offers:

  • Comprehensive knowledge of various components that fall in Hadoop Ecosystem like HDFS, Mapreduce, Hive, impala, Sqoop, Flume, Kafka, NIFI, Spark, Oozie, and HBase
  • The capability to ingest structured, semi-structured, and unstructured data in HDFS using Sqoop & Flume, and analyze those large datasets stored in the HDFS.

Course Highlights

  • Introduction to Big Data & Hadoop Fundamentals
  • Overview of Big Data & HDFS Concepts along with Linux Commands
  • Linux Scripting in detail
  • Understanding of hadoop eco system architecture and components
  • Introduction to MapReduce, Yarn , Hive, Hbase, Sqoop, Flume, Kafka
  • Deep dive in hive
  • Apache Sqoop - Moving Data into Hadoop (Vice-Versa)
  • Apache Hive Basics
  • Advance Hive Advance & Components - Part 1
  • Advance Hive Advance & Components - Part 2
  • Hue, Ambari, Cloudera Manager
  • Intro to NOSQL – Hbase , Phoenix
  • Oozie (Workflow Orchestration)
  • Learning Python
  • Learning Scala - A Guide to Functional Programming
  • Learning Apache Spark - General Purpose Cluster Computing Framework
  • Apache Spark usecases in depth
  • Spark streaming and advanced streaming
  • Learning Apache Kafka , NIFI
  • Bigdata on cloud
  • CI/CD Pipeline (GitHub, Maven, & Jenkins)
  • Real Time Projects on Bigdata

Pre-requisites and Eligibility

There are no pre-requisites for Big Data Hadoop and spark course. However, a basic understanding of computer science technicalities and Basic knowledge of Linux commands and SQL will be helpful but is not mandatory. Don't worry, We will cover Linux commands/scripting and SQL in detail in our course.

Syllabus

  • Topic 1: Overview of Big Data & HDFS Concepts along with Linux Commands
    • Bigdata Intro
    • HDFS Architecture
    • Hadoop Setup
    • Linux commands
    • HDFS commands
    • Quiz
    • Assessment
  • Topic 2: Linux Scripting in detail
    • History
    • Architecture
    • Development Commands
    • Env Variables
    • File Management
    • Directories Management
    • Admin Commands
    • Advance Commands
    • Shell Scripting
    • Groups and User managements
    • Permissions
    • Important directory structure
    • Disk utilities
    • Compression Techniques
    • Misc Commands
    • Kernel, Shell
    • Terminal, SSH, GUI
    • Automation & Scripting in Linux
    • Hands On Exercises
    • Quiz
    • Assessment
  • Topic 3: Deep Dive in Hadoop
    • What is Hadoop?
    • Evolution of Hadoop
    • Features of Hadoop
    • Characteristics of Hadoop
    • Hadoop compared with Traditional Dist. Systems
    • When to use Hadoop
    • Limitations of Hadoop
    • Components of Hadoop (HDFS, MapReduce, YARN)
    • Hadoop Architecture
    • Daemons in Hadoop Version 1 & 2
    • How Data is stored in Hadoop
    • Cluster, Datacenter, Spilt, Block,
    • Rack Awareness, Replication, Heartbeat)
    • Hadoop 1.0 Limitation
    • NameNode High Availability
    • Quiz
    • Assessment
  • Topic 4: MapReduce - Distributed Computing Framework
    • MapReduce Intro
    • MapReducte - Theory / Depth
    • MapReduce Programming concept
    • MapReduce Practicals
    • Different types of files supported
    • (Text, Sequence, map and Avro)
    • MapReduce Job submission in YARN Cluster in details
    • Tweaking mappers and reducers
    • Mapreduce package and deployment
    • Quiz
    • Assessment
  • Topic 5: Apache Sqoop - Moving Data into Hadoop (Vice-Versa)
    • Sqoop Fundamentals
    • Sqoop Excersises
    • Sqoop Export
    • Sqoop Incremental
    • Sqoop Job
    • Sqoop Merge
    • Best practices & performance tuning
    • Sqoop Use cases
    • Quiz
    • Assessment
  • Topic 6: Apache Hive Basics
    • Introduction to Apache Hive
    • Understanding Apache Hive
    • Hive Practical
    • Hive to know more
    • Quiz
  • Topic 7: Advance Hive Advance & Components - Part 1
    • Hive Definition Level Optimizations : Theory
    • Hive Definition Level Optimizations : Practical
    • Hive Query Level Optimizations : Theory
    • Hive Query Level Optimizations : Practical
    • Hive Windowing Functions
    • Hive Ranking
    • Hive Sorting
    • Quiz
    • Assessment
  • Topic 8: Advance Hive Advance & Components - Part 2
    • Hive File Format
    • Hive File Format - Practicals
    • Hive Compression Techniques
    • Hive Vectorization & Changing the Hive Engine
    • Hive Thrift Server
    • Hive MSCK Repair
    • Hive Miscellaneous
    • Hive Optimization Techniques REWIND
    • Hive SCD
    • Hive Sqoop, HBase Integration
    • Hive Schema evolution (AVSC) use cases using AVRO dataset
    • Quiz
    • Assessment
  • Topic 9: Hue, Ambari, Cloudera Manager
    • Introduction
    • Cluster formation guide and implementation
    • Deployment in Cloud
    • Full Visibility into Cluster Health
    • Metrics & Dashboards
    • Heat Maps
    • Configurations
    • Services, Alerts, Admin activities
    • Provisioning, Managing and Monitoring Hadoop Clusters
    • Hue Introduction
    • Access Hive
    • Query executor
    • Data browser
    • Access Hive, HCatalog, Oozie, File Browser
    • Hortonworks/Cloudera
    • Cluster Design
    • Different nodes (Gateway, Ingestion, Edge)
    • System consideration
    • Commands (fsck, job, dfsadmin, distcp, balancer)
    • Schedulers in RM (Capacity, Fair, FIFO)
    • View all services in Ambari & Cloudera Manager
  • Topic 10: Intro to NOSQL - HBase
    • HBase Basics
    • CAP Theorem
    • Hbase Architecture
    • Hbase Practicals
    • Storage Hierarchy – Characteristics
    • Table Design
    • HMaster & Regions
    • Region Server & Zookeeper
    • Inside Region Server (Memstore, Blockcache, HFile, WAL)
    • Minor/Major Compactions
    • Role of Zookeeper
    • HBase Shell
    • Introduction to Filters
    • Row Key Design
    • Performance Tuning
    • Cassandra Overview
    • Integration with Hive
    • Integration with Hadoop (Mini Project)"
    • Quiz
    • Assessment
  • Topic 11: Phoenix
    • Overview of Phoenix
    • Introduction
    • Architecture
    • History
    • Phoenix Hbase Integration
    • Hbase table, view creation
    • SQL & UDFs
    • SQL Line & PSQL Line of Phoenix
    • Phoenix Load & Query engine
    • Understanding co processor Configurations
    • Hive -> Hbase -> Phoenix integration
    • Creation of views in phoenix
    • Load bulk data using psql
    • Serverlog Aggregation usecase
  • Topic 12: Oozie (Workflow Orchestration)
    • Introduction
    • History - Why Oozie
    • Components
    • Architecture
    • Workflow Engine
    • Nodes
    • Workflow
    • Coordinator
    • Action (MapReduce, Hive, Spark, Shell & Sqoop)
    • Introduction to Bundle
    • Email Notification
    • Error Handling
    • Installation
    • Workouts
    • Orchestration of end to end tools
    • Scheduling of data pipeline
    • Invoking shell script, Sqoop, Hive & Spark
  • Topic 13: Python
    • Python Introduction
    • Evolution
    • Application
    • Features
    • Installation & Configuration
    • Objectives
    • Flow Control
    • Variables
    • Data types
    • Functions
    • Modules
    • OOPS
    • Python for Spark
    • Structures
    • Collection types
    • Looping Constructs
    • Dictionary & Tuples
    • File I/O
  • Topic 14: Learning Scala - A Guide to Functional Programming
    • Scala and Spark Setup
    • Scala Basics
    • Scala Functional Programming
    • Scala Object Oriented Sessions
    • Quiz
    • Assessment
  • Topic 15: Apache Spark - General Purpose Cluster Computing Framework
    • Scala Interview Prep Series
    • Spark Fundamental Theory
    • Spark Fundamental Practical
    • Quiz
    • Assessment
  • Topic 16: YARN
    • Introduction to YARN
    • YARN Architecture
    • YARN Components
    • YARN Longlived & Shortlived Daemons
    • YARN Schedulers
    • Job Submission under YARN
    • Multi tenancy support of YARN
    • YARN High Avalability
    • YARN Fault tolerance handling
    • MapReduce job submission using YARN
    • YARN UI
    • History Server
    • YARN Dynamic allocation
    • Containerization of YARN
    • Quiz
    • Assessment
  • Topic 17: Apache Spark Use Cases in Depth
    • Spark Real-Time Examples
    • Spark Shared Variables
    • YARN Rewind
    • Spark on YARN Architecture
    • Spark in depth
    • Quiz
    • Assessment
  • Topic 18: Spark Structured API - Part 1
    • Spark in depth continued
    • Spark DataFrames, DataSets
    • Quiz
    • Assessment
  • Topic 19: Spark Structured API - Part 2
    • Spark in depth continued
    • Quiz
    • Assessment
  • Topic 20: Spark Performance Tuning - Part 1
    • Spark Performance Tuning
    • Quiz
    • Assessment
  • Topic 21: Spark Performance Tuning - Part 2
    • Spark Broadcast Join With Low level API (RDD)
    • Spark Broadcast Join With Structured API (DataFrames)
    • Spark Submit: Client Mode vs Cluster Mode
    • Spark Join Optimizations
    • Spark Advance Optimization: Sort Aggregate vs Hash Aggregate
    • Spark Catalyst, Tungsten, AST Optimizer
    • Spark Connecting to External Source
    • Quiz
  • Topic 22: Spark Streaming
    • Spark Real Time Processing
    • Understanding Discretized Stream (DStream) in Spark Streaming
    • Stream Processing in Spark - Word Count Example
    • Understanding Stateless and Stateful Transformations in Spark Streaming
    • Stateless Transformation - Word Count Exampel Using Eclipse IDE
    • Stateful Transformation - Word Count Exampe
    • Working with Sliding Windows
    • Quiz
  • Topic 23: Spark Advance Streaming - Structured
    • Spark Structured Streaming - Part1
    • Spark Structured Streaming - Part2
    • Spark Structured Streaming - Part3
    • Quiz
  • Topic 24: Apache Kafka
    • Kafka Introduction
    • Applications, Cluster Setup
    • Broker fault tolerance
    • Architecture
    • Components
    • Partitions & Replication
    • Distribution of messages
    • Producer & Consumer workload distribution
    • Topics management
    • Brokers
    • Installation
    • Workouts
    • Console Publishing
    • Console Consuming
    • Topic options
    • Offset Management
    • Cluster deployment in cloud
  • Topic 25: NIFI
    • Nifi Introduction
    • Core Components
    • Architecture
    • Nifi Installation & Configuration
    • Fault tolerance
    • Data Provenance Routing, mediation, transformation & routing
    • Nifi -> Kafka -> Spark integration
    • Workouts
    • Scheduling
    • Real time streaming
    • Kafka producer & consumer
    • File streaming with HDFS integration
    • Data provenenance
    • Packaging NIFI templates
    • Rest API Integration
    • Twitter data capture
    • Quiz
    • Assessment
  • Topic 26: Big Data on Cloud Part 1 (AWS S3, EMR, Athena+Glue)
    • Introduction to Cloud Computing And Running Spark Code on AWS EMR
    • Fundamentals of AWS for Bigdata Developer
    • AWS Storage, Networking & CLI
    • AWS EMR: Launch a EMR Cluster Using Advanced Options
    • AWS Athena Session-1
    • AWS Athena Session-2
    • "AWS Athena with Glue Session-3"
    • Quiz
  • Topic 27: Big Data on Cloud Part 2 (Redshift, Glue, Airflow)
    • Database vs Datawarehouse vs Data lake
    • AWS Redshift Sessions
    • AWS Glue
    • Apache Airflow
    • "Apache Airflow - Workflow Management Platform"
    • Airflow Fundamentals Sessions
    • Airflow Practical Pipeline Sessions
  • Topic 28: CI/CD Pipeline (GitHub, Maven, & Jenkins)
    • DevOps Basics
    • Versioning
    • Create and use a repository
    • Start and manage a new branch
    • Make changes to a file and push them to GitHub as commits
    • Open and merge a pull request
    • Create Story boards
    • Desktop integration
    • Maven integration with Git
    • Create project in Maven
    • Add scala nature
    • Maven operations
    • Adding and updating POM
    • Managing dependencies with the maven repository
    • Building and installing maven
    • Maven fat & lean jar build with submit
  • Topic 29: Real Time Project classes
    Real Time Projects on Bigdata which are diverse in nature covering various data sets from multiple domains such as banking, Healthcare, telecommunication, social media, insurance, and e-commerce.
  • Topic 30: Real Time Projects on Bigdata
    • Project Statement
    • Dataset
    • Architectural Diagram and Solution
    • Task segregation
    • Project Demo Session
    • Code and snippets
    • Documentation
  • Topic 31: Interview Questions
  • Topic 32: Mock Interview and Mock Interview Answers
  • Topic 33: Resume Prep session and sample resumes

Audience for this course:

Administrators, Java Developers, Data analysts and database administrators, Database Administrator, System architects, IT managers, IT administrators and operators, IT systems engineers, data engineers and database administrators, data analytics administrators, cloud, web engineers, Project Managers, Software Architects, ETL and Data Warehousing Professionals, Data Engineers, Data Analysts & Business Intelligence Professionals, DBAs and DB professionals, Senior IT Professionals, Testing professionals, Mainframe professionals, Graduates looking to build a career in Big Data Field.

Mode of Training

  • Classroom Training
  • Online Instructor-Led Live Training
  • Online Video Recorded Sessions Training

Week days batch

  • Class Room Training @ Anna nagar & OMR
  • Online Instructor LED Training for Other Locations
  • Online Vedio Recorderd Training sessions for Other Locations

Week end batch

  • Class Room Training @ Anna nagar & OMR
  • Online Instructor LED Training for Other Locations
  • Online Vedio Recorderd Training sessionsOther Locations

Fast track Batch

  • Class Room Training @ Anna nagar & OMR
  • Online Instructor LED Training for Other Locations
  • Online Vedio Recorderd Training sessions

Big Data Certification

Hiring companies are looking for certified Big Data Hadoop professionals. Our Big Data & Hadoop Certification-oriented Training helps you to grab this opportunity and accelerate your career. we offer Hadoop online professional certification Guidance.
Get Certified By Cloudera CCP Data Engineer Exam (DE575), CCA SPARK AND HADOOP DEVELOPER (CCA 175), Databricks Certified Associate Developer for Apache Spark 3.0, Databricks Certified Professional Data Scientist, HDP Certified Developer (HDPCD)

Key features

  • Real-Time Projects on Bigdata
  • 200+ Hours Course Duration
  • 100% Placement guarantee
  • Job Oriented Training
  • Hands-on with 30+ Case Studies
  • Fast track placement mode
  • Managed and mastered by highly skilled Industry Experts
  • both Classroom Training and Online Training
  • Complete Career Guidance
  • Placement in Top MNC Company
  • Online Professional Certification: Cloudera / Hortonworks / Databricks
  • Support 24/7 * 365 days
Call us
Call us