About the program
In this course, you will learn following stuff:
- Explain Big Data and the ETL process
- Understand Mapreduce and Hadoop ecosystem
- Understand a source-to-target mapping documents
- Apply different testing strategies, methods with varying tools
- Rich enough to find mechanisms to test any business rule
- Determine appropriate sample sizes and data permutations
- Understand and apply automation testing for big data process.
Audience:
Testers, Developers, Managers or anyone who is interested or has involved to provide software quality for Big Data process. IT needs basic RDBMS and SQL knowledge; for that you may find a supportive workshop from our portal.
Contact US
Call us : +1-800-543-5571
Mail us : training@infotek-solutions.com
Course curriculum
- Introduction
- Data concept
- Data formats
- Plain text, XML, JSON
- CSV, Tabular,RDBMS
- ETL Concept
- Extract
- Load
- Transfer
- Big Data Concept/pipeline/architecture
- Data source
- MapReduce
- Data Warehouse
- Data Mart
- BI
- MapReduce Concept
- Mapping data to key, value pair
- Reducing the mapped data
- Hadoop MapReduce Architecture
- Master Node
- Clustering data to nodes
- Creating jobs and distributing tasks to nodes
- Tracking Jobs through nodes
- Slave nodes
- Tracking its task
- Mapping data
- Reducing data with other nodes
- Master Node
- Terminologies
- payLoad, Mapper, Job, Task
- NameNode, DataNode, MasterNode,
- JobTracker, TaskTracker, Task Attempt
- Roles of business intelligence
- BI program manager
- BI data architect, BI ETL architect, BI technical architect
- BI metadata manager, BI administrator
- Business Intelligence VS Business Analytics
- Big Data and Data Mining
- Transactional vs. Analytical Databases vs. Big Data Stores
- Properties
- Infrastructure
- Data
- Validation tools
- Big data scenario needs different tools to:
- Manage distributed/clustered file system
- Manage nodes and the network
- Script for different jobs and tasks
- Track jobs running on each node
- Store mapped and reduced data
- Improve performance and reliability
- Present data in graphs, tables, ...
- A complete Hadoop Ecosystem
- Ambari, HDFS, HBase, Hiv
- Sqoop, Pig, ZooKeeper
- NoSQL:MongoDB, Redis, CouchDB, Riak
- Mahout, Lucene/Solr, Avro, Oozie
- GIS tools for Hadoop on Github
- Flume, SQL on hadoop, Clouds
- Spark: is like hadoop but works on memory
- Processing Big Data using Hadoop
- Big data testing cycle
- Understand the business
- Understand the Big data process
- Prepare test plan
- Define test Scenario
- Write test cases
- Set test environment
- Execute test cases and generate defects
- Report and close test cycle
- Test Planning
- Test List
- Resource Estimation
- Prioritizing
- Scheduling
- Defect workflow
- Big data testing points
- Validate input data
- Validate staging process
- Validate mapping process
- Validate reducing process
- Validate output data
- Architecture testing for
- Performance improvement
- Failover avoidance
- Performance testing
- Points
- Throughput test
- CPU, Memory, Network, Storage IO test
- Job compilation time test
- Component test
- Performance testing approach
- Setup bid big data
- Design work load
- Prepare nodes
- Execute, analyse and tune components
- Optimum the configuration
- Performance testing parameters
- Data storage
- Commit Logs
- Concurrency threads
- Catching
- Tmeouts
- JVM
- Mapreduce performance
- Messaging queue
- Eyes on bottlenecks during performance test
- Points
- ETL testing points
- Validate input data
- Validate extraction process
- Validate transformation process
- Validate loading process
- Validate output data
- Testing techniques
- Visual Compare
- Record Counts
- Minus Queries
- Automation test
- Basic test methods
- Visual test
- Source to Target record comparison
- Source to Target count
- Minus Queries
- Automation for Big data testing
- Testing of load incrementals
- Database level test
- Schema test
- Transpose test
- Integrity test
- Views test
- Table level test
- Row count test
- Column count test
- Row duplicate and null value test
- Columns data type test
- Table merging and splitting test
- Table constraints test
- Column level test
- Merging and splitting test
- Calculated and derived test
- Transpose test
- Column constraints test
- Record level test
- Atomic test
- Visual test
- Data format test
- Record constrains test
- Transpose test
- Datacasting and rounding test
- Filed merging and splitting test
- Test Strategy
- Data Permutations
- Test Data Sampling
- Test Points
- Leveraging Test Tools
- Test environment needs
- Storage space
- Distributed clusters
- Minimum CPU, Memory utilization
- Big data Bugs
- Missing Data
- Truncation
- Type Mismatch
- Null Translation
- Misplaced Data
- Miscalculation
- Extra records
- Logic Issues
- Duplicate Records
- Precision
- Sequence
- Rejected Rows
- Undocumented Requirements
- Tools for big data testing
- QuerySurge for big data testing
Meet your mentor
About Trainer Rahul:
- 7 years of IT experience in software testing, quality assurance and quality management.
- Experienced in leading and managing medium to large testing teams.
- Have extensively trained participants in the areas of Software Testing Concepts, Quality Assurance, Quality Center, QTP, LoadRunner, Bugzilla, JIRA and Selenium.
- Testing process owner at the organization I have worked.
- Have mentored resources and helped set a career path and achieve testing certifications.
