About the program

In this course, you will learn following stuff:

  • Explain Big Data and the ETL process
  • Understand Mapreduce and Hadoop ecosystem
  • Understand a source-to-target mapping documents
  • Apply different testing strategies, methods with varying tools
  • Rich enough to find mechanisms to test any business rule
  • Determine appropriate sample sizes and data permutations
  • Understand and apply automation testing for big data process.

Audience:

Testers, Developers, Managers or anyone who is interested or has involved to provide software quality for Big Data process. IT needs basic RDBMS and SQL knowledge; for that you may find a supportive workshop from our portal.

Contact US

Call us : +1-800-543-5571
Mail us : training@infotek-solutions.com

Share this course

Course curriculum

  1. Introduction
    1. Data concept
    2. Data formats
      1. Plain text, XML, JSON
      2. CSV, Tabular,RDBMS
  2. ETL Concept
    1. Extract
    2. Load
    3. Transfer
  3. Big Data Concept/pipeline/architecture
    1. Data source
    2. MapReduce
    3. Data Warehouse
    4. Data Mart
    5. BI
  4. MapReduce Concept
    1. Mapping data to key, value pair
    2. Reducing the mapped data
  5. Hadoop MapReduce Architecture
    1. Master Node
      1. Clustering data to nodes
      2. Creating jobs and distributing tasks to nodes
      3. Tracking Jobs through nodes
    2. Slave nodes
      1. Tracking its task
      2. Mapping data
      3. Reducing data with other nodes
  6. Terminologies
    1. payLoad, Mapper, Job, Task
    2. NameNode, DataNode, MasterNode,
    3. JobTracker, TaskTracker, Task Attempt
  7. Roles of business intelligence
    1. BI program manager
    2. BI data architect, BI ETL architect, BI technical architect
    3. BI metadata manager, BI administrator
  8. Business Intelligence VS Business Analytics
  9. Big Data and Data Mining
  10. Transactional vs. Analytical Databases vs. Big Data Stores
    1. Properties
    2. Infrastructure
    3. Data
    4. Validation tools
  11. Big data scenario needs different tools to:
    1. Manage distributed/clustered file system
    2. Manage nodes and the network
    3. Script for different jobs and tasks
    4. Track jobs running on each node
    5. Store mapped and reduced data
    6. Improve performance and reliability
    7. Present data in graphs, tables, ...
  12. A complete Hadoop Ecosystem
    1. Ambari, HDFS, HBase, Hiv
    2. Sqoop, Pig, ZooKeeper
    3. NoSQL:MongoDB, Redis, CouchDB, Riak
    4. Mahout, Lucene/Solr, Avro, Oozie
    5. GIS tools for Hadoop on Github
    6. Flume, SQL on hadoop, Clouds
    7. Spark: is like hadoop but works on memory
  13. Processing Big Data using Hadoop
  14. Big data testing cycle
    1. Understand the business
    2. Understand the Big data process
    3. Prepare test plan
    4. Define test Scenario
    5. Write test cases
    6. Set test environment
    7. Execute test cases and generate defects
    8. Report and close test cycle
  15. Test Planning
    1. Test List
    2. Resource Estimation
    3. Prioritizing
    4. Scheduling
    5. Defect workflow
  16. Big data testing points
    1. Validate input data
    2. Validate staging process
    3. Validate mapping process
    4. Validate reducing process
    5. Validate output data
  17. Architecture testing for
    1. Performance improvement
    2. Failover avoidance
  18. Performance testing
    1. Points
      1. Throughput test
      2. CPU, Memory, Network, Storage IO test
      3. Job compilation time test
      4. Component test
    2. Performance testing approach
      1. Setup bid big data
      2. Design work load
      3. Prepare nodes
      4. Execute, analyse and tune components
      5. Optimum the configuration
    3. Performance testing parameters
      1. Data storage
      2. Commit Logs
      3. Concurrency threads
      4. Catching
      5. Tmeouts
      6. JVM
      7. Mapreduce performance
      8. Messaging queue
    4. Eyes on bottlenecks during performance test
  19. ETL testing points
    1. Validate input data
    2. Validate extraction process
    3. Validate transformation process
    4. Validate loading process
    5. Validate output data
  20. Testing techniques
    1. Visual Compare
    2. Record Counts
    3. Minus Queries
    4. Automation test
  21. Basic test methods
    1. Visual test
    2. Source to Target record comparison
    3. Source to Target count
    4. Minus Queries
  22. Automation for Big data testing
  23. Testing of load incrementals
  24. Database level test
    1. Schema test
    2. Transpose test
    3. Integrity test
    4. Views test
  25. Table level test
    1. Row count test
    2. Column count test
    3. Row duplicate and null value test
    4. Columns data type test
    5. Table merging and splitting test
    6. Table constraints test
  26. Column level test
    1. Merging and splitting test
    2. Calculated and derived test
    3. Transpose test
    4. Column constraints test
  27. Record level test
    1. Atomic test
    2. Visual test
    3. Data format test
    4. Record constrains test
    5. Transpose test
    6. Datacasting and rounding test
    7. Filed merging and splitting test
  28. Test Strategy
    1. Data Permutations
    2. Test Data Sampling
    3. Test Points
    4. Leveraging Test Tools
  29. Test environment needs
    1. Storage space
    2. Distributed clusters
    3. Minimum CPU, Memory utilization
  30. Big data Bugs
    1. Missing Data
    2. Truncation
    3. Type Mismatch
    4. Null Translation
    5. Misplaced Data
    6. Miscalculation
    7. Extra records
    8. Logic Issues
    9. Duplicate Records
    10. Precision
    11. Sequence
    12. Rejected Rows
    13. Undocumented Requirements
  31. Tools for big data testing
  32. QuerySurge for big data testing

Meet your mentor

About Trainer Rahul:

  • 7 years of IT experience in software testing, quality assurance and quality management.
  • Experienced in leading and managing medium to large testing teams.
  • Have extensively trained participants in the areas of Software Testing Concepts, Quality Assurance, Quality Center, QTP, LoadRunner, Bugzilla, JIRA and Selenium.
  • Testing process owner at the organization I have worked.
  • Have mentored resources and helped set a career path and achieve testing certifications.