Data Vault: Building a Scalable Data Warehouse Training Course
Data Vault Modeling is a database modeling technique that provides long-term historical storage of data that originates from multiple sources. A data vault stores a single version of the facts, or "all the data, all the time". Its flexible, scalable, consistent and adaptable design encompasses the best aspects of 3rd normal form (3NF) and star schema.
In this instructor-led, live training, participants will learn how to build a Data Vault.
By the end of this training, participants will be able to:
- Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
- Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse.
- Develop a consistent and repeatable ETL (Extract, Transform, Load) process.
- Build and deploy highly scalable and repeatable warehouses.
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Course Outline
Introduction
- The shortcomings of existing data warehouse data modeling architectures
- Benefits of Data Vault modeling
Overview of Data Vault architecture and design principles
- SEI / CMM / Compliance
Data Vault applications
- Dynamic Data Warehousing
- Exploration Warehousing
- In-Database Data Mining
- Rapid Linking of External Information
Data Vault components
- Hubs, Links, Satellites
Building a Data Vault
Modeling Hubs, Links and Satellites
Data Vault reference rules
How components interact with each other
Modeling and populating a Data Vault
Converting 3NF OLTP to a Data Vault Enterprise Data Warehouse (EDW)
Understanding load dates, end-dates, and join operations
Business keys, relationships, link tables and join techniques
Query techniques
Load processing and query processing
Overview of Matrix Methodology
Getting data into data entities
Loading Hub Entities
Loading Link Entities
Loading Satellites
Using SEI/CMM Level 5 templates to obtain repeatable, reliable, and quantifiable results
Developing a consistent and repeatable ETL (Extract, Transform, Load) process
Building and deploying highly scalable and repeatable warehouses
Closing remarks
Requirements
- An understanding of data warehousing concepts
- An understanding of database and data modeling concepts
Audience
- Data modelers
- Data warehousing specialist
- Business Intelligence specialists
- Data engineers
- Database administrators
Open Training Courses require 5+ participants.
Data Vault: Building a Scalable Data Warehouse Training Course - Booking
Data Vault: Building a Scalable Data Warehouse Training Course - Enquiry
Data Vault: Building a Scalable Data Warehouse - Consultancy Enquiry
Testimonials (1)
how the trainor shows his knowledge in the subject he's teachign
john ernesto ii fernandez - Philippine AXA Life Insurance Corporation
Course - Data Vault: Building a Scalable Data Warehouse
Upcoming Courses
Related Courses
Knowledge Discovery in Databases (KDD)
21 HoursKnowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing.
In this instructor-led, live course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes.
Audience
- Data analysts or anyone interested in learning how to interpret data to solve problems
Format of the Course
- After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations.
Statistics with SPSS Predictive Analytics Software
14 HoursGoal:
Learning to work with SPSS at the level of independence
The addressees:
Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and learn popular data mining techniques.
Data Mining
21 HoursCourse can be provided with any tools, including free open-source data mining software and applications
From Data to Decision with Big Data and Predictive Analytics
21 HoursAudience
If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you.
It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing.
It is not aimed at people configuring the solution, those people will benefit from the big picture though.
Delivery Mode
During the course delegates will be presented with working examples of mostly open source technologies.
Short lectures will be followed by presentation and simple exercises by the participants
Content and Software used
All software used is updated each time the course is run, so we check the newest versions possible.
It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning.
Data Mining with R
14 HoursR is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Oracle SQL Intermediate - Data Extraction
14 HoursThe objective of the course is to enable participants to gain a mastery of how to work with the SQL language in Oracle database for data extraction at intermediate level.
Data Mining and Analysis
28 HoursObjective:
Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results.
Introductory R for Biologists
28 HoursR is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or fine-tune financial models. R has a wide variety of packages for data mining.
Data Mining & Machine Learning with R
14 HoursR is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
Data Visualization
28 HoursThis course is intended for engineers and decision makers working in data mining and knoweldge discovery.
You will learn how to create effective plots and ways to present and represent your data in a way that will appeal to the decision makers and help them to understand hidden information.
Data Science for Big Data Analytics
35 HoursBig data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.
Process Mining
21 HoursProcess mining, or Automated Business Process Discovery (ABPD), is a technique that applies algorithms to event logs for the purpose of analyzing business processes. Process mining goes beyond data storage and data analysis; it bridges data with processes and provides insights into the trends and patterns that affect process efficiency.
Format of the Course
- The course starts with an overview of the most commonly used techniques for process mining. We discuss the various process discovery algorithms and tools used for discovering and modeling processes based on raw event data. Real-life case studies are examined and data sets are analyzed using the ProM open-source framework.
MATLAB for Financial Applications
21 HoursMATLAB is a numerical computing environment and programming language developed by MathWorks.
MonetDB
28 HoursMonetDB is an open-source database that pioneered the column-store technology approach.
In this instructor-led, live training, participants will learn how to use MonetDB and how to get the most value out of it.
By the end of this training, participants will be able to:
- Understand MonetDB and its features
- Install and get started with MonetDB
- Explore and perform different functions and tasks in MonetDB
- Accelerate the delivery of their project by maximizing MonetDB capabilities
Audience
- Developers
- Technical experts
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Foundation R
7 HoursThe objective of the course is to enable participants to gain a mastery of the fundamentals of R and how to work with data.