Designing, operating and managing an Enterprise Data Lake
Most organizations today are dealing with multiple silos of information. These include cloud and on-premises based transaction processing systems, multiple data warehouses, data marts, reference data management (RDM) systems, master data management (MDM) systems, content management (ECM) systems and, more recently, Big Data NoSQL platforms such as Hadoop and other NoSQL databases. In addition, the number of data sources is increasing dramatically, especially from outside the enterprise.
Given this situation it is not surprising that many companies have ended up managing information in silos with different tools being used to prepare and manage data across these systems with varying degrees of governance. In addition, it is not only IT that is now integrating data. Business users are also getting involved with new self-service data preparation tools. The question is, is this the only way to manage data? Is there another level that we can reach to allow us to govern data across an increasingly complex data landscape? Also, is there a strategy for unified data delivery that can speed up development and shorten time to value?
This 2-day course looks at the challenges faced by companies trying to deal with an exploding number of data sources, collecting data in multiple data stores (cloud and on-premises), multiple analytical systems and at the requirements to be able to define, govern, manage, unify and share trusted high-quality data products in a distributed and hybrid computing environment. It also explores a new approach of how organize a logical data lake to get control of your data and how IT data architects, business users and IT developers can collaborate to build ready-made trusted data products. This includes data ingestion, automated data discovery, data profiling and tagging and publishing data in an information catalog.
It also involves refining raw data to produce trusted ‘data products’ available as a service that can be published in a data marketplace (catalog) available for consumption across your company. We also introduce multiple data lake configurations including a centralized data lake and a ‘logical’ distributed data lake as well as execution of jobs and governance across multiple data stores. It emphasizes the need for a common collaborative approach to governing and managing data of all types.
You will learn:
- How to define a strategy for producing trusted data as-a-service in a distributed environment of multiple data stores and data sources
- How to organize data in a centralised or distributed data environment to overcome complexity and chaos
- How to design, build, manage and operate a logical or centralised data lake within their organisation
- The critical importance of an information catalog in understanding what data is available as a service
- How data standardisation and business glossaries can help make sure data is understood
- An operating model for effective distributed information governance
- What technologies and implementation methodologies they need to get their data under control and produce ready-made trusted data products
- How to apply methodologies to get master and reference data, big data, data warehouse data and unstructured data under control irrespective of whether it be on-premises or in the cloud
Who should attend
IT directors, CIO’s, Chief Data Officers, IT Managers, BI Managers, BI Professionals, Data Warehousing Professionals, Data Integration Developers, Master Data Management Professionals, Big Data Professionals, Data Scientists, Enterprise Architects, Data Architects, Database Administrators, Compliance Managers who are responsible for data management (including metadata management, data integration, data quality, master data management and enterprise content management) and Business Data Analysts doing self-service data integration.
This course assumes that you have an understanding of basic data management principles as well as a high level of understanding of the concepts of data migration, data replication, metadata, data warehousing, data modelling, data cleansing, etc.
Link to course:
Visa alla event