Data

Data Engineering

data engineering

Duration

2 Days

Languages

French - English

Trainer(s)

Thibault PERIER - Lead Data Engineer Astrakhan

In this program, we will discuss Data Engineering, a recent field related to the rapid increase in the amount of data collected in recent years. It covers a wide range of skills, including data architecture, data storage, data processing, and even general knowledge of data science.

This training will aim to present the concepts related to Data Engineering, the knowledge and skills of the Data Engineer through the presentation of some concrete examples using the tools or programming languages most used for each concept.

Compared to the “Data Architect” training, the “Data Engineering” training goes deeper into the detail of the technologies that cover the entire processing chain.

Target Audience

  • DBA
  • Data Architects
  • Data Scientists

Prerequisites

  • Knowledge in Data processing or on a specific subject (Analysis, Architecture, Storage…)

Course Delivery

On site,
in your offices

Remote,
via Teams

Workshops

Training Program

Introduction to Data Engineering

How did we get to Data Engineering ?

  • Relational data era & traditional Business Intelligence
  • Big Data, Hadoop & new storage technologies
  • Big Data & new integration methods
  • Infrastructures & industrialization (intro)

The place of Data Engineering

  • Data value chain
  • Roles around the data value chain
  • Data Engineering vs Data Scientist
  • Data Science & Machine Learning

The Data Engineer

  • Basic computer science knowledge required
  • Skills in programming languages for each part of the data processing chain
  • Knowledge of databases

Data Storage

  • Relational databases and SQL (MySQL, Oracle, SQL Server, PostgreSQL), 
  • NoSQL (MongoDB, Elasticsearch…)
  • Hadoop (HDFS)

Data Processing

  • What is it?
  • Types of processing
  • Example with Pyspark (Spark with Python)

Workflow Construction

  • Workflow scheduling
  • Data pipelines monitoring
  • Example of workflow scheduling with Apache Airflow

Infrastructure as code

  • Infrastructures & industrialization
  • Containers (Docker)
  • Containers orchestration (Kubernetes)
  • Infrastructure provisioning (Terraform, Amazon Cloud formation, Azure Resource Manager)