Primeshare Academy - Datadog Certification Program
Datadog Observability
Fundamentals Certification Preparation
This hands-on Primeshare Academy course prepares students to understand enterprise observability and confidently prepare for the Datadog Fundamentals Certification Exam. Students will learn how to monitor infrastructure, applications, networks, logs, databases, cloud services, service maps, and critical dependencies using Datadog.
200
Practice Exam Questions
Course Description
The Primeshare Academy Datadog Certification Program is a practical, lab-driven observability course designed to help students build the knowledge and confidence needed to prepare for the Datadog Fundamentals Certification Exam. The course introduces students to the core principles of observability, including metrics, logs, traces, service dependencies, dashboards, monitors, alerting, incident response, and root cause analysis.
Students will build a sample SaaS company environment using GNS3 and use it throughout the course for hands-on labs. The lab architecture includes web servers, application services, database servers, network devices, simulated cloud services, and critical infrastructure dependencies. Students will learn how to deploy Datadog Agents, collect telemetry, create dashboards, configure alerts, analyze logs, monitor applications, review service maps, and understand dependency maps across the environment.
By the end of this course, students will be able to explain observability concepts, navigate the Datadog platform, monitor infrastructure and applications, troubleshoot incidents, interpret service maps, and complete a full Datadog monitoring implementation project for a simulated SaaS company.
What You Will Be Able to Do After This Course
Observability
Explain metrics, logs, traces, events, service health, MTTR, MTTD, and the role of observability in modern SaaS operations.
Datadog Platform
Navigate Datadog, understand agents, integrations, tags, dashboards, monitors, infrastructure views, logs, APM, and service maps.
GNS3 Lab
Build a simulated SaaS company architecture with web, app, database, network, and cloud-style services for hands-on monitoring labs.
Monitoring Skills
Monitor infrastructure, applications, logs, networks, databases, containers, and cloud services using practical Datadog workflows.
Incident Response
Use Datadog dashboards, alerts, logs, traces, and dependency maps to investigate incidents and perform root cause analysis.
Certification Readiness
Complete section quizzes, hands-on labs, a final capstone project, and a 200-question original practice exam to prepare for certification.
Course Sections
- Introduction to Observability
- Datadog Platform Foundations
- GNS3 SaaS Company Lab Architecture
- Installing and Configuring Datadog Agents
- Infrastructure Monitoring
- Application Performance Monitoring
- Log Management and Analysis
- Network Monitoring
- Database Monitoring
- Cloud and Container Monitoring
- Dashboards, Monitors, Alerting and SLOs
- Service Maps and Dependency Mapping
- Security, Incident Response and AIOps Basics
- Certification Review and Exam Readiness
Hands-On Lab Environment
Students will build and monitor a fictional SaaS company called PrimeRetail Cloud Services. The lab environment includes a customer portal, application server, database server, internal admin service, Linux and Windows hosts, network devices, and simulated cloud services.
This environment will be used throughout the course to practice Datadog Agent installation, infrastructure monitoring, APM, log collection, network checks, database monitoring, dashboards, monitors, service maps, dependency analysis, and incident response.
primeshare.net
Complete all course sections, quizzes, labs, the final project, and the practice exam to finish the Primeshare Academy Datadog Observability and Fundamentals Certification Program.
please provide section 1 Focuses on known problems Helps discover unknown problems Reactive approach Investigative approach Checks status and thresholds Explains behavior and dependencies Often infrastructure focused Infrastructure + application + business focused Limited context Rich contextual analysis please provide the full in-line html for section one. Please also include external sources and video urls as needed I’ll expand Section 1 into a Moodle-ready inline HTML lesson with source links and video references included directly in the content, using plain ASCII characters to avoid encoding issues. Thought for 11s I used official Datadog docs and Google SRE references for the external resources.
Primeshare Academy - Datadog Certification Program
Section 1:
Introduction to Observability
This section introduces students to the core concepts of observability and explains why modern organizations use platforms like Datadog to monitor infrastructure, applications, logs, traces, cloud services, networks, databases, service maps, and critical dependencies.
Section Overview
Observability is the foundation of modern IT operations. In the past, many organizations monitored individual servers, routers, applications, or databases separately. That approach worked when systems were simple. Today, most organizations operate cloud platforms, SaaS applications, APIs, databases, containers, third-party services, and distributed networks. A single customer transaction may pass through many different systems before it is completed.
Observability helps technical teams understand what is happening across this complex environment. It allows teams to collect, connect, analyze, and act on telemetry data such as metrics, logs, traces, events, service health, and dependency relationships.
This section prepares students to think like observability engineers. Students will learn the difference between monitoring and observability, the three pillars of observability, key operational metrics, common blind spots, and how platforms like Datadog help reduce downtime and improve incident response.
Learning Objectives
- Define observability and explain why it is important in modern IT operations.
- Explain the difference between traditional monitoring and observability.
- Describe the three pillars of observability: metrics, logs, and traces.
- Explain how events, alerts, dashboards, and service maps support observability.
- Understand MTTD, MTTR, availability, reliability, latency, traffic, errors, and saturation.
- Identify common operational blind spots in SaaS and cloud environments.
- Explain how observability supports incident response and root cause analysis.
- Identify which systems in a sample SaaS company require monitoring.
- Prepare for Datadog concepts such as agents, integrations, tags, dashboards, logs, APM, monitors, and service maps.
1.1 What Is Observability?
Observability is the ability to understand the internal state, health, behavior, and performance of a system by analyzing the data that system produces. In practical terms, observability helps engineers answer questions such as:
- Is the service healthy?
- Why is the application slow?
- Which service is failing?
- Which customers are affected?
- What changed before the problem started?
- Which dependency caused the outage?
- Is this an infrastructure, application, database, network, or cloud issue?
A system is observable when it produces enough meaningful telemetry for engineers to understand what is happening without needing to guess. The most common telemetry types are metrics, logs, and traces. Datadog describes these as key observability data types used to understand system behavior, identify performance issues, troubleshoot failures, and improve reliability.
Primeshare Academy Definition:
Observability is the operational ability to see, understand, and explain the behavior of critical systems using connected telemetry, context, and dependency visibility.
1.2 Monitoring vs Observability
Monitoring and observability are related, but they are not the same. Monitoring usually focuses on known conditions. For example, a monitoring tool may tell you that a server is down, CPU is above 90%, disk space is almost full, or a website is unavailable.
Observability goes further. It helps engineers investigate why something is happening, how systems are connected, what changed, which users are affected, and what action should be taken.
| Traditional Monitoring |
Observability |
| Answers: Is it working? |
Answers: Why is it behaving this way? |
| Focuses on known problems and fixed thresholds. |
Helps investigate unknown, complex, or unexpected problems. |
| Often checks individual components. |
Connects systems, services, dependencies, and business impact. |
| Usually reactive. |
Supports proactive investigation and faster diagnosis. |
| Can create many isolated alerts. |
Correlates signals to reduce noise and improve root cause analysis. |
Example:
Monitoring may say, "The checkout API is slow." Observability helps answer, "The checkout API is slow because database queries are taking longer after a deployment, and the payment service is also returning intermittent errors."
1.3 The Three Pillars of Observability
The three classic pillars of observability are metrics, logs, and traces. Each pillar provides a different type of operational evidence.
A. Metrics
Metrics are numerical measurements collected over time. They are useful for identifying trends, setting alert thresholds, building dashboards, and understanding system performance.
| Metric Type |
Examples |
Operational Use |
| Infrastructure metrics |
CPU, memory, disk, network I/O |
Detect host saturation and capacity issues |
| Application metrics |
Request rate, error rate, latency |
Understand service performance |
| Database metrics |
Connections, query time, locks, replication lag |
Detect database bottlenecks |
| Business metrics |
Orders, logins, payments, failed transactions |
Connect technical health to business impact |
B. Logs
Logs are timestamped records of events generated by applications, servers, operating systems, network devices, security tools, and databases. Logs provide detailed evidence about what happened inside a system.
A good log entry may include a timestamp, service name, environment, severity level, user ID, request ID, error message, status code, and other useful attributes.
Example Log:
2026-06-01 14:05:12 ERROR checkout-api env=production user_id=4581 request_id=req-9281 message="Payment gateway timeout"
C. Traces
Traces show the path of a request as it moves through multiple services. A trace is made of spans. Each span represents one operation, such as an API call, database query, external service request, or internal function.
Traces are especially important in distributed applications and microservices. They help engineers identify which part of a request is slow, failing, or causing downstream impact.
Example Trace Path:
User Browser -> Web Portal -> Checkout API -> Inventory Service -> Payment API -> MySQL Database
1.4 Additional Observability Signals
Metrics, logs, and traces are the classic pillars, but modern observability also includes other important signals:
- Events: Changes or occurrences such as deployments, configuration updates, service restarts, or alerts.
- Profiles: Code-level performance data that helps identify CPU, memory, or function-level bottlenecks.
- Real User Monitoring: Data from actual user sessions, page loads, frontend errors, and user experience.
- Synthetic Monitoring: Automated tests that simulate user journeys to verify availability and performance.
- Service Maps: Visual maps showing how services communicate and depend on each other.
- Dependency Maps: Views that connect infrastructure, applications, databases, networks, and business services.
1.5 The Four Golden Signals
Google SRE popularized the four golden signals of monitoring: latency, traffic, errors, and saturation. These signals are useful because they focus on the health of user-facing services.
| Golden Signal |
Meaning |
Example |
| Latency |
How long it takes to serve a request |
Checkout API p95 latency is 2.4 seconds |
| Traffic |
How much demand is placed on the system |
500 requests per second |
| Errors |
How many requests fail |
HTTP 500 errors increased to 4% |
| Saturation |
How full or overloaded the system is |
CPU is 95% or database connections are exhausted |
1.6 MTTD, MTTR, Reliability, and Availability
Mean Time to Detect - MTTD
MTTD measures how long it takes to detect that a problem exists. A low MTTD means the organization identifies issues quickly.
Mean Time to Resolve - MTTR
MTTR measures how long it takes to restore normal service after a problem is detected. Strong observability helps reduce MTTR because engineers can identify root cause faster.
Availability
Availability measures whether a service is accessible and functioning when users need it. A service can be available but still slow or degraded.
Reliability
Reliability measures whether a system consistently performs correctly over time. Observability helps teams improve reliability by detecting recurring issues, weak dependencies, and capacity risks.
Example:
A customer portal outage begins at 9:00 AM. Datadog alerts the team at 9:03 AM. The team resolves the issue at 9:25 AM. MTTD is 3 minutes. MTTR is 22 minutes.
1.7 Why Observability Matters in SaaS Companies
SaaS companies depend on uptime, performance, and trust. Customers expect applications to be available, fast, secure, and reliable. When systems fail, customers may lose access to important business workflows.
A SaaS company may include the following components:
- Web application
- Mobile application
- API gateway
- Authentication service
- Application servers
- Databases
- Message queues
- Payment processors
- Cloud infrastructure
- Load balancers
- DNS services
- Third-party integrations
If any critical dependency fails, the customer experience can be affected. Observability helps teams understand not only whether a component is failing, but also how that failure affects the larger service.
1.8 Common Operational Blind Spots
An operational blind spot is an area where the organization lacks enough visibility to detect, understand, or troubleshoot a problem.
| Blind Spot |
Why It Is Dangerous |
| No centralized observability platform |
Teams waste time switching tools during incidents. |
| No standard tagging |
Telemetry cannot be grouped by service, team, environment, or business owner. |
| Logs are not collected |
Engineers lack detailed evidence of application and system errors. |
| No traces |
Teams cannot follow requests across distributed services. |
| No dependency map |
Teams do not understand upstream and downstream service impact. |
| No alert quality review |
Alert fatigue causes teams to ignore important signals. |
1.9 How Datadog Supports Observability
Datadog is a cloud-based observability and security platform. It helps teams collect and analyze telemetry from infrastructure, applications, logs, traces, cloud platforms, databases, containers, networks, and services.
Datadog supports observability through:
- Infrastructure Monitoring: Host, container, process, CPU, memory, disk, and network metrics.
- APM: Application performance monitoring and distributed tracing.
- Log Management: Log collection, search, parsing, indexing, and correlation.
- Network Monitoring: Network device, flow, DNS, and connectivity visibility.
- Database Monitoring: Query performance, database health, and dependency impact.
- Cloud Monitoring: Visibility into AWS, Azure, GCP, and hybrid environments.
- Dashboards: Visual summaries of system, application, and business health.
- Monitors: Alerts based on metrics, logs, traces, service checks, anomalies, and SLOs.
- Service Maps: Visual dependency maps for application services.
Certification Connection:
Datadog Fundamentals students must understand what each Datadog capability is used for and which operational problem it helps solve.
1.10 PrimeRetail Cloud Services Scenario
Throughout this course, students will work with a fictional SaaS company called PrimeRetail Cloud Services. PrimeRetail provides an online customer portal used by small businesses to manage orders, invoices, products, customer accounts, and support requests.
PrimeRetail has the following architecture:
| Component |
Purpose |
Observability Need |
| Customer Web Portal |
User-facing SaaS application |
Availability, latency, errors, logs, traces |
| API Service |
Processes customer requests |
APM, error rates, request volume, trace analysis |
| MySQL Database |
Stores orders, users, invoices, and products |
Query performance, locks, storage, connections |
| Linux App Server |
Runs backend services |
CPU, memory, disk, process, logs, APM |
| Windows Admin Server |
Internal operations tools |
Host monitoring, service checks, Windows events |
| Router and Switch |
Network connectivity |
Interface status, packet loss, latency, SNMP |
1.11 Hands-On Lab: Identify Observability Requirements
Lab Objective
Students will review the PrimeRetail Cloud Services architecture and identify which systems require observability, what telemetry should be collected, and what business risks exist if those systems are not monitored.
Lab Tasks
- List all systems in the PrimeRetail environment.
- Identify which systems are customer-facing.
- Identify which systems are business-critical.
- For each system, list the metrics that should be collected.
- For each system, list the logs that should be collected.
- Identify which services require application tracing.
- Identify at least five possible failure scenarios.
- Create a simple dependency map showing how the customer portal depends on other services.
- Write a short incident example explaining how observability would help troubleshoot the issue.
Suggested Dependency Map
Customer User -> Customer Web Portal -> API Service -> MySQL Database
Customer Web Portal -> Authentication Service
API Service -> Payment Gateway
API Service -> Email Notification Service
All Services -> Network Router/Switch -> Internet/Cloud Services
Student Deliverables
- System inventory table
- Telemetry requirements table
- Dependency map
- Five failure scenarios
- One-page observability summary
1.12 Section 1 Knowledge Check
- What is observability?
- How is observability different from traditional monitoring?
- What are the three pillars of observability?
- Why are traces important in distributed systems?
- What is the difference between MTTD and MTTR?
- What are the four golden signals?
- Why do SaaS companies need dependency visibility?
- What is an operational blind spot?
- How can poor tagging reduce observability?
- How does Datadog support observability?
External Reading Resources
Students should review the following resources before completing the Section 1 quiz:
Recommended Video Resources
These videos are recommended to help students visualize observability, APM, tracing, and Datadog workflows:
Instructor Note:
If a video URL changes, search the Datadog YouTube channel for these topics: "Datadog observability", "Datadog APM", "Datadog distributed tracing", "Datadog logs", and "Datadog service map".
Section 1 Quiz Preview
Students will complete a 20-question quiz at the end of this section. The quiz will test their understanding of:
- Definition of observability
- Monitoring vs observability
- Metrics, logs, and traces
- Events, dashboards, alerts, and service maps
- MTTD and MTTR
- Four golden signals
- SaaS observability requirements
- Operational blind spots
- How Datadog supports observability
- PrimeRetail Cloud Services lab scenario
Section 1 Summary
In this section, students learned the foundation of observability, including metrics, logs, traces, monitoring vs observability, MTTD, MTTR, the four golden signals, SaaS operational risks, and the role of Datadog in centralized observability. Students are now ready to begin learning the Datadog platform in Section 2.
I’ll create a Moodle XML file for Section 1 only, with 20 multiple-choice questions, 4 options each, correct answers marked, and plain ASCII-safe text. Thought for a couple of seconds Done. Download Moodle XML quiz Download ZIP with XML + answer key which one should i upload to moodle Upload this file into Moodle: primeshare_datadog_section_1_quiz.xml That is the actual Moodle Question Bank import file. The ZIP file is only for convenience and contains: the XML import file a readable instructor answer key How to Import into Moodle Go to your Moodle course Open: More Question Bank Click: Import Choose format: Moodle XML format Upload: primeshare_datadog_section_1_quiz.xml Click: Import The questions will appear under: Section 1 - Introduction to Observability Then you can: Create a quiz activity Add questions from the question bank Set passing score to 80% Shuffle questions and answers Limit timing if desired please create the full content for section 2 and include the questions bank xml download Done — I created the full Section 2 package including: Full inline HTML lesson content Detailed Moodle-ready section formatting Hands-on labs External resources 20-question Moodle XML quiz bank Downloads: Section 2 Full Package (HTML + XML) Section 2 HTML Content Section 2 Moodle XML Question Bank
Primeshare Academy - Datadog Certification Program
Section 2:
Datadog Platform Foundations
In this section students learn the core architecture, navigation, and operational concepts of the Datadog platform. Students will understand how Datadog collects telemetry, organizes data using tags, visualizes infrastructure, and enables operational monitoring across modern enterprise environments.
Section Learning Objectives
- Understand Datadog platform architecture
- Understand Datadog Agents and integrations
- Navigate the Datadog user interface
- Understand hosts, services, tags, metrics, logs, traces, and monitors
- Understand Datadog products and modules
- Explain why tagging standards are critical
- Create dashboards and investigate infrastructure
- Understand telemetry flow inside Datadog
- Prepare for Datadog Fundamentals certification concepts
2.1 Introduction to the Datadog Platform
Datadog is a cloud-native observability and security platform that helps organizations monitor infrastructure, applications, logs, databases, cloud services, networks, containers, and user experience from a centralized platform.
The platform allows teams to:
- Collect telemetry from systems and services
- Visualize operational health
- Detect incidents quickly
- Correlate metrics, logs, and traces
- Reduce Mean Time to Detect (MTTD)
- Reduce Mean Time to Resolve (MTTR)
- Investigate dependencies and root causes
- Improve service reliability
Unlike traditional siloed monitoring tools, Datadog centralizes observability into one connected platform.
2.2 Core Datadog Components
| Component |
Purpose |
| Datadog Agent |
Collects telemetry from hosts and services. |
| Infrastructure Monitoring |
Monitors hosts, containers, CPU, memory, disk, and networks. |
| APM |
Application Performance Monitoring and distributed tracing. |
| Log Management |
Collects, indexes, searches, and analyzes logs. |
| Dashboards |
Visualize operational health and trends. |
| Monitors |
Generate alerts based on operational conditions. |
| Service Maps |
Display relationships and dependencies between services. |
| Cloud Integrations |
Connect AWS, Azure, GCP, Kubernetes, and SaaS platforms. |
Students should understand that Datadog is not just a monitoring dashboard. It is a connected observability platform that correlates operational signals from many different technologies.
2.3 How Datadog Collects Telemetry
The Datadog Agent is installed on hosts, servers, virtual machines, or containers. The Agent collects metrics, logs, traces, process information, events, and integration data.
Telemetry flow typically follows this pattern:
Host or Application -> Datadog Agent -> Datadog Cloud Platform -> Dashboards, Monitors, Logs, APM, Alerts
The Agent may collect:
- Infrastructure metrics
- Application metrics
- Container metrics
- Logs
- Traces
- Database metrics
- Process information
- Network metrics
2.4 Tags and Metadata
Tags are one of the most important concepts in Datadog. Tags provide context to telemetry and allow teams to filter, group, correlate, and analyze data efficiently.
Examples of common tags:
- env:production
- service:checkout-api
- team:payments
- region:us-west
- role:webserver
- tier:database
- company:primeretail
Without tags, telemetry becomes difficult to organize. Good tagging standards improve:
- Dashboard filtering
- Monitor routing
- Root cause analysis
- Service ownership visibility
- Incident response efficiency
- Cloud cost visibility
Strong observability depends heavily on good tagging standards.
2.5 Navigating the Datadog User Interface
Students should become comfortable navigating the Datadog platform. Important navigation areas include:
| UI Area |
Purpose |
| Infrastructure List |
View monitored hosts and infrastructure health. |
| Metrics Explorer |
Search and graph metrics. |
| Dashboards |
Build operational visualizations. |
| Logs |
Search and analyze log data. |
| APM |
Review traces and service performance. |
| Monitors |
Manage alert conditions and notifications. |
| Service Map |
Understand service relationships and dependencies. |
2.6 Datadog Integrations
Datadog supports hundreds of integrations for infrastructure, cloud services, operating systems, databases, containers, applications, and networking technologies.
Examples include:
- AWS
- Microsoft Azure
- Google Cloud Platform
- Kubernetes
- Docker
- MySQL
- NGINX
- Apache
- Redis
- MongoDB
- Cisco devices
- Linux
- Windows Server
Integrations allow Datadog to collect technology-specific telemetry automatically.
2.7 Dashboards and Visualizations
Dashboards allow organizations to visualize operational health using graphs, tables, heatmaps, toplists, alerts, logs, and status widgets.
Good dashboards answer operational questions such as:
- Is the platform healthy?
- Which service is degraded?
- Are users impacted?
- Is infrastructure overloaded?
- Which region is failing?
- Which application has increased errors?
Dashboards should be audience-specific:
- Executives need business-impact dashboards.
- Operations teams need infrastructure dashboards.
- Application teams need APM dashboards.
- NOC teams need alert-focused dashboards.
2.8 Monitors and Alerts
Monitors are used to detect abnormal conditions and notify teams when action is required.
Examples of monitor types:
- Metric monitors
- Log monitors
- APM monitors
- Service checks
- Composite monitors
- Anomaly monitors
- SLO monitors
A good monitor should:
- Be actionable
- Reduce alert fatigue
- Have clear thresholds
- Include ownership
- Support operational response
2.9 Hands-On Lab
Lab Objective
Students will explore the Datadog platform and identify where different telemetry types are visualized.
Lab Tasks
- Create or access a Datadog account.
- Navigate the Infrastructure List.
- Open Metrics Explorer and graph CPU metrics.
- Open Dashboards and review sample dashboards.
- Open the Logs section and search for logs.
- Open APM and review traces.
- Review the Service Map.
- Identify at least five common tags.
- Create a simple dashboard with two widgets.
Deliverables
- Dashboard screenshots
- Tag inventory
- Telemetry summary
- Short operational findings report
Section 2 Summary
In this section students learned the foundations of the Datadog platform, including platform architecture, telemetry flow, tags, integrations, dashboards, monitors, and operational navigation. Students are now prepared to begin building the PrimeRetail SaaS lab environment in GNS3.