Dynatrace Maidenhead People 22

Senior Machine Learning Engineer

Senior Machine Learning Engineer

Dynatrace provides software intelligence to simplify cloud complexity and accelerate digital transformation. With automatic and intelligent observability at scale, our all-in-one platform delivers precise answers about the performance and security of applications, the underlying infrastructure, and the experience of all users to enable organizations to innovate faster, collaborate more efficiently, and deliver more value with dramatically less effort. That’s why many of the world’s largest organizations trust Dynatrace to modernize and automate cloud operations, release better software faster, and deliver unrivalled digital experiences.

Dynatrace makes it easy and simple to monitor and run the most complex, hyper-scale multicloud systems. Dynatrace is a full stack and completely automated monitoring solution that can track every user, every transaction, across every application.

The Opportunity:

We’re looking for a Senior Machine Learning Engineer (MLOps) to build and scale production ML services for our Business Insights products. You will be responsible for driving delivery of major projects across both LLM and traditional ML domains, including data pipeline design, model training, deployment, and monitoring, collaborating with Data Science and Software Engineering to uphold standards for reliability, latency, and cost.

Your Tasks:

Engineering and Architecture

  • Design and implement robust data and ML pipelines for training, deployment, and inference at scale, ensuring reliability, performance, and cost efficiency across cloud environments.

  • Deliver production ML services using cloud‑native patterns (e.g., managed services, serverless, container orchestration) optimized for low latency and high throughput.

  • Establish MLOps practices: dataset and model versioning, experiment tracking, promotion gates from development to production, and safe rollback or canary strategies.

  • Build ETL/ELT workflows with clear schema management, data validation, reproducibility, and performance tuning for large‑scale datasets.

  • Implement strategies for scalable inference, including caching, batching, autoscaling, and hardware‑aware optimizations to meet service‑level objectives.

  • Set technical direction for ML service architecture and pipeline design, ensuring scalability and portability across platforms.

Operations, Reliability, and Governance

  • Instrument services with metrics, logs, and traces; maintain dashboards and alerts for latency, throughput, errors, drift, and cost.

  • Run offline and online evaluations for accuracy, drift, stability, and cost; maintain golden datasets and automated promotion gates.

  • Own lifecycle management: training/retraining schedules, deployment procedures, incident playbooks, and post‑incident reviews.

  • Implement robust access controls, secrets management, data governance, and auditability across platforms.

Minimum Requiremnets:

  • Professional Python: 5+ years writing production‑quality code with testing/packaging and ML/DS libraries (MLflow, FastAPI, scikit‑learn, PyTorch or TensorFlow).

  • MLOps: 3+ years with model registries, experiment tracking, promotion gates, and safe deployment strategies.

  • Data engineering: 3+ years building reliable ETL/ELT, schema evolution, data validation, and performance tuning on large‑scale datasets.

  • CI/CD and IaC: 3+ years designing and owning build/test/deploy pipelines, plus infrastructure automation.

  • Containers and orchestration: 3+ years operating ML services on Kubernetes or equivalent.

  • Communication: clear design docs, ability to explain trade‑offs to technical and non‑technical stakeholders.

  • Education: Master’s degree or equivalent practical experience in CS/Engineering/Math or related field.

Preffered Requirements:

  • Experience with SQL‑centric data platforms (e.g., Snowflake) or cloud ML workloads (AWS/GCP/Azure).

  • Observability and monitoring integration (Dynatrace or similar).

  • Workflow orchestration (Prefect, Airflow) and CI tools (Jenkins, GitHub Actions).

  • Streaming and near real‑time patterns (Kafka, Kinesis).

  • Security and privacy: PII handling, audit trails, policy enforcement.

  • Domain: telemetry and observability, time‑series modelling, anomaly detection.

Note to Recruiters and Agencies: Thank you for your interest in Dynatrace. Please note that we do not accept unsolicited agency resumes—do not forward them via our website or directly to Dynatrace employees. Dynatrace will not pay fees for unsolicited resumes, and any resumes received this way will be considered the property of Dynatrace.

Benefits and work-life perks

We offer best-in-class core rewards, including paid time off, financial security benefits, retirement savings plans, and health insurance. Beyond that, you’ll get other benefits and work-life perks designed to make your ride with us even more rewarding.

Rewards vary depending on your employment type. Some benefits and perks also differ by location — explore your city to see what’s available there.

Mental health support

Our Employee Assistance Program, powered by Telus Health, offers support for you and your family members.

Wellness Days

Four company-designated extra paid days off for you to recharge batteries.

Flexibility

Our hybrid working model and flexible working hours offer you the flexibility you need.

Employee Stock Purchase Plan

Purchase company stock (NYSE:DT) at a discounted price and become a shareholder.

Learn & develop

Company-wide learning perks, designated team's learning days, and more.

Volunteering day

A day of paid volunteer time to support a community or cause you care about.

Regular team events

We host Global Culture Parties, Family & Friends at Work Day, Global Breakfasts, Green Weeks, Pride Month, and beyond!

International vibe

Most of our offices and teams are proudly multicultural. English is our shared language, but we embrace and learn from each other's cultures.

About Dynatrace

Dynatrace (NYSE: DT) is the leading AI-powered observability and security platform. We're advancing observability for today's digital businesses, helping transform modern digital ecosystems' complexity into powerful business assets.

Our AI-driven insights cut through the noise, allowing customers to focus on what truly matters by automating manual tasks and resolving issues with pinpoint accuracy. Dynatrace offers simplicity, clarity, and reliability at scale to ensure teams can make informed decisions, minimize downtime, and drive their business forward with confidence.

Video thumbnail
Dynatrace GPTW 2025

Great Place To Work®

We’re proud to be certified as a Great Place To Work® in 16 countries—an achievement made possible by feedback from our greatest asset: an incredible team. We couldn’t have done it without them.

Discover more

Let's work together

Background