Project Details
An Alert Management System designed to leverage automated alert handling, critical incident reporting, and service classification to ensure high availability and performance of cloud services while minimizing incident noise in PagerDuty, an incident management platform.
Things I Did
- Developed a production-grade solution for one of Australia's biggest retail enterprise ti classify and prioritise alerts across Dynatrace and PagerDuty
- Analysed historical alert patterns and identified key entities driving high-severity incidents
- Built rules engine for noise reduction and priority assignment
- Integrated with Dynatrace and PagerDuty for cross‑platform incident handling
- Implemented deduplication and enrichment to improve triage quality
- Set up metrics to track MTTA/MTTR improvements