backGo to search

Senior Site Reliability Engineer - Azure

Office in Hyderabad, Pune, Bangalore, Gurgaon, Chennai, Coimbatore
bullets
Site Reliability Engineering& others
bullets
can't find the job you are looking for?

Send us your CV to get a personalized offer.

We are seeking a highly skilled and motivated Senior Site Reliability Engineer (SRE) to join our team and lead the charge in building robust, scalable, and secure systems on the Azure platform. In this role, you will be responsible for ensuring the reliability, performance, and efficiency of our cloud-based infrastructure, as well as driving best practices in incident management, observability, and automation.

Responsibilities
  • Troubleshoot complex distributed systems and networking issues in a cloud-native environment
  • Ensure optimal performance and stability of Azure-based systems, utilizing tools like Azure Monitor, Log Analytics, and Application Insights
  • Architect, develop, and maintain Infrastructure as Code (IaC) solutions using ARM, Bicep, and Terraform
  • Implement and enforce observability solutions, develop metrics, and define and monitor SLOs/SLIs
  • Manage incident response processes, on-call rotations, and conduct post-incident analysis to prevent future occurrences
  • Automate repetitive tasks, leveraging scripting languages such as Python, PowerShell, or Bash
  • Collaborate with engineering and operational teams to improve system reliability, scalability, and cost-efficiency
  • Drive continuous improvement in system design and operational processes across the organization
  • Advocate for SRE culture by promoting best practices in monitoring, deployment, and infrastructure optimization
Requirements
  • 5+ years of experience in SRE, DevOps, or related roles, with a strong track record in cloud environments (Azure experience required)
  • Deep expertise in troubleshooting distributed systems, networking, and cloud-native architectures
  • Hands-on experience with Azure monitoring, logging, and automation tools (e.g., Azure Monitor, Log Analytics, Application Insights, ARM, Bicep, Terraform)
  • Proficiency in at least one scripting or programming language (Python, PowerShell, Bash, etc.)
  • Strong understanding of incident management, on-call operations, and post-incident analysis
  • Experience in implementing observability solutions and defining SLOs/SLIs
  • Excellent communication skills and the ability to collaborate cross-functionally in high-pressure situations
Nice to have
  • Azure certifications (e.g., Azure Solutions Architect, Azure DevOps Engineer)
  • Experience working in environments with low SRE process maturity, including building practices from the ground up
  • Familiarity with CI/CD pipelines and infrastructure-as-code practices
  • Experience mentoring or leading SRE or DevOps teams