Senior SRE Engineer
Office in Pune
Site Reliability Engineering
& others
can't find the job you are looking for?
Send us your CV to get a personalized offer.
We are seeking a Senior SRE Engineer to join our team and drive the reliability, scalability, and performance of our systems. If you are passionate about ensuring production excellence within complex, large-scale environments, this role is for you.
Responsibilities
- Apply SRE principles, including SLI, SLO, and error budget management, to enhance service reliability and availability
- Define meaningful metrics and alerts using monitoring and observability tools like Dynatrace and Splunk
- Manage and improve production environments using Kubernetes, Terraform, and database technologies (SQL/NoSQL)
- Develop automation scripts in Shell, Python, or Bash for operational efficiencies
- Lead incident management processes, conduct root cause analyses, and implement actionable postmortem improvements
- Maintain and optimize CI/CD pipelines using tools like Jenkins, Bamboo, or Concourse, aligning with DevOps standards
- Collaborate across cross-functional teams to ensure system reliability and prompt resolution of complex issues
- Enhance the scalability, performance, and reliability of distributed systems through innovative engineering solutions
- Apply software engineering concepts to support large-scale production environments
Requirements
- 5-10 years of experience in Site Reliability Engineering or a related field
- Strong understanding of SRE principles and practices, including SLI, SLO, and error budget management
- Proficiency with monitoring tools like Dynatrace and observability platforms like Splunk
- Expertise in Kubernetes, Terraform, and database technologies (SQL/NoSQL) in production environments
- Proficiency in scripting languages such as Shell, Python, or Bash for automation
- Strong knowledge of CI/CD tools like Jenkins, Bamboo, or Concourse, combined with DevOps best practices
- Experience with incident management, automated root cause analysis, and leading postmortems
- Familiarity with software engineering concepts, system design, and distributed systems at scale
- Capable of defining and implementing system reliability improvements across diverse technical stacks
Nice to have
- Degree in a technical-related field or equivalent practical experience
- Familiarity with Java-based applications, Bitbucket, Maven, and Jenkins
- Experience with performance tuning and optimization of cloud-native Kubernetes applications
- Proficiency in large-scale Infrastructure as Code implementations using Terraform
- Background in managing SLAs, SLOs, SLIs, and error budgets for production systems
- Understanding of chaos engineering, resilience testing, and advanced reliability practices