backGo to search

SRE - Site Reliability Engineer

Site Reliability Engineering, DevOps, Amazon Web Services, Terraform, Docker, Kubernetes, Python, Google Cloud Platform, Bash, PowerShell, Microsoft Azure
Hyderabad, Pune, Bangalore, Gurgaon, Chennai

We are seeking a talented and motivated Site Reliability Engineer to join our team. As a key member of our multi-disciplined team, you will play a crucial role in ensuring the reliability, performance, and security of our complex distributed systems. If you are passionate about operational risk management, have a deep understanding of Kubernetes and Containers, and possess strong problem-solving skills, this role offers an exciting opportunity to contribute to the success of our operations.

  • Ability to rapidly and effectively understand and translate requirements into technical solutions.
  • Ability to reason about performance, security, and process interactions in complex distributed system. Passionate about managing operational risk.
  • Ability to work effectively as part of a diverse multi-disciplined team.
  • Motivated, self-organized and have good time & work management skills.
  • Should have 3 to 5 years of experience as Site Reliability Engineer.
  • Must have expert/intermediate level knowledge of Azure (preferred) or AWS/ GCP Cloud Infrastructure, networking, security, Storage. (GCP will be decommissioned in upcoming days, just Azure is also fine)
  • Must have intermediate level Python core skills.
  • Must have expert/intermediate level python/cloud/windows admin debugging skills.
  • Must have intermediate level knowledge of Windows or Linux administration. (Only Linux is also okay, Windows administration training can be given for 2 weeks)
  • Good to have expert/intermediate level knowledge in infrastructure monitoring as well as application monitoring and related tools ELK/Opsbridge/DynaTrace
  • Good to have Observability & Centralized Logging experience.
  • Good to have knowledge of incident management (PagerDuty/OpsGinie/VictorOps).
  • Good to have knowledge of change management.
  • Good to have knowledge of SLO, SLI, SLA.
  • Good to have knowledge of Kubernetes and Docker.
  • Good to have knowledge of CI/CD (especially Azure DevOps)


For you
  • Insurance Coverage 
  • Paid Leaves – including maternity, bereavement, paternity, and special COVID-19 leaves. 
  • Financial assistance for medical crisis 
  • Retiral Benefits – VPF and NPS 
  • Customized Mindfulness and Wellness programs 
  • EPAM Hobby Clubs
For your comfortable work
  • Hybrid Work Model 
  • Soft loans to set up workspace at home 
  • Stable workload 
  • Relocation opportunities with ‘EPAM without Borders’ program

For your growth
  • Certification trainings for technical and soft skills 
  • Access to unlimited LinkedIn Learning platform 
  • Access to internal learning programs set up by world class trainers 
  • Community networking and idea creation platforms 
  • Mentorship programs 
  • Self-driven career progression tool