backGo to search

Site Reliability Engineer

Site Reliability Engineering, DevOps, Amazon Web Services, Terraform, Docker, Kubernetes, Python, Google Cloud Platform, Bash, PowerShell, Microsoft Azure
Hyderabad, Pune, Bangalore, Gurgaon, Chennai

We are seeking a talented and motivated Site Reliability Engineer to join our team. As a key member of our multi-disciplined team, you will play a crucial role in ensuring the reliability, performance, and security of our complex distributed systems. If you are passionate about operational risk management, have a deep understanding of Kubernetes and Containers, and possess strong problem-solving skills, this role offers an exciting opportunity to contribute to the success of our operations.

  • Ability to rapidly and effectively understand and translate requirements into technical solutions.
    • Ability to reason about performance, security, and process interactions in complex distributed system. Passionate about managing operational risk.
      • Ability to work effectively as part of a diverse multi-disciplined team.
        • Motivated, self-organized and have good time & work management skills.
          • Should have 3 to 5 years of experience as Site Reliability Engineer.
            • Must have expert/intermediate level knowledge of Azure (preferred) or AWS/ GCP Cloud Infrastructure, networking, security, Storage. (GCP will be decommissioned in upcoming days, just Azure is also fine)
              • Must have intermediate level Python core skills.
                • Must have expert/intermediate level python/cloud/windows admin debugging skills.
                  • Must have intermediate level knowledge of Windows or Linux administration. (Only Linux is also okay, Windows administration training can be given for 2 weeks)
                    • Good to have expert/intermediate level knowledge in infrastructure monitoring as well as application monitoring and related tools ELK/Opsbridge/DynaTrace
                      • Good to have Observability & Centralized Logging experience.
                        • Good to have knowledge of incident management (PagerDuty/OpsGinie/VictorOps).
                          • Good to have knowledge of change management.
                            • Good to have knowledge of SLO, SLI, SLA.
                              • Good to have knowledge of Kubernetes and Docker.
                                • Good to have knowledge of CI/CD (especially Azure DevOps)


                                  For you
                                  • Insurance Coverage 
                                  • Paid Leaves – including maternity, bereavement, paternity, and special COVID-19 leaves. 
                                  • Financial assistance for medical crisis 
                                  • Retiral Benefits – VPF and NPS 
                                  • Customized Mindfulness and Wellness programs 
                                  • EPAM Hobby Clubs
                                  For your comfortable work
                                  • Hybrid Work Model 
                                  • Soft loans to set up workspace at home 
                                  • Stable workload 
                                  • Relocation opportunities with ‘EPAM without Borders’ program

                                  For your growth
                                  • Certification trainings for technical and soft skills 
                                  • Access to unlimited LinkedIn Learning platform 
                                  • Access to internal learning programs set up by world class trainers 
                                  • Community networking and idea creation platforms 
                                  • Mentorship programs 
                                  • Self-driven career progression tool