backGo to search

Senior Systems Engineer (DevOps & SRE)

Site Reliability Engineering, DevOps
Hyderabad, Bangalore, Pune, Gurgaon, Chennai

We are looking for a skilled and driven Site Reliability Engineer (SRE) to become a part of our team.

The chosen candidate will play a key part in safeguarding the Reliability, Scalability, Capacity Planning, and performance of our infrastructure and applications. If you have a rich background in software engineering, system administration, Containerisation, and cloud technologies, you might be our ideal candidate.

  • Crafting, implementing, and managing scalable, reliable, and secure cloud infrastructure using tools such as Terraform, Kubernetes, and Docker
    • Building and maintaining monitoring and alerting systems for application and infrastructure health and performance with tools such as Prometheus, Grafana, and ELK stack
      • Leading response efforts for critical incidents, conducting root cause analysis, and implementing long-term fixes to prevent recurrence
        • Developing, maintaining, and optimizing continuous integration and continuous deployment (CI/CD) pipelines using tools like Jenkins, GitLab CI, or CircleCI
          • Automating routine tasks and enhancing efficiency through scripting and tools, employing languages such as Python, Bash, or Go
            • Implementing and managing security best practices for infrastructure and applications, including vulnerability assessments, penetration testing, and adherence to security standards
              • Cooperating closely with development, QA, and operations teams to ensure smooth integration and deployment of new features and updates
                • Conducting capacity planning and scaling infrastructure to meet present and future demands
                  • Creating and maintaining thorough documentation for infrastructure, processes, and procedures
                    • A minimum of 5 years experience in a DevOps/SRE role
                      • Solid experience with cloud platforms like AWS, GCP, Azure
                        • Proficiency in infrastructure as code (IaC) tools such as Terraform, CloudFormation
                          • Significant experience with containerization and orchestration (Docker, Kubernetes)
                            • In-depth knowledge of CI/CD tools (Jenkins, GitLab CI, CircleCI)
                              • Proficiency in scripting languages (Python, Bash)
                                • Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack)
                                  • Capacity to participate in capacity planning and scalability assessments to meet business growth and requirements
                                    • Familiarity with SLI, SLO, SLA, and Error Budget concepts, their implementation, and willingness to provide on-call support and participate in incident management & response activities as needed
                                      • Solid grasp of networking and security principles
                                        • Exceptional problem-solving skills and the ability to work under pressure
                                          • Strong communication and collaboration skills


                                            For you
                                            • Insurance Coverage 
                                            • Paid Leaves – including maternity, bereavement, paternity, and special COVID-19 leaves. 
                                            • Financial assistance for medical crisis 
                                            • Retiral Benefits – VPF and NPS 
                                            • Customized Mindfulness and Wellness programs 
                                            • EPAM Hobby Clubs
                                            For your comfortable work
                                            • Hybrid Work Model 
                                            • Soft loans to set up workspace at home 
                                            • Stable workload 
                                            • Relocation opportunities with ‘EPAM without Borders’ program

                                            For your growth
                                            • Certification trainings for technical and soft skills 
                                            • Access to unlimited LinkedIn Learning platform 
                                            • Access to internal learning programs set up by world class trainers 
                                            • Community networking and idea creation platforms 
                                            • Mentorship programs 
                                            • Self-driven career progression tool