backGo to search

Lead Systems Engineer (DevOps & SRE)

Site Reliability Engineering, DevOps
Hyderabad, Bangalore, Pune, Gurgaon, Chennai

Join our organization as a Lead Systems Engineer (DevOps & SRE) and play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications.

The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies, and will lead the design, development, and maintenance of scalable and reliable infrastructure.

You will also be responsible for implementing and managing CI/CD pipelines, monitoring system performance and reliability, developing and maintaining automation tools, ensuring security and compliance, mentoring and guiding junior SREs and DevOps engineers, and staying up-to-date with the latest industry trends and technologies.

  • Lead the design, development, and maintenance of scalable and reliable infrastructure
    • Implement and manage CI/CD pipelines to ensure efficient and smooth software releases
      • Monitor system performance and reliability, proactively identifying and resolving issues
        • Develop and maintain automation tools to streamline infrastructure management and deployment processes
          • Collaborate with development teams to ensure best practices for software development, deployment, and operations
            • Ensure security and compliance across all infrastructure and operations
              • Mentor and guide junior SREs and DevOps engineers, fostering a culture of collaboration and continuous learning
                • Conduct root cause analysis of system failures and implement solutions to prevent recurrence
                  • Optimize resource utilization to ensure cost-effective operations
                    • Stay up-to-date with the latest industry trends and technologies, integrating them into our processes where appropriate
                      • 8+ years of experience in a DevOps/SRE role
                        • Strong experience with cloud platforms (AWS, GCP, Azure)
                          • Proficiency in infrastructure as code (IaC) tools (Terraform, CloudFormation, etc.)
                            • Extensive experience with containerization and orchestration (Docker, Kubernetes)
                              • Strong knowledge of CI/CD tools (Jenkins, GitLab CI, CircleCI, etc.)
                                • Proficiency in scripting languages (Python, Bash, etc.)
                                  • Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack, etc.)
                                    • Ability to participate in capacity planning and scalability assessments to support business growth and requirements
                                      • Well aware of SLI, SLO, SLA and Error Budget concepts and their implementations and provide on-call support and participate in incident management & response activities as needed
                                        • Solid understanding of networking and security principles
                                          • Excellent problem-solving skills and the ability to work under pressure
                                            • Strong communication and collaboration skills
                                              • B2+ English level proficiency
                                                • CI/CD, Jenkins, Docker, Kubernetes, Terraform, Ansible, Python, Prometheus, Grafana, ELK stack, Splunk, Dynatrace, Datadog or similar, SLI, SLO, SLA and Error Budget concepts


                                                  For you
                                                  • Insurance Coverage 
                                                  • Paid Leaves – including maternity, bereavement, paternity, and special COVID-19 leaves. 
                                                  • Financial assistance for medical crisis 
                                                  • Retiral Benefits – VPF and NPS 
                                                  • Customized Mindfulness and Wellness programs 
                                                  • EPAM Hobby Clubs
                                                  For your comfortable work
                                                  • Hybrid Work Model 
                                                  • Soft loans to set up workspace at home 
                                                  • Stable workload 
                                                  • Relocation opportunities with ‘EPAM without Borders’ program

                                                  For your growth
                                                  • Certification trainings for technical and soft skills 
                                                  • Access to unlimited LinkedIn Learning platform 
                                                  • Access to internal learning programs set up by world class trainers 
                                                  • Community networking and idea creation platforms 
                                                  • Mentorship programs 
                                                  • Self-driven career progression tool