backGo to search

Senior Systems Engineer (DevOps & SRE)

hot
bullets
Site Reliability Engineering, DevOps
bullets
Hyderabad, Bangalore, Pune, Gurgaon, Chennai

We are looking for a skilled and driven Site Reliability Engineer (SRE) to become a part of our team.

The chosen candidate will play a key part in safeguarding the Reliability, Scalability, Capacity Planning, and performance of our infrastructure and applications. If you have a rich background in software engineering, system administration, Containerisation, and cloud technologies, you might be our ideal candidate.

Responsibilities
  • Crafting, implementing, and managing scalable, reliable, and secure cloud infrastructure using tools such as Terraform, Kubernetes, and Docker
  • Building and maintaining monitoring and alerting systems for application and infrastructure health and performance with tools such as Prometheus, Grafana, and ELK stack
  • Leading response efforts for critical incidents, conducting root cause analysis, and implementing long-term fixes to prevent recurrence
  • Developing, maintaining, and optimizing continuous integration and continuous deployment (CI/CD) pipelines using tools like Jenkins, GitLab CI, or CircleCI
  • Automating routine tasks and enhancing efficiency through scripting and tools, employing languages such as Python, Bash, or Go
  • Implementing and managing security best practices for infrastructure and applications, including vulnerability assessments, penetration testing, and adherence to security standards
  • Cooperating closely with development, QA, and operations teams to ensure smooth integration and deployment of new features and updates
  • Conducting capacity planning and scaling infrastructure to meet present and future demands
  • Creating and maintaining thorough documentation for infrastructure, processes, and procedures
Requirements
  • A minimum of 5 years experience in a DevOps/SRE role
  • Solid experience with cloud platforms like AWS, GCP, Azure
  • Proficiency in infrastructure as code (IaC) tools such as Terraform, CloudFormation
  • Significant experience with containerization and orchestration (Docker, Kubernetes)
  • In-depth knowledge of CI/CD tools (Jenkins, GitLab CI, CircleCI)
  • Proficiency in scripting languages (Python, Bash)
  • Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack)
  • Capacity to participate in capacity planning and scalability assessments to meet business growth and requirements
  • Familiarity with SLI, SLO, SLA, and Error Budget concepts, their implementation, and willingness to provide on-call support and participate in incident management & response activities as needed
  • Solid grasp of networking and security principles
  • Exceptional problem-solving skills and the ability to work under pressure
  • Strong communication and collaboration skills

Benefits

Benefits
  • Insurance coverage 
  • Paid leaves – including maternity, bereavement, paternity, and special COVID-19 leaves. 
  • Financial assistance for medical crisis 
  • Retiral Benefits – VPF and NPS 
  • Customized Mindfulness and Wellness programs 
  • EPAM Hobby Clubs
Community
  • Flexible and hybrid work opportunities
  • Soft loans to set up workspace at home 
  • Relocation and mobility programs

Professional development

  • Access to soft skills training in general communication, presenting and public speaking, diversity, equity and inclusion (DEI), cultural Intelligence, self-productivity, well-being and more.  
  • Unlimited access to the LinkedIn Learning Library, including 22,000+ courses 
  • Access to internal learning platforms, EPAM University and a wide range of professional communities and competency centers  
  • Community networking and idea creation platforms 
  • Mentorship programs 
  • Self-driven career progression tool
  • Upskilling, reskilling and certification courses <wbr>