SRE- Senior Site Reliability Engineer
Site Reliability Engineering, DevOps.CI/CD, Amazon Web Services, Terraform, Docker, Kubernetes, Python, Google Cloud Platform, Bash, PowerShell, Microsoft Azure
Hyderabad, Pune, Bangalore, Gurgaon, Chennai
SRE- Senior Site Reliability Engineer
We are seeking a talented and motivated Senior Site Reliability Engineer to join our team. As a key member of our multi-disciplined team, you will play a crucial role in ensuring the reliability, performance, and security of our complex distributed systems. If you are passionate about operational risk management, have a deep understanding of Kubernetes and Containers, and possess strong problem-solving skills, this role offers an exciting opportunity to contribute to the success of our operations.
responsibilities
- Ability to rapidly and effectively understand and translate requirements into technical solutions.
- Ability to reason about performance, security, and process interactions in complex distributed system. Passionate about managing operational risk.
- Ability to work effectively as part of a diverse multi-disciplined team.
- Motivated, self-organized and have good time & work management skills.
requirements
- Minimum experience required is 5 to 9 years.
- Required is an Systems Engineer with Development background and understanding of Kubernetes and Containers:
- Good knowledge of Infrastructure (networking, operating systems)
- Good knowledge of Linux
- Good knowledge of Kubernetes and Docker
- Good debugging skills
- Skills to handle operational issues
- Really good at Python, Bash, PowerShell (at least anyone)
- Strong in problem solving, analytical skills, algorithms
- Familiarity with monitoring in Cloud and understanding of SLI concept
- Ability to communicate technical concepts effectively, both written and orally, as well as the interpersonal skills required to collaborate effectively with colleagues across diverse technology teams and locations.
- Familiarity with any cloud provider (especially GCP or Azure)
- Identify, craft, and maintain SLIs and SLOs for teams, as well as metrics such as MTTR, Lead time for change, Deployment Frequency and Change Failure Rate
- Should be able to work with Application teams to set up Observability, Telemetry.
- Experience with Any SRE tool, good if it is Grafana, Dynatrace, Splunk
nice to have
- Package management solutions like Nix, Apt, Yum
- Nice to have experience working with Windows
- Nice to have knowledge of CI/CD (especially Azure DevOps)
- Nice to have knowledge of Kubernetes
- Nice to have knowledge of Istio
- Nice to have knowledge of GitOps tools (like ArgoCD)
Benefits
For you
- Insurance Coverage
- Paid Leaves – including maternity, bereavement, paternity, and special COVID-19 leaves.
- Financial assistance for medical crisis
- Retiral Benefits – VPF and NPS
- Customized Mindfulness and Wellness programs
- EPAM Hobby Clubs
For your comfortable work
- Hybrid Work Model
- Soft loans to set up workspace at home
- Stable workload
- Relocation opportunities with ‘EPAM without Borders’ program
For your growth
- Certification trainings for technical and soft skills
- Access to unlimited LinkedIn Learning platform
- Access to internal learning programs set up by world class trainers
- Community networking and idea creation platforms
- Mentorship programs
- Self-driven career progression tool
can't find the job you are looking for?
Send us your CV to get a personalized offer.