Go to search
Lead Site Reliability Engineer
Site Reliability Engineering, Kubernetes, Docker, Linux, Jenkins, Splunk, Dynatrace, Grafana, Amazon Web Services, Python
Hyderabad, Bangalore, Gurgaon, Chennai, Pune
We are seeking a talented and motivated Lead Site Reliability Engineer (SRE) to join our organization.
The experienced SRE will play a crucial role in ensuring the Reliability, Scalability, Capacity Planning and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, Containerization and cloud technologies.
Responsibilities
- Design, build, and maintain scalable and reliable cloud infrastructure and services utilizing AWS or Azure
- Automate routine activities through scripting languages
- Proactively monitor system performance and troubleshoot issues to ensure high availability and robust performance
- Provide on-call support and manage incident response activities when necessary
Requirements
- 8 to 12 years of hands-on experience in site reliability engineering or related fields
- Proficiency in cloud platforms such as AWS or Azure
- Competency in scripting languages including Python, Bash, and PowerShell
- Background in automation tools such as Jenkins, GitLab, and Ansible/Chef
- Understanding of Observability tools like Grafana, Splunk, and Dynatrace
- Experience with containerization and orchestration technologies (Docker, Kubernetes)
- Familiarity with concepts of SLI, SLO, SLA and Error Budget management