backGo to search

Senior Site Reliability Engineer

hot
bullets
Site Reliability Engineering, Datadog, Dynatrace, Splunk, Grafana, Jenkins, Kubernetes, Amazon Web Services, Python, Linux
bullets
Hyderabad, Bangalore, Pune, Gurgaon, Chennai, Mumbai

We are seeking a talented and motivated Senior Site Reliability Engineer (SRE) to join our Organization.

The Senior SRE will play a crucial role in ensuring the Reliability, Scalability, Capacity Planning and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, Containerization and cloud technologies.

Responsibilities
  • Design, build, and maintain scalable, reliable, and efficient cloud infrastructure and services on platforms like AWS, Azure, or Google Cloud
  • Automate manual work using scripting/programming languages (Python/Bash/PowerShell, etc.) within cloud environments
  • Implement and manage automation tools (Jenkins, GitLab, Ansible/Chef) and processes for streamlined deployment, monitoring, and management of systems and applications in the cloud
  • Monitor system performance, troubleshoot issues proactively, and ensure high availability and performance
  • Utilize observability tools (Prometheus, Grafana, ELK stack, Splunk, Dynatrace, Datadog) for monitoring, alerting, and logging to identify and address potential issues
  • Participate in capacity planning and scalability assessments to support business growth and cloud resource optimization
  • Manage containerization and orchestration technologies such as Docker and Kubernetes, particularly in cloud-native environments
  • Ensure compliance with security best practices and standards in the cloud
  • Evaluate and recommend new technologies and practices to improve system reliability, performance, and efficiency
  • Document processes, procedures, and configurations for knowledge sharing and system integrity
Requirements
  • 5-8 years of experience in a similar role
  • Strong background in software engineering and system administration
  • Proficiency with cloud platforms like AWS, Azure, or Google Cloud
  • Experience with scripting/programming languages (Python/Bash/PowerShell)
  • Experience with automation tools (Jenkins, GitLab, Ansible/Chef)
  • Excellent communication and collaboration skills
  • Hands-on experience with containerization and orchestration technologies (Docker, Kubernetes)
  • Knowledge of security practices and standards in the cloud
  • Familiarity with SLI, SLO, SLA, and Error Budget concepts
  • Strong problem-solving skills and experience with Agile methodologies and DevOps practices
Nice to have
  • Certifications in cloud technologies (AWS, Azure, Google Cloud)
  • Contributions to open-source projects
  • Prior experience in a leadership role in an SRE team