Jobs

Site Reliability Engineer

Job description

Live preview

We are seeking a Site Reliability Engineer responsible for designing, building, running, and monitoring public cloud infrastructure to support a variety of mission critical services. This is a highly technical, hands-on role that requires expertise supporting systems at enterprise scale. You will deliver innovative solutions in Engineering, Reliability, Monitoring, Automation and orchestration.

Experience: 2+ years with Any graduation.

Location: Onsite/Remote

Responsibilities:

  • As SRE, we look forward to your contribution in engineering and supporting cloud platform IaaS and PaaS services
  • Partner with application teams to provision scalable workloads reliably across distributed compute resources
  • Provide engineering and operational support for distributed systems and network based information security tools, including for configuration management and provisioning
  • Implement and maintain security controls
  • Work closely with development teams to understand application performance and behaviour patterns to proactively monitor, tune and correct issues before they occur
  • See opportunities to improve security tooling reliability, performance and security
  • Develop tools and automation to eliminate manual and repetitive efforts

Skills Required:

  • 2+ years of experience in Software Engineering and Systems Engineering to manage operations
  • Experience supporting infrastructure and services in public and private cloud environments (Azure, AWS, GCP, OpenStack etc.)
  • Proficient with various programming languages such as Python/Java/Ruby/Perl/Go/Makefile for building automation or integration with APIs
  • Experience with common formats such as JSON, YAML and compression utilities
  • Expertise with monitoring or log aggregation tools (Prometheus, Grafana, Splunk, ELK, etc.)
  • Expertise in key SRE Skills (Scalability, Reliability and Observability) and 24*7 on-call process
  • Familiarity with CI/CD tools and deployment processes
  • Solid understanding and experience with centralized configuration management, coordination and provisioning technologies, such as Ansible, Chef, Puppet, etc.
  • Experience implementing and working with open source frameworks
  • Excellent communication skills, must be capable of working with cross functional technical and business teams and varying levels of management
  • Understanding of Agile methodologies like Scrum and be able to work in fast-paced environment
  • Strong project management skills, including excellent presentation skills
  • Must be capable of writing detailed solution specifications, diagrams, best practices/standards documentation, operating procedures, test plans/test reports, etc.
  • Solid understanding Linux/Unix system internals, including kernel tuning
  • Failure Testing and Chaos Engineering
  • Working knowledge of network protocols and network based services, including routing and network load balancing
  • Experience building and supporting containerized applications on various platforms like GKE, EKS, ECS.