Jobs

Site Reliability Engineer

No items found.

Job description

We are seeking a Site Reliability Engineer responsible for designing, building, running, and monitoring public cloud infrastructure to support a variety of mission critical services. This is a highly technical, hands-on role that requires expertise supporting systems at enterprise scale. You will deliver innovative solutions in Engineering, Reliability, Monitoring, Automation and orchestration.

Experience: 2+ years with Any graduation.

Location: Onsite/Remote

Responsibilities:

As SRE, we look forward to your contribution in engineering and supporting cloud platform IaaS and PaaS services
Partner with application teams to provision scalable workloads reliably across distributed compute resources
Provide engineering and operational support for distributed systems and network based information security tools, including for configuration management and provisioning
Implement and maintain security controls
Work closely with development teams to understand application performance and behaviour patterns to proactively monitor, tune and correct issues before they occur
See opportunities to improve security tooling reliability, performance and security
Develop tools and automation to eliminate manual and repetitive efforts

Skills Required:

2+ years of experience in Software Engineering and Systems Engineering to manage operations
Experience supporting infrastructure and services in public and private cloud environments (Azure, AWS, GCP, OpenStack etc.)
Proficient with various programming languages such as Python/Java/Ruby/Perl/Go/Makefile for building automation or integration with APIs
Experience with common formats such as JSON, YAML and compression utilities
Expertise with monitoring or log aggregation tools (Prometheus, Grafana, Splunk, ELK, etc.)
Expertise in key SRE Skills (Scalability, Reliability and Observability) and 24*7 on-call process
Familiarity with CI/CD tools and deployment processes
Solid understanding and experience with centralized configuration management, coordination and provisioning technologies, such as Ansible, Chef, Puppet, etc.
Experience implementing and working with open source frameworks
Excellent communication skills, must be capable of working with cross functional technical and business teams and varying levels of management
Understanding of Agile methodologies like Scrum and be able to work in fast-paced environment
Strong project management skills, including excellent presentation skills
Must be capable of writing detailed solution specifications, diagrams, best practices/standards documentation, operating procedures, test plans/test reports, etc.
Solid understanding Linux/Unix system internals, including kernel tuning
Failure Testing and Chaos Engineering
Working knowledge of network protocols and network based services, including routing and network load balancing
Experience building and supporting containerized applications on various platforms like GKE, EKS, ECS.

Related Jobs