Job Description :
Job Description
As a Site Reliability Engineer, you will be responsible for running our production services, as a part of a collective group of developers, QA, and operational engineers that ensure service reliability, scalability and performance.
Administer, monitor and manage large-scale production environments
Implement automated procedures for deployment multiple versions of services, configuration, startup, and shutdown procedures of services, tools, application servers.
Work with engineering and release management to document, enhance, and improve on application operational procedures and processes.
Work with architects and developers in aid in design and to improve stability.
Skills Required:
Experienced Kubernetes Administrator (CKA preferred or willing to be certified)
Prometheus
Alert Manager
Grafana
Go
Looking for Sr Level Kubernetes/SRE''s who have enterprise level experience. The ideal candidate will come from a Linux Systems Admin or DevOps Engineering background. The candidates must have K8 component and larger cluster administration experience. Candidates are also expected to have solid experience troubleshooting Kubernetes.
This position can be 100% on remote, or if you rather work onsite, it is available in Sunnyvale, CA or Denver, CO