Job Description :
8-10+ years of experience or above Site Reliability / DevOps Engineering
Experience with Monitoring and Observability. (Datadog, Splunk)
Expertise in AWS / Azure
Expertise in Kubernetes, kOps, & Helm 3.
You won't deploy Kubernetes / Docker - our software engineers & release engineers do that. Instead you'll ensure we have the Docker registry for them. And debug / fixes issues with Kubernetes clusters. Your work will be more heavily focused on SRE & IAC.
Experience with Infrastructure as Code (IAC), Terraform Mandatory
Fluency in at least one language required: Python, C#, JAVA. Should have strong API experience.
Strong leadership, initiative taking, and capacity for decision making
Expert knowledge in any or all of these is a huge plus: Prometheus Operator, Grafana, Loki, ELK Stack, OpenTelemetry, Jaeger/OpenTracing (and yes, we use ALL of these!)
Participate in the on-call rotation for Operations support
Azure, Datadog or splunk, powershell, Terraform
Bachelor's degree in CS or a related STEM engineering field strongly preferred
Good Knowledge in CI and CD, Preferred tools Jenkins , Octopus and GitHub Actions
Good knowledge of deploying Kubernetes resources using GITOPS way, Argo CD is preferred tool.
We are an equal opportunity employer. All aspects of employment including the decision to hire, promote, discipline, or discharge, will be based on merit, competence, performance, and business needs. We do not discriminate on the basis of race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, national origin, citizenship/ immigration status, veteran status, or any other status protected under federal, state, or local law.