Job Description :
Direct Client Requirement
Position: Senior SRE/Observability Engineer
Location: Dallas, TX
Type: Contract
  • 8+ years of experience in AWS, configuring alerts, monitoring, Open Telemetry framework, Terraform, and scripting.
  • In-depth knowledge of observability tools such as Prometheus, Grafana, Splunk, Netcool, ELK, AIM, Sumologic, and New Relics.
  • Strong understanding of licensing mechanisms and MELT.
  • Experience with Cloud Platforms (AWS/Azure), Kubernetes, CI/CD (Jenkins), and Infrastructure as Code (Terraform).
  • Ability to read and write code in Java, Python, Ruby, Node.js, and other relevant languages.
  • Proven experience in creating dashboards, establishing design patterns, and understanding application flows in containerized/microservice environment
  • Excellent communication skills and the ability to work effectively across teams.
REQUIREMENTS
  • Implement and maintain observability solutions using Prometheus as the backend and GEM as the middle end.
  • Develop and manage Grafana dashboards for visualizing metrics and performance data.
  • Optimize and configure licensing mechanisms for observability tools.
  • Write and manage complex queries and alert definitions.
  • Bridge the gap between application development teams and SRE operations.
  • Manage and optimize OpenShift, Linux environments, and Grafana Enterprise Metrics.
  • Utilize MELT (Metrics, Events, Logs, and Traces) and plan for long-term data migration to AWS S3.
  • Configure and manage monitoring, alerts, and observability using a range of tools including Splunk, Netcool, ELK, and AIM.
  • Maintain deep technical knowledge and operational experience with tools like AppDynamics, DataDog, Dynatrace, NewRelic, Sumologic, Splunk, Prometheus, and Grafana.
  • Understand and write code (Java, Python, Ruby, Node.js, etc.), programs, config files, and complex queries.
  • Implement and manage Infrastructure as Code (IAC) using Terraform.
  • Manage and optimize cloud platforms (AWS/Azure) and Kubernetes environments.
  • Establish design patterns for monitoring and benchmarking application uptime and performance.
  • Provide thought leadership and strategy in implementing and maintaining observability solutions.
  • Onboard new teams and data sources into the observability solutions.
  • Create and maintain operational process documentation for observability solutions.
  • Optimize the Observability Suite for monitoring applications and infrastructure.
  • Write queries for alerts, dashboards, and reporting.
             

Similar Jobs you may be interested in ..