Job Description :

Job Description:

Job description for Kafka SRE:
Carry out SRE duties for Kafka Streaming Platform.
Have thorough understanding on the Kafka architecture along with the concepts of Producer, Consumer, topics, partitions etc.
Keep an eye on the platforms and adhere to runbooks/SOPs to manage platform and application problems.
Familiarize yourself with the cluster maintenance processes and implement changes as per the documented installation and validation plans.
Showcase robust troubleshooting and debugging skills, aiming to pinpoint and rectify the issue, while also offering advice on how to prevent such problems in the future.
Conduct thorough root cause analysis of major production incidents, document for future reference, and put in place proactive measures to enhance system reliability.
Experience with Cloud infrastructure in production environment will be added advantage for this role.
Automate routine tasks using scripts or automation tools to lessen manual work, decrease the chance of human errors, and boost system reliability.
Candidate should work as hybrid model from the first day of joining.
Candidate should work 3 days(Monday, Wednesday, Thursday) from office.
Candidates need to work as per the roster, might need to work in weekend once in a month, will get comp-off in consecutive week.

Technical Skills required:
At least 2-3 years of experience for a junior level role and 5+ for mid-level/senior level working as a Site reliability engineer for Kafka Platform.
Deep level Knowledge on core Kafka components like producers, consumers, topics, partitions etc.
Troubleshooting both Kafka platform service, application problems and identifying the root cause.
Hands on experience with Cloud technology will be added advantages.
Writing Ansible playbooks and automate manual tasks using Ansible, shell scripting and python.
Should be familiar with Unix/Linux system internals, networking, and distributed systems.

             

Similar Jobs you may be interested in ..