Role: Infra Lead (5 days Onsite/Week from Day 1)
Location: Southlake, TX
Fulltime: $110k/Annum + Benefits - $120k/Annum + Benefits
Job Description
Who are we looking for?
Infrastructure Lead with 9+ years of experience supporting infrastructure of large-scale distributed systems hosted on on-premises servers and cloud.
Primary Responsibilities:
· Effectively handle the infrastructure outages & Performance Issues with quick analysis and resolution
· Extensive hands-on experience in troubleshooting Infrastructure failures (OS, N/W, Storage, File Systems, LDAP, Active Directory and SAML etc.) that can impact the application availability and performance.
· Manage incidents and effectively communicate with users, application owners and senior stakeholders across all the areas.
· Expertise in implementing common infrastructure activities (i.e., Patch Mgt., Migrations, Upgrades, Assessments, Certification/password renewals etc.).
· Coordinate application resiliency exercises within required recovery time objective, perform functional and non-functional validations during and post-DR exercises.
· Actively participate in Change management process with view to manage risk in production environment.
· Minimize manual involvement by driving solutions, and automation and implementing continuous improvements that create an operating environment, including development & configuration for dynamic monitoring, & recovery.
· Identify and/or analyze patterns of incidents/problems, conduct flawless post-mortems, develop permanent remediation plans, and implement automation to prevent future incidents from re-occurring again.
· Build and improve the SOPs for all the maintenance activities.
· Challenge existing infrastructure setup, processing and suggest different ways to solve problems or improve stability.
Technical Skills:
· Expertise in Linux Server Administration -Primary / Windows Server Administration-Secondary
· Extensive hands-on experience in troubleshooting Infrastructure failures (OS, N/W, Storage, File Systems, LDAP, Active Directory and SAML etc.) that can impact the application availability and performance.
· Should have deep expertise in implementation of common infrastructure activities (i.e., Patch Mgt., Migrations, Upgrades, Assessments, Certification/password renewals etc.).
· Programming: Shell/Python.
· Troubleshooting of Web Application/Service and Database failures from infra-prospective
· Observability stack – AppDynamics, Splunk, 1000Eyes, ITRS or similar Tools.
· Experience with common networking protocols and services including TCP, UDP, DNS, DHCP, HTTP, SSH, FTP, SNMP, and LDAP
· Working knowledge with Remedy, JIRA etc.
· Fair Understanding of CI/CD and DevOps Tools
· Expertise in Google Cloud Platform preferably or any other cloud .
· Configuration Management: SaltStack, Ansible, Puppet, Terraform etc.
Process Skills:
· Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
· Interest in designing, analyzing, and troubleshooting large-scale distributed systems.
Behavioural Skills :
· Participates as a team member and fosters teamwork by inter-group coordination within the modules of the project.
· Proven ability to develop relationships with stakeholders, understand detailed business requirements across multiple project initiatives.
· Ability to steer technology direction and provide appropriate training to the team.
· Good attitude, and learning skills.
Qualification:
· 9+ years of work experience in Infra & environment support.
· Education qualification: B.Tech, BE, BCA, MCA, M. Tech or equivalent technical degree from a reputed college