American Credit Acceptance

Manager of IT Operations and Observability

Job Location US-ID-Boise
Posted Date 2 weeks ago(12/10/2024 10:48 AM)
ID
2024-4421

Job Description

We are seeking an experienced and proactive Manager of IT Operations and Observability to lead and optimize our IT operations and observability practices. This role is critical in ensuring the reliability, performance, and availability of our systems and infrastructure. The ideal candidate will be responsible for overseeing daily IT operations, driving operational excellence, and implementing observability tools and practices to proactively monitor system health and performance.
Key Responsibilities:
• Lead IT Operations: Manage and oversee the day-to-day IT operations to ensure the infrastructure and systems are stable, secure, and efficient.
• Observability and Monitoring: Implement, maintain, and optimize observability tools and practices (e.g., monitoring, logging, and alerting systems) to provide visibility into system performance and health.
• Incident Management: Oversee the identification, resolution, and post-incident analysis of critical incidents, ensuring minimal downtime and fast recovery times.
• Infrastructure Optimization: Work closely with engineering and development teams to improve infrastructure scalability, reliability, and cost efficiency through observability-driven insights.
• Continuous Improvement: Establish and maintain best practices for IT operations, automation, and observability to improve system reliability and operational efficiency.
• Collaboration: Work cross-functionally with engineering, security, and business teams to ensure IT operations are aligned with organizational goals and strategies.
• Vendor and Tool Management: Evaluate, select, and manage IT tools and observability platforms to ensure they meet operational requirements.
• Reporting and Metrics: Develop and maintain key performance indicators (KPIs), service level agreements (SLAs), and dashboards to report on system health, performance, and incident response.
• Team Leadership: Lead and mentor a team of IT operations and observability specialists, promoting a culture of continuous learning and improvement.
• Automation and Process Improvement: Drive automation initiatives to improve operational efficiency and reduce manual intervention in daily IT operations.
Qualifications:
• Education: Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field (or equivalent work experience).
• Experience:
o 7+ years of experience in IT operations, including at least 2+ years in a leadership or managerial role.
o Strong experience with IT monitoring, observability tools (e.g., SQL Sentry, SolarWinds, Splunk, and App Dynamics)
o Hands-on experience with incident response, troubleshooting, and root cause analysis.
o Advanced certifications in cloud technologies (AWS, Azure, GCP) or ITIL.
o Experience with distributed systems, microservices architecture, and large-scale infrastructure.
o Familiarity with DevOps practices and methodologies.
• Technical Skills:
o Proficiency in monitoring, alerting, and log aggregation tools.
o Experience with cloud infrastructure, containerization, and orchestration tools (e.g., Kubernetes, Docker).
o Understanding of automation tools and practices (e.g., Ansible, Terraform, CI/CD pipelines).
o Experience in setting up and optimizing observability practices for both infrastructure and application performance.
• Soft Skills:
o Strong problem-solving abilities and critical thinking skills.
o Excellent communication and interpersonal skills.
o Ability to collaborate with cross-functional teams and manage diverse stakeholders.
Supervisory Responsibility
This position has supervisory responsibilities.

Work Environment and Physical Demands
This job operates in a professional office environment. This role routinely uses standard office equipment such as computers, phones, photocopiers, filing cabinets and fax machines.

Position Type/Expected Hours of Work
This is a full-time position with a work schedule of Monday-Friday with some schedule variations as needed including on-call coverage rotation. Occasional night or weekend work for special projects.

Travel
This position will require up to 10% travel.

EEO Statement
ACA provides equal employment opportunities (EEO) to all applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws. ACA complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities.

Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities and activities may change at any time with or without notice.

Overview

We are seeking an experienced and proactive Manager of IT Operations and Observability to lead and optimize our IT operations and observability practices. This role is critical in ensuring the reliability, performance, and availability of our systems and infrastructure. The ideal candidate will be responsible for overseeing daily IT operations, driving operational excellence, and implementing observability tools and practices to proactively monitor system health and performance.

Key Responsibilities:

  • Lead IT Operations: Manage and oversee the day-to-day IT operations to ensure the infrastructure and systems are stable, secure, and efficient.
  • Observability and Monitoring: Implement, maintain, and optimize observability tools and practices (e.g., monitoring, logging, and alerting systems) to provide visibility into system performance and health.
  • Incident Management: Oversee the identification, resolution, and post-incident analysis of critical incidents, ensuring minimal downtime and fast recovery times.
  • Infrastructure Optimization: Work closely with engineering and development teams to improve infrastructure scalability, reliability, and cost efficiency through observability-driven insights.
  • Continuous Improvement: Establish and maintain best practices for IT operations, automation, and observability to improve system reliability and operational efficiency.
  • Collaboration: Work cross-functionally with engineering, security, and business teams to ensure IT operations are aligned with organizational goals and strategies.
  • Vendor and Tool Management: Evaluate, select, and manage IT tools and observability platforms to ensure they meet operational requirements.
  • Reporting and Metrics: Develop and maintain key performance indicators (KPIs), service level agreements (SLAs), and dashboards to report on system health, performance, and incident response.
  • Team Leadership: Lead and mentor a team of IT operations and observability specialists, promoting a culture of continuous learning and improvement.
  • Automation and Process Improvement: Drive automation initiatives to improve operational efficiency and reduce manual intervention in daily IT operations.

Qualifications:

  • Education: Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field (or equivalent work experience).
  • Experience:
    • 7+ years of experience in IT operations, including at least 2+ years in a leadership or managerial role.
    • Strong experience with IT monitoring, observability tools (e.g., SQL Sentry, SolarWinds, Splunk, and App Dynamics)
    • Hands-on experience with incident response, troubleshooting, and root cause analysis.
    • Advanced certifications in cloud technologies (AWS, Azure, GCP) or ITIL.
    • Experience with distributed systems, microservices architecture, and large-scale infrastructure.
    • Familiarity with DevOps practices and methodologies.
  • Technical Skills:
    • Proficiency in monitoring, alerting, and log aggregation tools.
    • Experience with cloud infrastructure, containerization, and orchestration tools (e.g., Kubernetes, Docker).
    • Understanding of automation tools and practices (e.g., Ansible, Terraform, CI/CD pipelines).
    • Experience in setting up and optimizing observability practices for both infrastructure and application performance.
  • Soft Skills:
    • Strong problem-solving abilities and critical thinking skills.
    • Excellent communication and interpersonal skills.
    • Ability to collaborate with cross-functional teams and manage diverse stakeholders.

Supervisory Responsibility

This position has supervisory responsibilities.

 

Work Environment and Physical Demands

This job operates in a professional office environment. This role routinely uses standard office equipment such as computers, phones, photocopiers, filing cabinets and fax machines.

 

Position Type/Expected Hours of Work

This is a full-time position with a work schedule of Monday-Friday with some schedule variations as needed including on-call coverage rotation. Occasional night or weekend work for special projects. 

 

Travel

This position will require up to 10% travel.

 

EEO Statement

ACA provides equal employment opportunities (EEO) to all applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws.  ACA complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities.

 

Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities and activities may change at any time with or without notice.

Options

<p style="margin: 0px;">Sorry the share function is not working properly at this moment. Please refresh the page and try again later.</p>
Share on your newsfeed