Summary/Objective
The Observability Engineer will design, implement, and maintain observability solutions for complex systems and applications. This role requires a solid understanding of monitoring and observability practices, as well as expertise in tools and technologies used to collect and analyze performance, logging, and metrics data.
Essential Functions
Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
- Monitoring Setup and Configuration: Configure monitoring tools to gather data from various systems, applications, and network components. Define metrics, configure data collection agents, and ensure proper connectivity and access.
- Alert Management: Monitor alerts, perform triage to identify critical issues, analyze alert patterns, and configure escalation workflows to ensure timely response and resolution.
- Performance Analysis and Troubleshooting: Use tool features to analyze metrics, logs, and traces. Conduct root cause analysis, troubleshoot issues, and identify areas for optimization.
- Incident Response: Collaborate across teams to respond to incidents quickly, handling triage, communication, and coordination with stakeholders. Participate in post-incident reviews to identify improvements.
- Dashboard and Visualization: Develop and maintain dashboards and visualizations that offer a consolidated view of system health and performance. Customize dashboards based on specific business and operational requirements.
- Capacity Planning and Scalability: Monitor resource utilization and trends to forecast capacity needs. Collaborate on resource planning and provisioning to support scalability and optimal performance.
- Tool Administration and Maintenance: Perform routine administration tasks for observability tools, including user management, access control, and system upgrades. Monitor the health and availability of these tools.
- Documentation and Knowledge Sharing: Document configurations, troubleshooting steps, and best practices. Contribute to knowledge bases and share insights with the team.
- Tool Integration and Automation: Integrate observability tools with other systems, including ticketing and incident management platforms. Automate monitoring configurations and reporting to improve efficiency.
- Continuous Improvement and Research: Stay updated on observability trends, research new tools and methods, and continuously improve monitoring setups to align with best practices.
- Other duties as assigned.
Qualifications
- Bachelor's degree in computer science or a related technical field preferred.
- 5+ years of experience in software engineering or IT with a focus on monitoring, alerting, and analysis.
- Proficiency in application, cloud infrastructure, and monitoring tool administration.
- Hands-on experience with SolarWinds, Elasticsearch (AWS OpenSearch), and similar tools (e.g., Splunk).
- Experience with APM tools such as AppDynamics or alternatives like Dynatrace, New Relic.
- Proficiency in scripting languages (Python, JSON, PowerShell preferred).
- Strong understanding of web services and CI/CD pipelines.
- Ability to thrive in a fast-paced environment with excellent problem-solving skills, adaptability, and teamwork skills.
- Knowledge of Infrastructure as Code (IaC), particularly CDK and Terraform, is highly desirable.
- Passion for DevOps, application/API monitoring, automation, and reliability.
Work Environment and Physical Demands
This job operates in a professional office environment. This role routinely uses standard office equipment such as computers, phones, photocopiers, filing cabinets and fax machines.
Position Type/Expected Hours of Work
This is a full-time position with a work schedule of Monday-Friday with some schedule variations as needed including on-call coverage rotation. Occasional night or weekend work for special projects.
Travel
This position will require up to 10% travel.
EEO Statement
ACA provides equal employment opportunities (EEO) to all applicants for employment without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, disability, genetic information, marital status, amnesty, or status as a covered veteran in accordance with applicable federal, state and local laws. ACA complies with applicable state and local laws governing non-discrimination in employment in every location in which the company has facilities.
Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee for this job. Duties, responsibilities and activities may change at any time with or without notice.