Company Logo

Software Engineer

Netflix - 1d ago

Company Logo

Senior Software Engineer

Reddit - 4d ago

Observability Engineer

OpenAI - San Francisco, CA


  • Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).
  • Proven proficiency in monitoring tools (e.g., DataDog, Prometheus, Grafana, ELK stack) and cloud platforms (e.g., AWS, Azure, GCP).
  • Strong background in software engineering, with expertise in relevant programming languages (like Python, Java, Go) and cloud platforms (like AWS, GCP, Azure).
  • Proficiency in programming/scripting languages.
  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Excellent communication skills are crucial for this role, as it involves interfacing with various stakeholders, presenting findings and plans, and documenting systems and processes.
  • Experience with microservices architecture and service mesh technologies.
  • Strong understanding of distributed systems, networking, and database technologies.
  • Excellent problem-solving skills and ability to work in a fast-paced environment.

What You'll Be Doing:

  • Develop and maintain systems that allow for effective monitoring, logging, and tracing of software applications. This includes choosing appropriate tools and technologies, setting up dashboards, and ensuring the scalability and reliability of the observability infrastructure.
  • Develop and integrate tools for logging, monitoring, and alerting to enhance visibility into system performance. Ensure compatibility and efficiency across various platforms and services.
  • Collaborate with different engineering teams to integrate observability practices into their workflows.
  • Regularly analyze system performance and identify areas for improvement. This involves working closely with other engineering teams to understand their needs and challenges and providing insights and solutions for better system performance.
  • Consistently stay up-to-date with the latest trends in observability, logging, monitoring, and cloud technologies. Introducing innovative solutions and best practices to improve system observability and reliability. Experiment with new tools and practices to enhance the observability landscape.
  • Participate in strategic planning for the technology roadmap, including scalability, cost-effectiveness, and risk management considerations related to observability infrastructure.
  • Create comprehensive documentation for observability systems and processes. Prepare reports and insights for management regarding system performance and reliability.

Nice to Haves:

  • Have a track record of building, operating and accelerating observability systems that empower your fellow engineers, at scale.
  • Enjoy seeking out and addressing bottlenecks and areas for performance improvement in our systems.
  • Utilize Infrastructure as Code (IaC) principles to automate infrastructure provisioning and configuration management.
  • Are experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services.
  • Help create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think.
  • Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.
  • Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.

Perks and Benefits:

  • This role is exclusively based in our San Francisco HQ. We offer relocation assistance to new employees.



Get notified about new job opportunities