Description
We are looking for a self-motivated Senior Software Engineer to join our Payments team. As a member of this team, you will be responsible for designing, implementing, and maintaining systems and tools that support flow-level observability, payments reliability, and scalability.
Your primary focus will be on building and managing large-scale platforms to improve the availability of our Payments platform for internal and external stakeholders. You will collaborate closely with other Payments engineering teams and Infra teams to ensure services are instrumented, scalable, and resilient to support our growing business.
Key responsibilities include:
- Designing, implementing, and maintaining systems and tools at a platform level that support flow-level observability, payments reliability, and scalability.
- Identifying and driving improvements to increase the Payments Availability, Observability, and Resiliency of Airbnb Payments.
- Developing observability standards/framework for new product readiness to ensure service reliability in SOA and distributed systems.
- Building domain expertise to achieve scalability by understanding the nuances of Payments across processing, compliance, and infra.
- Driving large-scale migration and adoption projects on Observability & Reliability by cross-collaborating with various Payments teams.
- Leading initiatives that promote a culture of reliability throughout the organization by improving incident management platforms and instrumentation.
Requirements:
- 7+ years of experience in back-end software development focusing on large-scale distributed systems.
- BE/B.Tech in Computer Science or a related technical field.
- Strong software development skills in one or more languages such as Java, Python, Kotlin, Scala, or Ruby on Rails.
- Experience in building intelligent AI agents and systems powered by Large Language Models is a plus.
- Evidence of exposure to architectural patterns of a large, high-scale web application (e.g., well-designed APIs, high-volume data pipelines, efficient algorithms).
- Familiarity with cloud platforms like AWS or Google Cloud Platform.
- Deep understanding of software development best practices, including version control, automated testing, CI/CD, and code reviews.
- Experience in incident management, monitoring, alerting, and root cause analysis.
- Effective leadership and communication skills to coordinate cross-functional teams during large-scale projects.
- Experience with initiatives across auto-scaling, self-healing mechanisms, chaos engineering, performance optimization techniques will be a plus.
- Previous experience in AI/ML will also be a plus.
If you are a strong problem solver and have worked in a team that is on-call for production systems before, we encourage you to apply.