Expert Application Engineer - SRE

Discover Financial Services

Discover Financial Services

This job is no longer accepting applications

See open jobs at Discover Financial Services.
Software Engineering · Full-time
Riverwoods, IL, USA
Posted on Thursday, February 23, 2023

About This Role

Discover. A brighter future.

With us, you’ll do meaningful work from Day 1. Our collaborative culture is built on three core behaviors: We Play to Win, We Get Better Every Day & We Succeed Together. And we mean it — we want you to grow and make a difference at one of the world's leading digital banking and payments companies. We value what makes you unique so that you have an opportunity to shine.

Come build your future, while being the reason millions of people find a brighter financial future with Discover.

Job Description

As an Application Reliability Engineer, you’ll tap into your passion for finding and fixing inefficiencies to solve our reliability and performance issues. In our Agile environment, you’ll focus on availability, latency, performance, efficiency, change and problem management, monitoring, emergency response and capacity planning of our services. Your projects will deliver enhanced infrastructure, development, and deployment automation at Discover.


  • Analyse, design, program, test, and deploy new user stories and features with high quality (security, reliability, operations) to production
  • Achieves team commitments (and influence others to do the same) by using informal leadership & highly developed communication skills
  • Has an oversight on design decisions and guides team to achieve key results for products assigned to them
  • Remediates issues using engineering principles and creates proactive design solutions for potential failures
  • Work with a team of site reliability engineers that is responsible for building the continuous reliability mindset, shepherding problem management, and driving key site reliability engineering practices into the organization.
  • Design and drive monitoring, alerting, ticket reporting strategies to measure SLA, SLO, MTTI, MTTR. Etc. and align with management expectations to reduce/minimize prod downtime.
  • Guide site reliability automation to help eliminate manual toil and create a self-healing capability
  • Participate in selection of appropriate automation tools, defining technology, quality, experience and implementation standards and practices within own technical domain.
  • Fosters a culture of excellence and continuous learning within the chapter. Establishes and tracks to appropriate OKRs to ensure outcomes are met.
  • Creates solutions addressing high impact technology and business priorities
  • Competent in multiple contexts, such as programming languages, security, automation, testing, infrastructure, and performance and is the go-to person for many people (inside and outside of their team)
  • Proactively identifies and mitigates issues based on intuition and experience in multiple domains
  • Contributes to and leads technology communities at Discover

Minimum Qualifications

At a minimum, here’s what we need from you:

  • Bachelors – Computer Science or related
  • 8+ Years – Information Technology, (Software) Engineering, or related
  • Internal applicants only: technical proficiency rating of expert on the Dreyfus engineering scale

Preferred Qualifications

Bonus Points If You Have:

  • Experience with SRE design to address reliability and resiliency with availability of 5-9s
  • Deep knowledge and understanding of emerging trends in the SRE field
  • High level of familiarity with the Linux command line and scripting
  • Extremely comfortable with production environments, firewalls and networking
  • Studied architectural patterns at scale, including thoughtfully designed APIs, repeatable delivery pipelines, and efficient computer engineering principles
  • Strong experience in deploying, observing, monitoring and altering in K8s with OLTP
  • Knowledge of the automation tools such as Ansible, Terraform, or Chef
  • Deep knowledge of working in public cloud (AWS, Azure, GCP)
  • Working knowledge of messaging services like RabbitMQ, SQS, Kafka
  • Working experience in caching systems like Redis
  • Working experience with logging and monitoring systems (Splunk, Datadog, New Relic, AppDynamics, Instana, CatchPoint) with a mindset towards predictive analysis
  • String Experience with Continuous Integration and Continuous Delivery models including Blue/Green and Canary release models is a plus

External applicants will be required to perform a technical interview.

What are you waiting for? Apply today!

All Discover employees place our customers at the very center of our work. To deliver on our promises to our customers, each of us contribute every day to a culture that values compliance and risk management.

Discover is committed to a diverse and inclusive workplace. Discover is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status, or other legally protected status. (Know Your Rights)

Discover Financial Services is an equal opportunity employer.

This job is no longer accepting applications

See open jobs at Discover Financial Services.