주요 콘텐츠로 건너 뛰기

커리어

COVID-19:

우리 동료들과 지원자들의 건강은 우리의 최우선 순위입니다. 따라서 Citi는 COVID-19 상황을 면밀히 모니터링하고 있습니다. 우리는 추후 공지가 있을 때까지, 임시적으로 모든 후보자 인터뷰를 가상으로 실시하는 것을 포함하여 전 세계적으로 회사 전체에 예방 조치를 구현하였습니다.

Site Reliability Engineer - Data & Platform (Remote)

직무 ID 22508410 기본 근무지 London, United Kingdom, Remote; 채용 범주 Technology
바로 지원하기

Now is an extremely exciting time to join a newly formed group within Citi. The Institutional Clients Group - Engineering and Architecture Practice (EAP) is responsible for defining and building core architecture and technology strategy for the ICG.  

This position will be in Kafka as-a-Service team which sits under Common Platform Engineering (CPE). The CPE is a department within the EAP group whose mission is to provide engineering for common platform capabilities in ICG, engineer solutions that codify the firm's data strategy into frameworks & tools and to ensure 'Common Product' standards are defined to ensure efficient adoption of common components.

We are looking for a SRE with software engineering background who is passionate about running large scale, multi-tenant distributed data systems for customers that expect a very high level of availability. In this role, you will be responsible for the availability, performance, monitoring, emergency response, and capacity planning of the data systems.

If you love the hum of big data systems, thinking about how to make them run as smoothly as possible, and want to have a big influence on the architecture plus operational design points of the systems, then you will fit right in. Your solutions will be leveraged by tens of thousands of developers across Citi supporting applications used by hundreds of thousands of internal and client users.

What you‘ll be doing:

  • Design & build observability solutions for distributed systems

  • Contribute to the continuous automation of toil, and drive & evangelize the four key DORA metrics

  • Establish Service Level Objectives for core services, monitor their Service Level Indicators, and implement error-budget based alerting

  • Help operational team by building solutions that allow them to identify and resolve health issues of the data systems as quickly as possible

  • Automate the deployment of infrastructure and application for data systems such as Kafka

  • Support the rapid growth of the platform, by expanding its strategy to deploy into an OpenShift environment and AWS Cloud environment (EKS/GKE)

  • Design and implement service improvements for performance & security, relentlessly improve reliability and facilitate effective incident response, mitigation & resolution

  • Write and review technical documents, including design, requirements, and process documentation

  • Advocate for a culture of platform automation with obsession for everything as-a-code approach

What we are looking for:

  • 4+ years’ experience in Site Reliability Engineering to create scalable and highly reliable systems

  • Strong fundamentals in distributed systems design and operation with experience building automation to operate large-scale data systems

  • Experience designing & implementing observability solutions for data systems to enable a holistic view of system health

  • Strong understanding of modern site reliability engineering practices and ability to apply them to improve the reliability of systems

  • Experience creating, deploying, and managing the lifecycle of containerised applications on Kubernetes

  • Experience in an agile development environment with modern programming languages such as any of the following: Python, Golang, Java, Kotlin, Scala or similar

What gives you an edge:

  • Experience working with the distributed systems and stream processing solutions, hands on experience with Apache Kafka is highly desirable

  • Strong grasp of DevSecOps practices and ability to contribute to improving systems reliability, quality, and time-to-market

  • Experience designing and implementing multiple automated deployment pipelines at both applications and infrastructure level. Ideally, you would have experience with Ansible and Terraform on multiple projects

  • Experience working with the Hashicorp tool set, specifically Vault for secrets management and Consul for service discovery

  • Experience deploying applications and infrastructure into the cloud

-------------------------------------------------

Job Family Group:

Technology

-------------------------------------------------

Job Family:

Applications Development

------------------------------------------------------

Time Type:

Full time

------------------------------------------------------

Citi is an equal opportunity and affirmative action employer.

Qualified applicants will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Citigroup Inc. and its subsidiaries ("Citi”) invite all qualified interested applicants to apply for career opportunities. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.

View the "EEO is the Law" poster. View the EEO is the Law Supplement.

View the EEO Policy Statement.

View the Pay Transparency Posting

바로 지원하기
  • 22만 명 이상의 유능하고 다양한 직원으로 구성된 Citi 팀에 합류하십시오

  • 90개 국가의 공동체에서 자원봉사를 행하는 사회적 책임감을 가진 직원들

  • 95개 이상의 시장에 실제로 존재하는 의미 있는 채용 기회

우리는 모든 개인을 포용하고 다양한 관점을 장려하여, 귀하가 영향력을 생성하고 경력을 성장시킬 수 있는 문화를 조성합니다. Citi는 높은 수준의 전문성 기준, 굳건한 성실성과 관대함, 지적 호기심 및 엄격함을 보여주는 동료를 소중하게 생각합니다. 우리는 귀하가 씨티에서 경력을 쌓는다는 일이 가진 중요성을 인식하며, 귀하의 헌신에 대하여 우리도 마찬가지로 노력할 것을 약속 드립니다

저장된 채용 공고

열람한 채용 공고가 없습니다.

이전에 본 채용 공고

열람한 채용 공고가 없습니다.