✨ Fill and validate PDF forms with InstaFill AI. Save an average of 34 minutes on each form, reducing mistakes by 90% and ensuring accuracy. Learn more

Senior Lead Engineer

Capital One Evanston, IL
senior lead engineer ai infrastructure training cloud capital clusters engineering machine learning learning lead
April 23, 2024
Capital One
Evanston, IL

Job Summary:

Capital One is searching for a Senior Lead Engineer in Generative AI Infrastructure to play a pivotal role in advancing foundational AI capabilities. This position entails contributing to various initiatives, including building distributed training clusters, deploying Large-Language Models (LLMs) on GPU instances, and supporting AI research and development within public cloud infrastructure.

Job Duties and Responsibilities:

  • Deploy large-scale distributed training clusters in the public cloud, focusing on optimizing storage and networking stack and employing multiple parallelism strategies.
  • Design and implement fault-tolerant infrastructure to support long-running large-scale training tasks, utilizing containers and check-pointing libraries.
  • Develop run-time infrastructure to serve large ML models such as LLMs and FMs in the public cloud.
  • Establish infrastructure for deploying search indexes and embeddings in vector databases, closely aligning with other capabilities.
  • Collaborate with cloud and container infrastructure teams, as well as AI researchers, to design and implement key capabilities.

Qualifications and Experience:

  • Bachelor's degree in Computer Science, Computer Engineering, or a related technical field.
  • At least 8 years of experience designing and constructing data-intensive solutions using distributed computing.
  • Proficiency in Python, Go, Scala, or Java, with a minimum of 8 years of programming experience.
  • Minimum of 1 year of experience with high-performance computing (HPC), vector embedding, or semantic search technologies.
  • At least 1 year of experience building, scaling, and optimizing training or inferencing systems for deep neural networks.

Preferred Qualifications:

  • Master's or Doctoral degree in Computer Science, Computer Engineering, Electrical Engineering, Mathematics, or a related field.
  • Background in machine learning with experience in large-scale training and deployment of deep neural networks and/or transformer architectures.
  • Experience with machine learning frameworks such as TensorFlow, PyTorch, Lightning, Mosaic ML, etc.
  • Ability to thrive in a fast-paced environment with ambiguity and competing priorities and deadlines.
  • Previous experience at tech and product-driven companies/startups is preferred.
  • Ability to iterate rapidly with researchers and engineers to enhance product experience while building foundational capabilities.
  • Familiarity with deploying large neural network models in demanding production environments.
  • Experience with building GPU clusters in the public cloud with tightly-coupled storage and networking.

Salary:

  • New York City (Hybrid On-Site): $234,700 - $267,900 for Sr. Lead Machine Learning Engineer

Benefits:

  • Performance-based incentive compensation, including cash bonuses and long-term incentives.
  • Comprehensive health, financial, and other benefits supporting total well-being.
  • Opportunities for career growth and development within Capital One.
  • Inclusive and supportive work environment fostering personal and professional growth.

About Company:

Join Capital One in its mission to create trustworthy, reliable, and human-in-the-loop AI systems, transforming the banking industry for good. Help us reimagine how we serve our customers and businesses with emerging AI capabilities.


Report this job

Similar jobs near me

Related articles