NVIDIA

AI and ML Infra Software Engineer, GPU Clusters

NVIDIA

Remote
Full-TimeDepends on ExperienceSenior LevelMasters
Job Description

Welcome to NVIDIA, a company at the forefront of artificial intelligence and machine learning technology. As an AI and ML Infra Software Engineer, you will play a crucial role in building and maintaining our cutting-edge GPU clusters that power our industry-leading AI and ML solutions. We are seeking a highly skilled and motivated individual who is passionate about harnessing the power of AI and ML to drive innovation and change. The ideal candidate will have a strong background in software engineering, with a focus on infrastructure and experience working with GPU clusters. If you are ready to join a dynamic and innovative team, and make a significant impact in the world of AI and ML, then we want to hear from you!

  1. Develop and maintain cutting-edge GPU clusters for AI and ML applications.
  2. Collaborate with cross-functional teams to design and implement efficient and scalable infrastructure solutions.
  3. Troubleshoot and resolve any issues related to the GPU clusters.
  4. Stay up-to-date with the latest advancements in AI and ML technology and implement them into the infrastructure.
  5. Conduct thorough testing and performance evaluations to ensure the reliability and efficiency of the GPU clusters.
  6. Optimize system performance and resource utilization to support large-scale AI and ML workloads.
  7. Provide technical support and guidance to other team members and departments utilizing the GPU clusters.
  8. Document and maintain accurate records of all infrastructure configurations and changes.
  9. Continuously monitor and analyze system performance metrics to identify areas for improvement.
  10. Keep up-to-date with industry trends and best practices in AI and ML infrastructure.
  11. Collaborate with vendors and third-party providers to evaluate and integrate new technologies and tools.
  12. Participate in on-call rotation to provide 24/7 support for critical systems.
  13. Proactively identify and address any potential security vulnerabilities in the infrastructure.
  14. Contribute to the development of internal software tools and processes to improve efficiency and productivity.
  15. Communicate project progress and status updates to the team and management.
Where is this job?
This job opening is listed as 100% remote
Job Qualifications
  • Extensive Knowledge And Experience In Designing And Implementing Ai And Ml Infrastructure Using Gpu Clusters, Including Proficiency In Cuda Programming And Parallel Computing.

  • Strong Background In Software Engineering, With Expertise In Developing And Maintaining Large-Scale Distributed Systems For Ai And Ml Applications.

  • Proficiency In Containerization Technologies Such As Docker And Kubernetes, And Experience Deploying And Managing Ai And Ml Workloads On Gpu Clusters.

  • Familiarity With Ai And Ml Frameworks Such As Tensorflow, Pytorch, And Caffe, And Ability To Optimize Them For Performance On Gpu Clusters.

  • Demonstrated Ability To Troubleshoot And Resolve Complex Technical Issues Related To Gpu Clusters And Ai/Ml Infrastructure, And Experience With Performance Monitoring And Tuning For Optimal Resource Utilization.

Required Skills
  • Distributed systems

  • Cloud Computing

  • Containerization

  • Data Processing

  • ML algorithms

  • Programming Languages

  • Infrastructure Automation

  • Parallel Computing

  • Ai Development

  • Dev

  • Gpu Architectures

  • High-Performance Computing

Soft Skills
  • Communication

  • Conflict Resolution

  • Leadership

  • Time management

  • creativity

  • flexibility

  • Teamwork

  • Adaptability

  • Problem-Solving

  • Empathy

Compensation

According to JobzMall, the average salary range for a AI and ML Infra Software Engineer, GPU Clusters is $120,000 to $180,000 per year. This can vary depending on factors such as location, experience, and company size.

Additional Information
NVIDIA is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate based upon race, religion, color, national origin, sex, sexual orientation, gender identity, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.
Required LanguagesEnglish
Job PostedFebruary 24th, 2025
Apply BeforeApril 11th, 2026
This job posting is from a verified source. 
Reposted

Apply with Video Cover Letter Add a warm greeting to your application and stand out!

About NVIDIA

NVIDIA Corp. designs and manufactures computer graphics processors, chipsets, and related multimedia software. The company operates through two segments: Graphics Processing Unit and Tegra Processor. The Graphics Processing Unit segment includes sales of the company's GeForce discrete and chipset products that supports desktop and notebook PCs plus license fees from Intel and sales of memory products. The Tegra Processors segment provides processors that deliver superior visual and multimedia experience on tablets, smart phones and gaming devices while consuming minimal power.

Frequently asked questions

Get interviewed today!

JobzMall is the world‘ s largest video talent marketplace.It‘s ultrafast, fun, and human.

Get Started