
AI and ML Infra Software Engineer, GPU Clusters
Welcome to NVIDIA, a company at the forefront of artificial intelligence and machine learning technology. As an AI and ML Infra Software Engineer, you will play a crucial role in building and maintaining our cutting-edge GPU clusters that power our industry-leading AI and ML solutions. We are seeking a highly skilled and motivated individual who is passionate about harnessing the power of AI and ML to drive innovation and change. The ideal candidate will have a strong background in software engineering, with a focus on infrastructure and experience working with GPU clusters. If you are ready to join a dynamic and innovative team, and make a significant impact in the world of AI and ML, then we want to hear from you!
- Develop and maintain cutting-edge GPU clusters for AI and ML applications.
- Collaborate with cross-functional teams to design and implement efficient and scalable infrastructure solutions.
- Troubleshoot and resolve any issues related to the GPU clusters.
- Stay up-to-date with the latest advancements in AI and ML technology and implement them into the infrastructure.
- Conduct thorough testing and performance evaluations to ensure the reliability and efficiency of the GPU clusters.
- Optimize system performance and resource utilization to support large-scale AI and ML workloads.
- Provide technical support and guidance to other team members and departments utilizing the GPU clusters.
- Document and maintain accurate records of all infrastructure configurations and changes.
- Continuously monitor and analyze system performance metrics to identify areas for improvement.
- Keep up-to-date with industry trends and best practices in AI and ML infrastructure.
- Collaborate with vendors and third-party providers to evaluate and integrate new technologies and tools.
- Participate in on-call rotation to provide 24/7 support for critical systems.
- Proactively identify and address any potential security vulnerabilities in the infrastructure.
- Contribute to the development of internal software tools and processes to improve efficiency and productivity.
- Communicate project progress and status updates to the team and management.
Extensive Knowledge And Experience In Designing And Implementing Ai And Ml Infrastructure Using Gpu Clusters, Including Proficiency In Cuda Programming And Parallel Computing.
Strong Background In Software Engineering, With Expertise In Developing And Maintaining Large-Scale Distributed Systems For Ai And Ml Applications.
Proficiency In Containerization Technologies Such As Docker And Kubernetes, And Experience Deploying And Managing Ai And Ml Workloads On Gpu Clusters.
Familiarity With Ai And Ml Frameworks Such As Tensorflow, Pytorch, And Caffe, And Ability To Optimize Them For Performance On Gpu Clusters.
Demonstrated Ability To Troubleshoot And Resolve Complex Technical Issues Related To Gpu Clusters And Ai/Ml Infrastructure, And Experience With Performance Monitoring And Tuning For Optimal Resource Utilization.
Distributed systems
Cloud Computing
Containerization
Data Processing
ML algorithms
Programming Languages
Infrastructure Automation
Parallel Computing
Ai Development
Dev
Gpu Architectures
High-Performance Computing
Communication
Conflict Resolution
Leadership
Time management
creativity
flexibility
Teamwork
Adaptability
Problem-Solving
Empathy
According to JobzMall, the average salary range for a AI and ML Infra Software Engineer, GPU Clusters is $120,000 to $180,000 per year. This can vary depending on factors such as location, experience, and company size.
Apply with Video Cover Letter Add a warm greeting to your application and stand out!
NVIDIA Corp. designs and manufactures computer graphics processors, chipsets, and related multimedia software. The company operates through two segments: Graphics Processing Unit and Tegra Processor. The Graphics Processing Unit segment includes sales of the company's GeForce discrete and chipset products that supports desktop and notebook PCs plus license fees from Intel and sales of memory products. The Tegra Processors segment provides processors that deliver superior visual and multimedia experience on tablets, smart phones and gaming devices while consuming minimal power.

Get interviewed today!
JobzMall is the world‘ s largest video talent marketplace.It‘s ultrafast, fun, and human.
Get Started
