• Locations
  • Paris, France
  • London
  • France
  • Last Published
  • Dec. 4, 2024
  • Sector
  • AI/ML
About Mistral - At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world. Our mission is to make AI ubiquitous and open. - We are creative, low-ego, team-spirited, and have been passionate about AI for years. - We hire people who thrive in competitive environments, because they find them more fun to work in. - We hire passionate women and men from all over the world. - Our teams are distributed between France, UK and USA. Role Summary - We work at the cutting edge of science and technology, combining modern cloud environments with High-Performance Computing (HPC) standards. Our clusters use the latest available GPUs at large scales. - As an HPC System Administrator, you will be responsible for managing our clusters, ensuring their smooth operations for all users and all sites. - You will be the interface between our research and production users as well as various cloud providers to address node issues, capacity requests, image upgrades and tooling needs. - Location: France. Key Responsibilities - Oversee the strategic design, system performance, resource allocation, configuration management and operational support for both our hardware and software systems. - Solve and troubleshoot complex technical problems with a proactive approach to system optimization and issue resolution. - Ensure HPC standard security practices and compliance. - Maintain comprehensive documentation for infrastructure, configurations and procedures. - Collaborate effectively across teams of engineers and researchers. Qualifications & profile We’re looking for a blend of experience with: - High-performance networking - GPUs in large scale distributed networks - Large scale distributed storage file systems and providers - Linux Kernel and OS - Virtualization and container architecture in cloud environments (Docker, Kubernetes, OpenStack...) - SLURM - Software, hardware and network failure troubleshooting Now, it would be ideal if you had experience with : - LLM training - AI/ML frameworks - GPU programming (CUDA) We’re also looking for people who are: - Passionate - Self-directed - Low-ego - Team player Benefits - Daily lunch vouchers - Contribution to a Gympass subscription - Monthly contribution to a mobility pass - Full health insurance for you and your family - Generous parental leave policy