- Location
- Seoul, South Korea
- Last Published
- Dec. 6, 2024
- Sector
- AI/ML
Who we are At Twelve Labs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media. With a remarkable $77 million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation. We are a global company that values the uniqueness of each person’s journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI. About the role We are seeking a research intern to collaborate directly with Twelv Labs ML Research Scientists in the development of cutting-edge Video Foundation Models and Video Language Foundation Models. As a research intern, you will engage in various aspects of the research process, including data processing, model training methodologies, and architecture design. You will also have the opportunity to contribute to the advancement of existing hypotheses or independently validate new hypotheses based on the needs of the project. Join us in shaping the future of video AI technology through hands-on research collaboration and innovation.
In this role, you will
- Participate in AI research projects necessary for contributing to the Video Foundation Model, including hypothesis validation and research development
- Design and discuss strategies for data collection and annotation required for model training
- Regularly communicate with team members, including project leads, to provide feedback on ongoing projects
You may be a good fit if you have
- Proficiency in Python and PyTorch
- Experience and interest in research in fields such as Large Language Model, Vision Language Model, Video Language Model, Video Representation Learning, Video Understanding, Action Recognition, or similar areas
- Proactive and responsible attitude towards performing tasks in your role with pride
- Effective communication and collaboration skills to work with project leads and other researchers
- Experience with real-world deep learning projects that have been applied to products or have research value (including open source projects)
- Record of research publications in top-tier AI conferences (NeurIPS, ICML, ICLR, AAAI, NAACL, ACL, EMNLP, CVPR, ICCV, ECCV, KDD, SIGGRAPH) in fields like Computer Vision, Video Understanding, Language Model, Vision Language Model
- Experience in top-tier AI conference challenges, Kaggle competitions, or ranking high in domestic/international AI competitions.
- Currently enrolled in or completed at least one year of master's or doctoral studies in AI, ML, or related fields
- Strong written and verbal communication skills in English