Meta Introduces the Groundbreaking SeamlessM4T Model for Multimodal Translation

Home AI Projects Meta Introduces the Groundbreaking SeamlessM4T Model for Multimodal Translation
translate app

Meta researchers have introduced the innovative SeamlessM4T, a cutting-edge multilingual and multitask model designed to effortlessly translate and transcribe across both text and speech formats. The advent of the internet, mobile devices, social platforms, and communication tools has ushered in an era of unprecedented access to content in various languages. SeamlessM4T aims to materialize the concept of fluid communication and understanding across different languages.

Equipped with an impressive array of capabilities, SeamlessM4T includes:


  • Automated speech recognition for nearly 100 languages
  • Speech-to-text translation supporting almost 100 input and output languages
  • Speech-to-speech translation for close to 100 input languages and 35 output languages (including English)
  • Text-to-text translation for nearly 100 languages
  • Text-to-speech translation for approximately 100 input languages and 35 output languages (including English)


SeamlessM4T is being offered to researchers and developers under the CC BY-NC 4.0 license, reflecting a commitment to open science principles. Furthermore, the metadata from SeamlessAlign, which stands as the largest multimodal translation dataset ever assembled, comprising 270,000 hours of aligned speech and text, has been released. This enables independent data analysis and further exploration within the research community. The creation of SeamlessM4T addresses a longstanding challenge in the realm of multilingual communication. Unlike previous systems that were confined by limited language coverage and reliance on separate subsystems, SeamlessM4T introduces a unified model with the capacity to comprehensively handle speech-to-speech and speech-to-text translation tasks. Meta has built upon prior innovations, such as No Language Left Behind (NLLB) and Universal Speech Translator, to craft this cohesive multilingual model. Demonstrating remarkable performance with both low-resource and high-resource languages, SeamlessM4T holds the potential to revolutionize cross-language communication.


At the core of the model’s architecture lies the multitask UnitY model, which excels in generating translated text and speech. UnitY supports a variety of translation tasks, encompassing automatic speech recognition, text-to-text translation, and speech-to-speech translation, all within a single model. Meta utilized advanced techniques such as text and speech encoders, self-supervised encoders, and sophisticated decoding processes to train this versatile model. Meta places a strong emphasis on a responsible AI framework to ensure accuracy and safety. The company reports extensive research in tackling toxicity and bias, resulting in a model that is more attuned to and capable of addressing potential issues. The public release of the SeamlessM4T model encourages collaborative research and development within the AI community. As the world becomes more interconnected, the ability of SeamlessM4T to overcome language barriers stands as a testament to the potency of AI-driven innovation. This milestone propels us closer to a future where communication transcends linguistic confines, ushering in a world where individuals can genuinely comprehend one another, irrespective of language differences.