Abu Dhabi, United Arab Emirates–(BUSINESSWIRE)–The Technology Innovation Institute (TII), the global research center and applied research pillar of the Abu Dhabi Advanced Technology Research Council (ATRC), today announced the launch of NOOR, the largest model natural language processing (NLP) in the Arabic language in the world to date.
The team of top researchers and artificial intelligence (AI) specialists at the Technology Innovation Institute have partnered with LightOn, a technology company whose mission is to unlock the potential of artificial intelligence at large scale for companies, in order to revolutionize the Arab NLP model. The NOOR model has the ability to perform tasks beyond the realm of language – providing a full pipeline of high-quality data, including large-scale exploration, filtering, and curation. The model facilitates the training and delivery of large-scale distributed services to deliver applications with efficient model inference and specialization.
Dr. Ray O. Johnson, CEO of TII and ASPIRE, said: “With this development, we are well on our way to strengthening our research capabilities and credentials, and elevating the status of Abu Dhabi and the United Arab Emirates. United as a serious research ecosystem. Our teams of experts have demonstrated once again that this region can achieve breakthrough R&D results to positively influence the world.”
Dr. Ebtesam Almazrouei, Director of the Artificial Intelligence Unit, TII, noted, “Large language models have been all the rage in the world of natural language processing, and we are proud to announce this model of tip with 10 billion parameters, the largest Arabic NLP model in the world. This unique and large Arabic dataset collected to train the model is the result of many months of hard work including collecting, deleting and filtering from various sources. We would like to thank the entire team that worked on this project, for ensuring that NOOR becomes the gold standard for exploring Arabic for academics and businesses around the world.”
Speaking on the launch, Professor Mérouane Debbah, Chief Researcher at Digital Science Research Center and Artificial Intelligence Unit, TII, said: “Thanks to NOOR, TII has expanded the scope of the Arab model modern standard by leveraging the know-how of large language models to establish cutting-edge interdisciplinary expertise in this new generation of AI research”.
To build the largest high-quality cross-domain Arabic databases in the world, NOOR’s unique database consists of more than 30 billion words, and combines web data with books, poetry, news articles and technical information to greatly expand the model’s applicability.
Dr. Ebtesam Almazrouei said that the NOOR model is based on the popular transformer architecture. The model’s decoder, which is similar in structure to the GPT-3 preformed generative transformer, is programmed to tackle generative tasks. This structure has been updated to reflect the latest developments in the world of machine learning, including improvements such as better positional integrations. To ensure quality at scale in the NOOR dataset, the TII team designed an automated filtering pipeline based on machine learning techniques. These tools identify text that matches quality references and protect the model from exposure to spam.
Formed on the 128 GPU A100 GPU, NOOR leverages a state-of-the-art 3D parallelism approach with Megatron + DeepSpeed to enable computational distribution while ensuring efficient use of available hardware resources.
The Director of the Artificial Intelligence Unit noted that this achievement is just the first step in the unit’s efforts to contribute to the UAE’s broader artificial intelligence strategy.
It should be noted that the model was called “NOOR”, which means “light” or glare in Arabic, to emphasize the connection between the Arabic language model and the enlightenment of the mind.
About Technology Innovation Institute (TII)
For more information, please visit the following website: www.tii.ae
The text of the press release resulting from a translation should in no way be considered official. The only authentic version of the press release is that of the press release in its original language. The translation will always have to be compared with the source text, which will set a precedent.
*Source: AETO Wire
We want to say thanks to the author of this post for this awesome material
Technology Innovation Institute announces the launch of NOOR, the largest Arabic-language NLP model in the world – i-Actu
Check out our social media profiles , as well as other pages related to it.https://www.ai-magazine.com/related-pages/