A Framework For Improving Security In Text-To-Image Generation Systems

Home Research A Framework For Improving Security In Text-To-Image Generation Systems
A Framework For Improving Security In Text-To-Image Generation Systems

The development of machine learning technologies capable of generating text and visual elements based on user input has opened new avenues for the cost-effective creation of targeted content. Among these technologies, generative text-to-image (T2I) systems are particularly notable for their global impact on creative practices.


T2I technologies such as DALL-E 3 and Stable Diffusion use deep learning to create realistic visuals that respond to written prompts or commands. Although these technologies are becoming more common, there are concerns about their potential abuse, from privacy violations to misinformation or image manipulation.


A team of researchers from the Hong Kong University of Science and Technology and the University of Oxford recently introduced Latent Guard, a strategy aimed at improving the security of T2I systems. Detailed in a study yet to be officially published on arXiv, the system aims to block the creation of content deemed inappropriate or harmful by analyzing user cues for banned ideas or terms.b “Given the complexity of T2I models in generating detailed images, there is a risk that they can be used to generate inappropriate content,” Runtao Liu, Ashkan Hakzar, and their team note in their report.


To solve this problem, current safeguards rely either on blacklists of specific terms that can be circumvented with creative wording, or on malicious content detection that requires massive amounts of data to train effectively and cannot adapt. Here, Latent Guard is introduced as a new solution designed to improve security measures for T2I content creation. The innovation of Latent Guard is based on the idea of ​​preventing the introduction of prohibited words in hints to limit the creation of content that can be considered unethical. However, unlike traditional blacklisting methods, which can be circumvented by simply changing the wording, Latent Guard digs deeper into the meaning of the input, detecting malicious intent even if the hint wording does not explicitly contain prohibited terms.


“Based on the concept of blacklists, Latent Guard works by learning the latent space above the text analysis component of the T2I model, which allows the detection of malicious ideas in the processed text,” explained Liu, Hakzar and their colleagues. The framework they propose involves a sophisticated approach, including a dedicated data generation process, unique architectural elements, and a training strategy designed to make effective use of the data.


In their analysis, Liu, Hakzar, and their colleagues tested Latent Guard against other T2I technologies using different data sets to evaluate its performance. One such dataset, the CoPro dataset, was specifically designed for their study, consisting of over 176,000 safe and questionable text prompts. According to their findings, Latent Guard consistently identifies dangerous cues under different conditions, showing strong adaptability to different datasets and contexts.


The promising first results of their work indicate that Latent Guard is a significant advance in making T2I technologies more secure and reducing the risks of misuse. The researchers are preparing to share the technical details and CoPro dataset on GitHub, inviting further experimentation and development by others in the field.