Varvara Guljajeva & Mar Canet Sola: Exploring the Catastrophic Impact of Human Interaction on Nature: Navigating the Latent Space of Diffusion Models through the Gaze

Large AI models, such as Stable Diffusion, have compressed billions of nature images that bear traces of both the catastrophic impact of human interaction and the mesmerizing beauty of natural landscapes, seamlessly merging these contrasting aspects. The past history of humanity, available online, is stored in datasets like LAION-5B, which aggregated 5 billion images from the internet for training such large models like Stable Diffusion. The weights of the checkpoint file are the repository where all this knowledge remains compressed and stored.

Artwork “Visions of Destruction” introduces an innovative real-time AI-driven interactive latent cinema experience to its audience. By harnessing the power of gaze and leveraging prompts that depict various ecological damages caused by humans, this groundbreaking project creates initial beautiful landscapes which mesmerize the flow of animated transformations to landscapes of ecological disasters. Guided by the viewers’ gazes, which are seamlessly detected through an advanced eye-tracking system, the real-time animations unfold, offering a captivating exploration of the consequences of our actions on the environment. In other words, a simple act of observation results in transformation of imagery.

The piece unveils the possibility space of a generative model when is fearlessly exploring the digitalized history of human transformations to nature captured in the training images. In the latent cinema of AI, dreams and nightmares intertwine as the generative model unravels hidden dimensions and unleashes new possibilities creating visual indeterminacy landscapes [1] of catastrophes to nature. It opens up a realm of limitless creativity and imagination, creating a unique experience for each viewer. It becomes a powerful tool, providing artists and filmmakers with a canvas to transmute their visions into tangible reality, blurring the line between the creator and the created. Moreover, gaze is a mode of interaction in this system, taking salient parts looked at by the viewers, but also the metaphor of humanity’s gaze over nature and its impact.

In terms of its technological aspects, this art installation demonstrates recent advancements in text-to-image models. Moreover, the interactive art piece uses real-time deep learning technology for image generation and animations, skillfully combined with an innovative method of interaction through eye-tracking. To achieve this, we make use of Stable Diffusion Inpainting model by using masks created of the specific areas where the viewer looks in the image. These masks enable the transformation of the image on the viewer’s focal points, guided by prompts that poignantly address the theme of human-induced destruction of nature and create interpolations to the new images generated. Finally, when a landscape has been destroyed by the gaze interaction, and it is not looked at for a few seconds, it regenerates into a new ‘beautiful landscape’.

To sum put, the proposed presentation is about a case study that introduces real-time gaze-based navigation to AI-generated animation depicting human destructive events in the beautiful landscapes, which are actually synthetically generated, too.

References

[1] Hertzmann, Aaron. “Visual indeterminacy in GAN art.” In ACM SIGGRAPH 2020 Art Gallery, pp. 424–428. 2020.