TY - GEN
T1 - The Five-Dollar Model
T2 - 19th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2023
AU - Merino, Timothy
AU - Negri, Roman
AU - Rajesh, Dipika
AU - Charity, M.
AU - Togelius, Julian
N1 - Publisher Copyright:
Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2023/10/6
Y1 - 2023/10/6
N2 - The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models’ performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.
AB - The five-dollar model is a lightweight text-to-image generative architecture that generates low dimensional images or tile maps from an encoded text prompt. This model can successfully generate accurate and aesthetically pleasing content in low dimensional domains, with limited amounts of training data. Despite the small size of both the model and datasets, the generated images or maps are still able to maintain the encoded semantic meaning of the textual prompt. We apply this model to three small datasets: pixel art video game maps, video game sprite images, and down-scaled emoji images and apply novel augmentation strategies to improve the performance of our model on these limited datasets. We evaluate our models’ performance using cosine similarity score between text-image pairs generated by the CLIP VIT-B/32 model to demonstrate quality generation.
UR - http://www.scopus.com/inward/record.url?scp=85175401527&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85175401527&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85175401527
T3 - Proceedings - AAAI Artificial Intelligence and Interactive Digital Entertainment Conference, AIIDE
SP - 107
EP - 115
BT - Proceedings - AAAI Artificial Intelligence and Interactive Digital Entertainment Conference, AIIDE
A2 - Eger, Markus
A2 - Cardona-Rivera, Rogelio Enrique
PB - Association for the Advancement of Artificial Intelligence
Y2 - 8 October 2023 through 12 October 2023
ER -