In 2022 and 2023, AI systems that generate original content such as images, music, and text, such as Stablie Diffusion, which generates images, and ChatGPT, which generates text through interactive dialogue, have become widely talked about among many people.
In fact, at VECTION, we had been working on creating the following from the summer to fall of 2020.
It was a game in which inputting a summary of a budget proposal would generate a corresponding "face".
Whether it's at the national or local government level, or even in the private sector or smaller groups, the allocation of a group's budget reflects the group's nature. However, understanding the overall picture, balance, and tendencies of a specific group's budget proposal just by looking at a table of numbers is difficult. The "nature" of a budget proposal has the shape of a hyper-dimensional figure, which requires expertise (or may be impossible) to interpret directly from the numbers, and visualizing it as a graph is also cumbersome. And even if it's put in a graph, it doesn't necessarily make it easier to understand.
On the other hand, a person's "face" also has a hyper-dimensional figure-like nature, compressed with various information coordinates, ranging from emotions to personality, age, health, and personal identification. Humans have a remarkable ability to recognize a wealth of information at a glance, thanks to their sensitivity to facial expressions and subtle differences in details. This is a fairly reasonable hypothesis, we think.
Then, is it possible to give a person's "face" the function of summarizing the complex and elusive "whole image" (nature and tendencies) of a budget proposal like a "chart" or a "graph"? This was the idea that led to its creation. We remember bursting into laughter at how ridiculous it sounded when it was first mentioned.
On the other hand, at VECTION, we also dream of a system called "Mirror Budget," in which all citizens stream their ideal national budget allocations and the results of the tally are constantly known, believing that it is desirable for many people to feel familiar with the concept of "budget proposal" to properly function democracy. If there is a tool that can be used as a half-joke or game to generate various faces from various budgets, it would be simply interesting and fun to have.
And so, we set out to create something using the popular models of the time, such as BERT for natural language processing and SAGAN for image generation. The result is the image at the beginning of this essay: a "face" created from a budget proposal. The donut-shaped graph (using the Sunburst visualization technique) represents the budget allocation, and the "face" generated from that budget is displayed in the center. In other words, selecting a budget proposal from the graph causes the "face" to change accordingly.
We even created a video of the interactive experience with the tool.
The budget data used at the time was from the Japanese government, pulled from the website "JUDGIT!," which allows for searching government projects (https://judgit.net/).
However, as we considered how to control the "meaning" of the "face" and the constraints on it (such as which specific facial features correspond to which budget properties, how much explanation should be provided for those connections, or whether no explanation at all would be more interesting), we became too busy to release the tool publicly. A shame, really.
Even so, we believe that the idea of representing complex and cumbersome budget proposals with a "face" is still an interesting one. Moreover, if the mechanism behind the bot legislator (https://mirror.xyz/vection.eth/Lkv1_-QciAG1811juIinXYNzLDeNan8zAeW7uHVl9KA) were to be generalized, having a symbolic "face" that summarizes the overall plan (such as its properties and trends) of policies and budget proposals would be a good thing to have.
We think it's important to consider how such tools can help us better understand complex issues and make informed decisions. It's exciting to think about the potential applications of such technology in the future, and how we can use it to make our democratic processes more transparent and accessible to everyone.
In this context, we took a look at Stable Diffusion, which is currently trending, and noticed a place in the structure that was similar to the one we used in our previous creation. It was a spot to input a vector that quantifies the meaning of the text. So, we tried doing the same thing using the website "getimg.ai."
One of the reasons we wanted to try this again was because we learned that Stable Diffusion can generate appropriate images from abstract prompts (text). In other words, it may be able to automatically create correspondence between suitable meanings (budget proposals) and images (faces). In our previous creation, the meaning structure was handled solely by BERT, and there wasn't much constraint on the image generation part (by the concept of words).
We tried inputting the following text prompt and obtained the corresponding image using "getimg.ai":
"One close-up of a face symbolizing the project 'to strengthen the disaster support system by developing a 'Tokyo Metropolitan Government Disaster Psychiatric Team' to provide psychiatric care and mental health support in disaster areas.'"
(Source: "Tokyo Metropolitan Government General Account Supplementary Budget No.2, 2019")
It seems that a female figure who appears to provide psychological care during disasters was generated, which is appropriate for the given prompt.
However, considering the fact that a background might distract viewers and that using a real person's face might result in biases related to race or gender, we set the initial value to a "Noh mask" (a type of traditional Japanese mask) and tried again. (The mask we used is from the Tokyo National Museum and the photo was chosen from Wikimedia.)
We didn't make any adjustments and just tried it out, but how should we evaluate this?
Therefore, we considered creating a comparison by generating faces for contrasting budget overviews (which only differ in the meaning of the words, not the numerical values). We replaced the initial Noh mask with two different ones, one for a prompt for "care" and the other for a prompt that is "opposite" to it, which is for "military."
"One close-up of a face symbolizing the project to 'establish a posture of eight BMD Aegis ships as stipulated in the new National Defense Program Guidelines in order to further strengthen our nation's defense against ballistic missile threats.'"
(Source: "Addition of BMD function to Aegis ships”, https://judgit.net/projects/5036)
The nine images below show the results of generating faces for these two prompts.
We wondered if there was any meaning in using a "face" as a symbol for budget proposals and whether the face ("Omote") could properly express the differences in the nature of the budget (and whether we can expect it to do so). We also challenged ourselves to see if we could actually identify the differences between the "care" and "military" budgets based on their appearance alone. We wrote down our answers on paper as an analog method of testing.
By the way, the correct answer rate among VECTION members was 82%. One member had a correct answer rate of only about 50% because he made judgments based on small images without enlarging them, so we confirmed that details are important (maybe?).
We made various other attempts, and we found that the Noh mask works well or produces a "good face."
Now, how can we make use of this taste? To be continued.
| Care | Care | Care | | Care | Military | Military | | Military | Military | Military | Correct answer rate summary: 1/9 = about 11%
Article summary and system draft: Asaki Nishikawa, Yoshimi Kikuya
Article composition: Toshihiro Furuya
Images: Yoshimi Kikuya