The Language of Modifications

a short study on the languages of descriptions and modifications

Evan Pu
4 min readDec 5, 2022

Imagine describing a task for your friend to perform. It is unlikely they’ll get it right on the first try. Often, additional communications are needed to modify and improve what is being done so far.

At Neurips 2022, I conducted a small study to get a sense of the following:

Q1: How valuable is the modification process?

Q2: Are the languages of modification and description different?

telephone pictionary

I chose the telephone-pictionary task. Given a start image, a group of people alternatively describe it (using words), then re-drawing the image based on the description.

Person1 sees the image from the previous generation, and gives a description. Person2 sees only the description, and attempt to recover the original image.

This continue for several iterations. As you can see, the language is descriptive and aims to have the drawer recover the original image in 1 shot.

telephone pictionary with modifications

What if we allow an additional step of modification to correct some of the errors? It looks something like this:

Person1 uses a descriptive language so that Person2 can generate an image from scratch. Person3 uses a modification language so Person4 can alter an existing image.

Ideally, Person1 and Person3 are the the same person — the “programmer”, and Person2 and Person4 are the same person — the “interpreter”. I made these people separate to avoid having to pair a programmer with an interpreter in the same iteration.

All the collected data can be browsed at this website (some images might not load right-away, just click the buttons to force it to reload).

Q1 : How important is the modification process ?

Here’s how the original image changes over time given only descriptions

As you can see, we quickly devolved into just a rectangle and a circle.

Here’s with both descriptions and modifications

As we can see, with modification, we were able to retain more details, and arrives at a teddy-bear like drawing.

We conclude that the process of modification is important.

Q2: Are the languages for description and modification different?

Are the languages of description and modification different? I first transcribed all the languages used to text form.

Then, we can use few-shot learning of gpt-3 to see if it can reliably distinguish descriptive language from modification language. I used the texts of the first 2 generations as prompt, and evaluated on the remaining 9 generations.

For descriptions, we get a correct identification 9/9 times. For modifications, we get a correct identification 7/9 times.

We conclude that the languages of description and modification are different.

A few logistic remarks

This study consists of roughly 12 generations of 2 conditions, one without modifications (2 participants) and one with modifications (4 participants). (2+4)*12 or roughly 70 participants total. Each data-point takes roughly 4 minutes to collect (1 min of explanation of the task, and 2–3 minutes of waiting for them to generate the answer), for a total of roughly 5 hours of time. This blog post and the associated interactive website and gpt3 study took about 10 hours, for a total of 15 hours of work.

I’m glad I get to do the data collection at neurips, where the data quality is high, the annotators understood the task well, and I don’t have to to spin up a website and host it on prolific.

Conclusion

The modification is valuable — without it, our drawing of a person devolves into just a rectangle and a circle. The language of modification is different than that of description — gpt3 can reliably tell one from another.

Current foundational models such as clip and stable-diffusion are trained using descriptive data such as image-caption pairs. Consequently, while they can in 1-shot generate impressive results, it is difficult to interact with them further to modify and refine the current output.

We should be collecting more datasets of modifications, where the speaker uses language to telling the listener how to modify and improve an existing output. There are several efforts in this direction already, mostly in the domain of text and code edits, which is a promising start.

— evan 2022–12–05

p.s. a huge thanks for everyone who participated in this study, it is atypical for someone to approach you at a conference and ask for drawings, but you were so kind to me and put up with it. this blog is written for you.

--

--

Evan Pu

Research Scientist (Autodesk). PhD (MIT 2019). I work on Program Synthesis in the context of Human-Machine Communications