This week, after months of anticipation, we finally received our invite to OpenAI's DALL·E 2 Preview. Needless to say, we have been experimenting furiously to figure out how we may be best able to utilize this amazing new technology here at Slokie once it is fully released. This is what we've found so far.
Our next book is going to be about a trip to a Mad Science Museum. The story for this book is already done, but one cool use of DALL·E that I've already found as an author (and founder of Slokie) is generating concept art that I can give an illustrator to show them what I envision for each page. For example, in college, I worked on a project to create an ornithopter, or flapping-wing flyer. So I wanted to include one in the Mad Science story. Below is what DALL·E produced when I entered the text:
"A photo of a robotic bird, digital art."
Here's another example that I generated with the following text:
“A photo of a science museum lobby with piles of junk, digital art”
Using this concept art as a guide, our Illustrator can then offer their own fresh perspective on it, and ensure that scenes and characters are consistent from one page to the next, which is something that's tricky/outright impossible to do with DALL·E. It's the perfect example of how technology like this doesn't replace the need for humans. It just lets us be more effective.
Along those lines, there are certain other limitations we've discovered with DALL·E that prevent it from being used as a direct upgrade to our current technology but should allow it to complement our existing software instead. For example, OpenAI is very concerned about how people may abuse this by creating deep-fakes, so they prevent you from uploading photos of real people to incorporate into scenes. And even if they remove those guardrails when it fully launches, DALL·E 2 requires that you erase the background from photos to mark areas that should be AI-generated, which is something that we already do with our own backend software. Regardless of what OpenAI decides to do in the end, we're very fortunate to already have that functionality figured out because it needs to be in place to put real people into scenes either way.
Therefore, the main application of DALL·E that we have been experimenting with is using it to slightly modify scenes to include other personalized touches. Essentially, what you can do with DALL·E is mark areas in existing images that you want to alter. So, we can take some static, pre-made background art made by a human illustrator, which would enable consistency of certain elements from scene to scene, and then change one or two things about it. For instance, we can ask for your child's favorite animal, seamlessly plug that animal into a scene, and then use our current machine learning models to plug your child into the scene as well. For example, here is the same picture from above, but I used DALL·E to add in a purple elephant:
Again there are some limitations here that we're still working on sorting out, such as how to ensure that the animal looks consistent from one page to the next and how to ensure that this doesn't get abused by bad actors. It may just be that these sorts of limitations just place rails on the stories we can tell and the list of variations we can offer, so maybe just one scene would feature the child's favorite animal, and the next would feature their favorite food, and those animals and foods have to be pre-vetted and chosen from a list. But regardless, there's a huge amount of potential here, especially when you consider how this can complement our existing tech.
It's also worth mentioning here that Google also recently announced a very similar product to DALL·E, called Imagen. We reached out to their team to see if we might be able to gain access to that as well in case it fits our needs a bit better, but have yet to hear back. Our application to use DALL·E did take several months, however, so hopefully, it's just a matter of time.