145: Mixed bag, 2
Creating and questioning reality; a prompting tip, and philosophical navel gazing
Two topics for this week, both are pragmatic but the second is more philosophical. Today’s post was polished while listening to my 17 year old learn the bass line to Graham Central Station’s “Hair.”
Creating reality
One of the reoccurring, challenging lessons of GenAI is in distinguishing how this new software is familiar, but also wildly different from software and process that’s come before.
As a use case, think about the task of defining a conceptual direction; and the early stage of image curation. Image you’re on the hook to develop a new brand campaign—you’re going to create a new reality—so an early question is often “what might it look like?” And I’m using “look” to mean the campaign’s conceptual clothing. Based on the brief, relevant culture, your competition, and brand history, an ideas person will immediately have preferences and guesses, potential directions and gut instincts.
The challenge is in translating what sparks in your head into something—a presentation perhaps—which other people can comprehend and hopefully embrace. This is at least two tasks: 1) There’s the “how do I clarify what’s in my head” piece, and 2) “How do I translate that into images others will recognize, accept, and find appealing?”
The first bit could be called nomenclature. Or art history. Or design thinking. Even philosophy. It’s not enough to have an idea, you have to be able to explain and sell it. And the only way to learn how to do that was by reading books or going to school or practicing—i.e. what are the most effective words to convey images which represent a concept that’s “soft,” or “edgy,” or “bespoke?” While the Internet expanded access to this kind of skill building, it didn’t necessarily do the work for you. You still had to search and curate.
Generative AI is beginning to do the work for you.
Today anyone can boot up an LLM and type,
“I’m trying to develop an idea that [solves a problem, defines a new reality, etc. 👈🏽 the more specific you define things the better] and I don’t know where to begin. Can you help?”
And ChatGPT or Claude or Seenapse will gladly initiate a conversation with you to try and address that first task—getting what’s in your head, out. This conversational ability is a massive leap forward for all kinds of idea people.
Then there’s the second task, i.e. illuminating, dramatizing, and protecting the idea, typically with images. Now that we’ve got a legible, coherent idea we need to support it and sell it. What does my concept look like?
Before the Internet, you likely had to able to draw, paint, process photos and/or have access to lots of printed resources to address this second task. And you’d work with specialists; there were roles paying decent salaries to manage the business of conceptual image creation and curation.
With the arrival of the Internet, the very first thing you’d do is collect keywords reflecting the vague notions in your head. See above, and why I think liberal arts majors have an advantage in writing prompts. Because if you have art or design history experience, you’d have a much easier time assembling words which label and describe obscure notions. Then, in the pre-LLM era, you’d scour Google Images, Pinterest, and stock image sites; maybe even reference and scan printed materials. And maybe you’d find images that matched what was in your head.
However you slice it, those older processes were significantly constrained by what existed. Unless you could draw, or take great photos.
This is where GenAI is radically altering the creative process. Because now you can just ask a diffusion model to create what’s in your head. But we’re back to the original challenge: How do I put into words the amorphous, vague, nebulous concepts I’m excited about?
Because as exciting and empowering as Midjourney, Firefly, Meta and Flux might be, you still kind of need to know some words. You need to be able to cast an effective spell to get the model to conjure the image that’s in your head.
Again, you can ask an LLM for help.
Here’s a cut/paste prompt you can feed to ChatGPT—yes, it’s long but what do you care? Hit “submit” and it will lead to a version of this request…
Now all you have to do is type whatever notions are in your head, and ChatGPT will do the translating for you, giving you a more effective prompt you can feed into the image generation platform of your choosing. (You might want to remove any “--instructions” this bit of code injects at the end if you’re using the prompt inside Firefly, Flux or Meta.)
Questioning Reality
Speaking of generating images, what happens when GenAI technology is handed over to billions of people without much fanfare, or training? Specifically, what happens when anyone can create or alter literally any image, at any time, with remarkable ease and fidelity? (And the ability to broadly publish what you’ve generated has already been long established.)
This is the grand experiment we’re all engaged in now, like it or not.
Recently Meta gave GenAI capabilities to everyone on Facebook, Instagram and WhatsApp. We seem to have survived, somehow. Then, in the past two weeks Google released the new Google Pixel 9 phone with a tool called “Reimagine.”
In short, you can use AI to alter, combine and re-imagine photos. Quickly. Without training. At very high fidelity. On your phone.
As The Verge puts it in two related posts, “no one is ready for this.”
“Reimagine is a logical extension of last year’s Magic Editor tools, which let you select and erase parts of a scene or change the sky to look like a sunset. It was nothing shocking. But Reimagine doesn’t just take it a step further — it kicks the whole door down. You can select any nonhuman object or portion of a scene and type in a text prompt to generate something in that space. The results are often very convincing and even uncanny. The lighting, shadows, and perspective usually match the original photo. You can add fun stuff, sure, like wildflowers or rainbows or whatever. But that’s not the problem.”
“Anyone who buys a Pixel 9 — the latest model of Google’s flagship phone, available starting this week — will have access to the easiest, breeziest user interface for top-tier lies, built right into their mobile device.”
So what’s really at stake here?
Media literacy.
Or more nuanced, photo literacy.
Jeong notes, “If I say Tiananmen Square, you will, most likely, envision the same photograph I do. This also goes for Abu Ghraib or napalm girl. These images …they have encapsulated truth to a degree that is impossible to fully express.”
When it no longer takes any real skill to reimagine (and disseminate) a photo, what remains real?
Benedict Evans takes the optimistic view: “But we’ve had forged text and forged images for as long as we’ve had text and images, and we seem to have survived? The real test seems to be whether people want to believe what’s obviously false, or want to disbelieve what’s obviously true, and that’s not a technology problem.”
Maybe it’s a philosophy problem. Stephen Wolfram thinks so. “Science is an incremental field where you’re not expecting that you’re going to be confronted with a major different way of thinking about things.” TechCrunch sums up Wolfram’s perspective, “a lot of the questions we’re now being confronted with by AI are actually at their core traditional philosophical questions.”
GenAI continues to unlock incredible capabilities, and with it, to unlock even more incredible opportunities to revisit many ancient assumptions and questions around how we create what we believe, and most important—the agreements we make with each other about that reality.
AI+Creativity Update
🤖🎨 Midjourney is now officially free and available to anyone, again. But this time through a browser. Free accounts are limited to 25 generations per day.
🤖🤓 The VC firm Andreessen Horowitz has released its third overview of GenAI Consumer Apps. They remind us how global the space has become and how normalized things like “AI copywriting” have become in realms like dating app messaging. Lots of useful links if you’re looking for that sort of thing.
📹 Here’s a great profile of iJustine, “the YouTuber that never quit” as The Verge puts it. Justine Ezarik’s 18 year journey in front of the camera is a story of tenacity, focus and a keen interest in always telling a story worth watching.
🤖🔎 System prompts are typically unseen code you or I can’t alter, which guides an LLM to behave in specific ways. For example, to “never identify a face,” or “refuse to use words including…” Anthropic has decided to buck the prevailing trend and publish its system prompts. They’re worth reading if only to glimpse the sentiments and preferences of the humans who created the code—and infer how they image their tool engaging. As Kyle Wiggers at TechCrunch puts it, “If the prompts for Claude tell us anything, it’s that without human guidance and hand-holding, these models are frighteningly blank slates.”