169: How do we evaluate creativity? Part 2
+ Google Shopping AI case study; ChatGPT o1 reactions; and a slew of AI+Creativity
I don’t know any creative people who have rejected AI outright. Most appear to remain on the sidelines, dabbling yet curious. A few creatives are shipping remarkable AI case studies, like
at Addition. If you’re looking for a thorough description of how AI can enable and fuel a creative concept—meaning, an idea which would have been impossible without AI—check out Paul’s case study for his firm’s recent out-of-home campaign promoting Google Shopping. (And the same could be said about Paula Scher’s recent story—check the links at the end for expanded perspective.)Addition’s work bridges the realms of what I label “little c” and “BIG C” creativity—which I described back in October for part 1 of a series asking, “How do we evaluate creativity?”
Evaluating creativity, Part 2
As a reminder, one of my students asked a fair question a while back: How do creative leaders and clients evaluate creativity? Are there consistent rules, hallmarks or criteria a student could examine, to prepare for entering the ad business? The short answer is, “No, dear student. There are no singular criteria. Good luck.” Which is another way of saying this is a topic leaders should be talking about. How decisions get made shouldn’t be a mystery.
Yet, as part 1 revealed, there’s still plenty of opaque bureaucracy among agencies of all sizes and focus. Here are three more survey questions, along with salient quotes in response.
🧐 Does your agency connect its creativity with economic or behavioral results? In other words, does your firm have Key Performance Indicators (KPIs) or similar methods they use with clients to answer this question: "How will we know our creativity will work?"
No to economic or behavioral, but Yes it needs to meet certain criteria outlined in a brief.
Our creativity is measured on the buzz it will create for the client. Are we getting more visits to the website? Are people talking about the brand? Have we done our job bringing them into the conversation?
No. I keep it clear that what [we] do has an element of the intangible, although we do track recall, clicks, facebook followers & engagements.
Yes. We talked a lot about the business impact of what we were doing and expected our creatives / designers to be comfortable with that conversation
* * *
🙋🏽♀️ How do your clients talk about creative effectiveness? What factors do they cite as methods or metrics for evaluating the efficacy of the ideas you're pitching?
Unfortunately, it’s usually based on short term results like impressions, engagement, click through rates, attendance numbers, etc. But they ignore more outcome-based measurements that focus on whether or not the creative asset led the customer along the journey and moved them to the next desired step or action, or the next phase in the sales funnel.
At the end of the day, clients measure results by sales. But they can't and won't ignore the fame building ideas and uptick eyes on the website. The savvy ones can see the time spent on every page.
A lot of what our team deals with is based largely on eComm and/or social data and if it jives with those strategies along with the brand strategy.
Needs to be “unique” in the marketplace, needs to get speak to a wide range of target clients, not just the high-performing ones.
* * *
🥇 Rank the factors your agency uses, by importance, to evaluate its creative output.
I used a wonky format in my original survey, asking for individual ranking (i.e. Definitely Most Important, Somewhat Important, Important, Least Important, Not a factor) per topic. I’m not a statistician. So I asked my friend ChatGPT how to make this make sense, and well, here you go.
Again, what fascinates me is the lack of any consistent or even standout metric for evaluating creativity across the range of agencies, tenure, industry, or media-focus. You might think we’d rather not have them. And maybe that’s the point I can convey to my student. Creativity, by its very nature, resists a homogenous approach to evaluation.
But that doesn’t mean we shouldn’t talk about it.
ChatGPT o1 released yesterday
If you’re paying for a Plus account, you’ve got this new version available to you now. Access it by selecting the model menu, which likely defaults to GPT-4o, then select “o1.” (OMG, their naming systems are quite awful.) Anyway, remember back when the models seemed to change and improve often? Perhaps we’re back in a similar cycle, as OpenAI shakes things up with their “12 Days” marketing concept—and we’ll see what 12 pieces of news they have to reveal now until the holidays.
Is o1 better?
I think it might be.
But I only ran one test. I took a fairly complex assignment I had been trying to solve with GPT-4o. It involved reviewing multiple URLs, and structuring a layered research and analysis with lots of back and forth dialogue—this is an assignment where I want to engage with the LLM to eke out potential directions for further exploration. So I fed the same prompt to the new o1.
First, I noticed the system spent more time “thinking.” The first paragraph of o1’s response suggested o1 was trying to solve my assignment in a different manner than 4o, employing “I’ll ask you some clarifying questions to better tailor the information.” This felt more like the o1-preview “reasoning” approach. The initial follow up questions posed by o1 illuminated insights which took several interactions to elicit from 4o. In this one test I felt like I was getting to a deeper level of useful discussion faster than by using the older 4o model.
I’m not the only one to suggest thinking of LLMs as infinite interns. But the metaphor makes sense here—as an intern, o1 was better organized, offered keener synthesis, seemed to grasp my intentions better, than the previous 4o model.
Obviously there’s a lot more to pursue. If you’ve got o1, I’d love to know what you’ve been learning about its peculiarities.
More AI+Creativity
🦻🏼🎄Apple’s Holiday Ad
We’ve got two hearing aid wearers in my family. So, as I said when they were announced, I’m sold on FDA-approved AirPod hearing aids. Nice to see Apple focusing their advertising might on this topic for the holidays.
🤖🎨 I’m with Paula Scher
“Government performance reports aren’t known to be page-turners. What if we could change that?” She and her team at Pentagram just released their work building Performance.gov for the US Office of Management and Budget and General Services Administration which involved a lot of generative AI. Four videos in the Pentagram case study explain their process. Mia Blume at Designing with AI offers a robust analysis.
🤖🤔 Generative Worlds
The woman who ran a lot of Google’s original LLM discoveries, Fei-Fei Lee, wound up creating World Labs. And this week her firm teased their first solution, a means to generate a 3D world from a single image. (Lee’s TED Talk is also worth your time.)
🤖 😵💫 Shaking up narrative flow
The video-generation platform Runway released new research this week exploring non-linear video editing. At least, that’s how I defined what they’re calling “mapping latent space.” This is important work, because it’s deconstructing all kinds of assumed UX and process methodologies which actually limit creative exploration. These tools aren’t live yet, but you can see how using AI to re-imagine ways in which narrative flow has been assumed to work might unlock fresh perspectives.
🤖 ✏️ Think different?
And all of the above is central to a canny thesis offered this week by Nate Jones in his newsletter.
“We're so focused on making AI think exactly like we imagine we think that we might be missing the new forms of intelligence emerging right before our eyes.”
Maybe it’s not about replacing jobs. Maybe it’s not about being more efficient.
Maybe it’s about thinking in ways we’re not accustomed to.
And isn’t that the whole point of creativity?
🎶 Today’s post was written to Doechii, then the band la lune’s shoegaze rendition of Fleetwood Mac’s “Dreams.”
love all this. Also: I was sending the la lune video around this week. Maybe our algorithms are similar? A, its a good cover, but B: Listen to how the kids help each other and support each other in the creative process. Different generation? Good parenting? And, that drummer really knows what hes doing.