Attention heatmaps for ad creatives
How attention heatmaps for ad creatives work, how accurate saliency prediction is on statics, and the pre-flight checks worth running before you spend.
How attention heatmaps for ad creatives work, how accurate saliency prediction is on statics, and the pre-flight checks worth running before you spend.
You get two statics back from design. Version A has a big lifestyle photo, a model smiling at the camera, your offer in the corner. Version B is plainer: product, claim, price. You run both through a heatmap tool and version A lights up bright red on the model's face. Total attention. Looks like a winner.
Except none of that red is on your offer. The face won the glance and kept it, and the thing you are paying CPMs to communicate sits in a cold blue corner. The heatmap did not tell you which ad will convert. It told you something narrower and still useful: in version A, the eye never gets to the part that does the selling.
I run a saliency pass like this on every static in my own pipeline, so I have strong opinions about where these heatmaps help and where the vendors oversell them.
The takeaways
Strictly speaking, nothing. A predictive attention heatmap is computed by a saliency model: a neural network trained on datasets of real human eye-tracking recordings, which then estimates, pixel by pixel, where a first glance at a new image will most likely land. The red zones are high predicted fixation probability. No person looked at your creative.
That puts it in a different category from two things it gets confused with. Live eye tracking measures real participants with cameras, costs more, and takes days per round. Click or scroll heatmaps (the Hotjar kind) record what visitors did on a live page, so they need traffic you have not spent yet. The saliency prediction needs neither people nor spend, which is the whole appeal: you get the read before the first euro leaves the account.
The trade is that you are reading a model's guess about the first seconds of pre-conscious attention. Neurons, one of the vendors in this space, frames its start-attention maps as the first two seconds of exposure. That is the window these models speak to.
Accurate enough at the narrow thing they do. Attention Insight, comparing its predictions against live eye-tracking studies, puts commercial saliency models at 90 to 96 percent accuracy, with academic models scoring a bit higher on public benchmarks. DeepGazeIIE, the model I use, comes from that academic line and has led the standard saliency benchmarks for static images.
But hold on to what "accurate" means here: agreement with where real fixations land in the opening moments on a still image. The model has never seen your offer, your audience, or your price point. It cannot tell a winning ad from a losing one. Two creatives can produce near-identical heatmaps and perform a 3x apart, because performance lives in the message, the offer, and the match to the person seeing it.
So the honest job description is small: the heatmap tells you whether attention even arrives at the elements that carry the message. Whether those elements persuade is outside its pay grade. It predicts attention, not persuasion.
Three checks, two minutes per creative:
One caveat from the version A story above: a face glowing red is not automatically bad. Faces pull attention in nearly every saliency dataset, and a face gazing toward your headline can hand the glance onward. The fix is rarely "remove the face"; it is usually "make the face look at the thing you sell".
And keep the verdict advisory. In my pipeline the saliency pass flags problems; it never rejects a creative on its own, because a human can see context the model can't. The heatmap is a pre-flight check. The proof is still the live test, and reading that test without fooling yourself is its own discipline.
Statics only. A saliency model trained on still images has nothing valid to say about video: motion, cuts, and audio rewrite where attention goes, and a frame-by-frame heatmap of a video is a still-image answer to a moving-image question. I deliberately do not run scan-path prediction on video in my own stack for that reason. When a tool sells you video heatmaps from a static model, ask what it was trained on.
The same restraint applies inside Adscalr's creative workflow: every generated static gets a DeepGazeIIE pass with the per-pixel heatmap, the scan-path order, and region scores, the flags land next to the draft, and a person decides what ships. If you want to see how that fits into the rest of the creative loop, that page walks through it.
This is the thinking behind Adscalr.
See the product →