01Apple Podcasts spec and the cover-art pipeline
Apple Podcasts is the dominant directory for cover-art consumption (roughly 60 percent of US podcast listening as of 2024 per Edison Research):
- 3000 x 3000 pixels minimum (raised from 1400 x 1400 in 2019 for app-store directory parity).
- 1:1 square aspect ratio, no exceptions.
- RGB colour space, sRGB only. Adobe RGB and CMYK rejected.
- JPG or PNG. PNG with alpha is supported but rarely useful since the directory composites onto a default background.
- File size under 500 KB for the directory thumbnail.
The host portrait is rarely the entire cover (the show name and graphic identity dominate), but the host's face often appears as part of the composition or as the secondary host-bio strip. The portrait master needs to be 3000 x 3000 sRGB to cleanly support layout. Spotify's spec is more permissive but recommends matching Apple for cross-platform parity. Pocket Casts and Overcast pull from the same RSS feed art automatically.


02The microphone and headphones conventions
The microphone in frame is the dominant cue for a working reason: directory-discovery views are pre-subscribe, and the first glance has to communicate "this is a podcast" before the show name registers. A headshot without audio context reads as generic professional portrait; a headshot with a Shure SM7B, Rode NT1, Electro-Voice RE320, or Neumann TLM 103 reads as podcast-host immediately.
The Shure SM7B (the iconic broadcast cardioid Joe Rogan, Jocko Podcast, and most successful video-and-audio podcasts use) is the most-photographed since 2018. The Rode NT1 sits as the entry-level studio favourite. The Electro-Voice RE20 is the broadcast-radio classic NPR-tradition shows favour.
Jonathan Mannion, the New York hip-hop portrait specialist who has shot Jay-Z, DMX, Aaliyah, and Eminem, has crossed into podcast-host portraiture through Drink Champs and Math Hoffa shoots. His convention: SM7B large in lower-frame, slight upward camera angle, dramatic side-lit register that reads music-and-culture rather than corporate. Joel Caldwell's Brooklyn studio runs a parallel register for business and interview podcasts, with cleaner light and the microphone slightly less dominant.
Headphones around the neck or with one earpiece on read as in-the-middle-of-recording rather than studio-portrait staged. Audio-Technica ATH-M50x for prosumer, Sony MDR-7506 for broadcast tradition, AKG K371 or Beyerdynamic DT 770 for audiophile-tilt, Bose QuietComfort for consumer-friendly. A wellness podcast hosted by a yoga teacher should not appear with broadcast-bulky AKGs; a serious music-criticism podcast should not appear with consumer Bose.
Want to see what yours would look like? Preview ten styles in about three minutes.
See a preview →03Lighting register and show-genre wardrobe
The lighting register sits between the speaker register (warm and energetic) and the music-portrait register (dramatic and characterful):
- 1m softbox key at 30 to 45 degrees subject-side, slightly higher than corporate, often warmed with 1/4 CTO gel.
- 60cm softbox or strip-light fill at 1:3 to 1:4.
- Hair light or rim with strong fall-off.
- Background light optional, often coloured gel (deep red, electric blue, amber) for music-and-culture register.
The aesthetic is closer to a record-cover portrait than corporate annual-report. Mannion uses vintage Profoto Acute strobes and 1m gridded softboxes for controlled fall-off; the same setup translates to music-podcast register cleanly.
Wardrobe varies by genre more than by host's day-job:
- Music and culture. Streetwear-adjacent, dark jacket over solid tee, characterful eyewear, hat sometimes works. Joe Budden Podcast, Drink Champs.
- Business and interview. Business-casual, blazer over open-collar shirt without tie. Tim Ferriss Show, How I Built This, Masters of Scale; portraits often pull from the same delivered file as the host's LinkedIn profile and trade-press features in Forbes, Inc., and Entrepreneur.
- News and politics. Closer to broadcast-anchor register, blazer-and-shirt or simple solid blouse. The Daily, Pod Save America, Up First.
- Health and wellness. Warmer earth-tone palette, relaxed knit. Huberman Lab, On Purpose with Jay Shetty.
- Comedy. Most permissive, often very casual or character-aligned. Smartless, Conan O'Brien Needs a Friend.
The genre signals wardrobe more than the host's other professional context. A management consultant running a fitness podcast should dress closer to fitness register, not corporate.
04Day rates and the 9:16 derivative
- Mid-market $300 to $700. 60-minute slot. Local commercial-portrait studios with audio-equipment props. Produces a 3000 x 3000 master plus square and vertical derivatives.
- Mid-premium $700 to $1200. 60 to 90 minute slot. Genre-specialist photographers like Joel Caldwell and operators in Los Angeles, London, Austin.
- Premium $1200 to $1500+. Publication-tier profile or music-portrait crossover. Mannion-tier or comparable runs higher when the deliverable feeds a music or culture publication.
Many hosts at the mid-market tier shoot at home with available light and a tripod, then commission cover-art design. This works for a show's first 50 to 100 episodes; an audience-building show usually justifies a $500 to $800 session by the time listener count crosses 5000 per episode.
Hosts increasingly need a vertical 9:16 for Instagram Reels, TikTok, and YouTube Shorts clip-show distribution. Modern sessions shoot wide so the master crops to 1:1 (Apple square), 9:16 (Reels and Shorts), and 16:9 (YouTube thumbnail and conference slide). The 9:16 derivative is what older portraits lack; the social clip-show distribution model only became dominant from 2021 forward. A session shot before 2021 likely needs a re-shoot or fresh derivative session.
05The microphone-without-host failure mode
Show-art design templates often produce cover art with a microphone but no host face. Fine for the cover art but leaves the host-bio strip without a portrait. A brand that grows past the early-discovery phase wants both the cover and a host portrait at directory-spec resolution. Working brands commission both, often by different specialists. Conflating the two produces compromise output for both.
Two podcasts can have nearly identical genre, budget, and equipment and produce wildly different portraits because micro-decisions diverge: Shure SM7B versus Rode NT1, ATH-M50x versus DT 770, vintage gel-lit versus clean studio, dark palette versus mid-tone earth. Each is a vote on what the show is.
For the platform-profile context see the linkedin headshot ideas spoke. For the speaker-and-conference overlap see the speaker headshot ideas spoke. For the author-and-publication context see the author headshot ideas spoke.
For solo AI-generated stylised podcast-host aesthetic portraits where the directory-art deployment is supplemental, MyPhotoAI generates single-person output in casual-to-business registers from 5 to 15 selfies. Useful as a placeholder for a podcast just launching, where the actual session with audio-equipment props and directory-spec resolution remains the working choice for an audience-building show. Starter plan is $15.
For solo AI-generated stylised headshot portraits.
Skip the $400 studio session. Upload five selfies, get HD headshots back in minutes.
Try the generator →


