The Sound of Money — 3-part series
Part 1 — Does Jim Cramer's Voice Move Markets? Part 2 — What Your Lender Is Really Hearing Part 3 — Can AI Hear What We Can't?

Find two clips on YouTube. The first: Jim Cramer on Mad Money, calling a stock. The second: Andrew Ross Sorkin on Squawk Box, covering the same company on the same day. Watch them back to back. The information is nearly identical — same stock, same earnings numbers, same analyst consensus. And yet notice what happens in your body. Cramer's clip makes you lean forward. Your pulse ticks up. You feel urgency. Sorkin's clip feels like sitting across from a thoughtful colleague. Same information. Completely different experience.

Here's the question nobody in finance is seriously asking: does that difference show up in how people trade?

We've been reading the news wrong

Over the past two decades, a whole field has emerged around measuring media sentiment and its effect on markets. Researchers scrape headlines, run them through NLP models, and test whether positive or negative language predicts trading behaviour. It does. Paul Tetlock's foundational work showed that unusually pessimistic language in the Wall Street Journal predicted downward pressure on stock prices and increased trading volume.

But this research shares a single critical assumption: that the words are what matter. Every study in this tradition takes audio and converts it to text before analysis. The voice is stripped away. The pacing is discarded. What remains is a transcript, and the transcript is what gets analysed. This is a bit like studying music by reading the sheet score.

What the voice carries

There's a term in linguistics for the information carried by how something is said, as opposed to what is said: paralinguistic cues. These include pitch, pitch variability, speech rate, pause frequency, and loudness dynamics. Research in psychology has established that these cues carry substantial emotional information — sometimes more than the words themselves. When someone speaks quickly with high pitch variability and few pauses, listeners unconsciously register urgency. This happens below conscious awareness.

The mechanism has a name: emotional contagion. Coined by psychologist Elaine Hatfield, the theory holds that humans automatically synchronise with the emotional expressions of those around them — including vocal tone. We don't just hear emotion in a voice. We catch it.

Why this should matter to finance

If emotional contagion operates through broadcast media, then the delivery style of financial broadcasters is not neutral packaging. It is an active channel through which sentiment is transmitted to audiences. A retail investor watching Cramer deliver a segment on a tech stock that missed earnings receives the same facts as someone watching Sorkin — but Cramer's delivery transmits urgency. The viewer feels that something needs to be done now.

The Elaboration Likelihood Model, developed by Petty and Cacioppo, explains why. When people are less motivated or less able to process information carefully, they take the "peripheral route" — relying on surface cues like the speaker's confidence or emotional intensity. Most retail investors, most of the time, are on the peripheral route. They are not building discounted cash flow models while watching Mad Money. The voice is the message.

A new metric: the Acoustic-Textual Divergence Score

If we can separately measure the sentiment of what was said (textual sentiment, using standard NLP) and how it was said (acoustic sentiment, using audio feature extraction), we can calculate the gap between the two. I call this the Acoustic-Textual Divergence Score (ATDS).

A high positive ATDS means the broadcaster sounds more alarmed than the words warrant — the text says "modest disappointment" but the voice communicates "disaster." A high negative ATDS means the broadcaster sounds calmer than the content justifies. Both are interesting. But the positive case is the one most likely to trigger overreaction. If high ATDS segments predict abnormal trading volume or short-term price overreaction, we'd have evidence that the paralinguistic channel is doing real work in financial markets.

The tools to measure this exist. Pitch, speech rate, and loudness can be extracted from audio using open-source libraries. Textual sentiment is a solved problem. The missing step is that nobody has systematically combined the two in a financial context.

Why this matters beyond Wall Street

If broadcaster delivery style can move retail equity markets, the mechanism doesn't stop there. Anywhere humans make financial decisions while consuming media, the same dynamics apply. Lenders watching a BBC investigation into greenwashing don't just process the facts — they absorb the tone. Valuers reading a Financial Times column on commercial real estate risk are influenced not just by the argument but by the urgency of the prose. This is particularly relevant in property and sustainability, where media narratives about climate risk and regulatory tightening are becoming more frequent — and more dramatic.

In Part 2, I explore how these dynamics apply specifically to property markets — where lender confidence, valuation assumptions, and ESG narratives intersect in ways that most people in real estate aren't thinking about yet.