Dr. John H. Johnson is a professional economist, author, and speaker. He’s known internationally for his ability to explain highly sophisticated concepts in a simple, straightforward manner. He’s also President and CEO of Edgeworth Economics.
We start off debunking a couple humorous cases of manipulated data (think: pregnant men in the UK and grilled cheese leading to a better love life) and John explains why we’re so susceptible to questionable data. John also shares his thoughts on the relationship between content marketing and data, and some statistics about the credibility boost enjoyed by articles with seemingly scientific graphics.
John’s work on data is amusing, insightful, and useful for anybody that tries to persuade others with data. His thoughts on presenting data in truthful yet engaging ways will surely leave you questioning the data you come across in your day-to-day existence.
Welcome to The Brainfluence podcast with Roger Dooley, author, speaker, and educator on neuromarketing and the psychology of persuasion. Every week, we talk with thought leaders that will help you improve your influence with factual evidence and concrete research. Introducing your host, Roger Dooley.
Roger: Welcome to the Brainfluence Podcast. I’m Roger Dooley. Our guest this week is a numbers guy. He’s got a PhD in Econometrics from MIT, but don’t be put off. His specialty is explaining to people like you and me what numbers really mean, how to use them and how to make numbers work for you. He’s the co-author of the new book Every Data: The Misinformation Hidden in the Little Data you Consume Every Day. We’re going to learn not just how we can all become smarter about interpreting the data that’s constantly thrown at us, but also how to use data to be more persuasive at our marketing and communicating with others. Welcome to the show, John.
John: Oh, thank you. Great to be here, Roger.
Roger: This is John Johnson, I don’t think I said that in my intro. John, one of the effects in your book that really stuck out at me was that 17,000 British men were reported in a medical journal to be having been treated for pregnancy and so is there something in the water over there or should we be looking for new types of male birth control? What’s going on?
John: That’s one of the interesting stories from the book. It’s not surprising, that’s actually a fairly significant coding error on an insurance form, but the point of the story in addition to being a little bit funny of course, is that we talk about data all the time. We talk about data in so many different settings in our life, but sometimes people step back and forget a really basic axiom which is if the data you’re relying upon is wrong, then you can do the most sophisticated analysis in the world and it really doesn’t matter. It’s not going to tell you anything meaningful.
You really need to think hard about what is the data I have and does it just make sense. Does it pass what I’d call the sniff test? That’s kind of the first. That’s always what I talk about when I talk about that kind of information being incorrect and thinking of the pregnant men in Great Britain.
Roger: Yeah. At first I thought it might be related to the grilled cheese lovers and sex phenomenon. Maybe that had something to do with these really fertile English guys. Why don’t you explain about the amazing relationship between grilled cheese lovers and sex.
John: Well, we’re getting off right to the highlights of the book.
Roger: I know. Well, I think that it’s important for everybody to know that this is not a dry book of sort of psychology and statistics and about numeracy or something, but rather it really shows how often we are if not mislead, at least sort of sucked in by stories that don’t have that much basis in fact.
John: Well, I think that’s a good way to put it. Look, numbers are everywhere, but the things that get our attention, the click bait, the commercials, the advertising, the claims, these days the political claims, are things that are sort of have some numbers hanging around them, but are sort of twisted, contorted, slightly … You look at it from one side it’s not quite as objective as you might think.
The grilled cheese lovers having better sex lives and being better people overall was sort of the headline. An odd coincidence is the book had actually come out on National Grilled Cheese Day, but the publisher didn’t like the title Grilled Cheese Sex as much as I thought that would sell a lot more books.
Roger: Wow, that would have been perfect timing.
John: Exactly, but the point of the grilled cheese story is when we actually dug under the headlines a lot more, what it turned out is that that was based on respondents to an online survey on a dating website for National Grilled Cheese Day. I can’t imagine a worse source for finding out someone’s love life than going to see what they say on a dating website when they’re trying to find a mate. Also just sort of the timing and how do you measure love of grilled cheese?
Well, it was self-reported and what does that mean to us? Again, a cute little story, a cute little relationship, but I wouldn’t run out and start eating grilled cheese every day for the hopes that it would improve my love life.
Roger: Yeah, although I makes a great click bait title. When you see that in the margin, what, grilled cheese? Better sex? Got to read it?
John: Exactly right.
Roger: With today’s emphasis on content marketing you see so much of that. I think anybody who reads Every Data will be a lot smarter about how they consume and use data. Most of our listeners are marketers, probably most of them are digital marketers. Hard to be a marketer today without being at least in part a digital marketer. I think that there are a few areas in the book that really resonate. One that you touch on is academic data. Folks who read my stuff online, read my book, know that a lot of the data that I and other people use comes from academic studies. Then the other area is how we use data in our marketing to inform people, to persuade them and hopefully not deceive them.
Looking at the academic side first, so much of what we talk about has its origins in academic research, but you raise some interesting points that when you drill down into a lot of these social science experiments you find that the subjects are often undergrad students at an elite university. The sample size is 47 or something, which when you think about it it’s not really a very good representation of the average citizen or average consumer in the US, much less the world. What are some of the other problems that you’ve seen in academic data reports?
John: Well, I think the big issue, first of all academic disciplines vary widely in terms of how they think about data, in terms of what types of data they can use. You’ve got everything from lab experiments where people are generating data to social science situations where oftentimes you either have an experimental, a laboratory with undergraduate students to then just data collected in the real world. There’s just big breadth of data that is used.
I think the key things when I talk to people about academic studies is not so much that they may not be careful or thoughtful, that doesn’t mean every study is careful or thoughtful, but more interpreting it appropriately for the question that you’re trying to answer is really critical. You’re talking about the undergraduate students that are given beer money basically to participate in some kind of psychology study. Well, that sample may not be very representative of the group of people you’re going to care about with your particular content marketing.
We talk all the time about whether or not an academic study has the fancy term is external validity. You can determine something and look at a sample of people, but then the question is can that be more broadly applied? One of the big things I say when people, I’m a big fan of rigor. I’m a big fan of careful academic work, but you have to think about if you’re going to try to translate that into the real world does the certain restrictions, assumptions, methods that were used, are they really appropriate for drawing conclusions that are going to apply to the specific problem you have? I think that’s the way you can really start to use academic studies effectively.
Roger: Yeah, and we’ve kind of beaten the replication crisis in social science to death in some past episodes so we’re not going to dig into that, but what you see there is even beyond sort of external validity. Just trying to replicate an experiment that was conduct with say Duke undergrads with Berkeley graduate students, that doesn’t always work, much less going out into the real world.
The takeaway for me there is certainly a lot of this academic work can give us a direction to look for answers in and perhaps ideas to test in our marketing, but we shouldn’t necessarily assume that it’s going to work magically, that if the word free doubled the response rate with undergrads that that same thing is going to happen on your eCommerce website or whatever. The nice thing is at least in digital marketing it’s relatively straight forward to test things except in sort of low volume situations.
For any business that has reasonable traffic you don’t have to just rely on luck or somebody else’s data and hope for the best. You can actually test these in your own environment.
John: Well exactly. I think also the notion, one of the things I try to talk about is not so much that people should be terrified of data, but they should feel empowered by it, right? In the world we live in today data is not going away. Data is going to be a fundamental part of decision making. People that can use it effectively to supplement their current thinking, to in some cases replace their current thinking. It really depends on the questions.
I try to be very pragmatic about this. I think that data is within the grasp of everyone and I don’t think you have to be a PhD in statistics to understand it. I do think you have to think hard and bring discipline to the kinds of questions you want to answer. You’re describing the digital marketing context. I think there’s a lot of potential for experimentation, changes, how do people respond, but you just have to think very carefully through on the upfront end what is the question you’re really trying to get to? How are you going to isolate the key concepts you care about and what data is most useful to collect to ultimately determine the answer you want to get at?
Roger: Mm-hmm (affirmative). Sticking with the marketing theme, certainly one of the biggest buzzwords in the last few years has been content marketing and everybody’s looking to create great attention-grabbing content that’s going to get shared and certainly clicked to begin with and then shared once people read it. There’s a big emphasis to, I’m using at least the veneer of science or data to add credibility, so you see articles that will grab some random statistics from the census bureau or someplace to prove that Colorado is the healthiest state or Austin is the best city to live in.
Actually the latter one isn’t from some cheesy content farm. That’s from US News. A survey they just published that Austin is the best city to live in in the US. Since I moved to Austin confirmation bias really says to me that it’s totally plausible and entirely reasonable. This isn’t even a digital era thing. US News and others have been doing that since the print days. Do you think the problem is worse though today?
John: I do and I think it’s partly a function of just the pure volume of information that is sort of bombarding us every day. I have some quotes in a book from IBM about the fact that 90% of the world’s data has been created in the last two years. That’s a pretty phenomenal rate of creation. Now I think you have to think about data broadly. It’s all things. It’s news, it’s Twitter, it’s hard data, structured data, documents, paper, pictures, photos.
There’s a lot of different components to it, but in this world with so much information and this constant need to sort of get attention or how do you get yourself to stand out, I think there’s a lot of room that’s ripe for abuse, but also then I think a second piece to it is there’s a lot, you mentioned confirmation bias in kind of an interesting funny way about Austin, but there’s an awful lot of confirmation bias in terms of what people even decide to read.
They look in today’s world you can look for the news source that actually supports your opinion. That might have very little to do with objectivity. It’s just people tend to gravitate towards that that will confirm their preconceived notions. It is challenging. Then I think I’ll overlay on that something that I’ve spent a lot of time on with the book and actually gave TEDX Talk on and that’s how headlines can translate the data and statistics in a very misleading way, often not intentionally, but it really is exactly where this kind of goes head to head, right?
People not particularly knowing exactly how to cite statistics, people looking for the sexiest most punchy headlines, trying to compete in a crowded space and then misrepresenting what the numbers actually mean, sometimes intentionally, sometimes not. How do you make sense of all that?
Roger: Right. I think on that topic there is a part of it is certainly just a lack of numeracy or skills in that area where you’ve got say a risk of cancer that is .01 under one condition and .02% under another condition. It basically highly improbable no matter what. But the headline will be something doubles your risk of cancer. It’s true, but it unnecessarily scares people.
John: Right. I read one that was sort of how do you want to get, Drink A Beer, It’s Good for Your Brain was the headline.
Roger: Confirmation bias again. It sounds good to me.
John: Right, exactly. When I looked at the story that was about that it was based on a study of lab rats. Yes, it was true that the lab rats, when they gave them beer it made them happier, but if you sort of scaled it up I would have to drink the equivalent of 28 kegs of beer to be as happy as these rats.
Roger: Wow. That’s a lot of beer. Yeah.
John: That doesn’t make a lot of sense. I don’t think I’d be happy. I think I’d be dead. The point is there are these things where things get translated and it was a cute story. I’m like great, beer is good for my brain. I like that. Then you sort of look at what the numbers were and it really was a stretch from the reality of what the story actually was about.
Roger: Yeah. The biggest problem I see with some of this number manipulation is that people make important life decisions based on it. I spent years looking at the higher-ed space with College Confidential, a firm I co-founded, and so many students and families built their college lists which is really sort of a life altering decision. You only get to make that undergrad school decision once. They built that list from US News rankings.
It’s an easy trap to fall into because there’s thousands of colleges and universities in the Unites States and you can’t possibly investigate them all or know about them all so you start looking for ways to filter that. Some folks may use logical filters like I want someplace that’s no more than a two hour drive away for whatever reason. Or I want a school on an ocean that has a good marine biology department. Those are sort of logical filters.
To winnow this list of thousands down to five or 10 or a dozen, US News provides this sort of really easy tool. Well, these are the best and they really, people often believe that these reflect true quality differences because they don’t really dig into the data and what’s underlying them and what the assumptions are. Not that those metrics that they use aren’t necessarily valid for some purposes, but to combine these in a specific ratio and say okay, well this year Harvard was better than Columbia, but Columbia is better than Princeton and so on really is kind of a ridiculous thing. Unfortunately that’s the way people think.
John: People love lists, right? They love to see rankings, lists of the best hospitals, lists of the best doctors, lists of the best schools. The problem is there are some, yes, US News has their criteria and they’re going to base it on some things and there’s a lot of detail in the US News Magazine if you bother to look and see how they’re coming up with their rankings, but that’s not what most people do. They don’t dig into that. They just see the list and number one to 10. Okay, I should apply to Princeton because that’s number two. Or I should apply to MIT because that’s number one without really getting into the guts of how are those lists generated, what are the criteria that sort of put someone on the list?
Some schools don’t participate in US News and World Report anymore. Does that mean they’re not good schools? No. We love to think about rankings, but again it goes back to there are actually numbers under lots of these things. It could be that US News has got a series of factors they weigh that is appropriate for your particular high school senior, but for a lot of us there’s going to be some unique set of things that are going to weigh very heavily. Again, there is no substitute for actually digging in harder to sort of see what the numbers mean and how they apply in your personal setting.
Roger: One thing I’ll give US News credit for is that they do provide pretty good transparency on their ranking factors and how they combine them and so on. I don’t know if they still do that, but you used to be able to apply your own ranking factors or modify the numbers in the weighting to create your own list. That’s good and I think that if you’re going to do that kind of a rating scheme, whether it’s the most livable city or whatever, transparency is a good idea and preferably even let folks modify it if they can. Although obviously you don’t always have the ability to create kind of easy tool on a one shot article.
John: Yeah. You need to just know what the criteria are. One of the things that I talked about in the book a little bit was buffalo chicken wings helped rank Buffalo as the third best food city in the world. That seemed kind of …
Roger: Wow. I grew up in Buffalo and that’s a stretch.
John: Okay. Well alright, and so I thought that’s kind of odd. I actually also grew up in Buffalo so when I looked at it, it actually turned out it was National Geographic had put together a list of cities that were known for a single iconic food. The buffalo chicken wing, or if you live in Buffalo it’s just wings, but for everyone else it’s buffalo chicken wings, right? Pasta Bolognese for Bolognese, Italy.
What it really was was not a ranking of best food cities, but cities that were associated with a specific food. The headline was picked up as Buffalo is the Third Best Food City. That’s not quite what it meant.
Roger: Right. Apparently they missed beef on weck there too, which is the something else that’s really important in Buffalo.
John: That’s right. That’s a big part of Buffalo food.
Roger: While we’re on the topic of rankings, in the book you touch on Google and SEO, search engine optimization. Since the earliest days search marketers have been trying to crack the code, figure out how to rank on page one preferably at the top of the list. Initially you could actually do that with just a little bit of analysis. You could figure out the exact formula, but of course that hasn’t worked for years and years and now Google claims to have 200 ranking factors. That might be true.
It might be a smokescreen at least in part, but people like Moz publish data on what top ranking pages look like. They note that they’re talking about correlation, not necessarily causation, but I think a lot of people again misuse that data. They say okay, great. The best formula for if you want to rank number one is if you have a 1,500 word page link. I just made that up. I have no idea if that ranks well or not. I think SEO experts actually have a clue.
Just a couple weeks ago we had Stephen Spencer on who’s the co-author of The Art of SEO. Probably one of the most common beliefs, I won’t necessarily call it a myth because I don’t know that it’s a myth for sure, is that Google uses social media shares in its algorithm. It’s a very logical assumption. You would think that gee, if I were Google I see this page had been shared by thousands of people on Facebook that that would be an indicator that okay, there must be some quality here. What Stephen would say is that actually they don’t do that for a variety of technical reasons and perhaps even legal reasons, but there’s a correlation between shares and the quality of content.
When you have a lot of shares you get more links which are links are a known ranking factor. Everybody believes that to be true in other things too. Like the time on site. If an article is really good, you’re going to spend more time reading it. That’s something else that Google supposedly likes. Trying to disentangle correlation and causation is always challenging.
John: Well, and that’s one of the really tricky problems I think because there really it is truly the case that how are you getting at what are the real causal factors? Yes, there is some formula or set of formulas, I don’t know any more than you do about what the SEO actual formulas are, but however their process is you’re going to think if you step back objectively we’re trying to get at measures of quality and relevance and the types of things you expect when you put some search term in.
Given that, so many of the metrics you might rely upon to think about that, whether it’s social media likes, whether it’s forwarding, whether it’s time on the site, they’re going to be highly correlated with quality and relevance. Disentangling here’s the way I get myself up on the list because I do these magic things, that’s a pretty hard problem. The other thing is that algorithm or series of algorithms is likely changing over time, being updated constantly.
Even if you think you know today how to do it, there’s no guarantee tomorrow. I hate to be too pragmatic, but I’d be like produce the best content you can.
Roger: Right, right. Yeah, that’s the general advice of experienced SEOs too. Like don’t worry that much about the details beyond some sort of obvious on page stuff.
John: When you write a book, this is my first book and when you write a book you have an entire series. I’ve gone through all the effort to, I had a blog for a full year before I wrote the book. Social media, I have a fairly active Twitter account and things like this. I have a blog on the Huffington Post now where during the political campaign I particularly talked a lot about numbers in polling.
The point is to this relevant to what you’re saying, I’m always amazed when we sort of see which of my blog posts end up high on Google or not when you do searches on polling or polling expert. Sometimes I write things that I think are great content and they don’t get as much traction. Then other times I’m like, “Wow, I’m at the top of the list on Google.” It’s an interesting reality as a content provider. You don’t always know what your market is going to respond to, but you also don’t know if it’s cumulative or otherwise.
That’s an interesting data problem as I said applying to something completely different that I’m somewhat new to. I’m an economist. I wasn’t used to this whole world of digital content that I had to create being an author.
Roger: Right. Similar to Google rankings is viral activity. Often as a writer produce a piece of content that you think is absolutely great and you know a million people are going to share it and it sort of lands with a thud. A few people say, “Hey, that’s pretty good,” but other than that it doesn’t get traction where something that you might consider to be an inferior piece for whatever reason takes off and gets shares. There’s sometimes there’s a luck factor too.
Another technique for presenting data that you talk about which can be let’s say interesting is changing the vertical axis of a graph. For instance if somebody runs a test and they find that changing from a green buy button to a red buy button increases the conversion rate from 10 to 10.5%. If you just present that on a graph it’s not going to look like much of a difference, but if you change the vertical axis so that now it starts at nine and suddenly that half point difference is going to look really huge, it’s kind of on obvious thing.
It seems like everybody does it. I’ve been surprised though when paying attention to academic papers I’ve seen serious academics doing the same thing. I’m sure if you ask them they would say, “Well, it’s just so that we can better illustrate the effect of our intervention.” It’s still really creates the appearance of a dramatic effect when the effect really wasn’t that dramatic.
John: Well, and this is sort of really one of the interesting things that I think marketers and anyone who relies on data has to deal with. There’s an inherent tension. Obviously if you’re doing sound science, if you’re trying to be true to the numbers, you want to report your results in the most honest straight forward way possible. I think that requires a degree of transparency. That requires you to be honest about what the strengths and weaknesses of your results are, those kind of things.
At the same time you’re also trying to make an argument often with data where you’re trying to present it in the most compelling way to emphasize the points that you think the data supports. I think things don’t have to be competing interests, but they can be, right? I don’t have a good feel for whether that half a percent that you described change is big or small relative to what other types of things I could do to my website, other types of buttons or things like that.
You’d want to put that in some type of perspective. We take you through a series of examples in the book where we take the exact same study on exercise and mortality and present seven graphs in all different ways and show how I can tell you seven different stories just based on how I change the axis or how I group the data and things like that. That’s one of those things though where I think a degree of awareness and thought is really important with this, just being thoughtful about how you organize the data and being true to the data at the end of the day is a pretty important concept to have credibility.
Roger: Mm-hmm (affirmative). Speaking of graphs, you mention a study that I’ve written about myself that when you add a graph or chart to content, it becomes so much more persuasive. In the study you cite two-thirds of the people who read an article just in text form agreed with this conclusions, but when they added a graph to it that said exactly the same things that the text had said, suddenly 97% of the people agreed with the conclusions of the study.
Really a dramatic impact on the credibility of the article just from that graph. I think you missed an opportunity there, John. You could have included a graph showing that difference and maybe even started the axis at about 50% to emphasize it.
John: Yeah. No, I think that’s right. You can imagine, we had a number of different … It was pretty funny, we had a number of different cuts of those graphs as we tried to figure out what we could put in the book or not in terms of the right visuals to draw the biggest contrast. It was an interesting exercise too because that’s not what I’m used to doing. I’m just trying to display data in the most honest way possible, not think about it the other way, what would be the most dishonest way possible. That was sort of an interesting exercise.
Roger: Right. I think that’s really important though our listeners that just the presence of a chart can be really very important if you’re trying to communicate. That would be true if you’re writing content and sort of just choosing a stock photo that vaguely is related to whatever it is you’re writing about. That chart could make it more effective in persuading people and more sharable too.
John: What I don’t know from the study that you’re talking about that I cite in the book is what we’re seeing when you see a result like that a function of just people pay more attention when there’s a picture or is it a case that it sort of appeals to a broader set of learning styles? Part of interpreting data is what is it that registers with you as a person? There are auditory learners, there are people who sort of visually can learn. There are people that are really sort of pure mathematics type that it’s just give me the numbers.
I think that from my background when I’ve taught these types of things that the key to also using data effectively is making sure you hit on enough of the different learning styles of your potential audience that you can have the broadest possible impact. That’s a place where pictorial versus a number versus a text can really make a big difference.
Roger: Yeah. I’m sure some people just see this text and because of the way they consume information they sort of glaze over and just don’t find it as persuasive. It sort of reminds me of another study that had neuroscientists evaluate an academic article. Interestingly enough the article was believed to be more credible when images of brain scans that didn’t actually illustrate the point of the article, these were semi-random FMRI brain scans were included.
Somehow, they added a apparently a science-y veneer to the conclusions and made them more credible. This was not a lay audience where you figure okay well, a lay audience might be influenced by extraneousness brain scans, but this was actually an audience of neuroscientists. Quite, quite interesting.
John: Yeah, that is actually fascinating.
Roger: John, explain about cherry picking and how Gerber used or uses that technique to sell baby food.
John: Well, so the cherry picking example is actually from a case that was actually litigated in the late ’90s where the Federal Trade Commission actually engaged with Gerber over claims that four out of five pediatricians prefer Gerber baby food. If you actually looked at the underlying data behind that claim you would think if four out of five, that’s 80% of the pediatricians. It actually turns out it was only 12%.
How do you get to 12% and still say four out of five? Well, they started with a sample of pediatricians and the first thing they asked them is, “Do you recommend baby food to your patients at least once a week?” A fairly large percentage did not. They dropped them out of the survey. Then they say, “All right, well do you recommend a specific brand of baby food to your patients in a given week?” Another larger percentage didn’t have any brand in mind so they dropped them out.
Then of the subset of pediatricians who both recommended baby food once a week and recommended a brand of baby food once a week, it was true that four out of five of those pediatricians recommended Gerber baby food. Gerber had high brand recognition amongst those that actually recommended a brand, but the majority didn’t recommend baby food or of the ones that did also didn’t recommend a specific brand. That’s a pretty big difference in terms of how you might think about the numbers.
Roger: Right. Probably most folks have difficulty coming up with another brand of baby food than Gerber. Have some availability here that they’re going on there too I guess. John, I think that Every Data can really be useful for marketers who want to be persuasive, but don’t want to cross ethical lines. Do you have any other advice for them?
John: Well, I guess what I always sort of tell people is that again, I really think that data is a part of a narrative in a story. We can learn a lot from data, we can put things in context. There’s a lot of different functions for data though. Data at times is helping us to just know, I call it like a photograph. It’s descriptive. What’s going on?
Then there’s times when we’re trying to draw really careful statistical relationships. We think we want to consider the two or three different options. How can we use the data to sift through these things? I think the biggest thing is that data in and of itself is not very useful without a fair amount of thought and I always say the upfront thought is really what are the questions you want to answer and is the data going to give you the right answer or not?
It doesn’t have, it’s not a magic bullet. It can’t answer every question, but I think it can be very, very important compliment and at times it can be definitive at times, but I just think being very realistic and very thoughtful about your use of data is the best way to make sure you’re not mislead by it and not misleading others with the data you have.
Roger: Makes sense. John, where can people find you and your content online?
John: You can find me on my website at www.JohnHJohnsonPhD.com. There’s links there to my Twitter feed @EveryData and my blog on the Huffington Post and all sorts of other things like that.
Roger: Okay. We have been speaking with John Johnson, the co-author of the new book Every Data: The Misinformation Hidden in the Little Data You Consume Every Day. We will link to John’s website, to Every Data and to any other resources we mentioned on the show notes page at RogerDolley.com/podcast. We’ll have a text version of our conversation there too. John, thanks for being on the show.
John: Thank you so much, Roger. It was my pleasure.
Thank you for joining me for this episode of The Brainfluence podcast. To continue the discussion and to find your own path to brainy success, please visit us at RogerDooley.com.