Wednesday, June 27, 2007

Totally New Concept: Statistical Sampling

From QandO: Liberal Media Group argues liberal media isn’t really liberal

Media Matters is pushing back against the MSNBC story about the donation patterns of journalists by arguing that MSNBC only cited a small sample of all journalists.

  • MEDIA MATTERS: Kurtz claimed "a lot of journalists" are giving to Dems — but number giving at all is tiny percentage of whole
  • MEDIA MATTERS: Kurtz again cited report on journalists’ donations without noting that only tinyfraction gave at all

Yes, if only there was some method of making inferences about a population based on data from a smaller sample.


It really makes me wonder if someone at Media Matters ran this "argument" by a colleague and they responded "Yeah! That's gold!"

At this point you need to stop trying. You're embarrassing the species.

9 comments:

Anonymous said...

What's an embarrassment to the species is that neither of you seem to know what statistical sampling is.

QandO writes: "...if only there was some method of making inferences about a population based on data from a smaller sample."

Well, gee, yes there is. But just because somebody takes a sample of a larger population doesn't make it statistically valid and inferences based upon the sample accurate.

MSNBC didn't take a sample in such a way that would leave you able to make inferences about the larger population as a whole. This is not to say that whether the media is liberal or not, just that this report is not evidence of it. All it does is suggest, with certainty, I'll gladly add, that of the journalists who donated to campaigns (a tiny fraction of all journalists- which is Media Matters point), the great majority gave to Democratic candidates. But it proves nothing about the political leanings of journalists as a whole- which some media outlets have tried to suggest it does-that is, that most journalists are liberal.

That sample cannot be considered statistically representive of the larger sample (there was no effort to do it in such a way) of journalists. Media Matters explicit argument, that MSNBC's report fails to recognize that these contributions are not necessarily representative of journalists as a whole is absolutely correct. If their implict argument is that the media isn't liberally biased (what QandO and you seem to suggest they are making, but cannot be proven), that is neither confirmed nor denied by the MSNBC report.

Now Media Matters, as it is wont to do, misses Kurtz's point. Kurtz is very adamant that no journalist should contribute to candidates or parties, so that any number is too many, especially 300. He doesn't care that it's only a fraction of working journalists- it's still too many.

Walt

Rich Horton said...

Gimme a break Walt,

The "sample" used in this case is much LARGER compared to the pool of journalists than most (if not all) of the polls used to measure US public opinion. (Which sometimes use samples under 1000 from a pop. of 300+ million.)

Granted the criteria used for selcting the represented population was not done in a random scientific method, however that doesn't mean valid logical inferences cannot be drawn. You would have to show that there was something inherently biased in the selections of journalists used by the MSNBC report before you could make any such claim.

We make judgements about larger populations based upon non-scientifically selected samples all the time and we get them right more often than not. I do not need a scientifically constructed poll to infer that University professors in the Art & Sciences are more likely to be Democrat than Republican. I can use the samples represented by those few I have met to draw a larger inference.

Anonymous said...

"Granted the criteria used for selcting the represented population was not done in a random scientific method, however that doesn't mean valid logical inferences cannot be drawn. "

That's exactly what it means when you don't have a representive sample. You're merely guessing. It may be an educated guess, but it's a guess nonetheless.

Yes, people do make inferences all the time based on non random samples. But they are not necesarily right most of the time. The world is littered with examples of people who incorrectly infer the behavior or makeup of a larger population based on a limited sample. You see how careless people, and the media (bloggers included) are when it comes to trumpeting the results of web surveys, which in most instances, suffer from self-selection bias.

"We make judgements about larger populations based upon non-scientifically selected samples all the time and we get them right more often than not. I do not need a scientifically constructed poll to infer that University professors in the Art & Sciences are more likely to be Democrat than Republican. I can use the samples represented by those few I have met to draw a larger inference."

But your error here is that you already have a sense of the population. You would need a scientifically constructed poll if you didn't have preexisting knowledge of the population. You already know what the general population of Democrats in University Arts and Sciences is, and the sample you would select would reflect that bias. But if you talked to a couple of Democrats in a business school, would you infer the proper proportion of their numbers in the large population? No.

"Which sometimes use samples under 1000 from a pop. of 300+ million" Public opinion polls that are not randomly drawn whether they are less than a thousand or not, are not valid from a statistical standpoint. (Hence the famous "Dewey wins" headline) The optimal number is about 1500, but you can use a smaller size random sample but you may have a larger margin of error (depending on your sampling).

With the MSNBC study- and it wasn't a poll- it reported on the entire population of donating journalists (given their stated limitations), you would not necessarily know whether the media was liberal or not based on their sample, if you did not have a prior knowledge of the tendency of the population itself. The finding may seem logical to you, and I'm not even disputing whether it is or not, but what I am saying is that in no way is that a statistically valid sample. You cannot make a vaild statistical inference about the larger population of journalists, and their biases, based upon this sample.

By the way, "We make judgements about larger populations based upon non-scientifically selected samples all the time and we get them right more often than not," is an empirical question and I believe we need to identify a sample to test that. You and I should be excluded because we are both right and wrong more often than not.

Walt

Rich Horton said...

You have to forgive me, I often live in my own little world. I'm obviously using ideas from the realm of formal logic that may or may not have direct corespondence to things in statistics. I certainly do not believe that statistics is the only valid way to make inferences about populations. For example when you say:

"But if you talked to a couple of Democrats in a business school, would you infer the proper proportion of their numbers in the large population? No."

I would say of course. But that is because such an approach would violate the strictures needed in Peircian Abduction (in the formal logic sense). Obviously one couldn't make a valid statistical inference (and yes it is a guess, though an educated one), based upon a total sample size of two, and yes I would bring other information into play. But I could certainly make valid inferences WITHOUT the use of a statisitically valid sample.

You also say: 'By the way, "We make judgements about larger populations based upon non-scientifically selected samples all the time and we get them right more often than not," is an empirical question and I believe we need to identify a sample to test that. You and I should be excluded because we are both right and wrong more often than not.'

What do you think would happen if we set up a questionaire that asked a grand total of three questions:

The first two would use a 1 to 7 scale:

Rate these two media sources on their trust-worthiness from 1 Not Trust-Worthy to 7 Very Trust-Worthy:

1. National Public Radio

2. Fox News Channel

We would then ask a hrid question on party ID:

3. You identify yourself as:

A. Democrat
B. Republican
C. Independant

Let us say we only looked at the answers to questions 1 & 2. How well do you think you and I could do using just the information of those two insignificant questions to determining the party id of the entire group of people who answered:

1: 7
2: 1

or answered

1: 1
2: 7

I'd say we would get it right more often than not. Don't You?

Anonymous said...

I think we'd do fine determining the party ID of the group we questioned.

But we couldn't say a thing about anything other than that group. So your example is not a question of sampling, it's just a matter of identifying members of a group based on general characteristics. Which is not what the MSNBC report did.

I don't disagree with you that you, me, and a lot of people can make relatively accurate judgements based on logical inferences, and often on even small samples-heck, we could not function in life without the ability to do so. I never suggested that using statistics is the only valid way to make inferences about populations. I never made that point because I was talking specifically about the issue of samples and drawing generalizations based on those samples. Because though we may be correct in drawing a generalization based on a poorly drawn sample, we may be doing so in spite of the sample, not because of it. But you have no way of proving you're right about your generalization, which is the purpose of using properly drawn samples and valid statistical techniques. In fact, none of us are probably as adept at drawing correct inferences from a sample to a larger population as a whole as we think we are. However, as with logic, the more we know about the subject we are studying via statistics, the better our research design will be (think about some of the methodologically complex studies in political science that were worthless because the authors did not know basic Poly Sci 101 information).

The original post was specifically about making inferences about a larger population using data from a smaller sample, which implies very clearly the use of valid statistical methods. It does not come across as a discussion of formal logic. So the flippant way that Media Matters complaint was dismissed on those grounds, with a charge that they were ignorant in such matters, was done so incorrectly (on that particular point- see my complaint about them above).

BTW- the last line of my previous post was a joke. Not a great one, mind you.

Walt

Rich Horton said...

"I think we'd do fine determining the party ID of the group we questioned.

"But we couldn't say a thing about anything other than that group."

So, if we conducted our little unscientific survey at, say, the APSA convention, you would argue that it only told us things about the people taking the survey?

Would you at least call it "suggestive"?

Uriah said...

By choosing those that donate, the group "self selects" for the most fervent among the population as a whole. This makes the sample even more useful because the most fervent would also be the most likely to show significant bias. Of course, this whole thing that Walt is arguing has more than a touch of the "angels dancing on pins" to it.

Rich Horton said...

I know where Walt is coming from, and in the narrowest of ways he is correct. However, I think the finding is suggestive UNLESS you can present a compelling reason why we should EXPECT there to be such a discrepency in rates of campaign contributions.

For example, if we knew that Democrats recieve several times the number of individual donations as opposed to Republicans in the general population we would expect the subset "journalists" to reflect that as well. However, that is not the case.

Now, there may be other factors in play here. There may be more than a bit of "self selection" going on. For example, social workers as a group OVERWHELMINGLY support Democratic party politics, mostly because more Democrats are attracted to that as an occupation. The same is probably is true of journalists as a group (and for many of the exact same impulses that lead social workers to social work.) However, that would still leave us witht he conclusion that the subset "journalists" deviates substantially from the general population.

Anonymous said...

I've always wanted to dance on the head of the pin, but then again, I'm no angel.

This whole argument depends on the conclusions you want to draw from the sample of donating journalists. If you want say that most journalists have a liberal bias because of the findings of the sample, or if you want to use the sample to say, see, there's a liberal bias in the media, then you are flat out wrong. There's nothing narrow or esoteric about it. It's incredibly sloppy thinking to draw conclusions such as that based on a limited, nonrepresentative sample.

Now, if you want to say that we think journalists have a liberal bias because we know all this other stuff (opinion polls, etc.) and this finding seems to suggest a confirmation of the pattern, I have no problem of that. But the sample in of itself is pretty flimsy evidence, and indeed adds nothing to the argument.

As far as the sample being reflective of journalists who donate,which I think IM's last post alludes to, I would say that's probably accurate, even given its limitations. But for making generalizations about the nature or biases of journalists who haven't donated, which is how it has been used, it is useless.

My point is calling out Media Matters because they argue that this sample is limited and should not be considered representative of journalists at large is fallacious. It's as simple as that.

You see this all the time in political and social debate- people use the behavior of a small group to draw wild generalizations about a larger group (IM- I believe you have addresed this issue more than once in your blog, especially about media acting as if liberal catholics spoke for the church- a rough example, but I don't have the time to dig through your archives).

But the reason this is not a dancing on the head of the pins argument is that the misuse of samples has serious consequences. Policies are formed that impact larger populations because ancedotes about the behavior of smaller groups, who are used as examples of problems that must be corrected.

Walt