ACloserLook: 2018

Tuesday, December 18, 2018

New paper: the accuracy of FiveThirtyEight's 2018 election predictions: an exploratory analysis

I submitted a paper with this title to SocArXiv, which you can read here in the unlikely event that you want to. (The content of that paper was originally posted here but it has since been removed, 'cause there's no need for it to be in 2 places at once.)

Friday, December 7, 2018

Stereotype accuracy part II

(Introductory author's note: all quotes in this post that I did not write will be in Courier font, but everything else will be in Times New Roman.)

In a previous post, I looked at the obviously fishy claims that Rutgers social psychology professor Lee Jussim and his colleagues (but especially Jussim himself) have been making regarding the purported accuracy of stereotypes. Before broadening this post to look at the many questionable arguments Jussim has made about many other topics, I will point out that Jussim's claim that stereotypes are usually very accurate (despite being false) has crept its way into a number of recent peer-reviewed papers that cite it as evidence of a supposedly widespread, structural bias in favor of liberal views in social psychology. Consequently, in this post I will critique some recent articles not written by Jussim or his colleagues, but which cite their research on stereotypes and portray it in a favorable light.

So first I need to sum up the argument being made by those advancing Jussim et al.'s claims about stereotype accuracy: ostensibly, there is overwhelming evidence that stereotypes are moderately to highly accurate, but liberal social psychologists (i.e. almost all social psychologists), blinded by their ideological preconceptions, refused to even approach or consider this evidence. Martin (2016), for instance, claims,

"...stereotype accuracy has been considered a taboo topic, and only a small number of researchers have investigated if stereotypes are accurate (e.g., Jussim 2012b). Much of this research has shown that stereotypes are indeed accurate (on average), particularly in direction. These findings contradict the assertion by some scholars that stereotypes primarily arise from intergroup envy or scorn (e.g., Fiske 2010). Rather, they develop from valid observations of the social world. Far from being the foolish mistake-makers that social psychologists have made them out to be (Baumeister 2010), humans are mostly perceptive observers. Were it not for the taboo against accuracy research, this scientific discovery might have occurred earlier."

Hoo boy, there's a lot of bullshit there! Firstly, we see the repeated victim mentality of anyone pushing a controversial claim that they claim is supported by strong scientific evidence: they are attacking their critics as motivated by political correctness and reluctant to even touch certain oh-so-controversial topics with a ten-foot pole because of their fear of "taboos". This is reminiscent of the argument style behavior geneticists often use, which also involves accusing their critics of political, rather than scientific, motivations. Aaron Panofsky's 2014 book Misbehaving Science refers to this style of (ad hominem) argumentation as "hitting-them-over-the-head" style. Panofsky states that the goal of this discursive style "...was not to seek synthesis, integration, or sober rational persuasion but to engage in polemical scientific attack, declaring themselves as crusaders who would rout the antigenetics heresy gripping behavioral science" (Panofsky 2014, p. 142).

In the field of behavior genetics (BG), this style of argumentation often manifests as behavior genetics researchers calling their critics "blank slatists", or saying they have some sworn ideological allegiance to total environmental determinism/the standard social science model when explaining human behavior. This lets BGists portray themselves as offering the reasonable idea of maybe letting genes be part of the equation that leads to human behavioral traits, as an alternative to those nutjobs who want to pretend that human genes and evolution don't even exist. Here we see Martin similarly using this approach to avoid addressing specific points made by one's critics, and instead trying to elicit sympathy from readers by portraying the author as under attack by the PC brigade that supposedly controls the vast majority of academia.

Where was I? Oh yeah, Martin's article. Martin was saying that the idea of stereotypes being accurate a) could've been researched empirically for a long time, but b) wasn't researched empirically nearly as often as it could have been, because c) almost all social psychologists were blinded by the taboo against such research by their supposedly all-encompassing liberal ideologies. Further, he claims that d) when a handful of brave, Galileo-like mavericks finally stood up to the leftist cabal that rules almost all of the social psychology field with an iron fist, e) they proved that stereotypes are actually very accurate, on average, which f) proves that stereotypes arise from accurate perceptions of reality, not prejudice.

Before addressing these arguments I want to point out another fundamental issue with the "stereotypes are accurate" argument I did not mention in my previous post on Jussim's work in this area. Specifically, as Jussim himself acknowledges, there is not a single dimension of "accuracy" on which a perception or belief can be assessed, but rather several possible "scales" on which one may attempt to do so. In a 2015 journal article, for example, Jussim et al. note that there are two distinct ways that stereotype accuracy can be assessed. These two ways are discrepancy scores and correspondence. As Jussim et al. further explain,

"One method of assessing accuracy is not “better” than the other; each contributes unique information (Jussim, 2012; Ryan, 2002). Discrepancy scores indicate how close perceivers’ stereotypes come to being perfectly accurate (scores of 0 reflect perfect accuracy). Correspondence indicates how well people’s beliefs covary with criteria" (Jussim et al. 2015, p. 492).

So it would behoove those who want to make confident claims about the "accuracy" of stereotypes to make sure that they use both methods (or take into account studies that do so) before concluding that stereotypes are either accurate or inaccurate. So surely, when Jussim et al. (2015) claim in their paper's abstract that the accuracy of stereotypes is "one of the largest and most replicable findings in social psychology", they are basing this on both types of studies, right? This is not at all the impression you get from their table 2, which claims to present, and I am not making this up, "Stereotype Accuracy Correlations From Over 50 Studies Showing That Stereotypes Are More Accurate Than Social-Psychological Hypotheses". They appear to only be paying attention to correlations between perceived and actual group characteristics, without paying attention to discrepancy scores, in coming to the obviously provocative conclusion that stereotype accuracy is actually greater than that of social psychological hypotheses collectively. That being said, they do acknowledge the existence and results of discrepancy-score studies bearing on this topic, e.g. when they say "Although not every study examined discrepancy scores, when they did, a plurality or majority of all consensual stereotype judgments were accurate. For example, an international study of accuracy in consensual gender stereotypes about the Big Five personality characteristics found that discrepancy scores for all five reflected accuracy (Lockenhoff et al., 2014)." However, it should be pointed out that it is more difficult to assess the relative "accuracy" of psychological hypotheses and stereotypes when the latter are assessed based on discrepancy scores (e.g. 1 SD) rather than correlation coefficients. Further, the statement that Lockenhoff et al. (2014) "found that discrepancy scores for all five reflected accuracy", referring to the Big Five model of personality traits (Neuroticism (N), Extraversion (E), Openness to Experience (O), Agreeableness (A), and Conscientiousness (C)), seems to be rather at odds with the following quote from that very paper (Lockenhoff et al. 2014, p. 685): "Across all facets of N (and for N1: Anxiety in particular), assessed sex differences appeared to be more pronounced than GSDs [gender stereotype differences], and this was true for both self-reports and observer-ratings."

Though the above points I made seem troubling (though I'm hardly an impartial judge of how compelling my own arguments are), I think that the biggest problem with this research is that, ironically, it stereotypes stereotypes themselves by referring to them in blanket terms as "accurate". This ignores not only the highly problematic nature of referring to entire groups as all possessing a characteristic without acknowledging variation within groups on that characteristic, but also the fact that even by these researchers' own criteria, some stereotypes are decidedly inaccurate. Political stereotypes, for instance, were said to "exaggerate group differences" by Jussim et al. (2015). In addition, these authors note that "Empirical reports based on independent samples from around the world (e.g., McCrae et al., 2013) have consistently found little national-character stereotype accuracy". Consequently, blanket statements about the "accuracy" of stereotypes serves to commit the very fallacy of generalization that psychologists have been criticizing stereotypes for for decades now: it ignores that not all members of group x (in this case, stereotypes) have characteristic y (in this case, accuracy).

Sources
Jussim et al. 2015
Lockenhoff et al. 2014
Panofsky 2014
Martin 2016

Wednesday, October 31, 2018

Gottfredson vs. Gottfredson

I'd like to introduce you to Linda Gottfredson, former professor of educational psychology at the University of Delaware and recipient of her very own page on the SPLC's "fighting hate" website. But if you've been reading this blog for long enough you'll already have seen me talk about some of Gottfredson's work. Specifically, last July I critiqued an article she wrote in 2013 lavishing praise on racialist psychologist J. Philippe Rushton and disparaging his detractors. But here I wanted to look at her work in "g theory" over a long period of time and try to understand exactly what she thinks about the topic.

Brief overview before I start: g theory is based around the idea that there is a single "general intelligence", aka g (note italics: that's important), that IQ tests measure (though of course some better than others). The evidence for the existence of this g (aka "g factor" or "general factor") is said to be, above all else, the positive correlations between scores on different types of cognitive ability tests--even those that are very different in their scope and subject matter. g theorists thus tend to talk about people who are very intelligent as having high levels of g, and vice versa, thus implicitly assuming that "intelligence" can be "objectively determined and measured" by IQ tests in all people everywhere in the world with no exceptions. (The "objectively determined and measured" quote is a reference to the 1904 article by psychologist Charles Spearman that started this "theory".)

So I wanted to start by trying to answer this question: does Gottfredson believe that IQ/g is a fixed quality that cannot be changed by environmental interventions, or does she acknowledge that people are not born with a fixed, immutable quantity of intelligence, and that they can be made smarter by certain environmental interventions and changes? Let's try to look at some quotes from her previous writings to get an answer to this question (all emphases are mine):

Gottfredson (1994, p. 15): "That IQ may be highly heritable does not mean that it is not affected by the environment. Individuals are not born with fixed, unchangeable levels of intelligence (no one claims they are)."

Gottfredson (2000): "“Genetic” does not mean “ﬁxed” or “unchangeable.” Just as genetically caused differences are not necessarily irremediable (consider diabetes and poor vision), environmental effects are not necessarily reversible (consider lead poisoning and head injuries). Both sources of low IQ may be preventable to some extent. Genetic screening and gene therapy, for instance, are both intended to prevent genetic disorders such as mental retardation."

Gottfredson (2003, p. 114): "No g theorist claims that g is “fixed.” This is a canard and distracts readers from the pertinent point, which is that individual differences in g become highly stable and more heritable by adolescence."

Gottfredson (2009, p. 415): "...if you state that people’s IQ scores are stable over time or highly genetic (both true), many people will hear you claiming that intelligence level is fixed in stone from birth (false)—unless you anticipate and correct that common misunderstanding."

Seems clear enough. Linda Gottfredson doesn't think that someone's IQ/intelligence/g is a fixed number, as is evident from all of the quotes cited above. In other words, it appears that she is willing to acknowledge the malleability of intelligence with respect to social/educational interventions. But perhaps she actually believes the exact opposite: that intelligence (i.e. IQ score) is a fixed, genetically determined quantity that we can't significantly change. Don't take my word for it, though; listen to what she herself says in the very sources I quoted above:

"IQs do gradually stabilize during childhood, however, and generally change little thereafter." (Gottfredson 1994, p. 15)
"There is no effective means, as yet, for raising low IQs permanently." (Gottfredson 2000)
In the two other quotations above (from 2003 and 2009), you see her talk about how g (aka general intelligence) is "(very) heritable", "highly stable", and "highly genetic". But does that mean it's fixed, or that policy makers shouldn't even bother to change it with programs like Head Start? Well, she herself provides us with a clear(-ish) answer to this question in a 2005 paper in which she stated:
"Jensen’s 1969 conclusion about the failure of socioeducational interventions to raise low IQs substantially and permanently still stands" (Gottfredson 2005, p. 313). This is a reference to the (in)famous paper by Jensen in the Harvard Educational Review that really got the genetic-determinist black-IQ-inferiority "debate" started 49 years ago.

So in practice, she is saying that people's IQs tend to stay at about the same value (after childhood, anyway), even though in theory, she acknowledges that this doesn't have to happen. And in her 2000 article that I cited above, she further says that we might be able to raise people's IQs by saying, "Both [genetic and environmental] sources of low IQ may be preventable to some extent", but then switches from theoretical optimism to supposedly realistic pessimism by saying we can't currently do it permanently (or at least we couldn't in 2000).

And in 2016, she wrote, "Were the distribution of g unstable or malleable, g's effect sizes for various types of performance and life outcomes would not remain so regular, so consistent, so patterned decade after decade at the population level (cf. Gordon, 1997)" (Gottfredson 2016, p. 125; emphasis in original).

Ugh, so confusing. IQ is malleable, but it isn't, at least not by any method that exists now? I wonder which Linda Gottfredson we are to believe? This kind of ambiguity is brought to you by what Howard Gardner dubbed "scholarly brinkmanship": going really close to an extreme conclusion, very strongly implying it, but being careful not to directly state it. Here we see Gottfredson engaging in scholarly brinkmanship with regard to the idea of genetic determinism of people with low IQ and society's putative inability to do anything about it (aka "genetic fatalism", Alper & Beckwith 1993).

There's more where that came from: she often emphasizes that research on the supposed genetic basis of black-white IQ differences doesn't necessarily have any policy implications:

Gottfredson et al. 1997 (p. 15): "The research findings neither dictate nor preclude any particular social policy, because they can never determine our goals. They can, however, help us estimate the likely success and side-effects of pursuing those goals via different means."

So she says that this research is only relevant to social policy in that it can shed light on how effective certain programs would be at achieving goals, but it can't help us make the (obviously subjective) decisions of what our goals should be. But the disingenuous part of this is that IQ-genetics-race research "neither dictate[s] nor preclude[s] any social policy"--because I can think of someone who would not agree with that statement. In fact, this person believes that research "showing" that racial IQ differences are mainly due to genetics does demonstrate that certain social policies will be doomed to fail. This person has written sentences like the following:

Much social policy has long been based on the false presumption that there exist no stubborn or consequential differences in mental capability. Worse than merely fruitless, such policy has produced one predictable failure and side effect after another, breeding widespread cynicism and recrimination...Civil rights advocates resolutely ignore the possibility that a distressingly high proportion of poor Black youth may be more disadvantaged today by low IQ than by racial discrimination, and thus that they will realize few if any benefits (unlike their more able brethren) from ever-more aggressive affirmative action [Emphasis mine].*

And:

...social science and social policy are now dominated by the theory that discrimination accounts for all racial disparities in achievements and well-being. This theory collapses, however, if deprived of the egalitarian fiction, as does the credibility of much current social policy.**

You'll never guess who the person is who wrote these statements--unless you have been paying even a modicum of attention to the previous parts of this post, or if you skipped ahead to the footnotes from the asterisks. In either case it should be obvious that Gottfredson wrote both of the above passages. This supports the point that the SPLC made on their "Extremist Files" profile of her:

She concludes “Mainstream Science” by claiming that her ideas “neither dictate nor preclude any social policy.” But much of her career has been dedicated to the idea that because IQ determines social outcomes, and racial disparities in IQ are innate and immutable, policies intended to reduce racial inequality are doomed to fail, and may even exacerbate the problems they’re intended to remedy.

*Gottfredson 1997, p. 124-5
**Gottfredson 1994, p. 55

Friday, October 12, 2018

What is Mankind Quarterly's impact factor?

Officially, this "scientific" white-supremacist pseudo-journal does not have an impact factor at all (at least not from the Journal Citation Reports, which is the only kind that's considered official). But what is the next best thing--their unofficial impact factor?

62 papers were published in Mankind Quarterly in 2017, according to ProQuest. Of these, only 7 of them were cited even once on ProQuest. 6 of these 7 papers were each cited only once, while the other one was cited twice.

And in 2016? We need to include 2016 data because impact factors are based on 2 years of data: "the impact factor of a journal is calculated by dividing the number of current year citations to the source items published in that journal during the previous two years".

Doing the same search as above for 2016 and 2017 yields 115. Of these articles, as of today (10/9/18), only 8 of them had been cited at all. Each of them was cited once except for one which had been cited 3 times. Anyway, this yields a total of (7*1)+3=10 citations, which when divided by 115 citable articles yields 0.087--lower than any academic journal impact factor I have seen in almost five years of editing and creating Wikipedia articles on this subject (the one I like to use for comparison is Psychological Reports because its IF is always pretty low; yet even it has an IF of 0.667, which is almost eight times that of MQ based on these estimates).

Let's look in more detail at these 10 citations. The number of citations and the journals in which they appeared are as follows:
4 for Personality and Individual Differences
1 for the book "Cognitive capitalism: Human capital and the wellbeing of nations" by Heiner Rindermann
1 for a dissertation
2 for Intelligence
2 for Journal of Individual Differences

Monday, September 3, 2018

Richard Lynn and Gerhard Meisenberg removed from the editorial board of Intelligence

Elsevier and/or journal editor-in-chief Richard Haier appear to have unceremoniously removed race scientists Richard Lynn and Gerhard Meisenberg (both editors-in-chief of Mankind Quarterly) from the editorial board of the respected peer-reviewed journal Intelligence. Back in January, Angela Saini noted their status as editorial board members in a column in the Guardian, and the following month, Ben van der Merwe pointed out the same thing in an article in the New Statesman.* When Saini initially confronted Haier about the status of Lynn and Meisenberg as editorial board members, Haier told her, "I consulted several people about this. I decided that it’s better to deal with these things with sunlight and by inclusion. The area of the relationship between intelligence and group differences is probably the most incendiary area in the whole of psychology. And some of the people who work in that area have said incendiary things … I have read some quotes, indirect quotes, that disturb me, but throwing people off an editorial board for expressing an opinion really kind of puts us in a dicey area. I prefer to let the papers and the data speak for themselves." (Emphasis mine.)

Thanks to the Wayback Machine, we know they were both still listed as editors as recently as April (note that Lynn was listed with no affiliation at all, unlike all of the other editors). And in fact, even the most recent complete volume of the journal (July/August 2018) lists both Lynn and Meisenberg as editors. But as of today, both Lynn and Meisenberg's names have been removed from the journal's editorial board page, clearly in response to public criticism of their status as board members. One wonders if Personality and Individual Differences will do the same, since Lynn (though not Meisenberg) is a member of their editorial board.

*Note: Van der Merwe also noted that "Two other board members are Heiner Rindermann and Jan te Nijenhuis, frequent contributors to Mankind Quarterly and the London Conference on Intelligence." Update: I didn't notice this when I first wrote this post, but apparently te Nijenhuis (but not Rindermann) has been removed from the editorial board as of Sept. 3, despite the fact that te Nijenhuis was listed as an editor in the aforementioned July/August 2018 issue.

Addendum: After writing and publishing this post I discovered that some RationalWiki editor(s) had already noticed this and posted about it last Sunday on this page.

Wednesday, July 25, 2018

Rejections redux

So in a previous post I described how two papers I previously submitted to two different Elsevier journals (Intelligence and the Journal of Criminal Justice) were rejected, both within 24 hours. But there are other articles I have submitted to reputable journals too, all of which have now been rejected, indicating I clearly have to be more attentive to detail and meticulous before I try doing this again. Anyway, I wanted to update my readers on what has happened with the other submissions (some of this was also stated at the bottom of the old post, linked above).

Earlier today (7/24/2018) I got a rejection email from Personality and Individual Differences (about state IQs). I have also had another paper rejected by Intelligence (which was a meta-analysis of the black-white mean IQ gap, and which is different from the rejected paper mentioned above and in the post from this May) and yet another (about the validity of FS/S, a widely used gun ownership proxy) that was rejected by Crime & Delinquency. So that's a total of 5 rejections. I'm 0-for-5, baby! No but seriously I clearly have to learn a lot more about these subjects and especially how to organize a scientific paper before trying this again. But I have every intention of trying again at some point.

That is all.

Wednesday, July 18, 2018

The story of His Excellency, a banned Wikipedian

His Excellency was a user who was highly critical of Wikipedia's purported anti-Muslim bias. He was put on something called "personal attack parole" in a 2006 arbitration case (mostly courtesy-blanked and the old page was deleted from the history, but it can still be viewed here*). Later the same year, this was upgraded first to a 4-month ban after violating said parole, apparently by making anti-Semitic attacks on other editors, after which he was placed on a year of probation. On 3/14/2007, he was indeffed for "multiple cases of IP socking and harassment".

Timothy Usher, a linguist at the Santa Fe Institute, was also sanctioned in the aforementioned arbitration case. The Committee found (in a 6-1 ruling) that "Timothy Usher has engaged in incivility and edit warring regarding Islam articles. In particular, he has personalized the conflict and engaged in harassment of His excellency." He later returned as Proabivouac (misspelled at this link for some reason). Under this username, he was emergency-banned and indeffed by ArbCom on 10/19/2008 for "long-term disruption". The block log for that day reads "Consult ArbCom privately for any discussion of this block; do not unblock without ArbCom's permission".

*Note that this is a very old version of Wikipedia saved in the Wayback Machine so it looks like shit and there's basically no formatting, as is usual for such old captures of Wikipedia pages.

Wednesday, July 11, 2018

A human BG debate: can it happen?

I am trying to organize a debate about human quantitative behavior genetic studies and whether they are fatally flawed or not. Originally it was just supposed to be about classical twin studies, but then I decided, why limit the scope to that? It's not like human BGists* only ever do such studies- they have plenty of other research designs (reared apart twins, families, adoption, etc.). So I decided I'd start emailing people to ask if they'd be willing to participate in a one-on-one debate about whether human BG studies have value or are totally worthless. Here I will be listing everyone I've emailed to ask about this potential debate and their responses (if any).

Robert Plomin (BGist) - Automated email reply saying he'll be "away from the office with only sporadic email contact until 15 July 2013." Yes, seriously. He hasn't updated his email in FIVE YEARS!!!
Jay Joseph (critic) - He responded the same day! He said "I might be interested in a debate, depending on how the details work out. But this Saturday is too soon, as these things require much more time and planning...Thanks for your interest, and please let me know what develops as far as a debate is concerned. For several reasons, I doubt that many twin researchers would want to debate me."**
Irwin Waldman (BGist) - He responded the same day! He said yes, but that he's on a family vacation now and won't be able to participate for a week or so.
Evan Charney (critic) - haven't heard back yet

Notes

*My abbreviation for "behavior geneticist". Hope it catches on.
**I did mention maybe doing it this Saturday in the first email I sent Joseph, when planning of this potential event was still even more tentative than it is now. But now it is obvious that will definitely be too soon, for 2 reasons. One is that as Joseph pointed out this takes a lot of time to plan successfully-way more than just a few days. The other is that as I note below Waldman won't be able to debate for like a week, and he might be the only BGist I can get to represent the field, so picking Saturday is a bad idea for that reason too.

Sunday, July 8, 2018

Wikipedia and the United States: a comparative case study

This is based on a comparison I first thought up about 4 years ago and briefly posted about on Wikipedia (in the Village Pump for proposals, to be specific). *cringes internally with embarrassment thinking about all the shitty ideas I proposed back in 2014 on Wikipedia that were overwhelmingly shot down*

...anyway, as I was saying, here's the analogy. Kind of like Mad Libs or a choose-your-own-adventure story, you have two options for each choice (each of which is in parentheses). The first option applies to Wikipedia and the second applies to the United States.

Once upon a time, a small, brave group of Americans (Jimmy Wales & Larry Sanger/the Founding Fathers) were tired of living under the tyranny of British rule (Encyclopedia Britannica/King George III), and so decided to create their own (encyclopedia/country): one that would be defined by freedom from an oppressive bureaucracy that exerted total control over what people could (read in encyclopedias/do or say). So this group of brave freedom-fighters (all of whom happened to be white men, not that that has anything to do with anything) founded their (encyclopedia/country), dubbing it (Wikipedia/the United States of America).

As implied in the fact that this (encyclopedia/country) was founded in response to a desire for freedom from the total control of an oppressive oligarchy, its main founding principle was that of liberty. Specifically, the (encyclopedia/country) was dedicated to the ability of anyone to (edit/pursue happiness in) it as much as they wanted, and to do so without the oppressive interference of a handful of (Encyclopedia Britannica expert editors/British leaders), who had unfairly denied them a voice in their own (encyclopedia/government).

Of course, this ideal of complete freedom wouldn't last long, despite its noble intentions. So while the (encyclopedia/country) initially had very little in the way of an organized (bureaucracy/government), such a system later needed to be created. Elements of the organized (bureaucracy/government) that controlled basic features of the (website/country) included the (Arbitration Committee/Supreme Court), which consists of (thirteen/nine) judicial decision-makers who get to issue binding decisions affecting all the (editors/citizens) of the (website/country). The (Arbitration Committee/Supreme Court) mainly aims to interpret established (Wikipedia policy/legal precedent) in the context of a specific case appealed to it. O

Other key components of the structure of (Wikipedia/the United States) include (Jimbo/the President), a figurehead who mainly acts to represent the entity rather than actually make key decisions about how it should be run, and (Admins/Congress), a group of officials elected by the (editors/citizens) of the (website/country) at large to do the "dirty work" of running the whole thing.
There are also other bureaucratic elements to (Wikipedia/the United States), including the punishment for breaking (policy/the law), which is (being blocked/imprisonment)--potentially indefinitely, if what you did was bad enough.

Some thoughts on "Misbehaving Science"

Recently I created an article on UCLA sociologist Aaron Panofsky on Wikipedia. I had previously created an article about his 2014 anti-behavior genetics book, Misbehaving Science, in May. I haven't read the entire book, but I have read reviews of it and bits and pieces of the book itself on Google Books preview (so really just a few pages at the beginning, until I get the "you have reached your viewing limit" message and can't keep reading). So you should take what I'm saying about this book with a grain of salt.

With that significant caveat out of the way, I will begin discussing this book now. The book is a sociological study of the development of the field of human behavior genetics over time, with a specific emphasis on the many highly controversial findings that have been reported in this field. The main argument of the book appears to be that in human BG, unlike in "real science" that is working like it's supposed to, controversies are never resolved, and rancorous debate over fundamental issues keeps persisting for many years. Thus, the book's title refers to a type of science in which results of studies are so inconsistent that no one can come to a clear conclusion about anything, partly because of "anomie" (lack of clear guidance) in the field (which would mean, I guess, that human BG researchers don't have clear or specific "rules" about how to conduct their research the "right way").

You'll notice I specified "human BG" when summarizing the book above, and that's because the human stuff, obviously, is more interesting, controversial, and is more discussed in the book. No one other than scientists who actually research animal BG really gives a shit about its results, except insofar as they apply to humans, which is only slightly. But human BG is really interesting and relevant to hot-button issues, so the media loves to publicize its findings. The most obvious example is the "gay gene" study published by Dean Hamer et al. in 1993. The study concluded that there was a link between markers on the X chromosome and male homosexuality. Of course, its results were not replicated despite the huge amount of media attention they got at the time.

So why does controversy keep resurfacing in human BG? One possible answer is that no one can advance the field's knowledge by replicating results, and the same research designs are used over and over to produce statistically random results based on bogus assumptions. Another possible answer is that BGists intentionally make provocative claims without caring about whether they are scientifically responsible, just so they can get media attention and increase their "scientific capital" (Panofsky looooooves this phrase for some reason). Misbehaving Science appears to endorse both of these answers. It appears that the picture he paints of human BG is one in which a controversial claim is made, other scientists try to replicate it, they fail, and then the cycle starts all over again.

Why Lee Jussim is wrong about stereotype "accuracy"

It has long been fashionable for academics pushing right-wing views to claim that they are merely trying to pursue honest, objective science and bravely standing up to the academic establishment--one which is invariably assumed to be composed of liberals whose views are based on their political ideologies rather than on scientific evidence. Recently, Rutgers psychologist Lee Jussim has been playing this game with respect to the specific subject of the "accuracy" of stereotypes. He claims that stereotypes are generally very accurate, and that there is a widespread assumption in both academia and the general public to the contrary that is simply objectively false (e.g. Jussim, Crawford, & Rubinstein 2015).

That's right, a professor at a prestigious American university actually wants to hold up people's beliefs about entire groups of human beings as being highly accurate representations of reality. So many obvious questions abound: is Dr. Jussim saying that black people are lazy violent criminals who always eat fried chicken and watermelons, and that Asian Americans are bad drivers who are good at math, and that women love pink things? After all, these are all stereotypes, and he is saying that many stereotypes are very "accurate". So it follows that many stereotypes that people are very familiar with, and often highly offended by, must be very "accurate", according to Jussim. This would, one imagines, include some or all of the stereotypes mentioned above. But to be fair, let's look closer at Jussim's argument before painting him as some sort of racist/sexist simply for trying to legitimize people's intergroup beliefs. After all, maybe the really offensive stereotypes like those previously mentioned are also the ones that aren't accurate (not least because they can't really be accurate: see below).

So what's wrong with this line of argument? First of all, it assumes that people's beliefs can accurately be represented in quantitative terms. Before explaining why this is a highly problematic assumption, I will explain how researchers in the past have assessed the (in)accuracy of stereotypes. As Jussim himself wrote in 2012, "There are many different ways to test for the accuracy of stereotypes, because there are many different types or aspects of accuracy. However, one type is quite simple -- the correspondence of stereotype beliefs with criteria. If I believe 60% of adult women are over 5' 4" tall, and 56% voted for the Democrat in the last Presidential election, and that 35% of all adult women have college degrees, how well do my beliefs correspond to the actual probabilities? One can do this sort of thing for many different types of groups."

In other words, asking people what % of members of group B (hereafter simply B) have characteristic X, and comparing it to the actual %, assumes that the answers to the % question will accurately represent how people normally perceive others as a result of holding stereotypes about the groups to which they belong. That is, such an approach to assessing the "accuracy" of stereotypes assumes that people's beliefs are statistical (e.g. 30% of B have X) rather than generic (all B have X). But this assumption may be wrong.

Another problematic assumption inherent in any claim that stereotypes are to some degree accurate is that stereotypes contain content that can be accurately operationalized in a way that conforms to reality. That is, the "stereotype accuracy" argument assumes that a stereotype being described as accurate is the result of things that members of the group to which it applies actually do. But of course, there are some stereotypes that, by their nature, certainly bear no relationship to objective reality. For instance, if it is believed that African Americans are apelike and mentally inferior to white people, then this is a stereotype that clearly cannot be assessed on its (in)accuracy, insofar as "inferiority" and the state of resembling an ape are both entirely subjective characteristics. Notably, this is clearly also true with respect to many other characteristics of stereotypes, many of which (e.g. laziness, athleticism, etc.) Thus such stereotypes cannot be the result of an accurate perception of reality, but must instead be artifacts of prejudice or other cognitive biases.

But the biggest problem by far with claims of stereotype "accuracy" is that they ignore this point: Most people in a group won't fit the stereotype, however accurate it may be. By this I simply mean that if the stereotype is that members of group B are X, researchers like Jussim who want to play up its accuracy will frame it as "(Number) % of members of group B are X" or "members of group B are (number) % less likely than members of group A to be X". Then the number in these formulations can be compared to the actual quantity to assess how "accurate" the stereotype is. In reality, however, my point is that even for the most "accurate" stereotypes, the stereotype will actually be highly inaccurate in that most people in group B will not be X. As we shall see, this point is a lot less controversial than you might think.

The examples given by Jussim of the % of women who voted Democrat or are more than 5' 4" tall seem of little or no relevance to actual stereotypes people actually hold about groups. What about more abstract and common stereotypes of women, like that they are more emotional, accommodating, and better equipped to do housework than are men? Is this actually true of most women (to the extent that such a question can even be answered objectively)? Fundamentally, the problem here is that even the most "accurate" stereotype will often lead to conclusions that are not just wrong, but highly offensive, when used to make judgments about individuals.

Consider the conclusions Jussim et al. (2018) have recently reached on this subject, namely, that "race, gender, and age stereotypes tend to be moderately to highly accurate." What does this mean? Does it mean that judgments based on stereotypes of someone of a given race/gender/age will usually be correct? What % of the time will such judgments be correct?

To their credit, Jussim et al. appear to anticipate this obvious criticism, writing (p. 7),

This line of reasoning’s suggestion – that all stereotypes are inaccurate because most members of a group fail to fit a stereotype – is partially justified. It is true that most members of a group will fail to perfectly fit a stereotype. This, however, does not mean that the stereotype is inaccurate. To understand why requires understanding how this reasoning confounds two different levels of analysis and how considerably greater conceptual clarity can be brought to understanding stereotype accuracy by clearly distinguishing among these levels of analysis. [Emphasis mine.]

So there it is: the emperor has no clothes! They admit the obvious fact that most stereotypes aren't really accurate in the sense that people can't use them very accurately to make individual level inferences. What does this mean? Among other things, it means Jussim et al. acknowledge that most blacks aren't athletic or lazy, and that most Asians aren't bad drivers but good at math, and that most women aren't emotional and submissive, and that this holds for all other stereotypes as well. But the fundamental point is that stereotypes are oversimplified because they attribute the same characteristics to all members of a group, which leads to inaccurate conclusions most of the time, because such an attribution ignores that most people in the group don't have those characteristics. This is the basic shit about how stereotypes have long been defined that Jussim and his colleagues love to bitch about, yet in the above passage they admit that it's actually true--but, but, it, uh, doesn't count as real accuracy!

Here's how they dismiss this problem later in the same paper (p. 9):

Absolutist stereotypes – beliefs that all members of a group have some attribute – will indeed almost always be false, because there are almost always wide variations among individuals. A single exception invalidates an absolutist belief. Just as a belief that the temperature in all locations in Alaska is always below freezing will be disconfirmed by a single reading of 33 degrees Fahrenheit in Juneau on July 15th at 1pm, a belief that all Germans are efficient will be disconfirmed by discovery of a single inefficient German. The vast accumulated empirical evidence on stereotypes, however, has yet to report a single person who holds absolutist stereotypes. Instead, the evidence indicates that most stereotypes are quantitative and probabilistic, not absolute (Citations omitted).

Really? Most stereotypes are in the form of quantitative values? Most people's stereotypes are that a given percentage of members of group B have trait X? That seems quite hard to believe, since even most statisticians, much less normal non-experts, think in quantitative terms on a regular basis in their day-to-day lives. Aside from being contradicted by the Hammond et al. paper I previously cited in this post, the claim that most stereotypes are quantitative, not absolute doesn't appear to be well-supported by the sources cited. The Hammond et al. study aimed specifically to assess the cognitive structure of stereotypes, but the 3 sources cited by Jussim et al. (2018) did no such thing. For instance, McCauley & Stitt (1978), one of these 3 sources, just proposed "...a quantitative and individual measure of stereotyping, based on defining stereotypes as probabilistic predictions that distinguish the stereotyped group from others." So they defined stereotypes as probabilistic rather then empirically demonstrating it, so that clearly doesn't support Jussim et al.'s views at all. What about Judd et al. (1995)? This study assessed white and black Americans' stereotypes of each other's groups. The study aimed "...to examine theoretical issues in stereotyping and to describe the current state of ethnic interrelations among young people." Their findings were: "Throughout, the samples of African Americans demonstrate interethnic judgments that are consistent with existing work on stereotyping and ethnocentrism. White American students, however, reported judgements that replicate neither the out-group homogeneity effect nor ethnocentrism." So...white people don't perceive black people (the out-group) as all being similar (homogeneous). But black people do hold such perceptions of white people. Interesting, but doesn't support the claim about most stereotypes being non-absolute. Lastly, there is Swim (1994), which "assessed the accuracy of people's stereotypes about gender differences in 2 studies by comparing perceptions of sizes of gender differences with meta-analytic findings." But this obviously assumes that numerical answers given to researchers' questions about "what percent of women have college degrees" or other such questions are accurate reflections of stereotypic beliefs.

I will close by quoting some important points previously made on this subject by Ryan (2003), who made several observations very similar to those I make in this post:

"The stereotypes that women are passive or that Blacks are athletic, for example, are no doubt erroneous if they are meant to imply that all women or all Blacks are so. And on what basis would one determine the actual passivity of women or the actual athleticism of Blacks anyway? Further, the notion that stereotypes can be accurate seems to imply that group attributes should be applied to individual group members so long as those attributes (or stereotypes) are accurate. This implication seems highly offensive in a society that values the individual and his or her unique merits."

Tuesday, July 3, 2018

A true "WTF" moment...in an academic journal

It is very rare for me to be reading an academic journal article and read something that makes me think "What the fuck are these author(s) talking about?" But this has just happened to me when I was reading this paper about biosocial criminology, written by many of its key proponents.

Just a brief recap: biosocial criminology is a stealth and, so far, disturbingly successful attempt to smuggle genetic determinism into the field of criminology, all the while employing the age-old behavior genetic style of "hitting-them-over-the-head" and accusing their critics of having an ideological opposition to any potential role of genetics/biology in human behavior. In reality, of course, genetic determinists, whatever they may want to call themselves, simply ignore obvious facts like the fact that, as Sir Michael Rutter put it, "genes do not, and cannot, code for socially defined behaviors" (quoted in Charney 2008). Crime, of course, is a socially defined behavior, one which is categorized as crime by subjective, socially constructed criteria. Thus, the same behavior (like violence) is considered crime in some contexts, but not in other contexts (like war) (e.g. Rosenfeld 2009).

Anyway, the moment that I thought was really weird in the paper I was just reading (Beaver et al. 2015) is shown below in bold in the quoted passage from this paper (note: all quotes taken from other sources in this post will be in Arial):

"When the biosocial perspective began to emerge, and biosocial criminologists began searching for biosocial samples that could be analyzed, the Add Health was an obvious choice. The reason is because it is genetically informative, as it includes kinship pairs along with specific genetic polymorphisms. As a result, this sample represented the key dataset that was used (and that continues to be used) by biosocial criminologists. Once biosocial criminologists began to use the Add Health on a widespread basis, a criminological witch-hunt ensued. The Add Health was made to seem as though it had fatal flaws, and some journals, such as Crime and Delinquency, created editorial policies barring any more studies using the Add Health from being published in the journal. Other biosocial critics have argued that biosocial criminologists have overused these data and there simply is nothing else that can be offered from them. Such a view by criminologists is, of course, nothing more than dressed-up rhetoric, particularly when considered against the fact that (1) there are more than 10,000 Add Health users (certainly not all of these are biosocial researchers) and (2) that the National Institutes of Health just awarded the Add Health a $22.7 million grant to collect a fifth wave of data on the Add Health participants. Against this backdrop, outside of biosocial critics, it does not appear that experts in other fields view the Add Health as only being used by biosocial researchers, as being dried up, or as being unimportant." [Emphasis mine, needless to say.]

Yeah, I thought the use of the phrase "witch-hunt" by Beaver et al. (2015) was very strange in a peer-reviewed journal. I mean, it's not new for these researchers to label their critics as ideologically motivated-in fact, the very same paper includes quotes like this: "...some reviewers are ideologically opposed to biosocial research and thus, employ virtually any tactic to provide a harsh critique of the submitted manuscript". But that's nothing compared to this quote from the same paper: "Certain journal editors, for instance, view themselves as gatekeepers of knowledge and strategically prevent biosocial studies from being published in their journals. In order to maintain the guise of being fair and impartial scholars, they hide behind the review process as though it obviates them from being biased against certain bodies of research or from stamping out studies submitted for publication by biosocial criminologists."

Where was I? Oh yeah, the "witch hunt" thing. So as I was saying, biosocial criminologists have giant persecution complexes and love to attack their critics as ideologically/politically motivated, but even by this standard, accusing their critics of an organized "witch hunt" against them seemed pretty weird, to put it mildly.

What's the context? They're talking about the Add Health dataset, which they note has been used for non-biosocial criminological research for many years already. Biosocial criminologists tried to use this dataset to assess the role of genetic factors in criminal behaviors, because, as Beaver et al. (2015) (hereafter B15) themselves note, "...it is genetically informative, as it includes kinship pairs along with specific genetic polymorphisms." So what is the pushback they're complaining about? Well, the main criticism of the use of Add Health data for "genetically informed" research in criminology seems to have been made by Burt & Simons (2014), who wrote the following:

"...of the identified 20 criminological twin studies published since 2008, 17 used the Add Health data. We do not argue that the genetic twin sample in the Add Health is deficient; indeed, the quality of the data seems to be extraordinary (Harris et al., 2006). We do believe, however, that reproducing findings of similar heritabilities for various criminal‐related traits on the same set of 289 MZ and 452 DZ twin pairs is problematic. Moreover, this means that most recent heritability estimates in criminology have been based on the same imperfect measures (self‐control, delinquent peers, delinquency, and victimization) that are available in the Add Health data."

That's one component of the criticism of genetic research based on Add Health. There's also this complaint B15 make: "...some journals, such as Crime and Delinquency, created editorial policies barring any more studies using the Add Health from being published in the journal." I found this claim hard to believe, so I searched the archives of Crime & Delinquency for any editorial statement even mentioning Add Health (aka the National Longitudinal Study of Adolescent to Adult Health, formerly known as the National Longitudinal Study of Adolescent Health). I found none, but I did find papers using Add Health data that have been published in this journal in this year (e.g. this one and this one). So clearly, any "policy" such as the one B15 claim existed in C&D does not exist there anymore, assuming that it ever did.

Basically, criticism of one's research, even if it includes attempts to prevent the publication of studies that tell us nothing we do not already know, should not be dismissed as an ideologically motivated "witch hunt". Unless of course you have no stronger, scientifically grounded argument with which to defend yourself.

Tuesday, June 26, 2018

A new law

In this post, I coin what I will name Scott's Law, which states:

"As an online discussion about a Supreme Court case grows longer, the probability of the decision's opponents comparing it to the Dred Scott v. Sandford decision approaches 1."

This was partly inspired by the recent op-ed "Move over, Dred Scott" by David D. Cole in The Washington Post, about the Supreme Court upholding Trump's travel ban.

There are plenty of examples to be found if you look around a bit, such as with Keith Olbermann and Citizens United (Olbermann's slamming of that decision was another major inspiration behind my coining of this law), Alan Grayson (also talking about Citizens United), Keith Ellison talking about the same Supreme Court case regarding Trump's travel ban as above, Rick Santorum about Obergefell v. Hodges, Ben Shapiro on Obamacare, and of course, Roe v. Wade and conservative Christians.

As with Godwin's Law, there are some situations where comparing a case you don't like to Dred Scott makes sense, and it's not automatically an invalid comparison. After all, historians overwhelmingly agree that it was one of the two worst Supreme Court decisions ever (along with Plessy v. Ferguson).

Thursday, May 24, 2018

A tale of two rejections

So far this year, I have submitted 4 articles to respected peer-reviewed journals: 2 to Intelligence, 1 to the Journal of Criminal Justice (JCJ), and 1 to Crime & Delinquency (CD). Of these 4 articles, 2 of them (1 submitted to Intelligence and the one submitted to the JCJ) have been rejected,* and it is these articles whose story I want to tell in this post (why am I saying "whose" when I'm talking about academic papers, not people? It sounds weird, but I can't think of a less awkward way to word this sentence, so whatever.)

Let's start with the paper I submitted to Intelligence, which was actually based in large part on this post (which I posted here in December 2017). I actually copied and pasted the post into the Word document I submitted to the journal (but of course, I modified it a lot before submitting it--what, do you think I'm that stupid? Hah!). Anyway, I submitted it in March of this year--while writing this post I actually dug back into my emails and discovered that this paper was submitted on March 4. I also discovered that the title of the submission was "A scientific critique of four arguments made in support of hereditarianism".

It was rejected without opportunity for improvement the next day, which of course was depressing and discouraging, but not nearly as much as it would've been if I'd had a lot more experience with this entire process. Weirdly, though I could find the email confirming that they'd received my submission, I couldn't find the email saying they rejected it. But I do remember the gist of the reasons it was rejected: the editor-in-chief of Intelligence, Richard Haier, said that he thought my review of the literature was too selective. He had some other criticisms that I don't remember off the top of my head. So of course after getting this news I just gave up on this particular submission and tried to move on, and I have done so since then without any major obstacles.

So what about the second paper? The one submitted to the JCJ? Well, that one was submitted later this March and was based largely on my previous post criticizing the paper by Walsh & Yun published in the JCJ last year. This paper was submitted at night, before I went to bed, and the morning after, once I woke up and had time to check my email, I discovered that it had already been rejected within less than 12 hours! The journal's editor, Matt DeLisi, said that it was "out of scope"--which I think is BS, since the original Walsh/Yun paper was no less out of scope than any of the content in my critique was. I actually briefly tried to get this decision appealed but gave up after failing to find a remotely effective way to do so.

So I will end this post by answering another obvious question: what about the other 2 papers that I mentioned at the start of the post? Well, they are both still under review: the CD one was submitted almost 2 weeks ago and nothing seems to have happened since then** (which is certainly weird), and the other Intelligence one is also still under review* (less surprising since I just submitted it last night). The subject of the CD one is whether the % of suicides committed with guns is a valid proxy for gun ownership over time, and the Intelligence paper is a meta-analysis of the black-white (mean) IQ gap in the US. Another obvious contrast between these submissions is that the CD one still says "awaiting reviewer selection" 12 days after I submitted it (on May 13), and the Intelligence one has already assigned a reviewer! Certainly this is not the kind of experience that will make me enthusiastic about submitting something to another SAGE journal in the future, to say the least.

*Update 5/25/18: the other Intelligence submission was just rejected as well.
**Update 6/20/18: the CD one has also been rejected.

Friday, May 18, 2018

Me and OpenPsych

OK, I feel like this is important enough to address here. I had been wondering whether I should or not and now I am convinced that people deserve to know what happened with regard to this subject.

So let's start from the beginning: despite the fact that I was very suspicious of many aspects of the OpenPsych journals (which I have criticized on this blog before), and of Emil Kirkegaard, the grad-school dropout who founded them, I still decided to submit a paper to one of these journals. Why? Why would I choose to associate myself with someone with views such as his (mainly because of his alleged support for child rape, as discussed in detail elsewhere)? The answer, basically, is that I thought I would be able to turn the journal around and get it to be taken seriously as a legitimate outlet for peer-reviewed research if I could get a well-designed paper published in it which did not conform to the hereditarian ideology of Kirkegaard and many other members of the journals' "editorial boards". I also hoped that there were other ways I could offer advice to make the journals more legitimate as peer-reviewed outlets should be, and allow Kirkegaard to respond to criticisms of the way they work.

As time went on, however, I decided that it was very unlikely that I would be able to move this journal from the category of scientific-racism echo chamber to that of respected new (albeit unusual) journal, and I didn't want my own reputation to suffer unnecessarily. Eventually I decided I should try to publish my paper (which can be viewed on OSF here) in a better journal: one that doesn't require me to come up with desperate justifications for associating myself with it. I haven't submitted it yet but I hope to do so soon.

I hope this is an adequate explanation of why I would associate myself with someone like this and with their ideological ilk. I certainly never agreed with the hereditarians behind this journal on almost any issues, but I was hoping to help them advance scientific inquiry in an unbiased manner; I no longer believe this to be possible enough to justify me working with them.

Friday, May 11, 2018

A roundup of criticisms of Richard Lynn's controversial intelligence "research"

You know that guy Richard Lynn, former emeritus professor of psychology at the University of Ulster (in Northern Ireland) who had his "emeritus" title revoked by the University last month? The guy whose website says that he has found that "the average IQ of blacks in sub-Saharan Africa is approximately 70" and that "men have a higher average IQ than women by about 5 IQs [sic] points"? The guy who is one of only a few academics both able and willing to argue in favor of eugenics (seriously, I am not making this up)? Who once said that it is “inevitable that whites will become a minority in the US sometime in the middle decades of the next century and this will entail a considerable deterioration in the quality of social, cultural and economic life"?

Well, he has published a large number of articles in peer-reviewed journals (some reputable, others not), as well as a bunch of books. These books contain arguments about putative race/sex differences in intelligence and other traits that Lynn claims are based on strong empirical evidence. But are they really? Many researchers say "no!" This post will compile examples of researchers criticizing Lynn's work on controversial topics. Note: I previously did something similar with J. Philippe Rushton's "research" in this post.

Kamin (1995), in his review of the Bell Curve (which includes citations to several of Lynn's articles), had this to say about Lynn's 1991 article in the racist journal Mankind Quarterly (full text here): "Lynn's 1991 paper describes a 1989 publication by Ken Owen as "the best single study of the Negroid intelligence." The study compared white, Indian and black pupils on the Junior Aptitude Tests; no coloured pupils were included. The mean "Negroid" IQ in that study, according to Lynn, was 69. But Owen did not in fact assign IQs to any of the groups he tested; he merely reported test-score differences between groups, expressed in terms of standard deviation units. The IQ figure was concocted by Lynn out of those data. There is, as Owen made clear, no reason to suppose that low scores of blacks had much to do with genetics: "the knowledge of English of the majority of black testees was so poor that certain [of the] tests... proved to be virtually unusable." Further, the tests assumed that Zulu pupils were familiar with electrical appliances, microscopes and "Western type of ladies' accessories."...The test's developer, John Raven, repeatedly insisted that results on the Progressive Matrices tests cannot be converted into IQs. Matrices scores, unlike IQs, are not symmetrical around their mean (no "bell curve" here). There is thus no meaningful way to convert an average of raw Matrices scores into an IQ...A. L. Pons did test 1,011 Zambian copper miners, whose average number of correct responses was 34. Pons reported on this work orally; his data were summarized in tabular form in a paper by D. H. Crawford-Nutt. Lynn took the Pons data from Crawford-Nutt's paper and converted the number of correct responses into a bogus average "IQ" of 75. Lynn chose to ignore the substance of Crawford-Nutt's paper, which reported that 228 black high school students in Soweto scored an average of 45 correct responses on the Matrices--HIGHER than the mean of 44 achieved by the same-age white sample on whom the test's norms had been established and well above the mean of Owen's coloured pupils." [Emphasis mine.]
Wicherts et al. (2010a): "On the basis of the samples he deemed representative, Lynn concluded that the average IQ of sub-Saharan Africans stands at 67 when compared to UK norms after a correction of the Flynn Effect. We criticize his methods for being unsystematic...Lynn's methods in selecting samples remain unsystematic; he is inconsistent in his reasons to exclude samples, and too unspecific to allow replication by independent raters...Lynn asserts that his conversion method from CPM scores to SPM norms is unproblematic, because ceiling effects are absent. This assertion is untenable." [Emphasis mine.]
Zuckerman (2003): "Lynn's claim that certain races or ethnic groups have a higher incidence of psychopathic personality is not substantiated by large scale community studies in America that show no differences between these groups in the diagnosis of antisocial personality disorder. No consistent racial differences are found in traits closely associated with psychopathy, sensation seeking and psychoticism, and, Lynn to the contrary, the Psychopathic Deviate scale of the MMPI." (There's a lot more criticism of Lynn in the rest of this paper, which makes sense since its title describes it as a "critique of Lynn (2002)", which is a paper on race and psychopathic personality. Anyway, read the rest of it if you want to hear more of Zuckerman's criticisms of Lynn (2002), and the rest of Lynn's work as well.)
Hill (2002): "Finding a modest yet statistically significant correlation between skin tone and vocabulary test scores among African Americans, Lynn (2002) concludes that “intelligence in African Americans is significantly determined by the proportion of Caucasian genes” (p. 365). In this reanalysis of Lynn's data, I demonstrate that his bivariate association disappears once childhood environmental factors are considered. Therefore, a genetic link between skin color and intelligence among African Americans cannot be supported in his data."
Wicherts et al. (2010b): "On the basis of several reviews of the literature, Lynn [Lynn, R., (2006). Race differences in intelligence: An evolutionary analysis. Augusta, GA: Washington Summit Publishers.] and Lynn and Vanhanen [Lynn, R., & Vanhanen, T., (2006). IQ and global inequality. Augusta, GA: Washington Summit Publishers.] concluded that the average IQ of the Black population of sub-Saharan Africa lies below 70...The assertion that the average IQ of [Black Sub-Saharan] Africans is below 70 is not tenable, even under the most lenient of inclusion criteria...our extensive search for relevant studies resulted in additional studies of IQ in Africa that Lynn (and Vanhanen) missed. This was partly caused by the fact that we had access to African journals that did not show up in Lynn (and Vanhanen)'s work. Because Lynn (and Vanhanen) missed a sizeable portion of the relevant literature, their estimate of average IQ of Africans is clearly too low [sic]." [Emphasis mine.]
Wicherts et al. (2010c) (note: this is a response to a response by Lynn and Gerhard Meisenberg to the conclusions of Wicherts et al. (2010b)): "Lynn and Meisenberg's assessment of the samples' representativeness is not associated with any of the objective sampling characteristics, but rather with the average IQ in the sample. This suggests that Lynn and Meisenberg excluded samples of Africans who average IQs above 75 because they deemed these samples unrepresentative on the basis of the samples' relatively high IQs. We conclude that Lynn and Meisenberg's unsystematic methods are questionable and their results untrustworthy."
Volken (2003) (note: this is a review of IQ and the Wealth of Nations, a 2002 book co-authored by Lynn and political scientist Tatu Vanhanen): "Recently Richard Lynn and Tatu Vanhanen have presented evidence that differences in national IQ account for the substantial variation in national per capita income and growth. However, their findings must be considered as highly problematic. The authors neither make use of state‐of‐the‐art methodological techniques nor can they substantiate their theoretical claims. More precisely the authors confuse IQ with human capital and fail to adequately discuss the causal sequence of their argument."
Lichten (2008): "In a recent article in this journal [i.e. the Journal of Biosocial Science], Lynn et al. (2007) found a high correlation between average national IQs and achievement test scores in 67 countries and concluded ‘The correlation is so high that national IQs and educational achievement appear to be measures of the same construct.’ The author finds here the data do not support this conclusion."
Skeem et al. (2003) (note: this is a critique of Lynn (2002)): "Lynn's analysis is problematic on three primary counts. First, he equates psychopathy with generalized antisocial behavior and social deviance and fails to distinguish longstanding personality-based from behavior-based conceptions of this syndrome. Second, Lynn presumes rather than demonstrates that genetic factors explain race differences in antisocial behavior and social deviance, neglecting such potential alternative explanations as socioeconomic status and measured verbal intelligence. Third, Lynn presents an evolutionary explanation for putative racial and ethnic group differences in psychopathy that fails to reflect current methods and practices of evolutionary biology and genetics."
Beraldo (2010) was one of many articles published in Intelligence critiquing the conclusions of Lynn (2010), a study which claimed to find a strong positive correlation between regional IQ, income, mortality, stature, and other variables. Beraldo specifically focuses on the putative IQ-income correlation found in Lynn's paper, arguing that "Lynn's analysis is not sufficiently robust to support its conclusions." Later, he says: "A critical point which makes the results of Lynn unconvincing, is that they are not grounded on a clear distinction between correlation and causation."
Cornoldi et al. (2010) was another article in the same category as 10). It focuses on the validity of the school assessment data that Lynn used to calculate his "IQ" scores for various areas of Italy, making the following 4 points: "1) school measures should be used for deriving IQ indices only in cases where contextual variables are not crucial: there is evidence that partialling out the role of contextual variables may lead to reduction or even elimination of PISA differences; in particular, schooling effects are shown through different sets of data obtained for younger grades; 2) in the case of South Italy, the PISA data may have exaggerated the differences, since data obtained with tasks similar to the PISA tasks (MT-advanced) show smaller differences; 3) national official data, obtained by INVALSI (2009a) on large numbers of primary school children, support these conclusions, suggesting that schooling may have a critical role; 4) purer measures of IQ obtained during the standardisation of Raven's Progressive Coloured Matrices also show no significant differences in IQ between children from South and North Italy."
Cornoldi et al. (2013) is a response to some of Lynn's responses to criticisms of his 2010 paper on the IQ in Italy. It states: "...the use of PISA data to make inferences about regional differences in intelligence is questionable, and in any case, both PISA and other recent surveys on achievement of North and South Italy students offer some results that do not support Lynn's conclusions."
Moreale & Levendis (2014): "We re-examine Lynn and Vanhanen's argument that gross domestic product (GDP) depends upon IQ...education has a stronger impact on GDP than does IQ, whose effect we find to be insignificant. In other words, it is a country's actual human capital, rather than its potential human capital, which determines its GDP. In short, we are unable to replicate their results."
Parker (2004) was a paper written by Macon Paul Parker, who was then an undergraduate in his senior year at the College of Charleston. It is dedicated largely to critiquing and reanalyzing Lynn (1999), a study claiming a statistically significant correlation between higher intelligence and a lower number of children and siblings. Parker incorrectly talks about this paper as though it was published in 2000, for whatever reason. Anyway, Parker concludes that "Lynn argues that his analysis provides evidence of dysgenic fertility for the first two generations in the twentieth century in the United States (Lynn). However, a substantial body of evidence exists to refute such claims. The re-specification and reanalysis reveals that the lack of control for other variables which influence fertility reveals that education and socioeconomic status are not merely proxies for intelligence, but play an important, spurious role in relating intelligence and fertility...education plays a more important role in determining number of children, before race, age, sex, socioeconomic status and intelligence."
Huang & Hauser (2000): "Using aggregate data from the General Social Survey (GSS), 1974-1996, Lynn (1998) claims that the Black-White intelligence difference in the United States has not been narrowing over time. We replicate Lynn’s analysis and challenge his conclusion by identifying several methodological problems. By analyzing changes in Black-White differences in the GSS vocabulary test across survey years, rather than birth cohorts, Lynn overlooks both the duration and the significance of the Black-White convergence."
Thomas (2011): "In the early 1990s, psychologist Richard Lynn published papers documenting average reaction times and decision times in samples of nine-year-olds taken from across the world. After summarizing these data, Lynn interpreted his results as evidence of national and racial differences in decision time and general intelligence. Others have also interpreted Lynn's data as evidence of racial differences in decision time and intelligence. However, comparing Lynn's summaries with his original reports shows that Lynn misreported and omitted some of his own data. Once these errors are fixed the rankings of nations in Lynn's datasets are unstable across different decision time measures."
Robinson et al. (2011): "...data on Italian regional differences in educational achievement obtained in a much larger INVALSI study of 2,089,829 Italian schoolchildren provide unequivocal evidence that Lynn's educational achievement measure [based on OECD tests] is not a valid index of IQ differences. More generally, the lengthy literature review in Lynn's article reveals uncritical acceptance of reported correlations between any putative index of IQ and socio‐economic variables. Any measure of cognitive performance that is correlated with IQ is considered a measure of IQ, even if there is only a weak correlation. All correlations between such measures and socio‐economic or public health variables are viewed as evidence of direct causal relationships. In all cases, causality is assumed to be in the direction that supports Lynn's doctrine when it would be equally valid to argue that socio‐economic and public health differences cause differences in the performance of IQ tests." [Emphasis mine.]
Felice & Giugliano (2011): "...the evidence presented by the author [i.e. Lynn] is not sufficient to say that the IQ of Southern Italians is lower than the one of Northern Italians...his analysis does not prove that there is any causal link between what he defines as IQ and any of the variables mentioned...there is no evidence that the alleged differences in IQ are persistent in time and, therefore, attributable to genetic factors."
Daniele (2015): "This paper has examined Lynn's (2010a) hypothesis that socio-economic inequalities between the Italian regions are explained by genetically-rooted differences in average intelligence...Results show how both IQ and math test scores are strongly related to current socio-economic development of Italian regions. But, when historical data on income, infant mortality or life expectancy are used, a different picture emerges: the correlations are insignificant, weak, or, as in the case of infant mortality, do not support the suggested link between “regional intelligence” and socio-economic development at all." [Bolded emphasis mine.]
Daniele & Malamina (2011): "Socioeconomic disparity between North and South Italy has been recently explained by Lynn (2010) as the result of a lower intelligence quotient (IQ) of the Southern population. The present article discusses the procedure followed by Lynn, supplementing his data with new information on school assessments and per head regional income. Genetic North–South differences are then discussed on the basis of the most recent literature on the subject. The results do not confirm the suggested IQ-economy causal link." [Emphasis mine.]
D'Amico et al. (2012): "Our examination of intelligence test score differences between the north and south of Italy led to results that are very different from those reached by Lynn (2010). Our results demonstrate that by using intelligence tests to assess differences in ability rather than using achievement scores as a proxy for intelligence, children from the south of Italy did not earn lower scores than those from the north of Italy." [Emphasis in original.]
Berhanu 2007: "I review the book IQ and the Wealth of Nations, written by Richard Lynn and Tatu Vanhanen...The essay exposes the racist, sexist, and antihuman nature of the research tradition in which the authors anchored their studies and the deep methodological flaws and theoretical assumptions that appear in their book. The low standards of scholarship evident in the book render it largely irrelevant for modern science. This essay specifically deals with the IQ value of Ethiopian immigrants that came from Israel, used by the authors as representing the National Average IQ of Ethiopia. Most of these immigrants had rudimentary knowledge of literacy, and experienced an abrupt transition from rural Ethiopia to Israel with all the accompanying effects that it entails such as trauma, dislocation, and cultural shock. The test was conducted a few months after their arrival. That specific study, conducted by two Israelis, that assigns low IQ to the immigrants is also replete with technical and statistical errors."
Berhanu 2011: "Lynn‘s central thesis in Chapter 2 [of his 2006 book Race Differences in Intelligence] is that aspects of physical appearance—phenotype—are outward manifestations of heritable traits such as abilities, propensities for certain behaviours, diseases, and other sociocultural characteristics. He attempts to demonstrate that in Chapters 3-17. These are all futile attempts, however, given the state of the art and current genetic research...Not only are the relevant genetic data absent, but the distribution of polygenic phenotypes does not suggest that race is a useful category. On this very shaky ground, American society has created social arrangements and public policies that assume that race is a real phenomenon and that distinct racial populations exist. And still worse, in the name of science, Lynn advances his continued essentialist position that race is real."