You find our results on the slides below or you can look at the pdf here. Please be warned that the figures are not publication quality. As you will see, the labels are sometimes in awkward places or weirdly formatted. However, I think the results are fairly robust and at this point they are unlikely to change much.
The brief summary is that, after some back and forth, we managed to identify the origin of the difference between the two data sets. In the end we get a gender-difference that is not as large as Strumia et al found it to be in the Inspire data and not as small as we originally found it in the arXiv data. The male/female ratio for the citations normalized to authors is about 1.5 for both the arXiv and the Inspire data.
We then tried to find out where the difference comes from. This is not all that obvious because the particular measure that Strumia used combines various kinds of data. Eg, it depends on how frequently authors collaborate, how many papers they publish, and how much those papers are cited.
We know that the total number of citations is comparable for men and women. It turns out that part of the reason why women have a lower score when when one counts the total citations divided by the number of authors is that women write (relatively) fewer single authored papers than men.
This, however, does not explain the entire difference, because if you look at the citations per single-authored paper (ie, without summing over all papers), then women also get fewer citations.
We then looked at where those citations are (or are not) coming from, and found that both men and women cite single-authored papers with female authors at a lower frequency than you would expect from the share among the citeable papers. It turns out that in the past 20 years the trend in women-to-women citations (single-authored papers only) has gone up, while for men-to-women citations it has remained low.
It is not a huge difference, but since there are so many more men than women in those fields, the lack of citations from male authors to female authors has a big impact on the overall number of citations that women receive.
In all those analyses, we have removed authors who have not published a paper in the past 3 years or who have fewer than 5 papers in total. This is to avoid that the higher percentage of dropouts among women pulls down the female average.
One of the most-frequent questions I get when I speak about our bibliometric stuff (not only this, but also our earlier works) is what are my own scores on the various indices. I usually answer this question with “I don’t know.” We don’t dig around in the data and look for familiar names. Once we have identified all of an author’s papers, we treat authors as numbers, and besides this, you don’t normally browse data tables with millions of entries.
Having said this, I have come to understand that people ask this question to figure out what are my stakes, and if I do not respond, they think I have something to hide. Let me therefore just show you what my curve looks like if you look at the index that Strumia has considered (ie the number of citations divided by the number of authors, summed up over time) because I think there is something to learn from this.
(This is the figure from the Inspire-data.)
Besides hoping to erase the impression that I have a hidden agenda, the reason I am showing you this is to illustrate that you have to be careful when interpreting bibliometric measures. Just because someone scores well on a particular index doesn’t mean they are hugely successful. I am certainly not. I am 42 years old and have a temporary position on a contract that will run out next year. I may be many things, but successful I am not.
The reason I do well on this particular index is simply that I am an anti-social introvert who doesn’t like to work with other people. And, evidently, I am too old to be apologetic about this. Since most of my papers are single-authored, I get to collect my citations pretty much undiluted, in contrast to people who prefer to work in groups.
I have all reason to think that the measure Strumia proposes is a great measure and everyone should use it because maybe I’d finally get tenured. But if this measure became widely used, it would strongly discourage researchers from collaborating, and I do not think that would be good for science.
The take-away message is that bibliometric analysis delivers facts but the interpretation of those facts can be difficult.
This research was supported by the Foundational Questions Institute.