To train the network, we took a random sample of authors and asked the network to predict these authors’ publication data. In each cycle the network learned how good or bad its prediction was and then tried to further improve it.
Concretely, we trained the network to predict the h-index, a measure for the number of citations a researcher has accumulated. We didn’t use this number because we think it’s particularly important, but simply because other groups have previously studied it with neural networks in disciplines other than physics. Looking at the h-index, therefore, allowed us to compare our results with those of the other groups.
After completing the training, we asked how well the network can predict the citations accumulated by authors that were not in the training group. The common way to quantify the goodness of such a prediction is with the coefficient of determination, R2. The higher the coefficient of determination, the stronger the correlation of the prediction with the actual number, hence the better the prediction. The below figure shows the result of our neural network, compared with some other predictors. As you can see we did pretty well!
The blue (solid) curve labelled “Net” shows how good the prediction of our network is for extrapolating the h-index over the number of years. The other two curves use simpler predictors on same data. |
We found a coefficient of determination of 0.85 for a prediction over ten years. Earlier studies based on machine learning found 0.48 in the life-sciences and 0.72 in the computer sciences.
But admittedly the coefficient of determination doesn’t tell you all that much unless you’re a statistician. So for illustration, here are some example trajectories that show the network’s prediction compared with the actual trend (more examples in the paper).
However, that our prediction is better than the earlier ones is only partly due to our network’s performance. Turns out, our data are also intrinsically easier to predict, even with simple measures. You can for example just try to linearly extrapolate the h-index, and while that prediction isn’t as good as that of the network, it is still better than the prediction from the other disciplines. You see this in the figure I showed you above for the coefficient of determination. Used on the arXiv data even the simple predictors achieve something like 0.75.
Why that is so, we don’t know. One possible reason could be that the sub-disciplines of physicists are more compartmentalized and researchers often stay in the fields that they started out with. Or, as Nima Arkani-Hamed put it when I interviewed him “everybody does the analytic continuation of what they’ve been doing for their PhD”. (Srsly, the book is fun, you don’t want to miss it.) In this case you establish a reputation early on and your colleagues know what to expect from you. It seems plausible to me that in such highly specialized communities it would be easier to extrapolate citations than in more mixed-up communities. But really this is just speculation; the data don’t tell us that.
Having said this, by and large the network predictions are scarily good. And that’s even though our data is woefully incomplete. We cannot presently, for example, include any papers that are not on the arXiv. Now, in some categories, like hep-th, pretty much all papers are on the arXiv. But in other categories that isn’t the case. So we are simply missing information about what researchers are doing. We also have the usual problem of identifying authors by their names, and haven’t always been able to find the journal in which a paper was published.
Now, if you allow me to extrapolate the present situation, data will become better and more complete. Also the author-identification problem will, hopefully, be resolved at some point. And this means that the predictivity of neural networks chewing on this data is likely to increase some more.
Of course we did not actually make future predictions in the present paper, because in this case we wouldn’t have been able to quantify how good the prediction was. But we could now go and train the network with data up to 2018 and extrapolate up to 2028. And I predict it won’t be long until such extrapolations of scientists’ research careers will be used in hiring and funding decisions. Sounds scary?
Oh, I know, many of you are now dying to see the extrapolation of their own publishing history. I haven’t seen mine. (Really I haven’t. We treat the authors as anonymous numbers.) But (if I can get funding for it) we will make these predictions publicly available in the coming year. If we don’t, rest assured someone else will. And in this case it might end up being proprietary software.
My personal conclusion from this study is that it’s about time we think about how to deal with personalized predictors for research activity.
I am proud of my intellectual growth over the decades, finally actually understanding physics through teaching it, and following my star in my research, with the best still ahead....whee! what a privilege, to be alive, and doing physics!
ReplyDeletebee:
ReplyDeletewhy should anyone care about such things?
in my day (the 70's) the scientific citatioon index was used to evaluate the research accomplishment of individuals who were 'up for tenure' but that was eventually abandoned because its methodology is inherently flawed. and ALWAYS, university administrators (department heads, deans, various academic promotion committees) say that they consider the number and size of one's research grants to be a valid measure of the acceptance by the peers of a faculty member of the researcher's work (this of course, is an outright lie. grants are counted because of the $ they bring into the university which can be used to pay for so-called 'indirect costs').
the best way to evaluate a researchers work (not the 'productivity', which has nothing whatever to do with the quality of the work) was stated by Feynman in response to a request for him to evaluate someone's work:
Dear Professor Feynman,
Dr Marvin Chester is presently under consideration for promotion to the Associate Professorship in our department. I would be very grateful for a letter from you evaluating his stature as a physicist. May I thank you in advance for your cooperation in this matter.
Sincerely, D.S Saxon Chairman Dick: Sorry to bother you, but we really need this sort of thing. David S.
Dr. D.S. Saxon, Chairman Department of Physics University of California, Los Angeles Los Angeles, California
Dear David:
This is in answer to your request for a letter evaluating Dr. Marvin Chester’s research contributions and his stature as a physicist.
What’s the matter with you fellows, he has been right there the past few years–can’t you “evaluate” him best yourself?I can’t do much better than the first time you asked me, a few years ago when he was working here, because I haven’t followed his research in detail. At that time, I was very much impressed with his originality, his ablity to carry a theoretical argument to its practical, experimental conclusions, and to design and perform the key experiments. Rarely have I met that combination in such good balance in a student. Was I wrong? How has he been making out?
Sincerely yours, R.P. Feynman
btw - hope your book is doing well. it's a fun read (although rather depressing since it is likely to have as much practical impact as the books of Smolin and Woit, which is to say, none. academia, and academic administrators in particular, follows its own rules selecting faculty for hiring and promotion. i see the future of theoretical physics as rather bleak. out golden era is long gone. more's the pity.
naive theorist
naivetheorist:
ReplyDeleteYou could answer your own question if you ask yourself what has changed since the 1970s. Answer: More administrators asking for ever more quantifiers for research output. That many universities look at how much grants someone brings in only pushes around the bump under the carpet. (I even comment on this in my book, see appendix.) Because who decides who gets grants? And are you seriously telling me that the reviewers don't care how many papers the applicant has written and how well those were cited?
Besides this, as I wrote in the blogpost and we also in the paper, we looked at the h-index to be able to compare our results with those of other groups. Maybe in the end you may want to predict something else. And maybe you don't like that prediction at all, but that doesn't matter, because someone will do it anyway. Best,
B.
There's also the "corporatification" of big science. It's possible for a person to spend their entire 40 or 50 year research career in the bosom of the LHC. It means that the research activity is exceedingly correlated over the years.
ReplyDeleteIsn't a predictor an outcome confirmator?
ReplyDeleteJust listening in.
ReplyDeletesean s.
"Why that is so, we don’t know" Replace creating paradigms (science) for adding parameters (politics). Displaced parallel trajectories obtain.
ReplyDeletehttps://www.youtube.com/watch?v=grXDKcsv7KA&t=0m30s
… Individual dissent is group failure for suspension bridges without anchorages.
Falsify quantum mechanics! A molecular beam traverses a grating. A given molecule's wavefunction traverses all slits. Tracing paths collapses the pattern - no dissipation occurs. A homochiral molecular beam engages Hund's paradox. Molecules must racemize or give no pattern, or QM is busted. Markus Arndt for apparatus. Exceptionally chiral, multiply-connected (no classical racemization path) molecules are literature syntheses. Or bloody use natural camphor.
QM is validated to 15 decimal places. QM percentage failure is madness - by peer vote. Don’t look.
Dr. H. This article tangential to your book. I suppose such related work is natural (Not meant in the "physics sense") and for a time there will be some interest in what you write tho perhaps not from the physics community. If you are invited to the SF area please let me know. I am a good driver and escort. Ask Ruth Kastner!
ReplyDeleteMeanwhile your book is wonderful. I am writing a review now and will let you know when it is up. You also suggest my next essay for there is a rationale... Not all beautiful theories are true but it will turn out that all true theories are beautiful... At least so I will argue... :)
Lots of citations are nice but the ability to bring in lots of grant money is far nicer. The two are not necessarily correlated.
ReplyDeleteTanner,
ReplyDeleteNot necessarily, but they almost certainly are very strongly correlated.
bee:
ReplyDelete"are you seriously telling me that the reviewers don't care how many papers the applicant has written and how well those were cited?"
i'm saying that the adminstrators in a university don't care at about the 'quality' of the faculty (as either researchers or teachers)., per se and the faculty don't care enough to read and judge the work of their colleagues. the university (at least in the U.S.) is corrupt through and through. it's rather quite nauseating to me.
but i'm old and cranky so i no longer worry about the long term future of theoretical physics or of the university. no, i just read scientific papers (arXiv and the internet are my lifelines), watch movies and observe the devolution the U.S. (and the free world) under Trump. if the human species is replaced by another (natural or artificial), it'll probably be an improvement.
naive theorist.
I'm visualizing a "Minority Report" scenario where three "pre-impact" neural nets vote on tenure after a year of grad school.
ReplyDeleteI was a member of a funding committee for about ten years, and yes, I suppose I am some sort of heretic, but I did not look at citations, and only glanced at the number of publications to make sure the person was still alive and functioning. I had two criteria: was the proposed project worth doing (and yes, I now that is subjective, but the applicant had to try and persuade me, and here the BS filters useful). The second criterion was what had the applicant achieved in the past few years? Was it significant or was it handle-cranking? Yes, that is subjective as well, but again the applicant was supposed to provide evidence here, and again the BS filter was useful. Additionally, I read a few papers, just to check on whether I thought the assessment here was fair. This took an unfortunately large amount of time, and I was glad to finish and find tome to do something more productive for me. Of course decision were made by a panel, and there were interesting games played there, but overall, I felt the system worked well then. My only regrets were at the border of fund/not fund, and that border, in my opinion, was too harsh simply encase there was not really enough money. I am sure other panels have faced this problem.
ReplyDeleteOOOoooohhhh! I JUST THIS VERY MOMENT unwrapped my LOST IN MATH, which I'd pre-ordered, a million years ago! Rather drab in appearance, but full of good stuff, I am sure!
ReplyDeleteHello Dr. Hossenfelder,
ReplyDeleteA quick comment on statistics, and I feel like a quibbler in this, R^2 is the coefficient of determination not correlation and assesses explained variation due to the experimental variable. Derived from the coefficient of correlation which ranges from -1.0 to +1.0 R^2 cannot be negative, obviously.
I greatly enjoy your style and look forward to reading your book,
Tim
Timothy,
ReplyDeleteThanks for your note, I keep confusing these two. I have fixed this in the blogpost.
Hi, I'm new here and an alien from another field of study.
ReplyDeleteTwo points:
1. If your data was incomplete then your network's prediction has no true predictive value on real world, complete data. Machine learning works under certain assumptions, principal among them that the distribution of the data stays the same between seen and unseen data. If this assumption is violated, performance on real world data will drop precipitously.
Of course, if your data *was* complete your network would begin to overfit to noise. To combat this effect you would need *a lot* of data, probably so much you'd never really be able to collect it. We're talking something in the order of hundreds of thousands of examples (i.e. authors). Realistically, that's really hard to get.
You can try to compensate in various ways- regularisation, basically. Unfortunately, even state-of-the-art deep neural nets trained by large corporations with access to (really) large amounts of data and computing power tend to overfit like the blazes.
2. Neural networks -and machine learning classifiers in general- are a rubbish way to make predictions about the real world. For one thing, with a bit of elbow grease it's possible, even simple, to push the accuracy of a prediction to about 80%. But that's not because your model is really good at predicting something. Usually, it's the result of some creative data manipulation.
Now, in a true scientific field I know that manipulating data to get a certain result is somewhat frowned upon. In machine learning on the other hand we are interested exactly in the little "tricks" that can give us good results on our testing data.
Unfortunately, machine learning results are nearly completely meaningless in the real world, for the reasons I highlight above: overfitting and a lack of any guarantee that the distributional consistency assumption will hold when a model is deployed in the real world.
Perhaps the focus of your work should be to point out that real-world predictions using machine learning techniques are tenuous at best and therefore should not be accepted as inevitable, but shown up as useless.
Hi Sabine,
ReplyDeletewhy don't you sell sell this neural net? (License and maintenance fees.) If someone is silly enough to buy it, why give it for free? After all, this is your work, and the results of your ideas.
Best,
J.
stassa,
ReplyDeleteNot sure what your comment about "incomplete data" is supposed to mean. Data about the real world is always incomplete, unless you believe we live in a computer simulation and you got the numbers straight from the programmer.
If we'd make a real prediction we'd of course presently have the same incomplete data, as with authors not having all of their papers on the arXiv.
We actually do have some hundred thousand authors, but we throw out most of them because (interestingly enough) most authors on the arxiv only have one or two papers (never occurred to me).
In any case, the method we are using can in principle include other databases, increasing the amount of data by at least a factor ten. We really just used the arXiv because a) it's easy, b) we're familiar with the terrain, and c) well, you have to start somewhere.
So, as I said, if I can get funding, we'll try to improve on that. But I think it's a good start. Best,
B.
akidbelle,
ReplyDeleteI understand where you're coming from, but your suggestion doesn't fit with my hopes for this project. I am not interested in making money, I am worried about the current organization of academic research, and I think that the way we currently assess "success" is a big part of the problem. We'll not solve this problem by developing proprietary software for extrapolating publication counts that will create yet more pressure on researchers. Best,
B.
I just finished reading your book this evening. If nothing else your paper re-enforces the theme of your book.
ReplyDeleteYour discussion makes clear for me the problems facing the theoretical physics community.
Sad to say many of the human foibles the you delineated are present in other academic and professional groups.
Your book will be a valuable addition to my library.