The New Scoring System.

So let’s see, how much can I say about this topic. Summary statements (SS) arrived while I was on vacation. Thankfully I couldn’t open them from abroad.

These SS are from a proposal that was scored under the old scoring system as an A1 submission, and scored again using the new system as an A2 submission. AND IT WENT BACKWARDS- by a lot… to the 40 something percentile. That’s an impact/priority score in the 50s using the new system. How can a proposal that I thought was better- and addressed all the critiques from the previous 2 rounds of review… do WORSE???  Apparently the reviewers didn’t think it was better, and that’s the only thing that counts.

Because everyone keeps asking me what one gets back as a review now that there is a new format, I’m gonna tell ya’ll. First, I got an overall impact/priority score and percentile ranking, followed by a short paragraph summarizing the review panel discussion.  The description of the project (provided by me) followed, as did the public health significance section.  I got three individual critiques, in a standard format.  Each individual critique began with individual scores (on a 1 -9 scale, I think- although it doesn’t say) in each of the following categories: significance, investigator, innovation, and approach. In the next section of each critique each of the areas that I listed in the previous sentence as well as ‘overall impact’ were evaluated by their strengths and weaknesses- given as 2-3 bullet points for each. So, for example, one of the critiques had 5 bullet points total for ‘overall impact’ – pretty equally divided between strength and weakness categories.

How the individual 1-9 scores in each category were added, subtracted, calculated, averaged- or what-the-hell-ever was done to them- to arrive at the big number … the ‘impact/priority’ score, was unclear to me. I’ll have to poke around and see if there is an explanation of how this works on some NIH website, or maybe writedit can explain it to me. It seems odd that I didn’t get an average 1-9 score (averaged out from the 3 reviewers maybe?) for each individual category (significance, etc.), or an individual score in each of the areas from the whole panel.  Looking at the scores from the individual critiques for each category, the scores seem to be all over the map.  Furthermore, the bullet points and the score in a category sometimes don’t match. I mean, do you give someone a 2 for innovation- then say ‘approach really isn’t very different from previous work’??? Cause if 1 is the best score you can get for innovation, or any other category for that matter, a score of 2 for innovation should be pretty fricking innovative. But WHATEVER.

My overall impression of this new arrangement? First, I thought the critiques were quite general… just lacking in specificity. And more blunt. If I were going to turn this proposal over again- that I would find it hard to address these very broad brush strokes. I also think it would be harder to determine when a proposal should really be dumpster bound- because those blunt general criticisms appear more negative than they really are. Perhaps.

Second, I’m uncertain how to define ‘significance’.  What I find significant for work in my field, colleagues in my field might not find so significant- so where does the consensus lie? For example- I think it would be hugely significant to definitively describe, for the first time, a set of genes that are important for a particular bacterium to live in a particular niche that is critical for its ability to cause disease. But, colleagues in my field might say- who cares about your pathogen-du-jour- such pathogen-centric results aren’t significant in the context of the wider field of microbiology.  Or- we can come back to the whole hypothesis driven vs. descriptive argument again- we already know that many scientists value studies that determine a molecular mechanism over other studies that uncover the potential for a new molecular mechanism (since such studies by definition would be descriptive).  Who’s to say which idea of significance should get a score of 1, and which should get a 10?

Third, I spent such enormous effort learning how to read the reviews in the old system, figuring out what was meant by each sort of ‘stock’ critique- there will definitely be a learning curve to deciphering this, in all its generality, in the new scoring system.


10 thoughts on “The New Scoring System.

  1. The only reviewers who give individual scores for the different categories are the assigned reviewers. They–along with all the other members of the panel–also give an overall score, which is averaged and multiplied by ten to give the final overall score that gets percentiled. There is no requirement for any particular numerical relationship between the specific individual category scores and the overall score given by each assigned reviewer. So, an assigned reviewer could theoretically give you all 2s and 3s on the individual categories, and then give your application an overall 5.

  2. C PP- Could someone tell me how this:

    ‘There is no requirement for any particular numerical relationship between the specific individual category scores and the overall score given by each assigned reviewer. So, an assigned reviewer could theoretically give you all 2s and 3s on the individual categories, and then give your application an overall 5.’

    makes sense?

  3. It does not.

    Based on what you, and others, have said; I am getting concerned about how the reviews are really providing useful feedback. Combined with only a single revision being allowed, I wonder if I will ever get an R01!

  4. DrDrA – The powers-that-be at NIH are very explicit about the lack of a relationship between . They are providing the “individual categories” as a way for the assigned reviewers to “communicate their intention better” to the applicant. The overall impact score is only “related” in the idea that assigned reviewers are self-consistent.

    As an example from a recent study section, we had several cases where the individual scores were all very good (1’s and 2’s) except for one person who found a fatal flaw in one aspect (and gave that aspect a 6), from which the overall impact score was a 6. Since that reviewer convinced the entire study section that he was right, the poor applicant got a 60.

    Also, we found at the study section that no one knew how to judge the scoring system. It sounded like NIH wanted the scores to translate non-linearly (something like 10 = 1.0, 20-70 = 2.0-3.5, and 80-90 = >3.5) but my study section translated the numbers very linearly (new = 2*old). This caused a lot of problems because the some grants were scored on the non-linear system and others on the more linear system.

    The other issue that I would like to point out is that as a reviewer I found it very difficult to turn things that were both positives and negatives into bullet points. For example, the 2 with “it’s a standard technique” might be because it’s a standard technique that works really well. People also found it very difficult to discuss from their bullet points. (They kept missing key facts that always got coded into the paragraphs.)

    There’s no question that the new system is very different from the old and there’s going to be a lot of pain and suffering as we figure it out.

    The more I used the old system, the more it made sense to me. I learned how to interpret the difference between a 1.4 and a 1.6, between a 2.4 and a 2.6. I learned what code words were hidden in paragraphs, and I learned when things had to change and how in a revision. As a reviewer, I always found that my scores (even when I was sure they were way out of whack) matched my colleagues surprisingly well. This time none of those things were true. From a reviewer standpoint, I found myself liking this review structure less and less the more I used it.

  5. DrDrA–It is important to bear in mind that the overall score is called an “impact” score. According to NIH: “Each member’s impact score will reflect his/her evaluation of the overall impact that the project is likely to have on the research field(s) involved.” The overall score is not merely reflecting the quality of the proposal, but the likelihood that this work will meaningfully advance the field. Thus, each individual aspect of the proposal may be OK, but the whole can be less than (or greater than) the sum of the parts.

    That said, CPP’s precise scenario is unlikely in practical terms. What is more likely is that an application can get very good scores (1-3) on 4 criteria, but the fifth sinks the grant. One example is the aforementiones scenario where a fatal flaw in “approach” garners a 6 for that item but also a 6 overall. Similarly, a extremely well-written, well-supported application might be viewed as having limited significance (e.g., a bunny-hopping grant that ends up in a non-bunny-hopping study section). On the other hand, a grant might get a 5 or 6 on innovativeness but the reviewer feels that the methods are appropriate and sound and the work needs to be done, so the overall impact is given a 2.

    Though I am still getting comfortable with it, I like the new scoring system. Can’t say the same for the bullet point critique format.

  6. N-c the trouble is that the subscores are essentially meaningless info with no functional purpose other than to confuse the hell out of applicants. Heckuva job NIH!

  7. bikemonkey – The subscores can be useful as a communication medium for a reviewer to try to tell an applicant where the problem is. (As in my earlier example of a fatal flaw. ) Partially, I think this is supposed to counter-act the weakening of the communication due to the stupid bullet-point idea. Personally, I think that the paragraphs communicated what needed to be fixed just fine.

    There’s nothing in the review document that says we have to write bullet points separated by “strengths” and “weaknesses”. I suggest that we all write paragraphs again. Viva la revolution!

  8. I’ve had reviewers look at a grant and one say it was too broad while the other said it was too narrow. We resubmitted several times (not to NIH), each time addressing all the reviewers’ comments, but to no avail. My take on this and some other similar situations is that reviewers are just not excited by the proposal: wanting to fund something else instead, they deny it on any tiny (or non-existent) reason. My conclusion: if your proposal gets kicked back for stupid reasons that nobody can explain, re-write it, putting more emphasis on how your work is great value for money and will benefit mankind hugely. Also make it easier to follow as the reviewers will already be tired and glazing over by the time they start to read. As to the changes in the review process – I ask myself whether deep down I really believe the changes will affect me at all?

  9. I just went through the exercise of reading the summary statement tea leaves for my shiny new investigator A0 (35th percentile). The new format was as confusing as everyone made it out to be, with no correlation between the subscores and the overall impact. The bullet point critiques for at least one reviewer were uninterpretable. Several conversations with colleagues on study sections offered up a range of experiences with the new system. Most reviewers attempted to assign individual subscores more or less in line with the overall impact of the proposal; so if the overall impact were in the 3s, then the subscores were in the 1-4 range with more 3/4s.

    At least one of my reviewers did attempt to communicate critiques in a transparent manner despite the charge to be brief. The most informative bit was a long statement to the applicant appended to the bullet point critiques. The other reviewers were much more vague though. What helped was the conversation with the PO who went over the discussion notes.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s