Highlights from Int'l Workshop On Mining Scientific Publications (WOSP2016)

by Mike Lauruhn

Co-located with the Joint Conference on Digital Libraries (JCDL), the 5th International Workshop On Mining Scientific Publications took place on June 22-23 at the Newark, New Jersey campus of Rutgers University. An engaged crowd of about 25 listened to paper presentationas and participated in conversation and networking. Twelve papers and demos were presented in addition to keynotes and invited talks. A few themes surfaced in the course of papers presented. One being about how to surface tangible, measurable credit for contributions to research from paper outside of the explicit citations.  

In their paper, "Measuring Scientific Impact Beyond Citation Counts" Robert Patton and his colleagues from the Oak Ridge National Laboratory described the notion of context-aware citation analysis. Their paper cites two gaps that they see in citation counts and measuring impact:  "1. Not all cited works provide an equal contribution to a publication; 2. Not all required resources are provided appropriate credit or even credit at all."

The paper explains that "some cited works are provided merely for reference or background purposes for the reader while other cited works are so critical to the citing work that the citing work would probably not have even existed if not for the existence of the cited work."

On the second point regarding the required resources given appropriate credit: this was pointing out a need for researchers using shared resources (like computing facilities that are unique to an institution such as a National Lab) should be expressly giving credit to that institution as being a contributor to the research output. This is essential on a few fronts. First, it contributes to overall reproducibility initiatives. Secondly, it helps those who maintain such shared resources and services quantify their own scientific contributions and outputs. This later helps with budget decisions. Related to that theme, I presented a paper on behalf of Elsevier Labs and The Arabidopsis Information Resource (TAIR), on additional value that an author receives in terms of citation hits when they use a shared resource such as a Model Organism Database.

They conclude that future work should be focused on a context-aware citation analysis that presents the different values that citation contribute to a citing work and that these different values should factor into measuring impact. Secondly, measuring impact assessment with more detail from full content should be able to reveal the manner in which a research area "begins, grows, and fades" and that stages on that lifecycle could also be used to assess impact.

Also related to the lifecycle of a particular area of research, Shubhanshu Mishra presented a long paper, "Quantifying conceptual novelty in the biomedical literature." In it, the team used MeSH terms to measure and quantify the novelty of MEDLINE articles. Their research found interesting trends in the biomedical domain, where concepts tend to have four phases: a Burn-In, Accelerating Growth, Decelerating Growth, and Constant Growth.

In addition, they found that novelty of an article can also be measured through finding combined concepts. They also attempted to measure an author's novelty across their careers -- finding that novelty goes down over time. Proceedings from WOSP2016 will be in the October/November issue of D-Lib.