Jane: suggesting journals, finding experts (2024)

Article Navigation

Volume 24 Issue 5 March 2008

Article Contents

  • Abstract

  • 1 INTRODUCTION

  • 2 USING JANE

  • 3 IMPLEMENTATION

  • 4 COMPARISON WITH OTHER TOOLS

  • 5 DISCUSSION

  • ACKNOWLEDGEMENTS

  • REFERENCES

  • < Previous
  • Next >

Journal Article

,

Martijn J. Schuemie *

Department of Medical Informatics, Erasmus University Medical Center Rotterdam, 3000 CA, Rotterdam, The Netherlands

*To whom correspondence should be addressed.

Search for other works by this author on:

Oxford Academic

Jan A. Kors

Department of Medical Informatics, Erasmus University Medical Center Rotterdam, 3000 CA, Rotterdam, The Netherlands

Search for other works by this author on:

Oxford Academic

Associate Editor: Jonathan Wren

Author Notes

Bioinformatics, Volume 24, Issue 5, March 2008, Pages 727–728, https://doi.org/10.1093/bioinformatics/btn006

Published:

28 January 2008

Article history

Received:

31 October 2007

Revision received:

15 December 2007

Accepted:

02 January 2008

Published:

28 January 2008

Search

Close

Search

Advanced Search

Search Menu

Abstract

Summary: With an exponentially growing number of articles being published every year, scientists can use some help in determining which journal is most appropriate for publishing their results, and which other scientists can be called upon to review their work.

Jane (Journal/Author Name Estimator) is a freely available web-based application that, on the basis of a sample text (e.g. the title and abstract of a manuscript), can suggest journals and experts who have published similar articles.

Availability: http://biosemantics.org/jane

Contact: m.schuemie@erasmusmc.nl

1 INTRODUCTION

PubMed (Wheeler et al., 2007) is growing exponentially. In 1996, 520 148 articles were published versus 793 919 in 2006. Interestingly, the number of different journals in which these articles were published did not show a similar growth: 5006 in 1996 versus 5100 in 2006. There is a steady turnover: according to the PubMed Journals database, 1707 journals were started between 1996 and 2006. The number of authors publishing one or more papers every year does increase rapidly: 543 974 in 1996 versus 867 919 in 2006.

For all these authors, finding the appropriate journal to publish their work becomes increasingly difficult: many journals deal with a wide diversity of topics, and many articles are multi-disciplinary, leading for instance to computer scientists publishing in biomedical journals. At the same time, finding reviewers among the growing number of peers also becomes more of a problem. We developed Jane (Journal/Author Name Estimator) to help with both tasks.

2 USING JANE

2.1 Finding journals and authors

The user starts by entering a piece of text as query (Fig. 1). Typically, this will be the title and abstract of the article for which the user wants to find a suitable journal or reviewer. The application will return an ordered list of results, with a confidence score for each item. Furthermore, it is possible to show the articles on which the score of a specific journal or author was based, as well as other similar articles. This can help a user to evaluate whether the journal is really the suitable medium for publishing his or her findings, or whether the selected author is really knowledgeable about the topic of the article used as input.

Jane: suggesting journals, finding experts (3)

Fig. 1.

Screenshots of Jane. From right to left: (1) Starting screen: you can enter the text of your title and abstract, select additional options, and choose whether you want to find journals or authors; (2) Results screen: the application returns an ordered list of journals or authors. For each item, a confidence score is given, and an option to show the articles on which the score is based; (3) Results screen showing the articles for a journal: The user can choose to view these and other similar articles in PubMed.

Open in new tabDownload slide

2.2 Extra features

Users can refine their search by selecting specific languages and types of publications. The search algorithm will then compare the input text only to those articles that meet these specifications. For instance, by selecting ‘Japanese’ and the publication type ‘review’, the system will return those journals containing the most similar Japanese review articles.

Some authors may be hesitant to send an abstract of their latest research to an unknown server. Therefore, we have included an option to scramble the input before submission. Scrambling simply entails putting the words in the text in alphabetic order, which makes it next to impossible to reconstruct the original text, but has no effect on the search.

3 IMPLEMENTATION

The open source search engine Lucene (Gospodnetic and Hatcher, 2005) is used to find articles that are similar to the input query. Texts are tokenized using the standard Lucene tokenizer, and are subsequently compared using the Lucene MoreLikeThis algorithm, a very efficient implementation of the traditional TF*IDF vector space model.

After retrieving the ordered list of most similar records, a weighted k-nearest neighbor approach is used to determine the journal or author list. For each item (i.e. a journal or author), we add the Lucene similarity scores for the articles belonging to this item in the k top-ranking records. To produce confidence scores, these sums are then normalized so that the scores add up to 100%. Results are ordered by confidence score. A leave-one-out evaluation showed that the best performance was achieved using k = 50.

We indexed all 4 171 368 articles from 4513 journals in Medline that

  • contained an abstract,

  • were published in the last 10 years,

  • did not belong to one of these categories: comment, editorial, news, historical article, congresses, biography, newspaper article, practice guideline, interview, bibliography, legal cases, lectures, consensus development conference, addresses, clinical conference, patient education handout, directory, technical report, festschrift, retraction of publication, retracted publication, duplicate publication, scientific integrity review, published erratum, periodical index, dictionary, legislation or government publication and

  • belonged to a journal with at least 25 publications in the last 10 years, and at least one publication in the last 12 months.

4 COMPARISON WITH OTHER TOOLS

PubMed itself offers the possibility to search for ‘similar articles’, but only existing Medline records can be used as queries. There are many other systems that offer some means of finding authors and/or journals, but they all use a boolean keyword-based query, for instance GoPubMed (Doms and Schroeder, 2005), and HubMed (Eaton, 2006).

One system, called eTBLAST (Errami et al., 2007), does accept full abstracts to search for journals and authors. It retrieves the 400 most similar articles using a vector-space approach, and for these articles a text-alignment score is calculated and aggregated per journal or author. We compared the performance of Jane to eTBLAST using a random set of 1000 citations entered into PubMed in the 3 days before the test, and were consequently not in the training sets of Jane and eTBLAST at that time. For each citation, we tested how well the systems predicted the authors of the paper, and the journal in which the paper was published.

Figure 2 shows that Jane outperforms eTBLAST (P < 0.001 and P = 0.010 for journals and authors, respectively, using a sign test to compare ranks). Furthermore, even though eTBLAST runs on a 20 CPU Linux cluster and Jane was tested on a dual CPU system, eTBLAST searches were much slower than Jane searches: the average search times were 114.0 and 0.6 seconds, respectively. Because eTBLAST currently has more users than Jane, we simulated an extra average load of 100 000 queries per day on our server whilst determining our search time.

Jane: suggesting journals, finding experts (4)

Fig. 2.

Cumulative histogram of the rank of the correct journal and the highest ranking correct author in the result lists of eTBLAST and Jane for a test set of 1000 abstracts (e.g. for Jane, the correct journal appeared at the top of the list for 23% of the abstracts, it appeared in the top 2 for 36% of the abstracts, etc.).

Open in new tabDownload slide

5 DISCUSSION

Jane is a simple, fast and accurate tool for finding journals and authors, as compared to other such tools.

We tested how well Jane predicts the journal in which a paper was published, assuming that this journal was the most appropriate one. Obviously, this may not always be the case since many journals overlap considerably and journal choice may be influenced by many factors. In a qualitative analysis of a small sample of the abstracts for which the correct journal did not appear in the top 10, we believe that the abstracts would also have been appropriate for many of the top-ranking journals returned by Jane. The same holds true for authors: although we can assume that an author is knowledgeable about the paper (s)he wrote, other, more experienced authors might qualify as better experts.

Jane is freely available. The underlying database of indexed abstracts will regularly be updated.

ACKNOWLEDGEMENTS

This study was supported by the Biorange project sp 4.1.1. of the Netherlands Bioinformatics Centre.

Conflict of Interest: none declared.

REFERENCES

Doms

A

,

Schroeder

M

.

GoPubMed: exploring PubMed with the gene ontology

,

Nucleic Acids Res

,

2005

,vol.

33

(pg.

W783

-

W786

)

Eaton

AD

.

HubMed: a web-based biomedical literature search interface

,

Nucleic Acids Res

,

2006

,vol.

34

(pg.

W745

-

W747

)

Errami

M

,et al.

eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications

,

Nucleic Acids Res

,

2007

,vol.

35

(pg.

W12

-

W15

)

Gospodnetic

O

,

Hatcher

E

.,

Lucene in Action.

,

2005

Greenwich

Manning Publications

Google Scholar

OpenURL Placeholder Text

Wheeler

DL

,et al.

Database resources of the National Center for Biotechnology Information

,

Nucleic Acids Res

,

2007

,vol.

35

(pg.

D5

-

D12

)

Author notes

Associate Editor: Jonathan Wren

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Download all slides

Advertisem*nt

Citations

Views

12,192

Altmetric

More metrics information

Metrics

Total Views 12,192

10,962 Pageviews

1,230 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 2
December 2016 4
January 2017 30
February 2017 49
March 2017 56
April 2017 47
May 2017 51
June 2017 32
July 2017 36
August 2017 49
September 2017 13
October 2017 26
November 2017 47
December 2017 120
January 2018 138
February 2018 170
March 2018 199
April 2018 182
May 2018 246
June 2018 234
July 2018 229
August 2018 241
September 2018 218
October 2018 182
November 2018 253
December 2018 177
January 2019 132
February 2019 209
March 2019 234
April 2019 185
May 2019 201
June 2019 181
July 2019 200
August 2019 173
September 2019 150
October 2019 200
November 2019 161
December 2019 137
January 2020 125
February 2020 118
March 2020 107
April 2020 108
May 2020 76
June 2020 109
July 2020 122
August 2020 104
September 2020 112
October 2020 185
November 2020 116
December 2020 148
January 2021 122
February 2021 100
March 2021 161
April 2021 141
May 2021 173
June 2021 129
July 2021 105
August 2021 86
September 2021 134
October 2021 149
November 2021 125
December 2021 121
January 2022 173
February 2022 130
March 2022 177
April 2022 180
May 2022 160
June 2022 145
July 2022 112
August 2022 111
September 2022 104
October 2022 133
November 2022 95
December 2022 109
January 2023 109
February 2023 127
March 2023 112
April 2023 113
May 2023 124
June 2023 115
July 2023 125
August 2023 125
September 2023 112
October 2023 126
November 2023 166
December 2023 143
January 2024 181
February 2024 124
March 2024 175
April 2024 155
May 2024 132
June 2024 129

Citations

Powered by Dimensions

51 Web of Science

Altmetrics

×

Email alerts

Article activity alert

Advance article alerts

New issue alert

In progress issue alert

Receive exclusive offers and updates from Oxford Academic

Citing articles via

Google Scholar

  • Latest

  • Most Read

  • Most Cited

Assessing Citation Integrity in Biomedical Publications: Corpus Annotation and NLP Models
PredGCN: A Pruning-enabled Gene-Cell Net for Automatic Cell Annotation of Single Cell Transcriptome Data
Detecting gene-environment interactions from multiple continuous traits
Document-level biomedical relation extraction via hierarchical tree graph and relation segmentation module
Protein interaction explorer (PIE): a comprehensive platform for navigating Protein-Protein interactions and ligand binding pockets

More from Oxford Academic

Bioinformatics and Computational Biology

Biological Sciences

Science and Mathematics

Books

Journals

Advertisem*nt

Jane: suggesting journals, finding experts (2024)

References

Top Articles
Latest Posts
Article information

Author: Moshe Kshlerin

Last Updated:

Views: 6298

Rating: 4.7 / 5 (57 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Moshe Kshlerin

Birthday: 1994-01-25

Address: Suite 609 315 Lupita Unions, Ronnieburgh, MI 62697

Phone: +2424755286529

Job: District Education Designer

Hobby: Yoga, Gunsmithing, Singing, 3D printing, Nordic skating, Soapmaking, Juggling

Introduction: My name is Moshe Kshlerin, I am a gleaming, attractive, outstanding, pleasant, delightful, outstanding, famous person who loves writing and wants to share my knowledge and understanding with you.