Article Navigation
Article Contents
-
Abstract
-
1 INTRODUCTION
-
2 USING JANE
-
3 IMPLEMENTATION
-
4 COMPARISON WITH OTHER TOOLS
-
5 DISCUSSION
-
ACKNOWLEDGEMENTS
-
REFERENCES
- < Previous
- Next >
Journal Article
, Martijn J. Schuemie * Department of Medical Informatics, Erasmus University Medical Center Rotterdam, 3000 CA, Rotterdam, The Netherlands *To whom correspondence should be addressed. Search for other works by this author on: Oxford Academic Jan A. Kors Department of Medical Informatics, Erasmus University Medical Center Rotterdam, 3000 CA, Rotterdam, The Netherlands Search for other works by this author on: Oxford Academic
Associate Editor: Jonathan Wren
Author Notes
Bioinformatics, Volume 24, Issue 5, March 2008, Pages 727–728, https://doi.org/10.1093/bioinformatics/btn006
Published:
28 January 2008
Article history
Received:
31 October 2007
Revision received:
15 December 2007
Accepted:
02 January 2008
Published:
28 January 2008
- Split View
- Views
- Article contents
- Figures & tables
- Video
- Audio
- Supplementary Data
-
Cite
Cite
Martijn J. Schuemie, Jan A. Kors, Jane: suggesting journals, finding experts, Bioinformatics, Volume 24, Issue 5, March 2008, Pages 727–728, https://doi.org/10.1093/bioinformatics/btn006
Close
Search
Close
Search
Advanced Search
Search Menu
Abstract
Summary: With an exponentially growing number of articles being published every year, scientists can use some help in determining which journal is most appropriate for publishing their results, and which other scientists can be called upon to review their work.
Jane (Journal/Author Name Estimator) is a freely available web-based application that, on the basis of a sample text (e.g. the title and abstract of a manuscript), can suggest journals and experts who have published similar articles.
Availability: http://biosemantics.org/jane
Contact: m.schuemie@erasmusmc.nl
1 INTRODUCTION
PubMed (Wheeler et al., 2007) is growing exponentially. In 1996, 520 148 articles were published versus 793 919 in 2006. Interestingly, the number of different journals in which these articles were published did not show a similar growth: 5006 in 1996 versus 5100 in 2006. There is a steady turnover: according to the PubMed Journals database, 1707 journals were started between 1996 and 2006. The number of authors publishing one or more papers every year does increase rapidly: 543 974 in 1996 versus 867 919 in 2006.
For all these authors, finding the appropriate journal to publish their work becomes increasingly difficult: many journals deal with a wide diversity of topics, and many articles are multi-disciplinary, leading for instance to computer scientists publishing in biomedical journals. At the same time, finding reviewers among the growing number of peers also becomes more of a problem. We developed Jane (Journal/Author Name Estimator) to help with both tasks.
2 USING JANE
2.1 Finding journals and authors
The user starts by entering a piece of text as query (Fig. 1). Typically, this will be the title and abstract of the article for which the user wants to find a suitable journal or reviewer. The application will return an ordered list of results, with a confidence score for each item. Furthermore, it is possible to show the articles on which the score of a specific journal or author was based, as well as other similar articles. This can help a user to evaluate whether the journal is really the suitable medium for publishing his or her findings, or whether the selected author is really knowledgeable about the topic of the article used as input.
Fig. 1.
Screenshots of Jane. From right to left: (1) Starting screen: you can enter the text of your title and abstract, select additional options, and choose whether you want to find journals or authors; (2) Results screen: the application returns an ordered list of journals or authors. For each item, a confidence score is given, and an option to show the articles on which the score is based; (3) Results screen showing the articles for a journal: The user can choose to view these and other similar articles in PubMed.
Open in new tabDownload slide
2.2 Extra features
Users can refine their search by selecting specific languages and types of publications. The search algorithm will then compare the input text only to those articles that meet these specifications. For instance, by selecting ‘Japanese’ and the publication type ‘review’, the system will return those journals containing the most similar Japanese review articles.
Some authors may be hesitant to send an abstract of their latest research to an unknown server. Therefore, we have included an option to scramble the input before submission. Scrambling simply entails putting the words in the text in alphabetic order, which makes it next to impossible to reconstruct the original text, but has no effect on the search.
3 IMPLEMENTATION
The open source search engine Lucene (Gospodnetic and Hatcher, 2005) is used to find articles that are similar to the input query. Texts are tokenized using the standard Lucene tokenizer, and are subsequently compared using the Lucene MoreLikeThis algorithm, a very efficient implementation of the traditional TF*IDF vector space model.
After retrieving the ordered list of most similar records, a weighted k-nearest neighbor approach is used to determine the journal or author list. For each item (i.e. a journal or author), we add the Lucene similarity scores for the articles belonging to this item in the k top-ranking records. To produce confidence scores, these sums are then normalized so that the scores add up to 100%. Results are ordered by confidence score. A leave-one-out evaluation showed that the best performance was achieved using k = 50.
We indexed all 4 171 368 articles from 4513 journals in Medline that
contained an abstract,
were published in the last 10 years,
did not belong to one of these categories: comment, editorial, news, historical article, congresses, biography, newspaper article, practice guideline, interview, bibliography, legal cases, lectures, consensus development conference, addresses, clinical conference, patient education handout, directory, technical report, festschrift, retraction of publication, retracted publication, duplicate publication, scientific integrity review, published erratum, periodical index, dictionary, legislation or government publication and
belonged to a journal with at least 25 publications in the last 10 years, and at least one publication in the last 12 months.
4 COMPARISON WITH OTHER TOOLS
PubMed itself offers the possibility to search for ‘similar articles’, but only existing Medline records can be used as queries. There are many other systems that offer some means of finding authors and/or journals, but they all use a boolean keyword-based query, for instance GoPubMed (Doms and Schroeder, 2005), and HubMed (Eaton, 2006).
One system, called eTBLAST (Errami et al., 2007), does accept full abstracts to search for journals and authors. It retrieves the 400 most similar articles using a vector-space approach, and for these articles a text-alignment score is calculated and aggregated per journal or author. We compared the performance of Jane to eTBLAST using a random set of 1000 citations entered into PubMed in the 3 days before the test, and were consequently not in the training sets of Jane and eTBLAST at that time. For each citation, we tested how well the systems predicted the authors of the paper, and the journal in which the paper was published.
Figure 2 shows that Jane outperforms eTBLAST (P < 0.001 and P = 0.010 for journals and authors, respectively, using a sign test to compare ranks). Furthermore, even though eTBLAST runs on a 20 CPU Linux cluster and Jane was tested on a dual CPU system, eTBLAST searches were much slower than Jane searches: the average search times were 114.0 and 0.6 seconds, respectively. Because eTBLAST currently has more users than Jane, we simulated an extra average load of 100 000 queries per day on our server whilst determining our search time.
Fig. 2.
Cumulative histogram of the rank of the correct journal and the highest ranking correct author in the result lists of eTBLAST and Jane for a test set of 1000 abstracts (e.g. for Jane, the correct journal appeared at the top of the list for 23% of the abstracts, it appeared in the top 2 for 36% of the abstracts, etc.).
Open in new tabDownload slide
5 DISCUSSION
Jane is a simple, fast and accurate tool for finding journals and authors, as compared to other such tools.
We tested how well Jane predicts the journal in which a paper was published, assuming that this journal was the most appropriate one. Obviously, this may not always be the case since many journals overlap considerably and journal choice may be influenced by many factors. In a qualitative analysis of a small sample of the abstracts for which the correct journal did not appear in the top 10, we believe that the abstracts would also have been appropriate for many of the top-ranking journals returned by Jane. The same holds true for authors: although we can assume that an author is knowledgeable about the paper (s)he wrote, other, more experienced authors might qualify as better experts.
Jane is freely available. The underlying database of indexed abstracts will regularly be updated.
ACKNOWLEDGEMENTS
This study was supported by the Biorange project sp 4.1.1. of the Netherlands Bioinformatics Centre.
Conflict of Interest: none declared.
REFERENCES
Doms A Schroeder M
GoPubMed: exploring PubMed with the gene ontology
,
Nucleic Acids Res
,
2005
,vol.
33
(pg.
W783
-
W786
)
Eaton AD
HubMed: a web-based biomedical literature search interface
,
Nucleic Acids Res
,
2006
,vol.
34
(pg.
W745
-
W747
)
Errami M
eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications
,
Nucleic Acids Res
,
2007
,vol.
35
(pg.
W12
-
W15
)
Gospodnetic O Hatcher E
Lucene in Action.
,
2005
Greenwich
Manning Publications
OpenURL Placeholder Text
Wheeler DL
Database resources of the National Center for Biotechnology Information
,
Nucleic Acids Res
,
2007
,vol.
35
(pg.
D5
-
D12
)
Author notes
Associate Editor: Jonathan Wren
© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
Download all slides
Advertisem*nt
Citations
Views
12,192
Altmetric
More metrics information
Metrics
Total Views 12,192
10,962 Pageviews
1,230 PDF Downloads
Since 11/1/2016
Month: | Total Views: |
---|---|
November 2016 | 2 |
December 2016 | 4 |
January 2017 | 30 |
February 2017 | 49 |
March 2017 | 56 |
April 2017 | 47 |
May 2017 | 51 |
June 2017 | 32 |
July 2017 | 36 |
August 2017 | 49 |
September 2017 | 13 |
October 2017 | 26 |
November 2017 | 47 |
December 2017 | 120 |
January 2018 | 138 |
February 2018 | 170 |
March 2018 | 199 |
April 2018 | 182 |
May 2018 | 246 |
June 2018 | 234 |
July 2018 | 229 |
August 2018 | 241 |
September 2018 | 218 |
October 2018 | 182 |
November 2018 | 253 |
December 2018 | 177 |
January 2019 | 132 |
February 2019 | 209 |
March 2019 | 234 |
April 2019 | 185 |
May 2019 | 201 |
June 2019 | 181 |
July 2019 | 200 |
August 2019 | 173 |
September 2019 | 150 |
October 2019 | 200 |
November 2019 | 161 |
December 2019 | 137 |
January 2020 | 125 |
February 2020 | 118 |
March 2020 | 107 |
April 2020 | 108 |
May 2020 | 76 |
June 2020 | 109 |
July 2020 | 122 |
August 2020 | 104 |
September 2020 | 112 |
October 2020 | 185 |
November 2020 | 116 |
December 2020 | 148 |
January 2021 | 122 |
February 2021 | 100 |
March 2021 | 161 |
April 2021 | 141 |
May 2021 | 173 |
June 2021 | 129 |
July 2021 | 105 |
August 2021 | 86 |
September 2021 | 134 |
October 2021 | 149 |
November 2021 | 125 |
December 2021 | 121 |
January 2022 | 173 |
February 2022 | 130 |
March 2022 | 177 |
April 2022 | 180 |
May 2022 | 160 |
June 2022 | 145 |
July 2022 | 112 |
August 2022 | 111 |
September 2022 | 104 |
October 2022 | 133 |
November 2022 | 95 |
December 2022 | 109 |
January 2023 | 109 |
February 2023 | 127 |
March 2023 | 112 |
April 2023 | 113 |
May 2023 | 124 |
June 2023 | 115 |
July 2023 | 125 |
August 2023 | 125 |
September 2023 | 112 |
October 2023 | 126 |
November 2023 | 166 |
December 2023 | 143 |
January 2024 | 181 |
February 2024 | 124 |
March 2024 | 175 |
April 2024 | 155 |
May 2024 | 132 |
June 2024 | 129 |
Email alerts
Article activity alert
Advance article alerts
New issue alert
In progress issue alert
Receive exclusive offers and updates from Oxford Academic
Citing articles via
Google Scholar
-
Latest
-
Most Read
-
Most Cited
More from Oxford Academic
Bioinformatics and Computational Biology
Biological Sciences
Science and Mathematics
Books
Journals
Lincoln, Nebraska
Pittsburg, Pennsylvania
Burlington, Vermont
Long Island, New York
Advertisem*nt