TY - JOUR
T1 - Information extraction for enhanced access to disease outbreak reports
AU - Grishman, Ralph
AU - Huttunen, Silja
AU - Yangarber, Roman
N1 - Funding Information:
This research was supported by the Defense Advanced Research Projects Agency as part of the Translingual Information Detection, Extraction and Summarization (TIDES) program, under Grant N66001-001-1-8917 from the Space and Naval Warfare Systems Center San Diego, and by the National Science Foundation under Grant IIS-0081962. This paper does not necessarily reflect the position or the policy of the US Government.
PY - 2002/8
Y1 - 2002/8
N2 - Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.
AB - Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.
UR - http://www.scopus.com/inward/record.url?scp=0036706221&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036706221&partnerID=8YFLogxK
U2 - 10.1016/S1532-0464(03)00013-3
DO - 10.1016/S1532-0464(03)00013-3
M3 - Article
C2 - 12755518
AN - SCOPUS:0036706221
SN - 1532-0464
VL - 35
SP - 236
EP - 246
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
IS - 4
ER -