Real-Time understanding of humanitarian crises via targeted information retrieval

K. T. Pham, P. Sattigeri, A. Dhurandhar, A. C. Jacob, M. Vukovic, P. Chataigner, J. Freire, A. Mojsilovic, K. R. Varshney

Research output: Contribution to journalArticlepeer-review


Humanitarian relief agencies must assess humanitarian crises occurring in the world to prioritize the aid that can be offered. While the rapidly growing availability of relevant information enables better decisions to be made, it also creates an important challenge: How to find, collect, and categorize this information in a timely manner. To address the problem, we propose a targeted retrieval system that automates these tasks. The system uses historical data collected and labeled by subject matter experts to train a classifier that identifies relevant content. Using this classifier, it deploys a focused crawler to locate and retrieve data at scale. The system also incorporates feedback from subject matter experts to adapt to new concepts and information sources. A novel component of the system is an algorithm for re-crawling that improves the crawler efficiency in retrieving recent data. Our preliminary result shows that the algorithm can increase the freshness of collected data while simultaneously decreasing crawling effort. Furthermore, we show that focused crawling outperforms general crawling in this domain. Our initial prototype has received positive feedback from analysts at the Assessment Capacities Project, a humanitarian response agency.

Original languageEnglish (US)
Article number8167726
Pages (from-to)71-712
Number of pages642
JournalIBM Journal of Research and Development
Issue number6
StatePublished - Nov 1 2017

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'Real-Time understanding of humanitarian crises via targeted information retrieval'. Together they form a unique fingerprint.

Cite this