Extraction of (key,value) pairs from unstructured ads

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we focus on the problem of extracii ng structured labeled data from short unstructured ad- postings from online sources like Craigslist. where ads are posted on various topics, such as job posti ngs, rentals, car sales etc. A fundamental challenge in addressing this problem is that most ad-postings are highly unstructured, short-text postings written in an informal manner with no inherent grammar or well- defined dictionary. In this paper, we propose unsuperv ised and supervised algorithms for extracting struct ured data from unstructured ads in the form of (key, value) pairs where the keys naturally represent topic- specific features in the ads. The unsupervised algorithm is centered around building an affinity graph, using the words from a topic-specific corpus of such ads where the edge weights represent affinities between words: the (key, value) extraction algorithm identifies specific groups of words in the affinity graph corresponding to different classes of key attributes. The supervised alg orithm uses a Conditional Random Field based traini ng algorithm to identify specific structured (key, value) pairs based on pre-delined topic-specific structural data representations of ads. Based on a corpus of car and apartment ad-postings from Craigslis, the unsupervised algorithm reported an accuracy of 67.74% and 68.74% for car and apartment ads respectively. The supervised algorithm demonstrated an improved performance with accuracies of 74.07% and 72.59% respectively.

Original languageEnglish (US)
Title of host publicationNatural Language Access to Big Data - Papers from the AAAI Fall Symposium, Technical Report
PublisherAI Access Foundation
Pages10-17
Number of pages8
ISBN (Electronic)9781577356967
StatePublished - 2014
Event2014 AAAI Fall Symposium - Arlington, United States
Duration: Nov 13 2014Nov 15 2014

Publication series

NameAAAI Fall Symposium - Technical Report
VolumeFS-14-06

Other

Other2014 AAAI Fall Symposium
CountryUnited States
CityArlington
Period11/13/1411/15/14

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Extraction of (key,value) pairs from unstructured ads'. Together they form a unique fingerprint.

Cite this