Improving web spam classifiers using link structure

Qingqing Gan, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, in-cluding both content spam [16, 12] and link spam [22, 13].However, any time an anti-spam technique is developed, spam-mers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Ma-chine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifer to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassifed result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.

    Original languageEnglish (US)
    Title of host publicationAIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web
    Pages17-20
    Number of pages4
    DOIs
    StatePublished - 2007
    EventAIRWeb 2007 - 3rd International Workshop on Adversarial Information Retrieval on the Web - Banff, AB, Canada
    Duration: May 8 2007May 8 2007

    Publication series

    NameACM International Conference Proceeding Series
    Volume215

    Other

    OtherAIRWeb 2007 - 3rd International Workshop on Adversarial Information Retrieval on the Web
    Country/TerritoryCanada
    CityBanff, AB
    Period5/8/075/8/07

    Keywords

    • Classification
    • Link analy-sis
    • Machine learning
    • Search engines
    • Web mining
    • Web spam detection

    ASJC Scopus subject areas

    • Software
    • Human-Computer Interaction
    • Computer Vision and Pattern Recognition
    • Computer Networks and Communications

    Fingerprint

    Dive into the research topics of 'Improving web spam classifiers using link structure'. Together they form a unique fingerprint.

    Cite this