Web Page Segmentation for Non Visual Skimming - GREYC hultech Access content directly
Conference Papers Year : 2019

Web Page Segmentation for Non Visual Skimming

Judith Jeyafreeda Andrew
  • Function : Author
Fabrice Maurel
Gaël Dias

Abstract

Web page segmentation aims to break a page into smaller blocks, in which contents with coherent semantics are kept together. Examples of tasks targeted by such a technique are advertisement detection or main content extraction. In this paper, we study different seg-mentation strategies for the task of non visual skimming. For that purpose, we consider web page segmentation as a clustering problem of visual elements, where (1) all visual elements must be clustered, (2) a fixed number of clusters must be discovered, and (3) the elements of a cluster should be visually connected. Therefore, we study three different algorithms that comply to these constraints: K-means, F-K-means, and Guided Expansion. Evaluation shows that Guided Expansion evidences statistically-relevant results in terms of compactness and separateness, and satisfies more logical constraints when compared to the other strategies.
Fichier principal
Vignette du fichier
PACLIC_33_2019-Web_Page_Segmentation_for_Non_Visual_Skimming.pdf (670.83 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02309625 , version 1 (09-10-2019)

Identifiers

  • HAL Id : hal-02309625 , version 1

Cite

Judith Jeyafreeda Andrew, Stéphane Ferrari, Fabrice Maurel, Gaël Dias, Emmanuel Giguet. Web Page Segmentation for Non Visual Skimming. The 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33), Sep 2019, Hakodate, Japan. ⟨hal-02309625⟩
214 View
279 Download

Share

Gmail Facebook X LinkedIn More