Effective semi-supervised learning strategies for automatic sentence segmentation

Dalva, Dogan; Guz, Umit; Gurkan, Hakan

doi:10.1016/j.patrec.2017.10.010

1 Ocak 2018 Dergi makalesi Açık Erişim

Effective semi-supervised learning strategies for automatic sentence segmentation

Dalva, Dogan; Guz, Umit; Gurkan, Hakan

JSON-LD (schema.org)

{
  "@context": "https://schema.org/", 
  "@id": 29899, 
  "@type": "ScholarlyArticle", 
  "creator": [
    {
      "@type": "Person", 
      "affiliation": "FMV ISIK Univ, Fac Engn, Dept Elect & Elect Engn, Istanbul, Turkey", 
      "name": "Dalva, Dogan"
    }, 
    {
      "@type": "Person", 
      "affiliation": "FMV ISIK Univ, Fac Engn, Dept Elect & Elect Engn, Istanbul, Turkey", 
      "name": "Guz, Umit"
    }, 
    {
      "@type": "Person", 
      "name": "Gurkan, Hakan"
    }
  ], 
  "datePublished": "2018-01-01", 
  "description": "The primary objective of sentence segmentation process is to determine the sentence boundaries of a stream of words output by the automatic speech recognizers. Statistical methods developed for sentence segmentation requires a significant amount of labeled data which is time-consuming, labor intensive and expensive. In this work, we propose new multi-view semi-supervised learning strategies for sentence boundary classification problem using lexical, prosodic, and morphological information. The aim is to find effective semi-supervised machine learning strategies when only small sets of sentence boundary labeled data are available. We primarily investigate two semi-supervised learning approaches, called self-training and co-training. Different example selection strategies were also used for co-training, namely, agreement, disagreement and self-combined. Furthermore, we propose three-view and committee-based algorithms incorporating with agreement, disagreement and self-combined strategies using three disjoint feature sets. We present comparative results of different learning strategies on the sentence segmentation task. The experimental results show that the sentence segmentation performance can be highly improved using multi-view learning strategies that we proposed since data sets can be represented by three redundantly sufficient and disjoint feature sets. We show that the proposed strategies substantially improve the average baseline F-measure of 67.66% to 75.15% and 64.84% to 66.32% when only a small set of manually labeled data is available for Turkish and English spoken languages, respectively. (c) 2017 Elsevier B.V. All rights reserved.", 
  "headline": "Effective semi-supervised learning strategies for automatic sentence segmentation", 
  "identifier": 29899, 
  "image": "https://aperta.ulakbim.gov.tr/static/img/logo/aperta_logo_with_icon.svg", 
  "license": "http://www.opendefinition.org/licenses/cc-by", 
  "name": "Effective semi-supervised learning strategies for automatic sentence segmentation", 
  "url": "https://aperta.ulakbim.gov.tr/record/29899"
}

görüntülenme

indirilme

Daha fazla ayrıntı...

Görüntülenme	28
İndirme	4
Veri hacmi	676 Bytes
Tekil görüntülenme	27
Tekil indirme	4

Kayıt Bilgileri

Yayınlanma tarihi:: 01/01/2018
Yayınlandığı yer:: PATTERN RECOGNITION LETTERS: 105 pp. 76-86.
Lisans:: Creative Commons Attribution

Effective semi-supervised learning strategies for automatic sentence segmentation

Effective semi-supervised learning strategies for automatic sentence segmentation

JSON-LD (schema.org)

Kayıt Bilgileri

Alıntı yap

Paylaş

Dışa aktar

TÜBİTAK ULAKBİM

İLETİŞİM