A non-sequential refinement approach to improve word embeddings using GPU-based string matching algorithms

Naderalvojoud, Behzad; Ozsoy, Adnan

doi:10.1007/s10586-021-03321-4

1 Ocak 2021 Dergi makalesi Açık Erişim

A non-sequential refinement approach to improve word embeddings using GPU-based string matching algorithms

Naderalvojoud, Behzad; Ozsoy, Adnan

DataCite XML

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="URL">https://aperta.ulakbim.gov.tr/record/237694</identifier>
  <creators>
    <creator>
      <creatorName>Naderalvojoud, Behzad</creatorName>
      <givenName>Behzad</givenName>
      <familyName>Naderalvojoud</familyName>
      <affiliation>Hacettepe Univ, Dept Comp Engn, Ankara, Turkey</affiliation>
    </creator>
    <creator>
      <creatorName>Ozsoy, Adnan</creatorName>
      <givenName>Adnan</givenName>
      <familyName>Ozsoy</familyName>
      <affiliation>Hacettepe Univ, Dept Comp Engn, Ankara, Turkey</affiliation>
    </creator>
  </creators>
  <titles>
    <title>A Non-Sequential Refinement Approach To Improve Word Embeddings Using Gpu-Based String Matching Algorithms</title>
  </titles>
  <publisher>Aperta</publisher>
  <publicationYear>2021</publicationYear>
  <dates>
    <date dateType="Issued">2021-01-01</date>
  </dates>
  <resourceType resourceTypeGeneral="Text">Journal article</resourceType>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://aperta.ulakbim.gov.tr/record/237694</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo">10.1007/s10586-021-03321-4</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="http://www.opendefinition.org/licenses/cc-by">Creative Commons Attribution</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">Unlike other word embedding models that learn word vectors for a collection of words sequentially, this paper proposes a non-sequential refinement approach to improve the vectors of particular words non-sequentially using a string matching algorithm to speed up the process. The key idea is to change the order of training in the embedding learning model and force it to learn the vector of a particular word completely before skipping to other target words. The learned vector of the given word and its context vectors are then used to train other target words. In this case, later words can be trained by the word vectors that are more accurate. In this study, the effect of training order in the Skip-gram model is investigated and a quantitative and qualitative comparison is made between the learned vectors in the word similarity task. To speed up the process, a GPU based string matching algorithm is used to find the occurrences of the given word in the training corpus. Incorporating the GPU-based string matching algorithm into the Skip-gram model to refine particular word vectors is, to our best knowledge, the first use case in the literature. Additionally, we provide in-depth analysis of GPU parallelization and identification of string matching algorithms that are suitable for integrating into word embedding models.</description>
  </descriptions>
</resource>

görüntülenme

indirilme

Daha fazla ayrıntı...

Görüntülenme	51
İndirme	9
Veri hacmi	2.1 kB
Tekil görüntülenme	48
Tekil indirme	9

Kayıt Bilgileri

Yayınlanma tarihi:: 01/01/2021
Bilim dalları:: Diğer
Yayınlandığı yer:: CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS: 24 pp. 3123-3134.
Lisans:: Creative Commons Attribution

A non-sequential refinement approach to improve word embeddings using GPU-based string matching algorithms

A non-sequential refinement approach to improve word embeddings using GPU-based string matching algorithms

DataCite XML

Kayıt Bilgileri

Alıntı yap

Paylaş

Dışa aktar

TÜBİTAK ULAKBİM

İLETİŞİM