Dergi makalesi Açık Erişim

Multi-Stream Word-Based Compression Algorithm for Compressed Text Search

Ozturk, Emir; Mesut, Altan; Diri, Banu


MARC21 XML

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">http://www.opendefinition.org/licenses/cc-by</subfield>
    <subfield code="a">Creative Commons Attribution</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="o">oai:zenodo.org:35275</subfield>
    <subfield code="p">user-tubitak-destekli-proje-yayinlari</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">In this article, we present a novel word-based lossless compression algorithm for text files using a semi-static model. We named this method the Multi-stream word-based compression algorithm (MWCA)' because it stores the compressed forms of the words in three individual streams depending on their frequencies in the text and stores two dictionaries and a bit vector as side information. In our experiments, MWCA produces a compression ratio of 3.23 bpc on average and 2.88 bpc for files greater than 50 MB; if a variable length encoder such as Huffman coding is used after MWCA, the given ratios are reduced to 2.65 and 2.44 bpc, respectively. MWCA supports exact word matching without decompression, and its multi-stream approach reduces the search time with respect to single-stream algorithms. Additionally, the MWCA multi-stream structure supplies the reduction in network load by requesting only the necessary streams from the database. With the advantage of its fast compressed search feature and multi-stream structure, we believe that MWCA is a good solution, especially for storing and searching big text data.</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-tubitak-destekli-proje-yayinlari</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="4">
    <subfield code="n">12</subfield>
    <subfield code="v">43</subfield>
    <subfield code="c">8209-8221</subfield>
    <subfield code="p">ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2018-01-01</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.1007/s13369-018-3378-9</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <controlfield tag="001">35275</controlfield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">Ozturk, Emir</subfield>
    <subfield code="u">Trakya Univ, Comp Engn Dept, Edirne, Turkey</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="u">https://aperta.ulakbim.gov.trrecord/35275/files/bib-7bc2e1ec-62f0-4c69-8e62-9e1b5cddc815.txt</subfield>
    <subfield code="s">178</subfield>
    <subfield code="z">md5:a69715484f2e26202ce692a24cf12a25</subfield>
  </datafield>
  <controlfield tag="005">20210315192023.0</controlfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">article</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Mesut, Altan</subfield>
    <subfield code="u">Trakya Univ, Comp Engn Dept, Edirne, Turkey</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Diri, Banu</subfield>
    <subfield code="u">Yildiz Tech Univ, Comp Engn Dept, Istanbul, Turkey</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Multi-Stream Word-Based Compression Algorithm for Compressed Text Search</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
</record>
11
5
görüntülenme
indirilme
Görüntülenme 11
İndirme 5
Veri hacmi 890 Bytes
Tekil görüntülenme 11
Tekil indirme 5

Alıntı yap