Dergi makalesi Açık Erişim

Architectural Trade-Off Analysis for Accelerating LSTM Network Using Radix-<i>r</i> OBC Scheme

Khan, Mohd Tasleem; Yantir, Hasan Erdem; Salama, Khaled Nabil; Eltawil, Ahmed M.


JSON-LD (schema.org)

{
  "@context": "https://schema.org/", 
  "@id": 265184, 
  "@type": "ScholarlyArticle", 
  "creator": [
    {
      "@type": "Person", 
      "affiliation": "Indian Inst Technol, Indian Sch Mines, Dept Elect Engn, Dhanbad 826004, Jharkhand, India", 
      "name": "Khan, Mohd Tasleem"
    }, 
    {
      "@type": "Person", 
      "affiliation": "TUBITAK Informat & Informat Secur Res Ctr, TR-41470 Kocaeli, Turkiye", 
      "name": "Yantir, Hasan Erdem"
    }, 
    {
      "@type": "Person", 
      "affiliation": "King Abdullah Univ Sci & Technol, Dept Comp Elect & Math Sci & Engn CEMSE Div, Elect & Comp Engn Program, Thuwa 23955, Saudi Arabia", 
      "name": "Salama, Khaled Nabil"
    }, 
    {
      "@type": "Person", 
      "affiliation": "King Abdullah Univ Sci & Technol, Dept Comp Elect & Math Sci & Engn CEMSE Div, Elect & Comp Engn Program, Thuwa 23955, Saudi Arabia", 
      "name": "Eltawil, Ahmed M."
    }
  ], 
  "datePublished": "2023-01-01", 
  "description": "<p>This paper presents architectural trade-off analysis for accelerating two (Type I, II) fixed-point long short-term memory (LSTM) network based on circulant matrix-vector multiplications (MVMs) using radix -r offset binary coding (OBC) scheme. Type I MVM architecture rotates the weights with the proposed modulo-cum interleaver and uses partial product generators (PPGs) with a single generation unit across a column. It is hardware-optimized using a single adder tree through time multiplexing. Meanwhile, Type II MVM architecture rotates the inputs with the proposed store-cum interleaver and uses single PPGs with a single generation unit across a row. It is time optimized by unfolding shift-accumulate unit to a shift-add tree followed by pipelining. A new design for element-wise multiplication using radix -r PPG is also presented. Both the designs are extended to their block-circulant variants for certain accuracy requirements. Post-synthesis of Type I and II architectures for a different model, kernel, radix sizes and clock frequencies result in several efficient designs. Compared with the prior scheme, Type I architecture for 128x128 with r = 2 on 28 nm FDSOI technology at 800 MHz occupies 32.27% lesser area, consumes 67.89% lesser power at the same throughput, while Type II architecture at the expense of area and power provides 40x higher throughput.</p>", 
  "headline": "Architectural Trade-Off Analysis for Accelerating LSTM Network Using Radix-<i>r</i> OBC Scheme", 
  "identifier": 265184, 
  "image": "https://aperta.ulakbim.gov.tr/static/img/logo/aperta_logo_with_icon.svg", 
  "license": "http://www.opendefinition.org/licenses/cc-by", 
  "name": "Architectural Trade-Off Analysis for Accelerating LSTM Network Using Radix-<i>r</i> OBC Scheme", 
  "url": "https://aperta.ulakbim.gov.tr/record/265184"
}
35
4
görüntülenme
indirilme
Görüntülenme 35
İndirme 4
Veri hacmi 916 Bytes
Tekil görüntülenme 29
Tekil indirme 4

Alıntı yap