Fast Scloud+: A Fast Hardware Implementation for the Unstructured LWE-based KEM - Scloud+
However, its efficiencies are still much inferior to those of the structured LWE-based KEM, like ML-KEM (standardized by NIST). In this paper, we present a configurable hardware architecture for Scloud+.KEM to improve the computational efficiency. Many algorithmic and architectural co-optimizations are proposed to reduce the complexity and increase the degree of parallelism. Specially, the matrix multiplications are computed by a block in serial and the block is calculated in one cycle, without using any multipliers. In addition, the random bits all are generated by an unfolded Keccak core, well matched with the data flow required by the block matrix multiplier. The proposed design is coded in Verilog and implemented under the SMIC 40nm LP CMOS technology. The synthesized results show that Scloud+.KEM-128 only costs 23.0 $us$, 24.3 $us$, and 24.6 $us$ in the KeyGen, Encaps, and Decaps stages, respectively, with an area consumption of 0.69 $mm^2$, significantly narrowing the gap with the state-of-the-art of Kyber hardware implementation. Fast Scloud+: A Fast Hardware Implementation for the Unstructured LWE-based KEM - Scloud+

Jing Tian, Nanjing University
Yaodong Wei, Nanjing University
Dejun Xu, Nanjing University
Kai Wang, Nanjing University
Anyu Wang, Tsinghua University
Zhiyuan Qiu, Shandong Institute of Blockchain
Fu Yao, Huawei Technologies
Guang Zeng, Huawei Technologies

Abstract class="authorName">Guang Zeng</span><span class="affiliation">, Huawei Technologies</span></div> <h5 class="mt-3">Abstract</h5> <p style="white-space: pre-wrap;">Scloud+ is an unstructured LWE-based key encapsulation mechanism (KEM) with conservative quantum security, in which ternary secrets and lattice coding are incorporated for higher computational and communication efficiency. However, its efficiencies are still much inferior to those of the structured LWE-based KEM, like ML-KEM (standardized by NIST). In this paper, we present a configurable hardware architecture for Scloud+.KEM to improve the computational efficiency. Many algorithmic and architectural co-optimizations are proposed to reduce the complexity and increase the degree of parallelism. Specially, the matrix multiplications are computed by a block in serial and the block is calculated in one cycle, without using any multipliers. In addition, the random bits all are generated by an unfolded Keccak core, well matched with the data flow required by the block matrix multiplier. The proposed design is coded in Verilog and implemented under the SMIC 40nm LP CMOS technology. The synthesized results show that Scloud+.KEM-128 only costs 23.0 $us$, 24.3 $us$, and 24.6 $us$ in the KeyGen, Encaps, and Decaps stages, respectively, with an area consumption of 0.69 $mm^2$, significantly narrowing the gap with the state-of-the-art of Kyber hardware implementation.

Metadata

Available format(s)
PDF
Category
Implementation
Publication info
Preprint.
Keywords
post-quantum cryptography
learning with errors
lattice code
Hardware Implementation
ASIC
Contact author(s)
tianjing @ nju edu cn
yaodongwei @ smail nju edu cn
xudejun @ smail nju edu cn
wang_kai @ smail nju edu cn
anyuwang @ tsinghua edu cn
qiuzhiyuan @ sdibc cn
yaofu3 @ huawei com
zengguang13 @ huawei com
History
2025-03-17: approved
2025-03-16: received
Short URL

License
Creative Commons Attribution-NonCommercial
CC BY-NC

BibTeX

@misc{cryptoeprint:2025/497,
      author = {Jing Tian and Yaodong Wei and Dejun Xu and Kai Wang and Anyu Wang and Zhiyuan Qiu and Fu Yao and Guang Zeng},
      title = {Fast Scloud+: A Fast Hardware Implementation for the Unstructured {LWE}-based {KEM} - Scloud+},
      howpublished = {Cryptology {ePrint} Archive, Paper 2025/497},
      year = {2025},
      url = {}
}