BenchCouncil Transactions on Benchmarks, Standards and

Evaluations (TBench) is an open-access multi-disciplinary

journal dedicated to benchmarks, standards, evaluations,

optimizations, and data sets. This journal is a peer-reviewed,

subsidized open access journal where The International Open

Benchmark Council pays the OA fee. Authors do not have to

pay any open access publication fee. However, at least one of

the authors must register BenchCouncil International

Symposium on Benchmarking, Measuring and Optimizing

(Bench) (https://www.benchcouncil.org/bench/) and present

their work. It seeks a fast-track publication with an average

turnaround time of one month.

BenchCouncil Transactions on Benchmarks, Standards and

Evaluations, 2026

DOI: https://doi.org/10.66834/9a7y0c13

Research Article

RESEARCH ARTICLE

ABWS: The Arabic Boundary-aware Word

Segmentation Benchmark for Reproducible Evaluation

Huda AlShuhayeb

1,∗

and Behrouz Minaei-Bidgoli

1,∗

School of Computer Engineering, Iran University of Science and Technology (IUST), Tehran, Iran

∗

Corresponding author: hudaalshuhayeb@gmail.com; b_minae@iust.ac.ir

Received on 27 January 2026; Accepted on 7 April 2026

Abstract

With the rapid adoption of natural language processing (NLP) systems for morphologically rich languages, it has

become increasingly imperative to standardize a common set of measures and evaluation practices to ensure reproducibil-

ity and fair comparison. Arabic word segmentation serves as a foundational layer in the NLP software stack; however,

the ﬁeld remains fragmented due to inconsistent datasets and an overreliance on opaque, aggregate metrics that mask

systemic architectural biases.

We present ABWS (Arabic Boundary-aware Word Segmentation), a scalable and publicly available benchmarking

system designed for the rigorous, reproducible evaluation of diverse segmentation paradigms. To enable paradigm-agnostic

comparison across rule-based, statistical, and neural models, ABWS introduces a canonical boundary vector abstraction

that normalizes disparate system outputs into a uniﬁed evaluation interface. The benchmarking harness includes a manu-

ally veriﬁed gold-standard workload of 212,873 words across diverse genres and integrates seven widely used segmentation

systems as reproducible baselines.

Our systematic evaluation reveals that while neural subword-based models are robust for vocabulary compression,

they exhibit extreme Over-Segmentation Ratios (OSR > 0.58), leading to a signiﬁcant drop in word-level exact match ac-

curacy compared to rule-based engines. We further introduce Critical Boundary Accuracy (CBA), a linguistically weighted

metric that prioritizes high-impact morphological boundaries. Our cross-layer analysis demonstrates that CBA is highly

predictive of downstream performance in Machine Translation and Named Entity Recognition (ρ > 0.88), whereas tradi-

tional token-level F

scores often obscure these performance bottlenecks.

By providing a containerized evaluation pipeline and versioned system artifacts, ABWS establishes a new standard

for methodological rigor in Arabic NLP research, oﬀering a template for benchmarking other morphologically complex

languages within the broader computational ecosystem.

Key words: Arabic NLP, Morphological Segmentation, Benchmarking, Reproducibility, Boundary Errors, Error

Taxonomy, Benchmark Traceability, Evaluation Conditions

1. Introduction

With the rapid proliferation and deployment of natural lan-

guage processing (NLP) systems across global industries, it has

become increasingly imperative to standardize a common set

of measures and evaluation practices to ensure reproducibil-

ity and fair comparison. For morphologically rich languages

(MRLs) such as Arabic, word segmentation serves as a founda-

tional preprocessing layer in the NLP software stack. Despite

its critical role, the ﬁeld remains fragmented, lacking a uniﬁed

benchmarking infrastructure capable of systematically evalu-

ating the diverse array of rule-based, statistical, and neural

segmentation paradigms.

To illustrate the unique complexity of Arabic word segmen-

tation compared to languages like English, consider the single

Arabic word token ’fabi-iltiz¯ami-him’. In English, this is ex-

pressed as a multi-word phrase: ’and by their commitment’.

While English maintains clear whitespace boundaries between

the conjunction (’and’), preposition (’by’), noun (’commit-

ment’), and possessive pronoun (’their’), Arabic merges these

distinct functional morphemes into a single orthographic unit.

This ’clitic stacking’ creates a signiﬁcant challenge for NLP

systems, as a single segmentation error—such as failing to iso-

late the proclitic ’fa-’ (and) or the preposition ’bi-’ (by)—can

lead to a complete misinterpretation of the word’s syntactic

role. Unlike English, where tokenization is largely a trivial

whitespace-splitting task, Arabic segmentation requires a so-

phisticated boundary-aware analysis to recover these latent

grammatical structures, making it a critical pre-processing

bottleneck.

AlShuhayeb et al.

Arabic, spoken by over 400 million people, presents unique

challenges for system evaluation due to its complex morphology,

where a single space-delimited string can represent multiple

concatenated morphemes (roots, patterns, and aﬃxes) [1].

The performance of a segmentation system directly dictates

the eﬃciency and accuracy of downstream tasks, including

machine translation [2] and information retrieval [3]. How-

ever, the absence of a standardized benchmarking harness

prevents researchers from understanding how diﬀerent architec-

tural choices. such as subword-based methods versus traditional

statistical models—behave across varied data modalities and

genres.

Current evaluation practices in Arabic NLP suﬀer from three

critical methodological gaps that hinder the development of

high-performance standards:

1. Lack of a Standardized Benchmark Suite: Many

evaluations rely on non-public or inconsistently annotated

datasets, making it impossible to replicate results or

perform “apples-to-apples” comparisons between emerging

neural models and established baselines [4].

2. Metric Opacity and Coarse Granularity: Most systems

report aggregate token-level F

scores. These “black-box”

metrics mask qualitative diﬀerences in boundary placement

errors, such as the over-segmentation of stems versus the

under-segmentation of clitic clusters, which have vastly

diﬀerent impacts on system usability [5].

3. Isolation from Downstream Impact: There is a lack of

empirical evidence linking speciﬁc segmentation error types

to performance degradation in full-stack NLP pipelines.

This limits the ability of systems engineers to perform

task-aware model selection.

To address these challenges, we introduce ABWS (Arabic

Boundary-aware Word Segmentation), a scalable and pub-

licly available benchmarking system designed for the rigorous

and reproducible evaluation of Arabic segmentation. Simi-

lar to benchmarking eﬀorts in other computational domains

(e.g., MLCommons), ABWS provides a standardized frame-

work that decouples the evaluation logic from the underlying

model implementation.

The primary contributions of this work are as follows:

• A Standardized Gold-Standard Dataset: We present a

manually veriﬁed dataset comprising 212,873 words across

diverse genres, providing a representative workload for

evaluating system robustness and generality.

• A Uniﬁed Benchmarking Harness: We establish repro-

ducible baselines by integrating seven widely used segmen-

tation systems—spanning rule-based, statistical, and neural

paradigms—under a common evaluation protocol.

• Boundary-aware Metrics and Taxonomy: We ex-

tend traditional evaluation practices by introducing a ﬁne-

grained error taxonomy that quantiﬁes boundary place-

ment decisions, oﬀering deep er insights into system-level

bottlenecks.

• Cross-Layer Impact Analysis: We provide a systematic

study of how segmentation errors propagate through down-

stream NLP tasks, enabling a more holistic assessment of

performance beyond simple accuracy scores.

By providing the dataset, standardized evaluation scripts,

and baseline system outputs, ABWS aims to establish a new

standard for methodological rigor in Arabic NLP. This frame-

work not only facilitates transparent performance tracking but

also serves as a mo del for b enchmarking other morphologically

complex languages within the broader NLP ecosystem.

The remainder of this paper is organized as follows: Sec-

tion 2 reviews existing segmentation and evaluation practices;

Section 3 details the design and composition of the ABWS

benchmark; Section 4 presents the boundary-aware evaluation

framework; Section 5 reports experimental results and sys-

tematic error analysis; Section 6 examines implications for

downstream task performance; and Section 7 concludes with

future directions for standardization in the ﬁeld.

2. Related Work

This section reviews prior work from a benchmark- engineering

perspective, with particular attention to three dimensions: (i)

the evolution of Arabic morphological segmentation systems,

(ii) existing evaluation methodologies and benchmarks for seg-

mentation, and (iii) recent advances in benchmarking theory

that emphasize the explicit speciﬁcation of evaluation condi-

tions, evaluation systems, and standards as prerequisites for

comparability and reproducibility [6–8].

2.1. Arabic Morphological Segmentation Systems

Arabic morphological segmentation has evolved through several

methodological paradigms. Early systems were predominantly

rule-based and lexicon-driven, aiming to pro duce linguisti-

cally well-formed analyses grounded in classical morphological

theory. Systems such as MADA and AlKhalil Morpho Sys ex-

emplify this generation, integrating rich lexical resources with

hand-crafted rules and contextual disambiguation [9–11]. While

these systems achieved high linguistic precision, they were of-

ten constrained by limited coverage, sensitivity to orthographic

variation, and reduced robustness to out-of-vocabulary forms

and non-canonical usage [12].

To address coverage and scalability, statistical segmentation

approaches emerged. Data-driven models based on conditional

random ﬁelds and discriminative classiﬁers learned boundary

decisions from annotated corpora, notably the Penn Arabic

Treebank. Farasa further emphasized eﬃciency and deployabil-

ity by introducing a fast, deterministic segmentation pipeline

with statistical ranking, enabling near real-time processing on

large corpora [13]. These systems improved robustness but often

traded linguistic interpretability for speed and generalization.

In contemporary NLP pipelines, segmentation is frequently

induced implicitly through subword tokenization. Methods such

as Byte-Pair Encoding (BPE) and SentencePiece, as well as

WordPiece tokenization used in transformer pretraining, gen-

erate boundaries optimized for vocabulary compression and

language modeling objectives rather than morphological va-

lidity [14–16]. Arabic-focused pretrained models, including

AraBERT and later AraELECTRA and MARBERT, inherit

this tokenization-centric notion of segmentation, which often

results in boundaries that cut across morphemes or clitic units

[17, 18]. Although recent work explores explicit neural seg-

mentation via boundary prediction or multitask learning with

orthographic processes, such approaches remain fragmented

across datasets and annotation conventions and are not yet

standardized [19].

Despite this metho dological diversity, there is no consensus

on an “optimal” segmentation strategy. In practice, system se-

lection is frequently driven by pragmatic constraints such as

speed, memory footprint, or compatibility with downstream

models rather than by linguistic or task-aware criteria.

2.2. Evaluation of Arabic Segmentation

Early evaluations of Arabic segmentation typically relied on

alignment with treebank-style gold annotations and reported

boundary-level precision, recall, and F

. However, treating all

boundaries as equally important obscures qualitatively diﬀer-

ent error types, such as under-segmentation of proclitics versus

over-segmentation of stems [5]. Task-oriented studies demon-

strated that segmentation errors have asymmetric downstream

impact: over-segmentation may harm precision in information

retrieval, while under-segmentation may reduce recall or impair

translation quality [20, 21].

More recent analyses highlight that tokenization and seg-

mentation choices also aﬀect the eﬃciency and behavior of

transformer-based models, inﬂuencing both performance and

computational cost [22]. Nevertheless, most comparative stud-

ies still report aggregate metrics computed under heteroge-

neous and often undocumented evaluation conditions, limiting

interpretability and reproducibility.

From a standards perspective, a central limitation of prior

work is the absence of a standardized protocol for comparing

fundamentally diﬀerent segmentation paradigms. Morpholog-

ical segmenters produce linguistically motivated morpheme

boundaries, whereas subword tokenizers generate boundaries

derived from statistical vocabulary construction. Without an

explicit mapping between these representations, evaluation

scores across paradigms become eﬀectively incomparable, even

when computed on the same dataset [6]. Reproducibility is

further hindered when code, data splits, normalization poli-

cies, and evaluation scripts are not fully speciﬁed or publicly

available [23].

2.3. Benchmarking Practices, Standards, and

Robustness

General-purpose NLP benchmarks such as GLUE and Super-

GLUE demonstrated the value of uniﬁed tasks, datasets, and

scoring protocols for accelerating progress through comparabil-

ity [24, 25]. Subsequent benchmarking research has clariﬁed,

however, that a benchmark should not be understood as a

dataset alone, but as a complete evaluation system whose

conclusions depend on explicitly deﬁned evaluation conditions

(EC), a concrete evaluation system (ES), and a value function

that encodes what is being optimized [6, 7].

Within this perspective, a dataset is only meaningful in-

sofar as it instantiates a representative workload. That is,

benchmark data should approximate the structural, distri-

butional, and operational characteristics of real-world inputs

that systems are expected to process. ABWS adopts this

workload-centric view explicitly: the curated corpus is not

treated as a passive collection of labeled examples, but as a

controlled workload designed to stress-test Arabic segmentation

systems under realistic linguistic conditions, including dense

clitic stacking, derivational morphology, orthographic varia-

tion, and genre-speciﬁc constructions common in formal Arabic

text.

Recent b enchmark frameworks emphasize workload charac-

terization as a prerequisite for valid measurement. For example,

AICB formalizes benchmarks around representative workloads

executed under reproducible environments and explicitly de-

ﬁned ECs, ensuring that performance claims reﬂect behavior

under realistic operating conditions rather than isolated test

sets [7]. Similarly, COADBench argues that benchmarks must

align evaluation metrics with practical outcomes, demonstrat-

ing that mischaracterized workloads can render even precise

metrics misleading [8].

In the context of Arabic segmentation, workload character-

ization is particularly critical. Segmentation diﬃculty varies

substantially across registers and genres, and small shifts in

text composition can induce large changes in boundary distri-

butions and error modes. ABWS therefore ﬁxes and documents

workload properties—including genre, morphological density,

normalization rules, and boundary conventions—so that re-

ported results correspond to a clearly speciﬁed and reproducible

segmentation workload, rather than an abstract notion of

“Arabic data.”

Two robustness issues follow directly from this workload-

centric framing. First, domain shift—for example b etween

Classical Arabic, Modern Standard Arabic, and informal or

social media text—can substantially alter error distributions

and system rankings unless ECs such as genre selection, or-

thographic normalization, and boundary deﬁnitions are ﬁxed

and reported. Second, data contamination risks arise when

benchmark material overlaps with resources used during system

development or pretraining, particularly for large pretrained

models, leading to inﬂated and non-generalizable performance

estimates.

These considerations motivate benchmark designs that treat

workload speciﬁcation, dataset provenance, splitting strategy,

normalization procedures, and evaluation scripts as ﬁrst-class

artifacts. By doing so, ABWS aligns with determinacy and

equivalence as core benchmarking standards [6], and ensures

that its results reﬂect system behavior on a well-deﬁned, repre-

sentative Arabic segmentation workload rather than incidental

properties of a static dataset.

2.4. Our Position

ABWS is designed as a standards-oriented benchmark for

Arabic word segmentation. It explicitly speciﬁes evaluation con-

ditions, provides a reproducible evaluation system, and deﬁnes

value functions that (i) distinguish boundary types and error

positions, (ii) enable comparison across rule-based, statistical,

and neural/subword paradigms via boundary harmonization,

and (iii) support downstream-aware analysis where appropri-

ate. In doing so, ABWS aims to move Arabic segmentation

evaluation from dataset-speciﬁc reporting toward a rigorous,

comparable, and reproducible b enchmark engineering practice

[6, 7].

3. Formal Speciﬁcation and Evaluation

Conditions

This section describes the architectural design of ABWS (Ara-

bic Boundary-aware Word Segmentation), a benchmarking

framework engineered to address fundamental limitations in ex-

isting Arabic segmentation evaluation practices. Empirical in-

spection of segmentation outputs across rule-based, statistical,

and neural systems reveals that segmentation errors are not ran-

dom, but systematic and paradigm-dependent. Subword-based

models fragment stems to minimize vocabulary entropy, neural

tokenizers exhibit unstable boundary placement, and statistical

systems bias toward conservative under-segmentation in clitic-

dense constructions. These failure modes cannot be reliably

captured by aggregate word-level metrics alone.

ABWS is therefore designed not as a static dataset, but

as a uniﬁed benchmarking harness that enables reproducible,

AlShuhayeb et al.

paradigm-agnostic, and diagnostically meaningful evaluation.

Following benchmarking principles established for large-scale

computational systems [6, 7], ABWS formalizes evaluation

around standardized execution conditions, a canonical bound-

ary representation layer, and a multi-dimensional metric suite

explicitly aligned with observed linguistic error b ehavior.

3.1. Design Principles and Standardization Goals

The design of ABWS is guided by four core principles, each

directly motivated by empirical segmentation pathologies ob-

served across contemporary systems.

• Boundary-Centric Granularity: Empirical analysis

demonstrates that neural and subword-based systems fre-

quently insert b oundaries within morphologically atomic

stems (e.g., istih

q¯aqan → ist + h

q + ¯aq + an), while other

systems omit required clitic boundaries (e.g., fa + li +

nah

mad → falinahmad). ABWS therefore formulates seg-

mentation as a sequence of binary boundary decisions at

the character level, enabling direct diagnosis of over- and

under-segmentation behavior.

• Paradigm-Agnostic Normalization: Arabic segmenta-

tion systems produce structurally incompatible outputs,

ranging from morpho-syntactic analyses to frequency-driven

subword decompositions. To enable fair comparison, ABWS

introduces a b oundary vector abstraction that pro jects

all outputs—regardless of underlying architecture—into a

common mathematical space.

• Reproducibility-First Engineering: To eliminate hid-

den variability, all datasets, normalization rules, evalu-

ation scripts, and system outputs are version-controlled

and containerized. This benchmark-as-code approach en-

sures that reported results are deterministic, auditable, and

independently veriﬁable.

• Error-Aware Metric Design: Observed segmentation

failures disproportionately aﬀect certain boundary types

(e.g., clitics versus stem-internal splits). ABWS metrics are

therefore designed to distinguish directional error biases and

to weight linguistically salient boundaries according to their

downstream impact.

3.2. Standardized Boundary Representation Layer

A central challenge in Arabic segmentation benchmarking is

output incompatibility. For example, rule-based analyzers cor-

rectly preserve clitic boundaries (li + al + wud

¯u), while sub-

word tokenizers may split stems (al-t

+ h + ¯ara) or collapse

multi-clitic constructions (wa-lil-junub). Direct comparison of

such outputs is ill-deﬁned.

ABWS resolves this incompatibility by projecting all system

outputs into a Character-Level Boundary Vector, which serves

as the canonical internal representation for evaluation.

Boundary Vector Formalization. Given an input string

of n characters, ABWS deﬁnes a binary boundary vector

B = (b

, b

, . . . , b

n−1

where







1 if a boundary exists between characters i and i + 1,

0 otherwise.

This representation ensures that all systems are evaluated

against an identical character sequence, eliminating alignment

drift caused by orthographic normalization, Unicode variation,

or tokenization artifacts. As a result, stem-internal splits, clitic

omissions, and boundary displacements are measured uniformly

across paradigms.

3.3. Evaluation Engine and Value Functions

Let S and G denote the system-predicted and gold-standard

boundary vectors, respectively. The ABWS evaluation engine

computes a suite of value functions designed to capture com-

plementary dimensions of segmentation quality revealed by

empirical error analysis:

• Boundary-Level Precision, Recall, and F

: Baseline

measures of boundary detection accuracy, insensitive to

token length but sensitive to boundary placement.

• Word-Level Exact Match (EM): A strict correctness cri-

terion requiring all boundary decisions within a word to

match the gold standard, p enalizing even a single stem-

internal split or missed clitic.

• Boundary Distance (BD): A granular disagreement met-

ric quantifying average per-boundary deviation:

BD(S, G) =

n − 1

n−1

i=1

(S) − b

(G)| .

This measure captures systemic boundary noise observed in

subword tokenizers.

• Directional Bias Ratios: Over-Segmentation Ratio

(OSR) and Under-Segmentation Ratio (USR) explicitly sep-

arate stem-fragmentation errors from clitic-merging errors,

reﬂecting the asymmetric failure modes observed across

architectures.

• Critical Boundary Accuracy (CBA): A weighted ac-

curacy metric prioritizing linguistically salient boundaries

(e.g., proclitics and enclitics) over stem-internal positions.

Fixed weights (w

clitic

= 2.0, w

stem

= 0.5) ensure determin-

ism while reﬂecting downstream sensitivity.

• CBA Formulation: The diﬀerential weighting in the Crit-

ical Boundary Accuracy (CBA) metric—assigning w =

2.0 to clitic boundaries and w = 0.5 to internal stem

boundaries—is grounded in the concept of Downstream Im-

pact Analysis of segmentation errors. In Arabic, clitics

(proclitics and enclitics) frequently function as essential syn-

tactic markers, including conjunctions, prepositions, and

pronominal suﬃxes. Failure to correctly segment a clitic (for

example, the preposition bi-) often produces a catastrophic

error in downstream tasks such as Machine Translation

or Dependency Parsing, because it alters the fundamental

grammatical role of the token within the sentence. Con-

versely, over-segmentation or under-segmentation within

the stem (for example, incorrectly splitting a root-derived

noun) usually produces a recoverable error, where the se-

mantic core remains partially identiﬁable by information

retrieval systems or embedding-based models. By assigning

a higher penalty to clitic-related segmentation errors, the

CBA metric explicitly prioritizes boundaries that preserve

functional linguistic structure. This weighting scheme en-

sures that the benchmark emphasizes architectural precision

necessary for syntactic and grammatical integrity rather

than treating all boundary errors as equally consequential

lexical variations.

3.4. Statistical Protocol and Robustness

To ensure that reported diﬀerences reﬂect systematic behav-

ior rather than sampling variance, ABWS adopts a rigorous

statistical protocol:

• Conﬁdence Estimation: 95% conﬁdence intervals esti-

mated via 1,000-resample bootstrap pro cedures.

• Pairwise Signiﬁcance Testing: McNemar’s test with

Bonferroni correction for multiple comparisons.

• Eﬀect Size Reporting: Cohen’s h is reported alongside

p-values to distinguish statistical signiﬁcance from practical

impact.

3.5. Implementation and Portability

ABWS is implemented in Python as a modular evaluation li-

brary. To guarantee portability and long-term reproducibility,

the entire benchmarking pipeline is containerized with pinned

dependencies and ﬁxed normalization rules. New segmentation

systems can be integrated by supplying raw outputs, which

are automatically normalized and projected into boundary vec-

tors, enabling immediate inclusion in the benchmarking harness

without architectural modiﬁcation.

This design positions ABWS as a stable, extensible, and

diagnostically expressive benchmark capable of evolving along-

side Arabic NLP systems while preserving comparability across

generations of mo dels.

While the current evaluation focuses on a workload char-

acterized by high morphological density—speciﬁcally Classical

Arabic texts such as Shar¯ai al-Isl¯am—the ABWS framework is

architecturally designed to be extensible to Arabic dialects. The

core strength of the benchmark lies in its Canonical Boundary

Vector (CBV) abstraction, which decouples linguistic speciﬁci-

ties from the technical evaluation harness. In dialectal Arabic,

where segmentation challenges often arise from phonological

fusion or elision, the CBV maintains its utility by treating

segmentation as a series of vocabulary-independent binary de-

cisions at the character level. Consequently, adapting ABWS

to various dialects only requires redeﬁning the ’Gold Vector’

to align with the speciﬁc morphological conventions of a given

dialect (e.g., handling the aspectual preﬁx ’bi-’ in Levantine

or negation particles in Maghrebi). This ﬂexibility ensures

that ABWS remains a paradigm-agnostic system capable of

evaluating model performance across the full spectrum of the

Arabic linguistic continuum without necessitating changes to

its underlying mathematical or procedural framework.

4. Experimental Results and Performance

Analysis

The ob jective of this evaluation is to provide a diagnostic

breakdown of Arabic word segmentation quality beyond ag-

gregate accuracy scores. All reported results are computed

using the canonical boundary vector representation deﬁned by

ABWS, ensuring strictly comparable (apples-to-apples) evalu-

ation across heterogeneous segmentation paradigms, including

rule-based, statistical, and neural systems. In addition to quan-

titative metrics, we incorporate linguistically grounded error

inspection to validate that ABWS diagnostics capture real and

systematic segmentation pathologies.

4.1. Comparative Analysis of Word-Level Accuracy

Table 1 reports Word-Level Exact Match (EM) accuracy, the

most stringent metric in the ABWS evaluation suite. EM re-

quires a system to reproduce the complete gold morphological

segmentation of each word without any boundary insertion,

deletion, or displacement errors.

Table 1. Word-level exact match accuracy across paradigms

(N = 212,873).

Paradigm System Accuracy

Rule-based CAMeL Tools 0.817

Rule-based ALP 0.790

Statistical Farasa 0.810

Neural / Subword BERT-based 0.460

Neural / Subword SelfSeg 0.163

Neural / Subword mBART 0.122

Neural / Subword BPE 0.102

The results reveal a pronounced performance hierarchy.

Rule-based systems achieve the highest word-level reliability,

followed by statistical models, while neural and subword-based

tokenizers exhibit a substantial degradation in exact match

accuracy. Crucially, this degradation is explained by struc-

tural mismatches between tokenization objectives and Arabic

morphology: subword tokenizers optimized for vocabulary com-

pression frequently fragment morphologically atomic stems

(e.g., altah¯ara → alt

+ h + ¯ara in mBART; istib¯ah

a → ist

+ b¯ah

+ a in mBART), while language-agnostic neural sys-

tems may collapse required clitic b oundaries (e.g., waad + n¯a

+ hu → waadn¯ahu in SelfSeg). Such errors are catastrophic

under EM because even a single stem-internal split or missed

clitic boundary invalidates the entire word segmentation.

4.2. Multi-Dimensional Diagnostic Metrics

To identify the structural sources of segmentation failure, we

analyze boundary-level diagnostics using ABWS metrics in Ta-

ble 2. Errors are decomposed into Boundary F

, Boundary

Distance (BD), Over-Segmentation Ratio (OSR), and Under-

Segmentation Ratio (USR), enabling ﬁne-grained characteriza-

tion of systematic error behavior.

Table 2. Boundary-level diagnostic proﬁles and error distribution.

System Boundary F

BD OSR USR

CAMeL Tools 0.86 0.11 0.08 0.14

Farasa 0.78 0.19 0.15 0.23

BERT-based 0.71 0.27 0.21 0.32

SelfSeg 0.38 0.61 0.55 0.09

BPE 0.32 0.65 0.58 0.07

mBART 0.29 0.68 0.62 0.09

To ensure a fair and reproducible comparison, all segmenta-

tion systems were evaluated under a uniﬁed set of Evaluation

Conditions (EC) as detailed in Table 3. Since diﬀerent Ara-

bic NLP tools often employ internal normalization logic, we

enforced a pre-processing layer that standardizes Alef/Ya char-

acters and removes non-lexical elements like Kashida and Di-

acritics. This prevents performance discrepancies from arising

due to orthographic variations rather than the segmentation

logic itself. Furthermore, we provide the exact versions of each

AlShuhayeb et al.

Table 3. Standardized Evaluation Conditions (EC) for ABWS

Benchmark

Parameter Speciﬁcation / Rule

Orthographic Normalization Alef normalization , Ya nor-

malization (y¯a, alif maqsura

→ uniﬁed form)

Kashida Removal All tatweel characters

(U+0640) stripped before

processing

Diacritics (Tashkeel) All short vowels and shadda

removed for consistency

Input Format UTF-8 encoded raw text

strings (sentence-level)

Punctuation Handling Preserved in text but ex-

cluded from boundary vector

calculation

Tool Versions Farasa (v1.1), Stanza (v1.4),

MADAMIRA (v2.1), CAMeL

Tools (v1.2)

Hardware Environment Ubuntu 22.04 LTS, 32GB

RAM, NVIDIA RTX 3090 (for

neural models)

integrated tool to ensure that our results can be replicated in

future studies.

4.3. Proﬁling Systematic Failure Modes

The diagnostic metrics reveal strongly asymmetric error proﬁles

across segmentation paradigms, consistent with direct linguistic

inspection:

• Subword Tokenizers (BPE, mBART): These systems

exhibit extreme over-segmentation behavior (OSR > 0.58),

frequently inserting boundaries within stems and even

within root material. In the provided examples, mBART

splits morphologically atomic forms such as istib¯ah

a into

ist + b¯ah

+ a, and fragments deﬁnite-article constructions

such as al-t

ah¯ara into al-t

+ h + ¯ara. Such boundaries

are not linguistically valid morphemes, but artifacts of

vocabulary compression ob jectives.

• Neural Tokenizers (SelfSeg, BERT-based): These sys-

tems demonstrate unstable boundary behavior. SelfSeg ex-

hibits a mixed proﬁle dominated by boundary omissions

on required clitic chains (e.g., waad + n¯a + hu → waadn¯ahu,

fa + lan + nah

mad left unsegmented), while also occasion-

ally introducing non-morphological preﬁx splits (e.g., a +

l-t

ah¯ara). BERT-based outputs are comparatively stronger

than subword tokenizers but still exhibit boundary drift,

including occasional stem-internal splits and inconsistent

handling of aﬃxes (e.g., al-t

ah¯ar + a instead of al +

ah¯ara).

• Statistical Systems (Farasa): Farasa exhibits a con-

servative boundary-decision strategy with elevated USR,

particularly in multi-clitic sequences and function-word at-

tachment. This is visible in cases where clitic boundaries are

merged (e.g., wa + kull + hu predicted as wakull + hu) and

in reduced granularity for proclitic chains.

• Rule-based Systems (CAMeL Tools, ALP): Rule-

based analyzers maintain the most balanced error distribu-

tion and low BD, indicating that residual errors are localized

rather than systemic. They consistently preserve canonical

clitic and article b oundaries (e.g., li + al + wud

¯u, wa + al

+ mand¯ub) and avoid stem fragmentation, aligning with gold

morphological conventions.

4.4. Assessment of High-Salience Boundaries

Critical Boundary Accuracy (CBA) evaluates segmentation per-

formance on linguistically salient boundaries—such as proclitics

(e.g., wa+, fa+, bi+, li+), the deﬁnite article (al+), and encli-

tics (e.g., +hu, +hum)—that exert disproportionate inﬂuence on

downstream tasks. Table 4 reports CBA scores across systems.

Table 4. Critical Boundary Accuracy (CBA): Performance on

high-impact segments.

System CBA

CAMeL Tools 0.89

Farasa 0.82

BERT-based 0.75

SelfSeg 0.44

BPE 0.41

mBART 0.39

The widening performance gap under CBA conﬁrms that

neural and subword-based systems not only generate more er-

rors overall, but disproportionately fail on boundaries that are

most consequential for linguistic interpretation. In the quali-

tative examples, failures are concentrated in clitic chains and

article attachment (e.g., fa + al + w¯ajib, li + al + wud

¯u, al

+ masjidayn), where subword tokenizers fragment stems and

SelfSeg often collapses required boundaries.

4.5. Statistical Veriﬁcation and Reproducibility

All observed p erformance diﬀerences were validated using Mc-

Nemar’s test with Bonferroni correction for multiple compar-

isons. Rule-based systems signiﬁcantly outp erform neural and

subword-based approaches (p < 0.001), with large eﬀect sizes

(Cohen’s h > 0.5).

In accordance with TBSE reproducibility standards, the full

experimental pip eline—including the 1,000-resample bootstrap

procedure used to estimate conﬁdence intervals—is fully con-

tainerized. Each table in this section can be regenerated via a

single command within the ABWS evaluation environment.

4.6. Summary of Benchmarking Insights

The application of ABWS yields three core conclusions:

• Architecture Dictates Boundary Precision: Segmen-

tation quality is primarily determined by architectural as-

sumptions. Rule-based systems preserve linguistically valid

boundaries and avoid stem fragmentation, yielding the

strongest EM and boundary diagnostics.

• Aggregate Metrics are Insuﬃcient: Word-level accuracy

alone obscures severe paradigm-speciﬁc biases. Boundary-

aware diagnostics are necessary to expose over-segmentation

in subword models and boundary omission in language-

agnostic neural tokenizers.

• Standardization Enables Diagnostic Insight: Canon-

ical boundary projection enables a comprehensive, multi-

paradigm evaluation under controlled conditions and pro-

vides explanatory power by linking numerical scores to

concrete linguistic failure modes.

5. Discussion

The empirical results presented in Section 4 reveal a sub-

stantial performance gap between segmentation architectural

paradigms when evaluated on the ABWS representative work-

load. As shown in Table [1], rule-based and hybrid systems such

as Farasa (0.81), CAMeL Tools (0.81), and ALP (0.79) main-

tain relatively high boundary ﬁdelity, reﬂecting their explicit

modeling of Arabic morphology. In contrast, modern neural

architectures and subword tokenizers exhibit a catastrophic

degradation in performance: BPE (0.102) and mBART (0.122)

fail to capture even basic clitic and stem boundaries, despite

their widespread use in downstream neural pip elines.

The observed performance degradation in neural subword

models, such as mBART and BPE-based architectures, stems

from a fundamental misalignment between computational ef-

ﬁciency and linguistic morphology. Unlike rule-based systems

that prioritize morpheme boundaries, subword tokenization al-

gorithms are driven by information-theoretic compression (e.g.,

maximizing likelihood or frequency). Consequently, these mod-

els often ignore critical linguistic boundaries—such as the

junction between a proclitic (e.g., the conjunction ’w-’) and

a stem—if a non-linguistic grouping provides a more frequent

statistical pattern in the training corpus. This ’mechanistic’

bias leads to the masking of functional particles, where a model

may treat a preﬁxed word as a single opaque unit rather than a

decomposable structure. Our CBA metric captures this failure

by penalizing these statistically-driven but linguistically-invalid

merges, which are particularly prevalent in the high-density

Classical Arabic workload of our benchmark.

Regarding the composition of the ABWS workload, the in-

clusion of high-density Classical Arabic texts—speciﬁcally legal

and jurisprudential treatises like Shar¯ai al-Isl¯am—is a deliber-

ate design choice rather than a limitation. These texts exhibit a

signiﬁcantly higher morphological density and a more complex

clitic-stacking behavior compared to modern news or techni-

cal documents. By evaluating systems on this corpus, ABWS

functions as a rigorous ’stress-test’ for segmentation models.

We argue that a system capable of accurately navigating the

intricate boundary decisions of Classical Arabic is inherently

more robust and better prepared for the linguistic variations of

Modern Standard Arabic (MSA). Thus, this workload serves as

a high-water mark for evaluating the precision and diagnostic

limits of current Arabic NLP architectures.

5.1. The Failure of Subword Tokenization

The output analysis in Section 4.1 exposes a pronounced re-

ality gap between subword-based segmentation mo dels and

linguistically valid Arabic morphology. In BPE and mBART,

segmentation decisions are driven primarily by statistical fre-

quency and vocabulary compression rather than morphemic

structure. For example, the word fa-al-w¯ajib (“so the obliga-

tion”) is correctly decomposed by ALP and Farasa into the

clitic-aware sequence [fa, al, w¯ajib]. By contrast, mBART pro-

duces fragmented outputs such as [fal, w¯a, jib], which do not

correspond to any valid morphological units in Arabic.

This behavior conﬁrms that subword-based neural models,

despite their apparent ﬂuency in downstream tasks, op erate on

a predominantly surface-level representation that lacks struc-

tural awareness of Arabic clitic attachment and stem integrity.

From a benchmarking perspective concerned with traceability

and linguistic correctness, these ﬁndings indicate that subword-

level metrics are poor proxies for morphological truth and can

substantially misrepresent actual segmentation quality.

5.2. Robustness to Domain-Speciﬁc Morphology

The evaluated workload is dominated by Classical Arabic ju-

risprudential (Fiqh) terminology, including morphologically

dense and derivationally complex forms such as al-istib¯ah

a and

al-mustah

¯ad

a. Traditional segmentation systems (Farasa and

CAMeL Tools) demonstrate robustness in this setting due to

their reliance on explicit morphological analyzers and lexicons.

These systems consistently preserve canonical preﬁx, stem, and

suﬃx boundaries even in specialized domains.

Neural models, however, exhibit marked performance degra-

dation. The BERT-based segmenter achieves moderate overall

accuracy (0.46) but still struggles with complex preﬁx–suﬃx

combinations. For instance, forms such as wa-al-mand¯ub are

segmented as [wal-man, d¯ub], indicating partial boundary drift

and loss of morphemic coherence. This behavior suggests a

high evaluation risk when deploying neural segmentation mod-

els in specialized or low-frequency domains, where memorized

subword statistics fail to generalize underlying morphological

rules.

5.3. Impact on Downstream Tasks

To address the correlation between ABWS metrics and down-

stream NLP performance, we conducted a pilot study focusing

on Part-of-Speech (POS) tagging—a critical downstream task

sensitive to segmentation quality. Our experiments, involving

multiple architectures (including BiLSTM and Stanza), demon-

strate a strong positive correlation (ρ > 0.88) between Crit-

ical Boundary Accuracy (CBA) and tagging macro-F1 scores.

Speciﬁcally, we observed that errors identiﬁed by ABWS as

‘Under-segmentation of Proclitics’ (high USR) lead to a dis-

proportionate drop in POS accuracy compared to simple stem

boundary shifts. For instance, when the CBA score fell be-

low 0.85, the downstream POS tagger’s ability to correctly

identify functional markers (e.g., particles and conjunctions)

degraded by over 12%. These ﬁndings empirically validate that

the diagnostic metrics provided by ABWS are not merely in-

trinsic measures but are reliable predictors of a model’s utility

in complex Arabic NLP pipelines.

5.4. Implications for Standardization and Evaluation

Theory

From a workload characterization perspective, these results

strongly justify the design choices underlying the ABWS frame-

work. Conventional evaluation practices often mask the ob-

served failures by relying on aggregate metrics (e.g., BLEU or

token-level F

) computed over overlapping subwords, thereby

conﬂating surface overlap with linguistic correctness. By en-

forcing a Canonical Boundary Vector (CBV) representation,

ABWS exposes fundamental limitations that remain invisible

under traditional evaluation regimes.

Speciﬁcally, the results demonstrate that:

• Neural and subword-based segmenters are not yet standard-

ready for high-precision linguistic tasks that require reliable

boundary placement.

• Evaluation equivalence between rule-based and neural sys-

tems is unattainable without a paradigm-agnostic repre-

sentation and metric suite, such as those proposed in this

work.

In summary, the current reality of Arabic NLP benchmark-

ing reﬂects a trade-oﬀ between the scalability and ﬂexibility

of neural models and the boundary precision of rule-based

AlShuhayeb et al.

systems. For critical applications such as legal, religious, or

scholarly text analysis, the high error rates observed for Self-

Seg (0.163), BPE (0.102), and mBART (0.122) render these

approaches unsuitable in their current form. These ﬁndings

underscore the urgent need for boundary-aware training ob jec-

tives and evaluation frameworks in the next generation of large

language models for Arabic.

6. Conclusion and Future Work

In this work, we introduced the Arabic Boundary Word Seg-

mentation (ABWS) framework, a multi-paradigm benchmark

designed to address the lack of standardization in Arabic mor-

phological evaluation. By formalizing the Canonical Boundary

Vector (CBV), we provided a methodology to evaluate systems

ranging from traditional rule-based analyzers to modern neu-

ral subword tokenizers within a uniﬁed, equivalent evaluation

condition (EC).

Our empirical results, based on a representative workload

of 212,873 words, reveal a profound "reality gap" in current

Arabic NLP. While rule-based systems like Farasa and Camel

achieve high boundary accuracy (0.81), state-of-the-art neural

models and statistical tokenizers such as mBART (0.122) and

BPE (0.102) show catastrophic failure in capturing linguisti-

cally valid boundaries. This disparity highlights a signiﬁcant

evaluation risk : conventional metrics used in downstream tasks

often mask a systemic lack of morphological awareness in Large

Language Models (LLMs).

ABWS contributes to the engineering of evaluation by

providing a containerized, reproducible pipeline that ensures

benchmark traceability. By treating dataset provenance and

workload characterization as ﬁrst-class artifacts, this bench-

mark allows for the rigorous comparison of diverse architec-

tures, ensuring that progress in Arabic NLP is measured against

a ground-truth linguistic standard rather than surface-level

statistical frequency.

While ABWS is speciﬁcally designed for Arabic, its

core methodological contributions are language-agnostic. The

Canonical Boundary Vector (CBV) abstraction provides a gen-

eral solution for comparing outputs from disparate segmenta-

tion paradigms (rule-based, statistical, neural) in any language.

The boundary-aware metrics (e.g., OSR, USR, CBA) are de-

ﬁned at the character level and do not rely on Arabic-speciﬁc

features, making them transferable to other morphologically

rich languages (MRLs) such as Hebrew, Turkish, or Finnish.

However, the empirical ﬁndings reported in this paper—such

as the extreme over-segmentation of subword tokenizers—are

directly tied to Arabic’s unique morphological structure (e.g.,

concatenative cliticization). While similar phenomena may oc-

cur in other MRLs, further experiments are needed to conﬁrm

cross-lingual patterns.

Future work will focus on expanding the ABWS workload

to include more diverse dialects and low-resource historical

texts. Furthermore, we intend to integrate automated artifact

evaluation tools to further streamline the reproducibility of re-

sults across diﬀerent hardware testbeds. Ultimately, ABWS

oﬀers a template for how complex, multi-layered NLP tasks

can be standardized to support cumulative scientiﬁc progress

and reliable real-world deployment.

Ethical Statement

No ethical approval was required for this study, as it did not

involve human or animal subjects.

Funding

This research received no speciﬁc grant from any funding

agency in the public, commercial, or not-for-proﬁt sectors.

Declaration of competing interests

The authors declare that they have no known competing ﬁnan-

cial interests or personal relationships that could have appeared

to inﬂuence the work reported in this paper.

Data Availability Statements

The data supporting the ﬁndings of this study are openly

available in zenodo at https://zenodo.org/records/18138582 or

https://doi.org/10.5281/zenodo.18138582.

Credit authorship contribution statement

Behrouz Minaei-Bidgoli: Supervision; Methodology; Valida-

tion; Writing – Review & Editing. Huda AlShuhayeb: Con-

ceptualization; Methodology; Formal Analysis; Investigation;

Visualization; Writing – Original Draft.

References

1. Nizar Y. Habash. Introduction to Arabic Natural Lan-

guage Processing. Synthesis Lectures on Human Language

Technologies. Morgan & Claypool Publishers, 2010. doi:

10.2200/S00277ED1V01Y201008HLT010.

2. Rabih Zbib, Erika Malchiodi, Jacob Devlin, David Stal-

lard, Spyros Matsoukas, and Richard Schwartz. Machine

translation of arabic dialects. In Proceedings of NAACL-

HLT, pages 49–59, 2012. URL: https://aclanthology.org/

N12-1006.pdf.

3. Kareem Darwish. Building a shallow arabic morphological

analyzer in one day. In Proceedings of the ACL Work-

shop on Computational Approaches to Semitic Languages,

2002. URL: https://aclanthology.org/W02- 0506.pdf.

4. Kyle Gorman and Steven Bedrick. We need to talk about

standard splits. In Proceedings of the 57th Annual Meeting

of the Association for Computational Linguistics, pages

2786–2791, 2019. doi:10.18653/v1/P19-1267.

5. Nizar Habash and Owen Rambow. Arabic tokenization,

part-of-speech tagging and morphological disambiguation in

one fell swoop. In Proceedings of ACL, pages 573–580,

2005. URL: https://aclanthology.org/P05- 1071.pdf.

6. F. Han et al. Open source evaluatology: A theoreti-

cal framework for open-source evaluation. BenchCouncil

Transactions on Benchmarks, Standards and Evaluations,

4:100190, 2024. URL: https://doi.org/10.1016/j.tbench.

2025.100190.

7. Xinyue Li, Heyang Zhou, Qingxu Li, Sen Zhang, and Gang

Lu. Aicb: A benchmark for evaluating the communica-

tion subsystem of LLM training clusters. BenchCouncil

Transactions on Benchmarks, Standards and Evaluations,

5:100212, 2025. doi:10.1016/j.tbench.2025.100212.

8. Jiyue Xie, Wenjing Liu, Li Ma, Caiqin Yao, Qi Liang, Suqin

Tang, and Yunyou Huang. COADBench: A benchmark for

revealing the relationship between AI models and clinical

outcomes. BenchCouncil Transactions on Benchmarks,

Standards and Evaluations, 4:100198, 2025. TBSE pa-

per (uploaded PDF: S2772485925000110). doi:10.1016/j.

tbench.2025.100198.

9. Mohamed Maamouri, Ann Bies, Tim Buckwalter, and

Wigdan Mekki. The penn arabic treebank: Building

a large-scale annotated arabic corpus. In NEMLAR

Conference on Arabic Language Resources and Tools,

2004. URL: https://www.marefa.org/images/e/e8/The_penn_

arabic_treebank_Building_a_large-scale_an_%281%29.pdf.

10. Ryan Roth, Owen Rambow, Nizar Habash, Mona Diab, and

Cynthia Rudin. Arabic morphological tagging, diacritiza-

tion, and lemmatization using lexeme models and feature

ranking. In Proceedings of ACL-08: HLT, 2008. URL:

https://aclanthology.org/P08-2030.pdf.

11. Mohamed Boudchiche, Abdelhak Mazroui, Mohamed

Bebah, Abdelhadi Lakhouaja, and Abdelaziz Boud-

lal. Alkhalil morpho sys 2: A robust arabic

morpho-syntactic analyzer. Journal of King Saud

University – Computer and Information Sciences,

29(2):141–146, 2017. URL: https://www.sciencedirect.

com/science/article/pii/S131915781630026X, doi:10.1016/

j.jksuci.2016.08.003.

12. Wajdi Zaghouani. Critical survey of the

freely available arabic corpora. In Proceed-

ings of LREC, 2014. URL: https://www.

researchgate.net/profile/Wajdi-Zaghouani/publication/

263215246_Critical_Survey_of_the_Freely_Available_

Arabic_Corpora/links/0046353a53977808fa000000/

Critical-Survey- of-the- Freely- Available-Arabic- Corpora.

pdf.

13. Ahmed Abdelali, Kareem Darwish, Nadir Durrani, and

Hamdy Mubarak. Farasa: A fast and furious segmenter

for arabic. In Proceedings of NAACL-HLT, 2016. URL:

https://aclanthology.org/N16-3003.pdf.

14. Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural

machine translation of rare words with subword units. In

Proceedings of ACL, pages 1715–1725, 2016. URL: https://

aclanthology.org/P16-1162.pdf, doi:10.18653/v1/P16-1162.

15. Taku Kudo and John Richardson. Sentencepiece:

A simple and language independent subword tok-

enizer and detokenizer for neural text processing.

In Proceedings of EMNLP, 2018. URL: https:

//aclanthology.org/anthology-files/anthology- files/

pdf/D/D18/D18-2.pdf#page=78.

16. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina

Toutanova. BERT: Pre-training of deep bidirectional

transformers for language understanding. In Proceedings

of NAACL-HLT, 2019. URL: https://aclanthology.org/

N19-1423.pdf.

17. Wissam Antoun, Fady Baly, and Hazem Hajj. AraELEC-

TRA: Pre-training text discriminators for arabic language

understanding. In Proceedings of WANLP, 2020. URL:

https://aclanthology.org/2021.wanlp-1.20.pdf.

18. Muhammad Abdul-Mageed, AbdelRahim Elmadany, and

El Moatez Billah Nagoudi. ARBERT & MARBERT:

Deep bidirectional transformers for arabic. In Proceedings

of ACL-IJCNLP, 2021. URL: https://aclanthology.org/

2021.acl-long.551.pdf.

19. Bashar Alhafni and Nizar Habash. Joint diacritization,

lemmatization, normalization, and ﬁne-grained morpho-

logical tagging. In Proceedings of EACL, 2023. URL:

https://aclanthology.org/2020.acl-main.736.pdf.

20. Nizar Habash and Fatiha Sadat. Arabic preprocessing

schemes for statistical machine translation. In Proceed-

ings of NAACL-HLT, pages 49–52, 2006. URL: https:

//aclanthology.org/N06-2013.pdf.

21. Kareem Darwish and Douglas W. Oard. Term selection for

searching printed arabic. In Proceedings of SIGIR, 2003.

URL: https://dl.acm.org/doi/pdf/10.1145/564376.564423.

22. Yonghui Wu et al. Google’s neural machine trans-

lation system: Bridging the gap between human and

machine translation. In arXiv preprint arXiv:1609.08144,

2016. URL: https://www.researchgate.net/publication/

308646556_Google’s_Neural_Machine_Translation_System_

Bridging_the_Gap_between_Human_and_Machine_Translation.

23. Kyle Gorman and Steven Bedrick. We need to

talk ab out standard splits. In Proceedings of ACL,

2019. URL: https://pmc.ncbi.nlm.nih.gov/articles/

PMC10287171/pdf/nihms-1908534.pdf.

24. Alex Wang et al. GLUE: A multi-task benchmark and

analysis platform for natural language understanding. In

Proceedings of EMNLP Workshop, 2018. URL: https:

//aclanthology.org/W18-5446.pdf.

25. Alex Wang et al. SuperGLUE: A stickier bench-

mark for general-purpose language understanding

systems. In Proceedings of NeurIPS, 2019. URL:

https://proceedings.neurips.cc/paper_files/paper/2019/

file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf.

BenchCouncil Transactions on Benchmarks, Standards and

Evaluations, 2026

DOI: https://doi.org/10.66834/5s84yp08

Full Length Articles

FULL LENGTH ARTICLES

TraceRTL: Agile Performance Evaluation for

Microarchitecture Exploration

Zifei Zhang ,

1,2

Yinan Xu ,

Kaichen Gong ,

Sa Wang ,

1,2

Dan Tang

1,3

and Yungang Bao

1,2,∗

State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China,

University of

Chinese Academy of Sciences, 100190, Beijing, China,

Beijing Institute of Open Source Chip, 100080, Beijing, China and

School of

Information Science and Technology, ShanghaiTech University, 200000, Shanghai, China

∗

Corresponding author. baoyg@ict.ac.cn

Received on 2 February 2026; Accepted on 29 March 2026

Abstract

While agile chip development methodologies have accelerated RTL design and simulation, performance evaluation re-

mains constrained by challenges: (1) limited benchmark availability due to incomplete peripheral/software simulation

environments or unavailable source code; (2) ineﬃcient feature prototyping caused by the tight coupling between func-

tional correctness and performance evaluation, particularly for large-scale, error-prone microarchitectures. To address

these challenges, we propose TraceRTL, an agile, trace-driven performance evaluation methodology that decouples the

functional and performance components of CPU RTL designs. It introduces three contributions to the benchmarking com-

munity: (1) a trace-driven exploration framework that bypasses full functional correctness while preserving performance

behavior and supports replaying workload traces on RTL designs; (2) a quantitative analysis and mitigation methodology

to identify and reduce trace-driven performance discrepancies; (3) a trace transformation technique, TraceBridge, that

converts benchmark traces between diﬀerent formats and instruction sets. Using TraceRTL, we have developed the ﬁrst

trace-driven RTL CPU derived from XiangShan, a high-performance out-of-order RISC-V processor. TraceRTL achieves

performance accuracy of 99.87% and 99.86% on SPECint2017 and SPECfp2017, respectively. With TraceBridge, we

evaluate x86 Google workload traces on a RISC-V RTL CPU and reveal distinct memory-bound behavior.

Key words: Trace-driven simulation, Performance evaluation, Cross ISA benchmarking

1. Introduction

Performance has always been a central consideration in CPU

development. As Moore’s Law slows and application demands

diversify, achieving further performance improvements has be-

come increasingly challenging. This highlights the imp ortance

of microarchitecture exploration methodologies. A key question

is: given a baseline CPU design, how can we eﬃciently quantify

the performance impact of a prop osed hardware feature using

representative benchmarks?

Among available evaluation methods for assessing CPU

design changes using diverse benchmarks, the most faithful

approach is to use the register-transfer level (RTL) implementa-

tion. As the deﬁnitive description of the microarchitecture, RTL

is the most reliable basis for assessing CPU microarchitecture

designs. Ultimately, any proposed feature must be implemented

and evaluated in RTL to determine its true performance impact.

However, since the RTL development process is time-

consuming, the computer architecture community has adopted

more eﬃcient approaches to accelerate early-stage exploration

before implementing a proposed feature in RTL. As shown in

Fig. 1(a), software-based architectural simulators [1–8] model

low-level hardware components using high-level languages and

abstractions, enabling fast simulation and rapid design itera-

tion. Despite their high productivity in early-stage exploration,

the last mile remains unavoidable: performance must still be re-

evaluated at the RTL level after initial simulator studies, since

the additional modeling layer inevitably introduces discrep-

ancies that require substantial engineering eﬀorts and costly

calibration with the actual implementation [9].

Another fundamental yet often overlooked challenge is the

benchmarking asymmetry across the development workﬂow.

While software simulators [5, 6, 8, 10] widely adopt trace-driven

methodologies to execute diverse benchmarks in trace format,

RTL models lack the capability to replay traces and faces lim-

ited benchmarks due to immature simulation environments.

The benchmarking gap prevents a consistent and continuous

evaluation ﬂow from early-stage modeling to ﬁnal hardware

implementation.

Z. Zhang et al.

Trace-DrivenExecution-Driven

RTL

Software

Execution Model

Platform

• Rocket Chip Generator (2016)

• FireSim (2018)

• BOOM-Explorer (2021)

• XiangShan (2022)

• SimpleScalar (1997)

• GEMS (2005)

• M5 (2006)

• gem5 (2011)

• Sniper (2011)

• ZSim (2013)

• ChampSim (2022)

• Calipers (2022)

TRACERTL

(This Work)

(a) Approaches to microarchitecture exploration.

Workload

Preparation

RTL

Implementation

Agile

Performance

Evaluation

Functional

Correctness

Performance

Evaluation

TRACERTL

Current Workflow

(b) Current and TraceRTL performance evaluation workﬂows.

Figure 1. Microarchitecture exploration methods and workﬂows.

Recent advancements in RTL design and simulation oﬀer

strong potential for a seamless, progressive reﬁnement work-

ﬂow from early-stage exploration to the last mile. On the design

side, high-level hardware construction languages [11–13] enable

parameterized and reusable comp onents, allowing rapid imple-

mentation and iteration of new microarchitectural ideas. On

the evaluation side, eﬃcient RTL simulation methods [14–18],

especially FPGAs and emulators [19–22], have signiﬁcantly ac-

celerated large-scale RTL simulation. These capabilities have

already been demonstrated in several open-source, industrial-

competitive CPUs [23–27], which provide realistic and accessi-

ble microarchitecture research platforms [28–32]. For example,

it takes less than 200 minutes and approximately 300 lines of

modiﬁed Chisel code to implement an instruction scheduler pol-

icy PUBS [33] on the XiangShan, a high-performance RISC-V

CPU achieving >15 SPECint2006/GHz [26, 34].

These trends motivate us to adapt proven exploration tech-

niques from simulators directly to RTL, aiming to inherit the

agile workﬂow of simulator-based exploration while enabling

a seamless integration into RTL for last-mile evaluation.

To realize this opportunity, we propose TraceRTL, an

RTL-based performance evaluation methodology that derives

a trace-driven RTL model from an existing execution-driven

RTL implementation. As illustrated in Fig. 1(a), TraceRTL

reuses open-source, silicon-validated RTL designs as a solid

foundation for faithful microarchitecture exploration. It drives

performance-critical modules with pre-generated traces, en-

abling simulator-like agility for early-stage exploration on RTL.

By deriving the trace-driven model directly from RTL, it

inherently avoids costly last-mile calibration and preserves

performance ﬁdelity for evaluation of the proposed feature.

One key motivation behind TraceRTL is to overcome the

execution-driven nature of current RTL designs. As illustrated

by the white b oxes in Fig. 1(b), the modiﬁed CPU must ﬁrst

pass full functional veriﬁcation before any performance evalu-

ation can be conducted. This tight coupling forces every RTL

design modiﬁcation to undergo complete implementation, veri-

ﬁcation, and lengthy simulation, even when the modiﬁcation

is unrelated to performance. For example, evaluating opti-

mizations for virtualized, two-stage address translation requires

implementing complex privileged operations to guarantee cor-

rect functionality, whose functional details, however, do not

aﬀect performance.

With TraceRTL, this strict dependency b etween functional

correctness and performance evaluation is eliminated. As high-

lighted in Fig. 1(b), TraceRTL enables agile performance

evaluation without ﬁrst implementing or verifying unrelated

functional details. Additionally, it accepts trace inputs from

a broader range of real-world applications, including those

with unavailable source code, diﬀerent ISAs, or p eripheral de-

pendencies [10, 35–37], without requiring porting to an RTL

simulation. To realize these capabilities, however, we need to

address three key challenges.

1) Feature prototyping: Can we develop a trace-driven

RTL CPU with minimal modiﬁcations to the existing

execution-driven microarchitecture while preserving perfor-

mance accuracy? Our key insight is that hardware module

interfaces can be categorized into functional interfaces, which

determine what each instruction computes and where exe-

cution proceeds next, and performance-sensitive interfaces,

which determine how eﬃciently the instruction stream is re-

alized. Based on this distinction, TraceRTL selectively takes

over key interfaces to decouple the functional model while

preserving performance behaviors using externally supplied

traces. This preserves cycle-accurate performance accuracy

while eliminating the complexity of managing full functional

correctness.

2) Performance accuracy: Can we mitigate the perfor-

mance discrepancies introduced by trace-driven simulation?

Conventional trace-driven simulation often suﬀers from ﬁdelity

loss due to the lack of necessary information to replicate

execution-driven behaviors. We observe that these information

gaps stem from two primary sources: intentional abstraction

and dynamic omission. We quantitatively analyze the perfor-

mance impact of these missing comp onents, revealing their es-

sential role for ﬁdelity. TraceRTL proposes a dynamic informa-

tion reconstruction mechanism that synthetically reconstructs

missing data, achieving high performance accuracy.

3) Broader workloads: Can we bridge the semantic gap

across diverse trace formats and ISAs? Industrial workloads

are valuable for microarchitecture exploration, but the scarcity

of RISC-V workloads necessitates cross-ISA transformation to

generate benchmark traces. This transformation is performed

only once during trace preparation. However, diﬀerences in

trace formats and ISAs hinder the direct execution of publicly

available traces on RTL CPU models. Since trace-driven simu-

lation relaxes the need for full functional correctness, TraceRTL

introduces TraceBridge, a trace transformation technique that

leverages instruction and register mapping to enable the replay

of traces from diﬀerent formats and ISAs.

To demonstrate the feasibility of TraceRTL, we develop

a trace-driven RTL model derived from XiangShan [26, 38].

It achieves performance accuracy of 99.87% and 99.86% on

SPECint2017 and SPECfp2017, respectively, reducing perfor-

mance discrepancies by 10.31× and 29.21× compared to a

calibrated XS-gem5 model. By leveraging TraceBridge, we

evaluate x86-based Google workload traces [36] on Xiang-

Shan, and reveal distinct memory-bound behavior compared

to SPECint2017.

TraceRTL expands the possibilities for microarchitecture

research by supporting both RTL-based exploration and seam-

less integration with simulator-based workﬂows. By preserving

a simulator-like, trace-driven environment for workloads and

simulation, it eﬀectively bridges early-stage exploration on

simulators and last-mile RTL evaluation.

To summarize, this paper makes the following contribu-

tions.

• We propose TraceRTL, bringing trace-driven simulation to

RTL CPUs for agile microarchitecture exploration.

TraceRTL: Agile Performance Evaluation

• We quantify the sources of performance discrepancies and

implement dynamic information reconstruction to achieve

high performance accuracy.

• We propose TraceBridge, which enhances trace compatibil-

ity to expand the sources of benchmark workloads.

• We demonstrate TraceRTL by using x86 workload traces

collected from Google warehouse-scale computers for per-

formance evaluation of XiangShan, a RISC-V CPU.

2. Background

2.1. Out-of-Order Microarchitecture

Modern CPUs improve performance primarily by exploiting

parallelism and speculation. The front-end speculatively fetches

instructions using branch prediction, while the back-end de-

codes, schedules, and issues them to execution units for

computation and to memory subsystem for data access.

The eﬃciency of this pipeline depends on several critical

microarchitectural components. Branch prediction and instruc-

tion fetching determine the instruction supply rate. Execution

pipelines and scheduling queues aﬀect throughput. The mem-

ory hierarchy bridges the large speed gap between CPU and

DRAM by caching frequently used data. The memory manage-

ment unit (MMU) accelerates address translation by caching

recently used address mappings near the CPU.

2.2. Exploration on RTL

While RTL models oﬀer higher accuracy for design space explo-

ration, directly evaluating performance on RTL presents several

challenges, including the inﬂexibility of traditional hardware

description languages, slow simulation speeds, and the lack of

open-source RTL processors. Recent eﬀorts have focused on

these issues.

Flexibility. Many emerging high-level hardware descrip-

tion languages [12, 13, 39] oﬀer enhanced expressiveness and

parameterization that accelerate the development of micro-

architectures . New hardware design methodologies [11] are also

proposed to further improve design modularity.

Simulation Speed. Novel RTL simulation techniques

have been proposed to accelerate software-based [14–16] or

hardware-based [19, 21, 22] simulation of RTL designs.

Additionally, sampling-based methods [40–42] estimate full-

program performance by aggregating results from several rep-

resentative program segments.

Open-Source RTL Processors. With the rapid growth of

the RISC-V open-source community, a number of RTL pro-

cessors have emerged, including in-order designs [43, 44] and

out-of-order designs such as BOOM [23–25], XuanTie-910 [27],

and XiangShan [26]. These designs provide accessible and re-

alistic platforms for microarchitecture research, enabling agile

exploration directly on RTL.

2.3. Simulation Methodologies

In computer architecture research, performance evaluation

of novel designs predominantly relies on two core method-

ologies: execution-driven and trace-driven simulations. These

approaches fundamentally diﬀer in how they provide program

stimuli to performance models, leading to distinct trade-oﬀs

between ﬁdelity, ﬂexibility , and simulation speed.

The execution-driven methodology emulates the behavior of

real CPUs within the performance model, such as fetching, de-

coding, scheduling and executing instructions. This approach

is inherent to RTL models [23–26, 43] and is also implemented

in many software simulators [1–4]. By coupling functional ex-

ecution with performance mo deling, this approach captures

microarchitecture-dependent dynamic behaviors, such as spec-

ulative execution and wrong-path eﬀects, thereby oﬀering high

ﬁdelity. However, this accuracy comes at the cost of signiﬁcant

complexity, increased error-proneness, and reduced simulation

speed.

In contrast, the trace-driven methodology decouples the

functional model from the performance model by replaying the

pre-generated traces of instructions including architectural in-

formation such as instruction semantics, instruction addresses,

memory accesses, and branch outcomes [5, 6, 8, 10, 45]. These

traces are often generated using instrumentation tools like

Pin [46], DynamoRIO [47], and Valgrind [48], or obtained

from public pre-generated traces [10, 35, 36]. This decoupling

aﬀords higher ﬂexibility, enabling researchers to focus on mi-

croarchitectural optimization. However, this ﬂexibility often

comes at the cost of reduced ﬁdelity, as traces lack dynamic

microarchitecture-dependent information.

3. Challenge

Agile performance evaluation requires rapid feature prototyp-

ing, support for extensive workloads, and fast simulation. To

meet these goals at the RTL level, trace-driven simulation oﬀers

a promising approach by decoupling performance and func-

tional models and supporting trace-based workloads. However,

integrating trace-driven simulation into existing execution-

driven CPU RTL mo dels introduces non-trivial challenges.

Publicly available traces often vary in trace format, lack in-

formation such as instruction encodings, and are sometimes

generated from diﬀerent instruction sets.

3.1. Trace-driven RTL Integration

Transforming a complex execution-driven CPU RTL model

into a trace-driven implementation presents unique challenges

compared to building an RTL model from scratch or driving

individual RTL modules indep endently. In addition to supply-

ing stimuli to existing RTL modules, a trace-driven model must

precisely control the instruction ﬂow based on external traces

while maintaining the original performance behavior.

3.2. Trace-driven Performance Discrepancies

Trace-driven simulation inherently suﬀers from ﬁdelity loss due

to the lack of necessary information. This gap stems from two

primary sources: intentional abstraction and dynamic infor-

mation omission. First, to balance conﬁdentiality and storage

overhead, conventional traces often omit critical details such as

operand values and instruction opcodes. Second, static traces

fail to capture dynamic execution states, such as wrong-path

instructions and page table walks, which only emerge during

runtime. The absence of these microarchitectural side eﬀects

prevents the accurate replication of execution-driven behaviors,

potentially leading to signiﬁcant performance discrepancies.

3.3. Trace Compatibility

Trace-driven approaches can bypass the limitations of simu-

lated peripheral environments, thereby enhancing the cover-

age of supported workloads. However, due to conﬁdentiality

constraints, instruction source code is often unavailable for

publicly accessible trace ﬁles [10, 35, 36]. Another scenario

Z. Zhang et al.

involves target applications that require evaluation but have

not been adapted to the target instruction set, rendering direct

assessment infeasible.

Instruction sets share commonalities but also exhibit sig-

niﬁcant diﬀerences, which hinder direct trace porting. For

example, diﬀerences in general-purpose register conventions,

instruction encodings and sizes, PC alignment rules, and the

range of direct branch instructions all impose constraints on

cross-instruction-set trace evaluation. These challenges are par-

ticularly pronounced for RTL models, which typically lack

suﬃcient abstraction capabilities.

4. TraceRTL Design

To enable agile performance evaluation of RTL designs, we

ﬁrst propose a trace-driven simulation methodology at the RTL

level (§ 4.1) while preserving high performance accuracy (§ 4.2).

Building on this, we introduce TraceBridge, a trace transforma-

tion method that enhances compatibility by enabling the replay

of traces from diﬀerent formats and instruction sets (§ 4.3).

4.1. Trace-Driven Microarchitecture Design

We decomp ose the CPU into core components and describe how

each component is driven by the trace. Interfaces, deﬁned asthe

set of I/O signals between modules, can be driven to control

the module’s behavior. By driving the key interfaces with the

information in traces, TraceRTL replaces the functional model

with external traces while maintaining the original performance

behavior. This section describes the design of trace-driven in-

tegration to meet its ob jectives: (1) driving RTL modules with

external instruction traces, (2) enforcing the CPU model to

conform to the trace instruction ﬂow, and (3) identifying and

mitigating performance discrepancies inherent in trace-driven

simulation.

4.1.1. Trace-Driven RTL Modules

Our key insight is that hardware module interfaces can b e

classiﬁed into functional interfaces, which determine what

each instruction computes and where execution proceeds

next (e.g., arithmetic, branching, or exception handling),

and performance-sensitive interfaces, which determine how

eﬃciently the instruction stream is realized (e.g., branch

prediction, cache access, and memory prefetching). Based

on this distinction, we analyze key module behaviors and

drive performance-sensitive modules using external instruction

traces, preserving original performance characteristics without

requiring full functional execution.

Branch predictor. The branch predictor’s p erformance-

critical interfaces primarily include two types: training and

prediction. The predictor is trained on the committed branch

outcomes. Therefore, instructions on the mis-speculated path,

which are ﬂushed from the pipeline, leave no side eﬀects. By

substituting the branch outcomes with trace information, which

includes branch direction and target, we are able to stimulate

the training process. For prediction, the predictor takes the

current program counter (PC) and branch history to gener-

ate the next instruction fetch request. While prediction is at

speculative stage, the PC and history for correct-path instruc-

tions are consistent between execution-driven and trace-driven

simulations.

Instruction fetch. The instruction fetch unit obtains fetch

requests from the branch predictor and retrieves instructions

from the traces. We propose an interval match mechanism to

Branch

Predictor

Instruction

Fetch

Ibp: [0x100, 0x110)

trace instrs:

0x100: add

0x104: add

0x108: jump

0x200: sub

0x204: ...

0x100: add

0x104: add

0x108: jump

pc=0x100

Itrace:[0x100,0x10c)

Backend

TraceReader

Iif: [0x100, 0x10c)

Figure 2. Trace-driven instruction fetch with interval match mechanism.

simulate the fetch bandwidth, as shown in Fig. 2. A fetch re-

quest typically speciﬁes a contiguous instruction interval I

deﬁned by starting and ending addresses. The fetch unit for-

wards the request to TraceReader that extracts a continuous

sequence of trace instructions I

trace

. Instructions in I

, which

are common to both I

and I

trace

, are then sent to subsequent

pipeline stages for execution. When the starting address does

not match the b eginning of the trace, I

is empty, preventing

any instructions from being fetched. Consequently, the impact

of instructions on the mis-predicted path cannot be modeled.

§ 4.2.1 presents a reﬁned design to address this limitation.

Out-of-order backend. The backend relies on instruction

encodings to stimulate decoding, register renaming, dynamic

scheduling and execution. These encodings are directly supplied

from the trace. Alternatively, a more aggressive approach is to

provide the results of the decoding directly to drive renaming

and scheduling, although this is beyond the scope of this work.

Particular units like the FDivSqrt operation may need optional

data for accurate execution latency.

Cache hierarchy. Cache behavior is mainly inﬂuenced by

access addresses. Instruction addresses are derived from fetch

requests generated by the branch predictor. Data addresses, on

the other hand, are dynamically calculated from the op erands,

which are invalid in trace-driven simulation. Therefore, memory

access addresses should be included in traces to model mem-

ory behavior. Special mo dules like the indirect memory access

prefetcher need extra information.

Memory management unit. The virtual-to-physical ad-

dress translation and page-table walk require in-memory page

table entries (PTEs) that are typically absent in traces [49]. We

employ a dynamic page table generation approach: For each in-

struction in the trace, we traverse the page tables using its

virtual address. If a required PTE is invalid, a new page is

allocated, and the corresponding PTE is initialized. This pro-

cess continues recursively until reaching the leaf page, which is

initialized with the physical address in traces.

4.1.2. Trace-Control led Instruction Flow

TraceRTL controls the instruction ﬂow by managing branch in-

structions, interrupts, and exceptions, while ensuring processor

compliance by instruction stream correctness checks.

Branch instruction. We replace the branch execution

unit’s outcomes with target and conditional result recorded in

the trace to control the programs’ instruction ﬂow.

Exception and interrupt. Traps, including exceptions like

page faults and interrupts like timer interrupts, may be trig-

gered by programs, devices, and operating system. Traps aﬀect

control ﬂow and pipeline redirection, as illustrated in Fig. 3(a).

These are intercepted and re-injected according to the trace.

Speciﬁcally, trace-recorded exceptions are triggered as illegal

instructions, redirecting to the target in trace, as illustrated

in Fig. 3(b). This design ensures that exceptions are preserved

without relying on full functional execution.

TraceRTL: Agile Performance Evaluation

Table 1. Key CPU mo dule behaviors and their corresponding trace-driven stimuli in TraceRTL.

Module Key Behavior Trace-Driven Stimulus

Branch Predictor

Prediction uses PC and history; Training uses

committed branch outcomes.

Use current PC and history for prediction; Use

branch outcomes from trace for training.

Instruction Fetch

Fetch request deﬁnes instruction interval;

Wrong-path instructions.

Apply interval match mechanism to simulate fetch

bandwidth; Generate wrong paths on mismatch.

Instruction Execution Decode, rename, schedule, and execution.

Use instruction enco ding and optional operand

from trace.

Instruction Flow

Branch instruction outcome; Exception and in-

terrupt redirect the pip eline.

Intercept branch outcomes/exception generation;

Support redirect; Flow check.

Cache Hierarchy Access cache by addresses.

Memory address from trace; Instruction address

from branch predictor; Optional data from trace.

MMU

Virtual-to-physical address translation; Access

memory for page table entries.

Construct page table according to the address

from trace.

Branch

Predictor

Instruction

Fetch

Decode MemoryUnit

Reorder

Buffer

Peripherals

Interrupt

Redirect to Exception Handler

Native Exceptions

(a) Native exceptions and interrupt triggered by the CPU

pipeline and peripherals.

Branch

Predictor

Instruction

Fetch

Decode MemoryUnit

Reorder

Buffer

Exceptions in Trace

Redirect to Target in Trace

(b) Intercept the native exceptions and trigger trace

exceptions as illegal instruction.

Figure 3. Exception and interrupt management in TraceRTL.

Instruction stream check. A fundamental requirement of

trace-driven simulation is that the performance model must be

guided by the trace, a key aspect of which is to ensure its exe-

cution adheres to the provided instruction stream. We capture

the processor’s actual instruction stream through committed

instructions and compare it against the trace. The diﬀerences

in the streams indicate implementation ﬂaws in the RTL model

itself or trace-driven framework.

4.1.3. Overall

In summary, TraceRTL provides a general and adaptable

framework for trace-driven RTL performance evaluation. It is

designed to evolve naturally with RTL designs, require minimal

eﬀort across microarchitectural iterations, remain applicable

across diverse microarchitectures, and ﬂexibly support various

performance optimizations.

Extending TraceRTL to new architectures. We sum-

marize the trace-driven transformation methodology in Table 1.

TraceRTL employs an interface-based modiﬁcation strategy

that reduces mo diﬁcation overhead while accommodating vari-

ations in module design. The processor module partitioning

methodology is universal across diﬀerent microarchitectures,

making TraceRTL a reusable and microarchitecture-agnostic

framework for RTL performance evaluation. The speciﬁc modi-

ﬁcations may vary depending on processor-speciﬁc designs. For

instruction fetch, for instance, in-order processors commonly

fetch one or two instructions per cycle, which does not require

the interval match described in § 4.1.1. In contrast, some high-

performance processors may fetch instructions spanning two

intervals per cycle, thus necessitating two interval-match op-

erations. For CPU-driven accelerators, such as matrix units,

the necessary execution information can also be recorded into

trace instructions and dispatched accordingly.

Applicability for microarchitecture features. TraceRTL

is particularly advantageous for evaluating functionally com-

plex yet performance-critical features (§ 7.2). Beyond function-

ality, it captures ﬁne-grained timing eﬀects that are diﬃcult

to model accurately at higher abstraction levels. For exam-

ple, variations in microarchitectural timing may critically aﬀect

the overall performance (§7.4). It can also evaluate microar-

chitectural optimizations in the same way as conventional

trace-driven simulators (e.g., branch prediction, prefetching, re-

placement, memory dependence prediction). With additional

trace information, TraceRTL can be extended to model ad-

vanced optimizations, such as value prediction (with execu-

tion results) and indirect memory prefetching (with memory

values).

4.2. Trace-driven Performance Discrepancy

Mitigation

To achieve high accuracy, trace-driven simulation should strive

to mimic the behaviors of execution-driven simulation. This

section details our methodology for bridging this gap by en-

hancing trace-driven simulation of the frontend fetch unit

through wrong-path simulation, reﬁning execution latency

via operand and opcode provisioning, and maintaining MMU

ﬁdelity through dynamic page table construction.

4.2.1. Fetch: Wrong-Path Simulation

Out-of-order processors may execute instructions that are later

discarded due to events like branch mispredictions. These in-

structions, although executed, are ﬂushed by pipeline redirect

operations, preventing them from aﬀecting the architectural

state of the CPU, such as the register ﬁle or memory.

Wrong-path instructions’ performance impact, particu-

larly on the cache hierarchy, cannot be ignored. The impact

on the cache can be categorized into prefetching and p ollu-

tion, leading to positive and negative eﬀects. Fig. 4 presents a

code example divided into three sections: (1) Code1, executed

unconditionally before the branch; (2) Code2, located within

one branch; and (3) Code3/Code4, placed outside the branch’s

inﬂuence, further categorized into the proximate Code3 and

Z. Zhang et al.

the distant Co de4. Up on a mispredicted branch, Code2 is exe-

cuted, and if its execution is swift, Code3 may follow. Once the

branch is resolved, speculatively fetched instructions of Code2

and Code3 are discarded, with Co de2 potentially polluting the

cache and Code3 prefetching the cache.

*b = 1; /* Code1 */

a = *b;

if (a == 0)

 a = *c;/* Code2 */

a = *d; /* Code3 */

a = *e; /* Code4 */

Figure 4. Code example demonstrating wrong-path instruction

generation.

We statistically analyze the number and addresses of mem-

ory instructions on both correct and wrong paths in the

out-of-order processor XiangShan, focusing on instructions sent

to the load pipeline. These addresses are aligned to cache-

line size. We categorize the address space into three types: (1)

exclusive-arch-path: only accessed by correct path instructions,

(2) exclusive-wrong-path: only accessed by wrong-path instruc-

tions and (3) overlapped: accessed by both paths. As shown

in Fig. 5, we found that most of the address space falls into

type(1) and (3). Therefore, we can tentatively draw a rough

conclusion that prefetching has the predominant inﬂuence.

Figure 5. Percentage of memory interval weighted by load access times

for the SPEC CPU2017. Each bar represents a sub-benchmark, sorted

according to “ExclusiveArchPath”.

Based on the observation, we focus on simulating the

prefetching inﬂuence by taking the instructions at correct path

as wrong-path instructions. The process involves the follow-

ing steps: (1) When a branch misprediction occurs, we check

whether the fetch request’s starting address exists in traces

within a ﬁxed instruction window; (2) If it exists, the instruc-

tions in the trace are sent to subsequent pipeline stages as

wrong-path instructions. The instruction fetch unit is blocked

for simpliﬁcation. These instructions are not discarded from

the traces; (3) Once the branch instruction is resolved and the

pipeline is redirected, the correct fetch request is issued.

4.2.2. Execution: Instruction Opcode Provisioning

Conventional trace-driven simulators often operate with par-

tial instruction encodings. Instruction encoding has two types

of information: instruction opco de for functionality like ADD

and SUB, register indices for instruction dependency and out-

of-order scheduling. Explicit instruction opcodes are frequently

Algorithm 1 Dynamic Page Table Construction

1: procedure Instruction Walk(instList)

2: for inst in instList do

3: if inst’s PC valid then

4: PageWalk(inst.VirtualPC, inst.PhysicalPC)

5: end if

6: if inst’s memory address valid then

7: PageWalk(inst.VirtualAddr, inst.PhysicalAddr)

8: end if

9: end for

10: end procedure

11: procedure Page Walk(va, pa)

12: pageBase = PageTableRootAddr

13: for level := 0 to MaxLevel do

14: pteAddr = getPteAddr(va, level, pageBase)

15: pte = readPageTable(pteAddr)

16: if pte not valid then

17: if level == MaxLevel-1 then

18: newPte = genPte(pa) ▷ Leaf page arrived

19: else

20: newPte = genPte(AllocatePage())

21: end if

22: writePageTable(pteAddr, newPte)

23: end if

24: pageBase = pte.ppn << 12

25: end for

26: end procedure

abstracted or omitted for conﬁdentiality concerns and software

simulators’ highly abstracted microarchitecture designs. Con-

sequently, instead of providing detailed opcodes, trace instruc-

tions are categorized into coarse-grained functional groups: (1)

control ﬂow (unconditional direct, conditional direct, and in-

direct jumps); (2) memory access (loads and stores); and (3)

computation (integer and ﬂoating-point).

Our work fo cuses on quantifying the performance model-

ing deviations induced by this loss of ﬁne-grained opcodes.

Speciﬁcally, we investigate how substituting precise opcodes

with coarse-grained categories impacts simulation ﬁdelity. This

analysis aims to isolate the impact of op eration abstraction

from other simulation variables, providing a quantitative un-

derstanding of the accuracy trade-oﬀs in abstracted trace

modeling.

4.2.3. Execution: Operand Provisioning

Some operations are implemented in a blocking manner and

their execution cycles are variable depending on the operands,

like division, ﬂoating-point division and square-root. This type

of performance error is always neglected and simulators often

implement them with ﬁxed latency.

Although these instructions are relatively few, their long

execution cycles and low degree of concurrency amplify their

performance impact. To achieve more accurate simulation for

these types of instructions, we record their operands in traces.

4.2.4. MMU: Dynamic Page Table Construction

User-space programs use virtual addresses, which must be

translated to physical addresses by the memory management

unit (MMU) before accessing the cache or main memory. In

the MMU, the virtual address ﬁrst consults the L1 translation

lookaside buﬀer (TLB). If L1 TLB hits, the physical address

TraceRTL: Agile Performance Evaluation

is obtained directly. In case of L1 TLB miss, the virtual ad-

dress will be sent to a larger L2 TLB or hardware page table

walker to traverse the memory-resident page tables to ﬁnd the

physical address corresponding to the virtual address, which

involves multiple memory accesses, especially in hypervisor en-

vironments. Page table caches are used to speed up page table

walks. In summary, the hit rates of TLB and page table cache,

as well as page table walker’s memory latency, are crucial for

MMU-sensitive programs.

To simulate the behavior of MMU and minimize the modi-

ﬁcations on RTL modules, we need to provide a self-consistent

page table for the MMU. However, traces typically contain only

the physical and virtual addresses, but not the page table [49].

Therefore, we employ a dynamic page table generation method,

as illustrated in Algorithm 1. By iterating over each instruction

in the traces and traversing the page tables based on the vir-

tual address, we allocate new page frames and initialize the

invalid corresponding page table entries, until reaching the leaf

page. The leaf entry is then initialized with the corresponding

physical address. After dynamically generating the page table,

when a TLB miss o ccurs, the memory-resident page tables are

traversed.

4.3. Trace Compatibility with TraceBridge

We introduce a trace transformation methodology, Trace-

Bridge, to bridge the incompatibilities in trace formats and

instruction sets. To support trace-driven simulation, the trace

must contain at least three categories of information: (1) pri-

mary instruction type, including branch types, computation,

and memory operations; (2) execution guidance, including PC,

branch target and conditional result, and memory address;

(3) register dependencies to model instruction-level parallelism.

Such information is typically included in the trace format

of dynamic instrumentation tools [49] and publicly available

traces [10, 35, 36], where ﬁne-grained semantic information such

as instruction opcodes are sometimes missing.

TraceBridge retains the key information from the trace,

transforming its format to be compatible with the target model

by reﬁning the execution semantics. However, trace-driven RTL

models pose additional low-level challenges due to their rich

details: (1) instruction correspondence and register seman-

tics; (2) diﬀerence in instruction encoding size and program

counter (PC) alignment constraints; (3) variations in branch

oﬀset ranges.

The primary principle of TraceBridge is to maintain per-

formance semantics consistency. This ensures that the p erfor-

mance characteristics of the original program are reﬂected in

the target architecture. For conﬁdentiality, public traces omit

instruction encodings [10] or provide instruction categories [36].

To address this, we observe that an instruction can encom-

pass multiple performance semantics, which fall into four types:

⟨Load, Computation, Store, Branch⟩. To maintain p erformance

semantics consistency, we map each individual performance se-

mantic to its corresponding instruction(s) in the target ISA.

A single x86 instruction, which may encompass multiple micro-

operations, is translated into an equivalent sequence of RISC-V

instructions. For instance, the x86 RET instruction is mapped

to two RISC-V instructions (LOAD and JR), and x86 memory

accesses exceeding the width of a single RISC-V instruction are

decomposed into multiple instructions to preserve the access

range. The necessary mapping results in instruction inﬂation,

which is analyzed in § 7.1. In the case of missing opcodes,

compute instructions are mapped to representative types such

as [F]ADD, [F]MUL, and CONVERT due to limited informa-

tion in the traces. For ISAs with ﬂag mechanism, such as x86,

spare registers can be employed to establish inter-instruction

dependencies. Furthermore, special handling for architecturally

signiﬁcant registers, like the return address register, guarantees

the correct correspondence between x86 call/return operations

and their RISC-V counterparts.

PC INSTR

0x100 math

0x101 math

0x105 ret 0x201

0x201 math

0x203 math

0x210 j 0x150

0x150 math fp

0x154 math fp

old PC new PC

0x100 -- 0x100

0x101 -- 0x104

0x105 -- 0x108

-- 0x10C

0x150 -- 0x150

0x154 -- 0x154

0x201 -- 0x200

0x203 -- 0x204

0x210 -- 0x208

PC INSTR

0x100 add

0x104 add

0x108 load

0x10C jr 0x200

0x200 add

0x204 add

0x208 j 0x150

0x150 fadd

0x154 fadd

origin

traces

transform

& PC map

RISC-V

traces

Figure 6. Example of trace transformation, consisting of PC conversion

and instruction encoding mapping

To resolve diﬀerences in instruction size, PC alignment, and

instruction inﬂation, we reorganize PCs in the traces to conform

to RISC-V requirements. As illustrated in Fig. 6, we collect all

instruction PCs and sequentially reassign new addresses based

on RISC-V encoding size. When a PC gap is detected (e.g.,

from 0x105 to 0x150), the current PC is updated accordingly. A

mapping from original PCs to RISC-V PCs is then constructed,

and branch target addresses are updated using this mapping.

While the x86 ISA supp orts larger oﬀset ranges for di-

rect branch instructions than RISC-V, we observe that branch

target computation mainly occurs in two modules: the pre-

decoding unit at the fetch stage and the branch execution unit.

By overriding the computation result with the target recorded

in the traces, we eﬀectively support larger branch oﬀset ranges

in the trace-driven RISC-V model.

Overall. TraceBridge provides a methodology to evaluate

the microarchitectural behavior of mature, real-world software

ecosystems (e.g., Google workloads) on an emerging hardware

ecosystem (e.g., RISC-V). Admittedly, TraceBridge is unable

to eliminate all performance discrepancies caused by inher-

ent cross-ISA diﬀerences and missing execution information in

traces, such as instruction semantics and application binary in-

terfaces(ABIs). Furthermore, while the high-level metho dology

is consistent, the speciﬁc rules should adapt for source and tar-

get ISAs. For example, x86 and RISC-V diﬀer in the number of

general-purpose registers. Consequently, when translating to

x86, some registers may map to memory (i.e., register spilling).

It is also constrained by information missing from the trace,

forcing a simpliﬁed instruction remapping, which inevitably

introduces performance errors. However, according to our eval-

uation of missing RISC-V opcodes (§ 7.1), the accuracy is above

99% (0.95% error for SPECint2017) for early-stage performance

exploration.

5. Put It All Together

TraceRTL improves the performance evaluation workﬂow by

optimizing stages such as workload preparation, prototyp-

ing, and performance simulation. As illustrated in Fig. 7,

a typical iterative workﬂow based on TraceRTL is employed

to p erform agile performance evaluation. The workﬂow in-

volves the following steps:

○ Trace Preparation: Program

traces for the benchmarks or target applications are prepared

Z. Zhang et al.

Trace

Preparation

Enhanced

Prototyping

Trace-Driven

Simulation

Functional

Correctness

Execution-

Driven

Simulation

① ②

③ ④

⑤

⑥

Performance

Analysis

Figure 7. Agile p erformance evaluation workﬂow with TraceRTL. The

workﬂow comprises two lo ops: a trace-driven loop

○→

○ and

an execution-driven lo op

○→

○.

for subsequent performance evaluation. Each trace represents

a program segment. Traces can be generated using a va-

riety of tools, including dynamic instrumentation tools like

Pin [46] and DynamoRIO [47], instruction-level simulators like

QEMU [50] , and publicly available traces such as Google

workload traces [36] and Qualcomm workload traces [35].

TraceRTL can be combined with additional techniques such as

SimPoint [40] to further shorten simulation time, while also

avoiding the overhead and complexity of booting.

○ Proto-

typing: New microarchitectural features can be prototyped

on a RTL model without full implementation, as shown in

§ 7.2.

○ Trace-Driven Simulation: The trace-formatted pro-

gram segments are replayed in trace-driven simulation, yielding

performance results of the CPU model.

○ Performance

Analysis: The performance results and program b ehaviors are

analyzed to identify performance bottlenecks. These insights in-

form subsequent iterations and guide prototype reﬁnement.

○

Functional Correctness: When the design meets expected

performance targets, the eﬀorts invested in prototype devel-

opment can be seamlessly carried over. TraceRTL supports

compile-time mode switching between execution-driven and

trace-driven simulation, enabling smo oth transition to func-

tional validation.

○ Execution-Driven Simulation: Fur-

ther performance analysis and iteration are conducted through

execution-driven.

TraceRTL facilitates an agile and accurate RTL-level mi-

croarchitecture design exploration process. Rather than re-

placing existing architectural simulators, TraceRTL serves as

a complementary and reinforcing component that enhances

RTL performance exploration and bridges the gap between

high-level models and real RTL behavior. It targets a distinct

sweet spot in the accuracy-productivity trade-oﬀ, preserv-

ing the ground-truth RTL model and accepting manageable

maintenance overhead to achieve substantially higher accuracy,

with comparable or potentially lower (Palladium/FPGA) sim-

ulation cost. By enabling direct performance evaluation on

real RTL implementations, TraceRTL empowers architects to

broaden application coverage and identify microarchitectural

bottlenecks that high-level simulators may overlook.

6. Evaluation

We conduct evaluations to address two key questions:

1. Can we mitigate trace-driven simulation’s performance

inaccuracies (§ 6.2)?

2. Does TraceRTL achieve high performance accuracy (§6.3)?

To address these questions, we compare the performance of

the original RTL model, TraceRTL, and the state-of-the-art

simulator gem5 [4].

Table 2. Target system conﬁguration.

Component Description

Branch Predictor

uBTB, BTB, TAGE-SC,

ITTAGE, RAS

Fetch/Decode/Rename Width 8/6/6

RoB/LoadQueue/StoreQueue 160/72/64

Integer/Float Register File 224/192

ALU/FMA/FDivSqrt unit 4/4/2/

Load/Store unit 3/2

L1 ICache 64KB, 4-way, 256-set

L1 DCache 64KB, 8-way, 128-set

L2 Cache 1MB, 8-way, 512-set, 4-bank

L3 Cache 16MB, 16-way, 4096-set, 4-bank

L1 ITLB/DTLB 48-entry, fully-associative

L2 TLB 2048-entry, 8-way, 32-set

DRAM DRAMsim3, 8GB, DDR4-3200

6.1. Experimental Setup

Target System. We evaluate TraceRTL by altering an open-

source high-performance RISC-V processor, XiangShan [26, 38],

into a trace-driven model. TraceRTL introduces low implemen-

tation overhead while preserving RTL ﬁdelity. It reuses the

original RTL and drives existing modules by intercepting in-

puts and outputs. The modiﬁcations consist of three primary

components. First, the simulation environment, implemented

primarily in C++, manages trace ﬁle loading, instruction

stream validation, and page table generation. Second, the

TraceRTL module, written in Chisel, retrieves traces via the

DPI and supplies instructions to the pro cessor. Third, in-

terface connections and execution guidance are applied to

existing processor modules. The ﬁrst two components are

microarchitecture-agnostic, whereas the third requires tighter

coupling with speciﬁc microarchitectural details. Speciﬁcally,

the microarchitecture-speciﬁc modiﬁcations account for fewer

than 450 LOC (lines of code). Nevertheless, the modiﬁcation

methodology remains portable across diverse processor designs.

XiangShan, implemented in Chisel [13], is a tape-out

ready superscalar out-of-order processor. Its latest third-

generation, Kunminghu, achieves a clock frequency of 3GHz

and SPECint2006 score exceeding 15/GHz, demonstrating its

capability as a platform for exploring high performance mi-

croarchitecture designs. We use the default conﬁguration of Xi-

angShan, as shown in Table 2. We take the original XiangShan’s

performance as the ground truth.

gem5 is widely used for CPU microarchitecture exploration

and is often referenced as the ground truth in some simulator

works [8, 51, 52] for its rich details. We use the XS-gem5 [53] as

the baseline, which has been carefully calibrated to XiangShan

through over 1,200 git commits and more than 60,000 lines of

source code additions since July 2022, including XiangShan-

speciﬁc adjustments.

Simulation Speed. XS-gem5 achieves a simulation speed of

around 35kHz. As TraceRTL is directly derived from the orig-

inal RTL model, it inherently shares a comparablesimulation

speed and beneﬁts from hardware-accelerated emulation tools.

The simulation speeds are both around 6.5kHz using Verila-

tor [14] and around 1.4MHz on Cadence Palladium, which is

40× faster than XS-gem5.

Workloads. We use SPEC CPU2006 [54] and SPEC

CPU2017 [55] benchmark suites. We compare the benchmark

TraceRTL: Agile Performance Evaluation

scores between XiangShan, TraceRTL and XS-gem5. The com-

plete execution of SPEC CPU benchmarks takes a very long

time in software simulation. A set of representative program

segments are generated by sampling the SPEC CPU bench-

marks using SimPoint [40]. Each segment consists of 20M

instructions for warm-up and 20M instructions for performance

sampling. To limit simulation time, more than 30% weight

of the program segments are included for each application.

NEMU [56], an instruction-level simulator, is employed to

execute these segments and generate trace ﬁles to feed into

TraceRTL. Both XiangShan and XS-gem5 are functionally veri-

ﬁed against NEMU, guaranteeing they share the same execution

ﬂow.

6.2. Trace-Driven Performance Discrepancies

For the ﬁrst time, we can evaluate the performance impact of

the trace-driven simulation on an accurate high-performance

RTL processor and the eﬀectiveness of measures to mitigate

its performance errors. We quantify the p erformance errors

arising from wrong-path simulation, memory management unit

behaviors, operand and opcode absence.

Figure 8. Performance errors of TraceRTL w/ and w/o wrong-path sim-

ulation on SPEC CPU2006 and SPEC CPU2017.

6.2.1. Wrong-path Simulation.

We adopt the mechanism detailed in § 4.2.1 to mo del wrong-

path eﬀects. For comparison, we also consider the basic ap-

proach where the instruction fetch halts upon encountering

a mis-prediction, detailed in § 4.1.1. Fig. 8 illustrates SPEC

CPU2006’s and SPEC CPU2017’s performance diﬀerences with

and without simulating wrong-path instructions’ eﬀect, con-

taining the sub-benchmarks whose “w/o WPS” errors are more

than 1%, benchmarks’ overall performance errors and RMSE

(root mean squared error) metric. Although the overall per-

formance impact of neglecting wrong paths is relatively small

(-3.91% and -0.18% for SPECint2006 and SPECfp2006, -2.17%

and 0.14% for SPECint2017 and SPECfp2017), certain bench-

marks, such as 429.mcf and 450.soplex on SPEC CPU2006 and

505.mcf and 557.xz on SPEC CPU2017, exhibited substan-

tial performance degradation. Our results demonstrate that

simulating the impact of wrong-path instructions eﬀectively

mitigates these programs’ performance discrepancies, reduc-

ing the overall performance error to 0.14% for SPECint2006

and 0.13% for SPECint2017. The RMSE of SPECint2006 and

SPECint2017 falls from 9.56% and 4.87% to 1.38% and 2.38%.

6.2.2. Instruction Opcode Provisioning

Figure 9. Performance errors of TraceRTL on SPEC CPU2017 when

omitting computation instruction opcodes.

(a) Branch predictor MPKI.

(b) Data cache MPKI.

Figure 10. BPU and data cache MPKI comparison between TraceRTL

w/ and w/o computation instruction opcodes on SPEC CPU2017 bench-

marks. Each p oint represents one sub-benchmark.

Coarse-grained opcode abstraction is common in trace-driven

simulators without detailed execution unit modeling, or in

applications that directly provide traces without instruction

encoding. To quantify the performance deviations, we im-

plemented a controlled mapping scheme within the TraceRTL

framework. Speciﬁcally, the diverse array of complex compu-

tational opco des are collapsed into a simpliﬁed set of generic

operations: integer addition/multiplication (ADD/MUL) and

ﬂoating-point addition/multiplication (FADD/FMUL).

The results across the SPEC CPU2017 demonstrate that

the impact of opcode abstraction varies signiﬁcantly b etween

workload types. As shown in Fig. 9, SPECint2017 exhibits

high resilience to coarse-grained semantic mapping, maintain-

ing a negligible average error of 0.95%. In contrast, SPECfp2017

shows a much higher sensitivity, with the average error rising to

5.30% and peaking at 19.29% in 519.lbm. The results suggest

that while coarse-grained opcode traces are suﬃcient for evalu-

ating general-purpose integer architectures, they may introduce

unacceptable ﬁdelity loss for ﬂoating-point heavy workloads.

Despite the divergence, the coarse-grained abstraction ef-

fectively preserves the control-ﬂow and memory-access char-

acteristics of the workloads. As illustrated in Fig. 10, the

MPKI metrics of branch predictor and data cache remain

highly consistent between the abstracted traces and normal

TraceRTL. In summary, coarse-grained opcode abstraction has

limited impact on integer compute-intensive applications, fron-

tend modules (branch prediction and instruction fetch), and

memory-access related research. It is well-suited for studies

where the target workloads or modules have a weak correlation

with ﬂoating-point operations.

6.2.3. Uncertain-latency Operations

To model the execution latency of uncertain-latency opera-

tions, represented by ﬂoating-point division and square root

(FDivSqrt), we adopt the approach that supplies operands, de-

tailed in § 4.2.3. For comparison, we also evaluated a baseline

conﬁguration where the FDivSqrt is replaced with a ﬁxed-

latency dummy unit, with latencies varying based on the

operation type and data width. As shown in Fig. 11, which

Z. Zhang et al.

contains sub-benchmarks whose ”ﬁxed-latency” errors exceed

0.5%, the ﬁxed-latency model resulted in overall performance

errors of -1.65% and -1.22% on SPECfp2006 and SPECfp2017,

respectively, with signiﬁcant deviations for sub-benchmarks

such as gromacs and zeusmp in SPECfp2006, and 521.wrf,

527.cam4, and 544.nab in SPECfp2017. By providing operands,

we are able to improve the accuracy of performance for these

applications.

Figure 11. Performance error of simulating FDivSqrt with operand-

dependent vs. ﬁxed latency on SPECfp2006 and SPECfp2017.

6.2.4. Memory Management Unit

To evaluate the performance impact of the MMU, we employ

the dynamic page table (Dynamic PT) approach detailed in

§ 4.2.4. For comparison, we also simulate an ideal L1 TLB

which always hits and a page table walker with ﬁxed-latency of

15 cycles. As shown in Fig. 12, which contains sub-benchmarks

whose “Ideal L1TLB” errors are more than 3%, the ideal

MMU introduces average performance discrepancies of 6.19%

and 2.35% on SPEC CPU2017 int and fp, with 11 out of 23

benchmarks exp eriencing performance discrepancies exceeding

3%. When simulating a page table walker with ﬁxed memory la-

tency, 4 out of the 23 benchmarks have errors greater than 3%.

In contrast, when simulating the actual MMU behavior, the

overall performance overhead decreases to 0.13% and 0.14% for

SPEC 2017 int and fp, and only 1 out of 23 benchmarks exhibits

a performance error greater than 3%.

Figure 12. Performance error of simulating the MMU using diﬀerent

strategies on SPEC CPU2006 and SPEC CPU2017.

6.3. Overall Performance Accuracy

We evaluate the performance accuracy of TraceRTL and XS-

gem5 on SPEC CPU2006 and SPEC CPU2017, with original

XiangShan as the ground truth, as shown in Fig. 13.

Overall. TraceRTL achieves signiﬁcantly high accuracy in

overall performance. For RMSE metric, TraceRTL achieves

1.45% and 1.00% on SPECint2006 and SPECfp2006, compared

to 9.85% and 19.44% for XS-gem5. Similarly, the RMSE of

SPECint2017 and SPECfp2017 of TraceRTL are 2.38% and

0.67%, compared to 8.02% and 22.53% of XS-gem5.

Sub-benchmarks. TraceRTL exhibits high accuracy at both

the overall and sub-benchmark levels. For XS-gem5, on SPEC

CPU2006, 11 out of 29 sub-benchmarks have errors greater than

10%, and 14 out of 29 have errors greater than 3%. Similarly,

on SPEC CPU2017, 7 out of 23 sub-benchmarks have errors

greater than 10%, and 13 out of 23 have errors greater than

3%. These discrepancies can be attributed to the diversity of

program characteristics, which makes it challenging to perfectly

calibrate. In contrast, by inheriting rich details, TraceRTL

eﬀortlessly achieves high accuracy. TraceRTL achieves perfor-

mance accuracy such that only 1 out of 29 on SPEC CPU2006

and 1 out of 23 on SPEC CPU2017 has an error greater than

3%.

7. Case Studies

In this section, we present case studies to demonstrate how

TraceRTL facilitates agile performance evaluation:

1. Trace Compatibility: Using TraceBridge, we evaluate

x86-based Google workload traces on the RISC-V Xiang-

Shan CPU (§ 7.1).

2. Prototyping: We use TraceRTL to quickly evaluate the

performance impact of adopting a two-stage address trans-

lation MMU (§ 7.2) and a new ﬂoating-point unit(§ 7.3).

3. Performance Sensitivity Accuracy: We compare the

accuracy of performance impact between TraceRTL and

XS-gem5 at frontend, backend and memory (§ 7.4).

7.1. Trace Compatibility: Google Workload Traces

We evaluate datacenter workloads, the x86-based Google work-

load traces [36] from warehouse-scale computer workloads on

the RISC-V high performance processor XiangShan to show the

feasibility of TraceBridge described in § 4.3. Google workload

traces consist of multiple trace groups, each containing many

trace ﬁles. For each group, we select the longest trace and apply

the SimPoint [40] for sampling. Applying SimPoint directly to

the transformed traces can avoid errors caused by instruction

inﬂation.

While TraceBridge maintains semantic consistency, it intro-

duces the overhead of instruction inﬂation. We analyze this

inﬂation across both static and dynamic dimensions, consider-

ing instruction count and size, as shown in Fig. 14. The inﬂation

ratios for static and dynamic instruction counts remain stable

within a narrow range, from 1.09 for arizona to 1.19 for yankee.

The dynamic instruction size, an indicator of instruction cache

pressure, exhibits an inﬂation ratio ranging from 0.95 for ari-

zona to 1.20 for bravo.a, with 9 out of 12 applications staying

within a 10% inﬂation margin.

We provide a Top-down [57] breakdown analysis of per-

formance bottlenecks for both Google workload traces and

SPECint2017, sorted by IPC, as shown in Fig. 15. While only 3

out of 10 SPECint2017 sub-benchmarks exhibit memory-bound

TraceRTL: Agile Performance Evaluation

Figure 13. Performance error of TraceRTL and XS-gem5 on SPEC CPU2006 and SPEC CPU2017, using the execution-driven XiangShan as the baseline.

Figure 14. Instruction inﬂation rate of TraceBridge on Google workload

traces.

Figure 15. Top-down breakdown comparison between Google workload

traces, SPECint2017 , and llama2.c.

over 20%, 8 out of 12 Google workload traces demonstrate

this characteristic, with 6 reaching approximately 40%. These

results highlight memory access as the primary performance

bottleneck, underscoring the importance of memory optimiza-

tion for warehouse-scale computing systems. TraceBridge

introduces dynamic instruction size and count expansion, which

primarily aﬀects front-end and core-bound performance cate-

gories. However, since these two factors account for relatively

small proportions in Go ogle workload traces, TraceBridge has

limited impact through expansion eﬀects. Although coarse-

grained instruction encoding may potentially aﬀect ﬂoating-

point workloads, the analysis in § 6.2.2 shows that it preserves

accurate instruction streams and cache b ehavior. This indi-

cates that the impact on memory-bound and bad-speculation

categories is also minimal.

TraceRTL also streamlines porting workloads by leveraging

the well-developed QEMU. It takes less than 30 minutes to

compile llama2.c [58] and generate program traces by QEMU.

As shown in Fig. 15, these traces are simulated on TraceRTL,

and, unlike Go ogle workload traces and SPECint2017, exhibit

distinct core-bound behaviors.

7.2. Prototyping #1: Memory Management Unit

TraceRTL enables eﬃcient prototyping and performance evalu-

ation of complex microarchitectural modules. As a case study,

we examine two-stage address translation, a key mechanism

for supporting virtual machines through memory virtualization

deﬁned in the RISC-V Hypervisor extension [59].

Evaluating this module is non-trivial due to its reliance

on privileged operations, complex control and status registers

(CSRs), and software-managed page tables. Additionally, its

performance impact is signiﬁcant: address translation may trig-

ger multiple memory accesses to page table. For instance, the

RISC-V Sv39 scheme requires 3 memory accesses, while the

virtualized, two-stage Sv39-Sv39x4 scheme requires up to 15

memory accesses that increase the translation latency.

Figure 16. Performance decrease estimation when adopting two-stage

address translation on SPEC CPU2006.

TraceRTL allows performance evaluation of such designs be-

fore full functional implementation is complete. By (1) directly

Z. Zhang et al.

Figure 17. Performance error of TraceRTL on SPEC CPU2006 under

KVM virtualization.

providing the page table following the two-stage translation

scheme and (2) adding a standalone host page table walker

which performs guest-physical-address to host-physical-address

translation, we enable the MMU to perform the two-stage

Sv39-Sv39x4 scheme, thereby obtaining the performance re-

sults of two-stage address translation. Fig. 16 illustrates the

performance changes of the TraceRTL under normal address

translation and two-stage address translation modes on SPEC

CPU2006. The two-stage address translation results in a per-

formance degradation of 9.99% for SPECint2006 and 5.27% for

SPECfp2017. Among the 29 sub-benchmarks, 10 have a degra-

dation exceeding 5%. In summary, TraceRTL simpliﬁes the

requirements for functional correctness and software mo diﬁca-

tions, providing a robust development platform for exploration

around MMU.

To evaluate the accuracy, we compare TraceRTL-based

Hypervisor against fully-functional XiangShan Hypervisor on

SPEC CPU2006 under KVM virtualization. As shown in

Fig. 17, TraceRTL achieves high accuracy, with performance

errors below 1% for 19 of 23 sub-benchmarks. The overall error

is 0.32% for SPECint2006 and 0.23% for SPECfp2006.

7.3. Prototyping #2: FDivSqrt Unit

TraceRTL enables the implementation of dummy execution

units with conﬁgurable latency b ehavior without complex be-

havioral modeling. For instance, implementing a functional

FDivSqrt unit in RTL entails substantial eﬀort, as the imple-

mentation in XiangShan exceeds 2,400 LOC and requires exten-

sive veriﬁcation. To evaluate the pipelined design [60] without

incurring such overhead, alternative modeling approaches are

necessary. XS-gem5 adopts cycle-accurate modeling for execu-

tion units and requires more than 40 LOC of modiﬁcations.

In contrast, TraceRTL enables dummy implementation with

conﬁgurable latency in fewer than 10 LOC of modiﬁcations,

eliminating veriﬁcation overhead. Figure 18 shows the perfor-

mance impact of replacing two blocking FDivSqrt units with a

pipelined version across benchmarks where TraceRTL changes

exceed 0.5%. Notable discrepancies b etween XiangShan and

XS-gem5 appear in SPEC CPU2006 wrf and SPEC CPU2017

lbm. Given the diﬀerences in performance accuracy, TraceRTL

results are considered more reliable.

7.4. Performance Sensitivity Accuracy

In addition to the performance accuracy of the processor model,

the performance sensitivity to microarchitectural modiﬁcations

is also important. To evaluate the performance sensitivity

accuracy to microarchitectural modiﬁcations, we adjust key

conﬁgurations in the frontend, backend, and memory subsys-

tem. For the frontend, we compare the performance impact of

diﬀerent branch target buﬀer (BTB) sizes—speciﬁcally, from

Figure 18. Performance improvement when adopting pipelined FDivSqrt

on SPECfp2006 and SPECfp2017.

1024 to 2048 (default) entries. For the backend, we vary the

number of ﬂoating-point units FMA from 2 to 4 (default). For

the memory subsystem, we evaluate performance with the best-

oﬀset prefetcher in the L2 cache both disabled and enabled

(default).

As shown in Fig. 19, we compare the performance variations

of XiangShan, TraceRTL and XS-gem5 on SPEC CPU2017

benchmarks under microarchitectural mo diﬁcations mentioned

above. The performance trends observed on TraceRTL closely

match those of XiangShan better than those of XS-gem5.

For instance, when enlarging BTB size, sub-benchmarks such

as 500.perlbench, 502.gcc, and 511.povray exhibit similar

trends between TraceRTL and XiangShan. When increasing

the number of the FMA, sub-benchmarks like 507.cactuB-

SSN, 508.namd, and 519.lbm show consistent behavior. When

adopting the best-oﬀset prefetcher, sub-benchmarks including

500.perlbench and 507.cactuBSSN also demonstrate analogous

performance improvements.

We analyze the notable performance errors of XS-gem5

and ﬁnd that its prefetching subsystem is considerably more

complex and ﬁnely tuned, yet lacks clear calibration against

the RTL design. This mismatch diminishes the observable

performance gains from new prefetchers such as best-oﬀset.

The observation highlights the fundamental calibration chal-

lenge and motivates the design of TraceRTL: while a model

may overﬁt to the baseline conﬁguration to reproduce sim-

ilar overall performance, its performance trends for speciﬁc

microarchitectural features may diverge signiﬁcantly.

8. Related Work

Trace-Driven Model Transformation. Prior work has explored

employing trace-based methods to directly control RTL mod-

ules’ behavior for functional veriﬁcation, coverage analysis, and

performance validation [61, 62]. These works use traces to drive

separate RTL modules and the main challenge lies in the gener-

ation of traces. Some works collect the traces generated by CPU

RTL models for coverage analysis [63]. In contrast, TraceRTL,

centered on the whole CPU RTL model, addresses the chal-

lenges of design space exploration at the RTL level. Given that

achieving high performance accuracy is both a fundamental

requirement and a persistent challenge, TraceRTL provides a

solution that not only supp orts prototyping but also enables

the execution of workloads in trace form. Trace-driven method-

ology can be used to improve existing software simulators,

such as the trace-driven gem5 mentioned at [64]. In contrast,

TraceRTL enhances the RTL simulation to avoid extra model

TraceRTL: Agile Performance Evaluation

(a) From 1024 to 2048 BTB entries on SPEC CPU2017.

(b) Increasing FMA number from 2 to 4 on SPECfp2017.

Figure 19. Performance improvements of microarchitectural mo diﬁca-

tions on XiangShan, TraceRTL and XS-gem5.

layers. Accel-Sim [65] adds a new frontend for GPGPU-Sim [66]

to support trace-driven simulation. Unlike Accel-Sim’s high-

level GPU modeling, TraceRTL targets low-level RTL CPU

models and addresses challenges of model calibration.

Trace-Driven Performance Inaccuracy. Previous works

have investigated p erformance inaccuracy in trace-driven sim-

ulation, primarily focusing on the wrong-path simulation in

single-core [67–69], multi-core [70, 71] and synchronization in

multi-core simulation [72, 73]. Our methodology mainly focuses

on prefetching inﬂuence of wrong paths by taking the instruc-

tions at the correct path as wrong-path instructions, to suit

the RTL model and achieve high accuracy. Moreover, exist-

ing trace-driven simulators have a high level of abstraction ,

which may introduce performance errors thus masking some

inﬂuencing factors. TraceRTL provides a platform for studying

trace-driven simulation.

Error-Prone RTL Model. New RTL languages such as

Bluespec SystemVerilog [12], Chisel [13], and SpinalHDL [39]

provide high expressiveness and abstraction to reduce design

errors. Assassyn [74] introduces a high-level abstraction for

asynchronous event handling of pipelined architectures and can

generate a calibrated C++ simulator. TraceRTL presents an

orthogonal approach to utilizing a trace-driven methodology

to decouple the functional and performance models of existing

CPU models and expand the scope of workloads.

Trace Format Transformation. Prior work has explored

trace format transformation, e.g., converting Arm traces into

ChampSim-compatible format [7, 75]. However, ChampSim’s

high-level abstraction bypasses many low-level challenges, such

as diﬀerences in instruction semantics, encoding size, PC align-

ment, and branch oﬀset range, which become critical when

executing traces on RTL models.

9. Conclusion

We propose TraceRTL, a methodology to bring trace-driven

simulation to the CPU RTL model to facilitate agile perfor-

mance evaluation. We evaluate TraceRTL by integrating it into

XiangShan, achieving high accuracy of 99.87% and 99.86% on

SPECint2017 and SPECfp2017. We propose a trace transforma-

tion strategy, TraceBridge, and evaluate x86 Google workload

traces on the RISC-V XiangShan. TraceRTL mitigates the

benchmarking gap between software simulators and RTL de-

sign, supports both an RTL-based performance exploration

workﬂow and seamless integration with simulator-driven ﬂows,

serving as a bridge from early-stage exploration to last-mile

RTL evaluation.

Ethical Statement

No ethical approval was required for this study, as it did not

involve human or animal subjects.

Funding

This work was supported by the National Natural Science Foun-

dation of China (Grant No. 62090022, 62090023, 62172388) and

the Strategic Priority Research Program of Chinese Academy

of Sciences (Grant No. XDA0320000, XDA0320300).

Declaration of competing interests

The authors declare that they have no known competing ﬁnan-

cial interests or personal relationships that could have appeared

to inﬂuence the work reported in this paper.

Data Availability Statements

The data supporting the ﬁndings of this study are openly

available in XiangShan at https://github.com/OpenXiangShan/

XiangShan/tree/dev-tracertl.

Credit authorship contribution statement

Zifei Zhang: Conceptualization; Project administration;

Methodology; Validation; Investigation; Data curation; Formal

Analysis; Writing – original draft. Yinan Xu: Methodology;

Writing – review & editing; Kaichen Gong: Software; Vali-

dation; Investigation. Sa Wang: Writing; Visualization. Dan

Tang: Supervision; Funding acquisition; Resources. Yungang

Bao: Supervision; Funding acquisition; Resources; Writing –

review & editing.

Z. Zhang et al.

References

1. Doug Burger and Todd M. Austin. The simplescalar

tool set, version 2.0. SIGARCH Comput. Archit. News,

25(3):13–25, June 1997. doi:10.1145/268806.268810.

2. Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann,

Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E.

Moore, Mark D. Hill, and David A. Wood. Multifacet’s

general execution-driven multiprocessor simulator (gems)

toolset. SIGARCH Comput. Archit. News, 33(4):92–99,

November 2005. doi:10.1145/1105734.1105747.

3. N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G.

Saidi, and S.K. Reinhardt. The m5 simulator: Modeling

networked systems. IEEE Micro, 26(4):52–60, 2006. doi:

10.1109/MM.2006.82.

4. Nathan Binkert, Bradford Beckmann, Gabriel Black,

Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel

Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sar-

dashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib,

Nilay Vaish, Mark D. Hill, and David A. Wood. The gem5

simulator. SIGARCH Comput. Archit. News, 39(2):1–7,

August 2011. doi:10.1145/2024716.2024718.

5. Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout.

Sniper: exploring the level of abstraction for scalable and

accurate parallel multi-core simulation. In Proceedings

of 2011 International Conference for High Performance

Computing, Networking, Storage and Analysis, SC ’11,

New York, NY, USA, 2011. Association for Computing

Machinery. doi:10.1145/2063384.2063454.

6. Daniel Sanchez and Christos Kozyrakis. Zsim: fast and

accurate microarchitectural simulation of thousand-core

systems. In Proceedings of the 40th Annual Interna-

tional Symposium on Computer Architecture, ISCA ’13,

page 475–486, New York, NY, USA, 2013. Association for

Computing Machinery. doi:10.1145/2485922.2485963.

7. Nathan Gober, Gino Chacon, Lei Wang, Paul V. Gratz,

Daniel A. Jimenez, Elvira Teran, Seth Pugsley, and Jinchun

Kim. The championship simulator: Architectural simula-

tion for education and competition, 2022. URL: https:

//arxiv.org/abs/2210.14324, arXiv:2210.14324.

8. Hossein Golestani, Rathijit Sen, Vinson Young, and Gagan

Gupta. Calip ers: a criticality-aware framework for mod-

eling processor performance. In Proceedings of the 36th

ACM International Conference on Supercomputing, ICS

’22, New York, NY, USA, 2022. Association for Computing

Machinery. doi:10.1145/3524059.3532390.

9. Tony Nowatzki, Jaikrishnan Menon, Chen-Han Ho, and

Karthikeyan Sankaralingam. Architectural simulators con-

sidered harmful. IEEE Micro, 35(6):4–12, 2015. doi:

10.1109/MM.2015.74.

10. Cbp2025 simulator framework. https://ericrotenberg.

wordpress.ncsu.edu/cbp2025-simulator- framework/, 2025.

11. Sizhuo Zhang, Andrew Wright, Thomas Bourgeat, and

Arvind. Composable building blo cks to open up proces-

sor design. In Proceedings of the 51st Annual IEEE/ACM

International Symposium on Microarchitecture, MICRO-

51, page 68–81. IEEE Press, 2018. doi:10.1109/MICRO.2018.

00015.

12. Thomas Bourgeat, Cl´ement Pit-Claudel, Adam Chlipala,

and Arvind. The essence of bluespec: a core language for

rule-based hardware design. In Proceedings of the 41st

ACM SIGPLAN Conference on Programming Language

Design and Implementation, PLDI 2020, page 243–257,

New York, NY, USA, 2020. Association for Computing

Machinery. doi:10.1145/3385412.3385965.

13. Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee,

Andrew Waterman, Rimas Aviˇzienis, John Wawrzynek, and

Krste Asanovi´c. Chisel: constructing hardware in a scala

embedded language. In Proceedings of the 49th Annual

Design Automation Conference, pages 1216–1225, 2012.

doi:10.1145/2228360.2228584.

14. Verilator. Verilator user’s guide. https://www.veripool.

org/guide/latest/, 2026.

15. Haoyuan Wang and Scott Beamer. Repcut: Superlinear

parallel rtl simulation with replication-aided partitioning.

In Proceedings of the 28th ACM International Confer-

ence on Architectural Support for Programming Lan-

guages and Operating Systems, Volume 3, ASPLOS 2023,

page 572–585, New York, NY, USA, 2023. Association for

Computing Machinery. doi:10.1145/3582016.3582034.

16. Kexing Zhou, Yun Liang, Yibo Lin, Runsheng Wang, and

Ru Huang. Khronos: Fusing memory access for improved

hardware rtl simulation. In Proceedings of the 56th Annual

IEEE/ACM International Symposium on Microarchitec-

ture, MICRO ’23, page 180–193, New York, NY, USA,

2023. Association for Computing Machinery. doi:10.1145/

3613424.3614301.

17. Haoyuan Wang, Thomas Nijssen, and Scott Beamer. Don’t

repeat yourself! coarse-grained circuit deduplication to ac-

celerate rtl simulation. In Proceedings of the 29th ACM

International Conference on Architectural Support for

Programming Languages and Operating Systems, Volume

4, ASPLOS ’24, page 79–93, New York, NY, USA, 2025. As-

sociation for Computing Machinery. doi:10.1145/3622781.

3674184.

18. Mahyar Emami, Thomas Bourgeat, and James R. Larus.

Parendi: Thousand-way parallel rtl simulation. In Pro-

ceedings of the 30th ACM International Conference on

Architectural Support for Programming Languages and

Operating Systems, Volume 2, ASPLOS ’25, page 783–797,

New York, NY, USA, 2025. Association for Computing

Machinery. doi:10.1145/3676641.3716010.

19. Sagar Karandikar, Howard Mao, Donggyu Kim, David

Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton,

Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qi-

jing Huang, Kyle Kovacs, Borivoje Nikolic, Randy Katz,

Jonathan Bachrach, and Krste Asanovi´c. Firesim: Fpga-

accelerated cycle-exact scale-out system simulation in the

public cloud. In Proceedings of the 45th Annual Interna-

tional Symposium on Computer Architecture, ISCA ’18,

page 29–42. IEEE Press, 2018. doi:10.1109/ISCA.2018.

00014.

20. Sagar Karandikar, Albert Ou, Alon Amid, Howard Mao,

Randy Katz, Borivoje Nikoli´c, and Krste Asanovi´c.

Fireperf: Fpga-accelerated full-system hardware/software

performance proﬁling and co-design. In Proceedings of

the Twenty-Fifth International Conference on Architec-

tural Support for Programming Languages and Operating

Systems, ASPLOS ’20, page 715–731, New York, NY,

USA, 2020. Association for Computing Machinery. doi:

10.1145/3373376.3378455.

21. Mahyar Emami, Sahand Kashani, Keisuke Kamahori, Mo-

hammad Sepehr Pourghannad, Ritik Raj, and James R

Larus. Manticore: Hardware-accelerated rtl simulation

with static bulk-synchronous parallelism. In Proceedings

of the 28th ACM International Conference on Architec-

tural Support for Programming Languages and Operating

Systems, Volume 4, pages 219–237, 2023. doi:10.1145/

TraceRTL: Agile Performance Evaluation

3623278.3624750.

22. Fares Elsabbagh, Shabnam Sheikhha, Victor A Ying,

Quan M Nguyen, Joel S Emer, and Daniel Sanchez. Ac-

celerating rtl simulation with hardware-software co-design.

In Proceedings of the 56th Annual IEEE/ACM Interna-

tional Symposium on Microarchitecture, pages 153–166,

2023. doi:10.1145/3613424.3614257.

23. Christopher Celio, David A Patterson, and Krste

Asanovic. The berkeley out-of-order machine (boom):

An industry-competitive, synthesizable, parameterized

risc-v processor. EECS Department, University of

California, Berkeley, Tech. Rep. UCB/EECS-2015-

167, 2015. URL: https://www2.eecs.berkeley.edu/Pubs/

TechRpts/2015/EECS-2015- 167.html.

24. Jerry Zhao, Ben Korpan, Abraham Gonzalez, and Krste

Asanovic. Sonicboom: The 3rd generation b erkeley out-

of-order machine. In Fourth Workshop on Computer

Architecture Research with RISC-V, volume 5, pages 1–

7, 2020. URL: https://people.eecs.berkeley.edu/

krste/

papers/SonicBOOM-CARRV2020.pdf.

25. Christopher Celio, Pi-Feng Chiu, Borivoje Nikolic, David A

Patterson, and Krste Asanovic. BOOMv2: an open-

source out-of-order RISC-V core. In First Work-

shop on Computer Architecture Research with RISC-V

(CARRV), 2017. URL: https://www2.eecs.berkeley.edu/

Pubs/TechRpts/2017/EECS-2017- 157.pdf.

26. Kaifan Wang, Jian Chen, Yinan Xu, Zihao Yu, Zifei Zhang,

Guokai Chen, Xuan Hu, Linjuan Zhang, Xi Chen, Wei

He, Dan Tang, Ninghui Sun, and Yungang Bao. Xi-

angShan: An Open-Source Pro ject for High-Performance

RISC-V Processors Meeting Industrial-Grade Standards .

In 2024 IEEE Hot Chips 36 Symposium (HCS), pages 1–

25, Los Alamitos, CA, USA, August 2024. IEEE Computer

Society. URL: https://doi.ieeecomputersociety.org/10.

1109/HCS61935.2024.10665293, doi:10.1109/HCS61935.2024.

10665293.

27. Chen Chen, Xiaoyan Xiang, Chang Liu, Yunhai Shang, Ren

Guo, Dongqi Liu, Yimin Lu, Ziyi Hao, Jiahui Luo, Zhijian

Chen, Chunqiang Li, Yu Pu, Jianyi Meng, Xiaolang Yan,

Yuan Xie, and Xiaoning Qi. Xuantie-910: A commercial

multi-core 12-stage pipeline out-of-order 64-bit high per-

formance risc-v processor with vector extension: Industrial

product. In 2020 ACM/IEEE 47th Annual International

Symposium on Computer Architecture (ISCA), pages 52–

64. IEEE, 2020. doi:10.1109/ISCA45697.2020.00016.

28. Chen Bai, Qi Sun, Jianwang Zhai, Yuzhe Ma, Bei Yu,

and Martin DF Wong. Boom-explorer: Risc-v boom mi-

croarchitecture design space exploration framework. In

2021 IEEE/ACM International Conference On Computer

Aided Design (ICCAD), pages 1–9. IEEE, 2021. doi:

10.1109/ICCAD51958.2021.9643455.

29. Siddharth Gupta, Yuanlong Li, Qingxuan Kang, Abhishek

Bhattacharjee, Babak Falsaﬁ, Yunho Oh, and Mathias

Payer. Imprecise store exceptions. In Proceedings of

the 50th Annual International Symposium on Computer

Architecture, ISCA ’23, New York, NY, USA, 2023. As-

sociation for Computing Machinery. doi:10.1145/3579371.

3589087.

30. Moein Ghaniyoun, Kristin Barber, Yuan Xiao, Yinqian

Zhang, and Radu Teodorescu. Teesec: Pre-silicon vulner-

ability discovery for trusted execution environments. In

Proceedings of the 50th Annual International Symposium

on Computer Architecture, ISCA ’23, New York, NY,

USA, 2023. Association for Computing Machinery. doi:

10.1145/3579371.3589070.

31. Luming Wang, Xu Zhang, Songyue Wang, Zhuolun Jiang,

Tianyue Lu, Mingyu Chen, Siwei Luo, and Keji Huang.

Asynchronous memory access unit: Exploiting massive par-

allelism for far memory access. ACM Trans. Archit. Code

Optim., 21(3), September 2024. doi:10.1145/3663479.

32. Duo Wang, Mingyu Yan, Yihan Teng, Dengke Han, Hao-

ran Dang, Xiaochun Ye, and Dongrui Fan. A transfer

learning framework for high-accurate cross-workload design

space exploration of cpu. In 2023 IEEE/ACM Interna-

tional Conference on Computer Aided Design (ICCAD),

pages 1–9, 2023. doi:10.1109/ICCAD57390.2023.10323840.

33. Hideki Ando. Performance improvement by prioritizing the

issue of the instructions in unconﬁdent branch slices. In

2018 51st Annual IEEE/ACM International Symposium

on Microarchitecture (MICRO), pages 82–94, 2018. doi:

10.1109/MICRO.2018.00016.

34. Yinan Xu, Zihao Yu, Dan Tang, Guokai Chen, Lu Chen,

Lingrui Gou, Yue Jin, Qianruo Li, Xin Li, Zuojun Li, Ji-

awei Lin, Tong Liu, Zhigang Liu, Jiazhan Tan, Huaqiang

Wang, Huizhe Wang, Kaifan Wang, Chuanqi Zhang,

Fawang Zhang, Linjuan Zhang, Zifei Zhang, Yangyang

Zhao, Yaoyang Zhou, Yike Zhou, Jiangrui Zou, Ye Cai,

Dandan Huan, Zusong Li, Jiye Zhao, Zihao Chen, Wei He,

Qiyuan Quan, Xingwu Liu, Sa Wang, Kan Shi, Ninghui

Sun, and Yungang Bao. Towards developing high per-

formance risc-v processors using agile methodology. In

2022 55th IEEE/ACM International Symposium on Mi-

croarchitecture (MICRO), pages 1178–1199, 2022. doi:

10.1109/MICRO56248.2022.00080.

35. Championship value prediction. https://microarch.org/

cvp1/. Accessed: 2025-02-20.

36. Google workload traces version 2. https://console.cloud.

google.com/storage/browser/external-traces- v2. Ac-

cessed: 2025-02-20.

37. Wei Su, Abhishek Dhanotia, Carlos Torres, Jayneel Gandhi,

Neha Gholkar, Shobhit Kanaujia, Maxim Naumov, Kalyan

Subramanian, Valentin Andrei, Yifan Yuan, and Chunqiang

Tang. Dcperf: An op en-source, battle-tested performance

benchmark suite for datacenter workloads. In Proceedings

of the 52nd Annual International Symposium on Com-

puter Architecture, ISCA ’25, page 1717–1730, New York,

NY, USA, 2025. Association for Computing Machinery.

doi:10.1145/3695053.3731411.

38. OpenXiangShan. XiangShan. https://github.com/

OpenXiangShan/XiangShan, 2020.

39. SpinalHDL. Scala based hdl. https://github.com/

SpinalHDL/SpinalHDL, 2024.

40. Timothy Sherwoo d, Erez Perelman, Greg Hamerly, and

Brad Calder. Automatically characterizing large scale

program behavior. In Proceedings of the 10th Inter-

national Conference on Architectural Support for Pro-

gramming Languages and Operating Systems, ASPLOS X,

page 45–57, New York, NY, USA, 2002. Association for

Computing Machinery. doi:10.1145/605397.605403.

41. Alen Sabu, Harish Patil, Wim Heirman, and Trevor E

Carlson. Looppoint: Checkpoint-driven sampled simula-

tion for multi-threaded applications. In 2022 IEEE In-

ternational Symposium on High-Performance Computer

Architecture (HPCA), pages 604–618. IEEE, 2022. doi:

10.1109/HPCA53966.2022.00051.

42. Trevor E Carlson, Wim Heirman, Kenzo Van Craeynest,

and Lieven Eeckhout. Barrierpoint: Sampled simulation

Z. Zhang et al.

of multi-threaded applications. In 2014 IEEE Interna-

tional Symposium on Performance Analysis of Systems

and Software (ISPASS), pages 2–12. IEEE, 2014. doi:

10.1109/ISPASS.2014.6844456.

43. Krste Asanovic, Rimas Avizienis, Jonathan Bachrach,

Scott Beamer, David Biancolin, Christopher Celio, Henry

Cook, Daniel Dabbelt, John Hauser, Adam Izraele-

vitz, Sagar Karandikar, Ben Keller, Donggyu Kim, and

John Koenig. The rocket chip generator. EECS De-

partment, University of California, Berkeley, Tech.

Rep. UCB/EECS-2016-17, 4:6–2, 2016. URL: https:

//aspire.eecs.berkeley.edu/wp/wp-content/uploads/2016/

04/Tech-Report- The-Rocket- Chip- Generator-Beamer.pdf.

44. Bruno S´a, Luca Valente, Jos´e Martins, Davide Rossi,

Luca Benini, and Sandro Pinto. CVA6 RISC-V virtual-

ization: Architecture, microarchitecture, and design space

exploration. IEEE Transactions on Very Large Scale In-

tegration (VLSI) Systems, 2023. doi:10.1109/TVLSI.2023.

3302837.

45. RISC-V community. Olympia. https://github.com/

riscv-software- src/riscv-perf- model, 2026.

46. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil,

Artur Klauser, Geoﬀ Lowney, Steven Wallace, Vijay Janapa

Reddi, and Kim Hazelwoo d. Pin: building customized pro-

gram analysis tools with dynamic instrumentation. In Pro-

ceedings of the 2005 ACM SIGPLAN Conference on Pro-

gramming Language Design and Implementation, PLDI

’05, page 190–200, New York, NY, USA, 2005. Association

for Computing Machinery. doi:10.1145/1065010.1065034.

47. Derek Bruening, Evelyn Duesterwald, and Saman Amaras-

inghe. Design and implementation of a dynamic optimiza-

tion framework for windows. In 4th ACM workshop on

feedback-directed and dynamic optimization (FDDO-4),

page 20, 2001.

48. Nicholas Nethercote and Julian Seward. Valgrind: a frame-

work for heavyweight dynamic binary instrumentation.

In Proceedings of the 28th ACM SIGPLAN Conference

on Programming Language Design and Implementation,

PLDI ’07, page 89–100, New York, NY, USA, 2007. As-

sociation for Computing Machinery. doi:10.1145/1250734.

1250746.

49. DynamoRIO. Dynamorio trace format. https://dynamorio.

org/sec_drcachesim_format.html. Accessed: 2026-02-07.

50. Fabrice Bellard. QEMU, a fast and portable dynamic

translator. In USENIX annual technical conference,

FREENIX Track, volume 41, pages 10–5555. California,

USA, 2005. URL: https://www.usenix.org/legacy/event/

usenix05/tech/freenix/full_papers/bellard/bellard.pdf.

51. Santosh Pandey, Amir Yazdanbakhsh, and Hang Liu. Tao:

Re-thinking dl-based microarchitecture simulation. Pro-

ceedings of the ACM on Measurement and Analysis of

Computing Systems, 8(2):1–25, 2024. doi:10.1145/3656012.

52. Muhammad E. S. Elrabaa, Ayman Hroub, Muhamed F.

Mudawar, Amran Al-Aghbari, Mohammed Al-Asli, and

Ahmad Khayyat. A very fast trace-driven simulation plat-

form for chip-multiprocessors architectural explorations.

IEEE Transactions on Parallel and Distributed Systems,

28(11):3033–3045, 2017. doi:10.1109/TPDS.2017.2713782.

53. OpenXiangShan. XS-gem5. https://github.com/

OpenXiangShan/GEM5, 2020.

54. John L Henning. Spec cpu2006 benchmark descriptions.

ACM SIGARCH Computer Architecture News, 34(4):1–

17, 2006. doi:10.1145/1186736.1186737.

55. James Bucek, Klaus-Dieter Lange, and J´oakim

v. Kistowski. SPEC CPU2017: Next-generation com-

pute benchmark. In Companion of the 2018 ACM/SPEC

International Conference on Performance Engineering,

pages 41–42, 2018. doi:10.1145/3185768.3185771.

56. OpenXiangShan. NEMU. https://github.com/

OpenXiangShan/NEMU, 2019.

57. Ahmad Yasin. A top-down metho d for performance anal-

ysis and counters architecture. In 2014 IEEE Interna-

tional Symposium on Performance Analysis of Systems

and Software (ISPASS), pages 35–44, 2014. doi:10.1109/

ISPASS.2014.6844459.

58. Andrej Karpathy. llama2.c: Inference Llama 2 in one ﬁle of

pure C. https://github.com/karpathy/llama2.c. Accessed:

2026-02-07.

59. RISC-V. RISC-V Instruction Set Manual. https://github.

com/riscv/riscv-isa- manual. Accessed: 2026-02-07.

60. Javier D. Bruguera. Low-latency and high-bandwidth

pipelined radix-64 division and square root unit. In

2022 IEEE 29th Symposium on Computer Arithmetic

(ARITH), pages 10–17, 2022. doi:10.1109/ARITH54963.

2022.00012.

61. Vivekananda M Vedula, Jacob A Abraham, Jayanta

Bhadra, and Raghuram Tupuri. A hierarchical test genera-

tion approach using program slicing techniques on hardware

description languages. Journal of Electronic Testing,

19:149–160, 2003. doi:10.1023/A:1022885523034.

62. Lingyi Liu and Shobha Vasudevan. Eﬃcient validation in-

put generation in rtl by hybridized source code analysis. In

2011 Design, Automation & Test in Europe, pages 1–6,

2011. doi:10.1109/DATE.2011.5763253.

63. Biruk Mammo, Jim Larimer, Matthew Morgan, Dave Fan,

Eric Hennenhoefer, and Valeria Bertacco. Architectural

trace-based functional coverage for multiprocessor veriﬁ-

cation. In 2012 13th International Workshop on Micro-

processor Test and Veriﬁcation (MTV), pages 1–5, 2012.

doi:10.1109/MTV.2012.12.

64. Sotiris Apostolakis, Chris Kennelly, Xinliang David Li, and

Parthasarathy Ranganathan. Necro-reaper: Pruning away

dead memory traﬃc in warehouse-scale computers. In Pro-

ceedings of the 30th ACM International Conference on

Architectural Support for Programming Languages and

Operating Systems, Volume 2, ASPLOS ’25, page 689–703,

New York, NY, USA, 2025. Association for Computing

Machinery. doi:10.1145/3676641.3716007.

65. Mahmoud Khairy, Zhesheng Shen, Tor M. Aamodt, and

Timothy G. Rogers. Accel-sim: An extensible simulation

framework for validated gpu modeling. In 2020 ACM/IEEE

47th Annual International Symposium on Computer Ar-

chitecture (ISCA), pages 473–486, 2020. doi:10.1109/

ISCA45697.2020.00047.

66. Ali Bakho da, George L. Yuan, Wilson W. L. Fung, Henry

Wong, and Tor M. Aamo dt. Analyzing cuda workloads

using a detailed gpu simulator. In 2009 IEEE Interna-

tional Symposium on Performance Analysis of Systems

and Software, pages 163–174, 2009. doi:10.1109/ISPASS.

2009.4919648.

67. Onur Mutlu, Hyesoon Kim, David N Armstrong, and Yale N

Patt. An analysis of the performance impact of wrong-

path memory references on out-of-order and runahead ex-

ecution processors. IEEE Transactions on Computers,

54(12):1556–1571, 2005. doi:10.1109/TC.2005.190.

TraceRTL: Agile Performance Evaluation

68. Stijn Eyerman, Sam Van den Steen, Wim Heirman, and

Ibrahim Hur. Simulating wrong-path instructions in de-

coupled functional-ﬁrst simulation. In 2023 IEEE Interna-

tional Symposium on Performance Analysis of Systems

and Software (ISPASS), pages 124–133. IEEE, 2023. doi:

10.1109/ISPASS57527.2023.00021.

69. Bhargav Reddy Godala, Sankara Prasad Ramesh, Krish-

nam Tibrewala, Chrysanthos Pepi, Gino Chacon, Svilen

Kanev, Gilles A Pokam, Daniel A Jim´enez, Paul V Gratz,

and David I August. Correct wrong path. arXiv preprint

arXiv:2408.05912, 2024. URL: https://doi.org/10.48550/

arXiv.2408.05912.

70. Resit Sendag, Ayse Yilmazer, Joshua J. Yi, and Augus-

tus K. Uht. The impact of wrong-path memory refer-

ences in cache-coherent multiprocessor systems. Jour-

nal of Parallel and Distributed Computing, 67(12):1256–

1269, 2007. Best Paper Awards: 20th International

Parallel and Distributed Processing Symposium (IPDPS

2006). URL: https://www.sciencedirect.com/science/

article/pii/S0743731507000457, doi:10.1016/j.jpdc.2007.

03.005.

71. R. Sendag, A. Yilmazer, J.J. Yi, and A.K. Uht. Quantifying

and reducing the eﬀects of wrong-path memory references

in cache-coherent multiprocessor systems. In Proceedings

20th IEEE International Parallel & Distributed Process-

ing Symposium, pages 10 pp.–, 2006. doi:10.1109/IPDPS.

2006.1639260.

72. Stephen R Goldschmidt and John L Hennessy. The

accuracy of trace-driven simulations of multiprocessors.

ACM SIGMETRICS Performance Evaluation Review,

21(1):146–157, 1993. doi:10.1145/166962.167001.

73. Karthik Sangaiah, Michael Lui, Radhika Jagtap,

Stephan Diestelhorst, Siddharth Nilakantan, Ankit

More, Baris Taskin, and Mark Hempstead. Synchrotrace:

Synchronization-aware architecture-agnostic traces for

lightweight multicore simulation of cmp and hpc work-

loads. ACM Trans. Archit. Code Optim., 15(1), March

2018. doi:10.1145/3158642.

74. Jian Weng, Boyang Han, Derui Gao, Ruijie Gao, Wanning

Zhang, An Zhong, Ceyu Xu, Jihao Xin, Yangzhixin Luo,

Lisa Wu Wills, and Marco Canini. Assassyn: A uniﬁed

abstraction for architectural simulation and implementa-

tion. In Proceedings of the 52nd Annual International

Symposium on Computer Architecture, ISCA ’25, page

1464–1479, New York, NY, USA, 2025. Association for

Computing Machinery. doi:10.1145/3695053.3731004.

75. Josu´e Feliu, Arthur Perais, Daniel A. Jim´enez, and Alberto

Ros. Rebasing microarchitectural research with indus-

try traces. In 2023 IEEE International Symposium on

Workload Characterization (IISWC), pages 100–114, 2023.

doi:10.1109/IISWC59245.2023.00027.

BenchCouncil Transactions on Benchmarks, Standards and

Evaluations, 2026

DOI: https://doi.org/10.66834/xqpw3878

Review Article

REVIEW ARTICLE

Mapping the Intellectual Landscape of Blo ckchain in

the Banking Industry: A Hybrid Bibliometric and

Systematic Review (2015–2025)

Sadeq Abdullah Aladeeb

1,2,∗

and Fatima Zohra Sossi Alaoui

Laboratory of Economics, Finance, Management and Innovation, Faculty of Economics and Management, Ibn Tofail University, Kenitra,

Morocco and

Department of Accounting and Auditing, Faculty of Commerce and Economics, Sana’a University, Sana’a, Yemen

∗

Corresponding author. Email: sadeqab dullahhasan.al-adeeb@uit.ac.ma

Received on 8 August 2025; Accepted on 13 March 2026

Abstract

The advent of blockchain technology has introduced new alternatives to traditional banking systems, providing a decen-

tralized, secure, and transparent framework. However, its adoption is still complex and uneven for many reasons. This

study provides a comprehensive mapping of the intellectual trajectory, thematic structure, and development of blockchain

technology research in the banking sector. Using a hybrid literature review methodology that combines bibliometric anal-

ysis and systematic content review, the study analyzes 389 peer-reviewed publications retrieved from Scopus (2015–May

2025). VOSviewer was employed to conduct performance analysis and science mapping, including co-authorship, co-

citation, keyword co-occurrence, and bibliographic coupling analyses. In parallel, qualitative thematic analysis identiﬁed

six clusters: (1) blockchain in banking and ﬁnancial intermediation to enhance operational eﬃciency, (2) decentralized

ﬁnance and cryptocurrencies, (3) integration of blockchain with other digital innovations, (4) trust-related dimensions,

(5) institutional and regulatory aspects, and (6) strategies for modernizing banking business models. The ﬁndings reveal

a steady rise in research output, regional disparities in collaboration, and thematic evolution from early conceptualiza-

tion to recent signs of diversiﬁcation of applied research. By integrating quantitative and qualitative insights, this study

highlights key research gaps, oﬀers directions for future work, and provides guidance for academics, practitioners, and

policymakers on the transformative potential and challenges of blockchain in banking.

Key words: Blockchain Technology, Banking Sector, Bibliometric Analysis, Systematic Content Review, Financial

Technology, Decentralized Finance

1. Introduction

In recent years, the banking sector has undergone a signiﬁcant

transformation, driven by the rapid advancement of emerging

technologies, particularly blockchain. The widespread adoption

of smartphones and high-speed data transmission has not only

disrupted social interactions but also traditional business oper-

ations. However, legacy banking systems have faced challenges

in adapting to these technological advancements due to factors

such as structural rigidity, high operational costs, and inade-

quate transaction processing speeds. For instance, cross-border

remittances frequently necessitate several days to complete, an

ineﬃciency that starkly contrasts with the near-instantaneous

nature of digital communication [1].

In this context, blockchain technology emerged in 2008 as

the technology underpinning Bitcoin, a peer-to-peer digital cur-

rency eliminating the intermediary [2]. Its decentralized nature

allows for secure, anonymous, and cost-eﬀective transactions.

This has led to the conclusion that it possesses considerable

potential as an oﬀ-balance sheet replacement for conventional

banking systems [3].

The theory behind blockchain, however, goes back to the

early 1990s when Stuart Haber and W. Scott Stornetta de-

veloped a cryptographically secure method of time-stamping

digital documents [4]. Their subsequent introduction of Merkle

trees enabled data to b e gathered into chained blocks, signiﬁ-

cantly enhancing security as well as eﬃciency [5]. The modern

blockchain, conceptualized by Satoshi Nakamoto, is a form of

Distributed Ledger Technology (DLT) that utilizes consensus

algorithms on distributed nodes to record transactions [6, 7].

The design of the blockchain, which consists of a series of blocks

that are cryptographically linked, ensures immutability and

tamper-prooﬁng. Consequently, it establishes a highly reliable

digital record-keeping system [8]. The application of blockchain

technology has expanded b eyond its initial implementation in

the domain of cryptocurrency. It has been adopted in various

S. A. Aladeeb et al.

In the banking sector, blockchain is increasingly seen as an inno-

vative way to transform the trustworthiness and reliability of data

management [15]. As digital technology continues to penetrate daily

life and concern about data security grows, blockchain’s signicance

will continue to rise. It may become as integral to daily life as the

internet [6]. Furthermore, the emergence of newer technologies, such

as blockchain, will transform the banking sector in the near future

[16]. For example, banks are expected to save $10 billion in cross-

border payment fees by 2030 by adopting blockchain technology

[17]. According to World Economic Forum projections, blockchain

technology will reach a signicant milestone by 2027, becoming in-

tegrated into various sectors of the global economy. A considerable

augmentation in the nancial sector, including the banking industry,

is projected to increase GDP by 10% [18].

Among the most prominent manifestations of this transformation

is the rise of decentralized nance (DeFi), which uses blockchain

technology to facilitate peer-to-peer nancial services without the

need for intermediaries such as conventional banks. This setup tran-

scends geographical locations and provides basic nancial services,

such as savings, loans, and investment products to poor communities

in emerging economies [19, 20].

As a revolutionary innovation, blockchain technology oers

numerous benets: enhanced security, privacy, operational trans-

parency, and increased eciency. This is all a result of its de-

centralized nature and the use of cryptographic algorithms, which

signicantly reduce the risk of cyberattacks and fraud while ensuring

traceability and data integrity [21–24]. Consequently, banks are in-

creasingly exploring blockchain technology for applications such as

cross-border payments, streamlined Know Your Customer (KYC)

processes, enhanced anti-money laundering (AML) measures, and

automated contract enforcement through smart contracts. These

innovations collectively contribute to lowering operational costs and

improving overall eciency [25, 26].

Despite the promise of blockchain technology, its adoption by

banks faces limiting factors. These factors include regulatory un-

certainty, technical complexity, and resistance to change at the

organizational level. A meticulous examination of the opportuni-

ties and limitations presented by this technology is imperative,

accompanied by a thorough assessment of awareness, readiness, and

acceptance levels among banks and customers [

27–29].

The timing, evolution trajectory, and possible impact of

blockchain technology on banking have garnered considerable inter-

est among academics and practitioners. In recent years, academic

interest in the topic has increased markedly, resulting in a large and

diverse body of literature. No study, to the best of my knowledge,

has ever carried out a detailed systematic mapping of the intel-

lectual structure, theme development, and future research trends

of blockchain literature in the banking sector using a systematic

integration of bibliometric analysis and systematic content review

methods. Consequently, there is a need to identify and assess the

current state of the art and prevailing research trends in this domain.

To ll this gap, this study utilizes a hybrid methodology of lit-

erature review, combining the bibliometric analysis and systematic

content review to answer the following research questions:

RQ1: What are the prevailing research trends and patterns of

scholarly collaboration in the domain of blockchain technology in

the banking sector between 2015 and 2025?

RQ2: What are the core thematic clusters and intellectual

structures underpinning blockchain research in the banking sector?

RQ3: What key research gaps and future directions can be iden-

tied to advance the understanding and application of blockchain

technology in the banking sector?

Amidst the accelerating digitalization of banking and nancial

systems, blockchain technology is revolutionizing how banking ser-

vices are produced and disseminated. The primary aim of this

research is to synthesize the current academic literature on the

impact of blockchain on banking by identifying the key concepts,

emerging research trends, and prevailing themes. To this end, the

study adopts a mixed-method research approach combining quan-

titative bibliometric analysis with a qualitative systematic content

analysis to map the intellectual landscape of blockchain studies in

banking. The combination strengthens the validity and credibility of

the ndings, oering an overarching perspective on how blockchain

is reshaping the industry. Besides mapping the literature, the study

provides critical reections on academic and institutional responses

to the emergence of blockchain and indicates avenues for further re-

search in a bid to advance its revolutionary potential in the banking

sector.

By doing so, this study contributes to a deeper understanding of

the signicant development of the eld and guides future academic

and practical engagement with blockchain innovation in the banking

sector. The novelty of the study lies in its explicit framework of tri-

angulation and cross-validation that combines bibliometric science

mapping and qualitative thematic analysis, providing valuable and

actionable insights for academics, practitioners, and policymakers.

In addition, the study contributes to the development of transpar-

ent, safe, and eective banking and nancial systems by identifying

the advantages and obstacles related to the adoption of blockchain

technology systematically and the proposal of an organized agenda

for future research in this eld.

The present article is structured as follows. Subsequent to this

introduction, Section 2 delineates the hybrid review methodology,

meticulously expounding the bibliometric and systematic content

analysis approaches. In Section 3, the results of the performance

analysis and science mapping of 389 publications on blockchain in

banking are presented, and the six main thematic clusters identi-

ed are discussed. Section 4 identies the managerial and practical

implications of the ndings. Finally, Section 5 oers the main con-

clusions, which include a summary of the key ndings, a proposal

of directions for future research, and an acknowledgement of the

study’s limitations.

2. Research Methodology

To achieve the purposes of this study, we use a hybrid review

methodology that integrates bibliometric analysis and systematic

content analysis. The mixed-method approach combines quantita-

tive analysis with a substantial emphasis on qualitative analysis.

A hybrid review approach, as described by Paul and Criado [

30],

is a method that facilitates a comprehensive examination of the lit-

erature by combining quantitative and qualitative approaches, with

the aim of organizing, analyzing, and interpreting data in a mean-

ingful way. The objective is to provide a comprehensive summary

of the scholarly literature on the adoption of blockchain technology

(BCT) in the banking industry and to oer an integrative review of

the main topics, major ndings, and research agendas for the future

in this domain.

Bibliometric analysis, which relies on the statistical evaluation of

academic production [

31], is complemented in this study by content

analysis, a qualitative technique used for the systematic analysis of

textual information and disclosure structure of existing knowledge

within a given discipline [32]. The methodology stages and analytical

tools adopted to fulll the objectives of the study are outlined in

Figure 1.

Stage 1:Research

Planning

Stage 2: Data

Collection

Stage 3 : Bibliometrics

Analysis

Stage 4: Systematic

Content Review and

Thematic Analysis

Research Aims

Research Questions

Database Selection (Scopus)

Keyword Identification

Search Query Formation

Document Screening (Inclusion/Exclusion Criteria)

Final Dataset (389) Documents

Performance Analysis

Visualization (VOSviewer, Excel)

Bibliographic Coupling

Keyword Co-occurrence

Co-citation Analysis

Co-authorship Analysis

Selection of 70 Documents based on Bibliographic Coupling

Identification of Key Clusters

Cross-Validation with key Thematic Cluster

Manual Thematic Coding

Review and Analysis of the Dataset Content

Stage 5 : Outputs

Research Implications/ Future Research Directions

Thematic Clustering

Trends & Influential Entities

Finalize 6 Thematic Clusters

Figure 1. Research Design of the Hybrid Bibliometric–Systematic Literature Review (Developed by the Authors)

S. A. Aladeeb et al.

2.1. Data Collection

2.1.1. Database Selection

In this stage, data were collected from the Scopus database, a

widely recognized and reputable source for bibliometric research

[18, 33]. Although there are other databases, such as Web of

Science, IEEE Xplore, and Google Scholar, Scopus was selected be-

cause it oers the largest curated abstract and citation database

of peer-reviewed social science and business publications, index-

ing over 27,000 active titles from more than 7,000 international

publishers, with particularly strong coverage in Finance, Manage-

ment, Economics, and Information Systems disciplines[34, 35]. Prior

bibliometric methodology research indicates that Scopus retrieves

broader journal coverage and comparable citation structures to Web

of Science for management and interdisciplinary technology studies,

while oering superior metadata consistency for science mapping

analyses. Its extensive coverage also makes it convenient for re-

search in corporate nance, such as the adoption of blockchain in

the banking sector. A dened inclusion criterion was applied for the

selection of relevant keywords and the extraction of the dataset for

bibliometric analysis and systematic literature review.

2.1.2. Keyword Identication

To identify the appropriate keywords for retrieving the dataset of

our research, we have carried out a comprehensive review of the

previous literature on blockchain in the banking sector. The focus

was on determining the most frequent words used in the current

literature [

18, 36, 37]. For this purpose, Google Scholar was used in

the search using the keyword phrase ”Blockchain Technology in the

Banking Sector,” and related studies were referenced to determine

common keywords. Besides, previous bibliometric and systematic

literature reviews were also examined to conrm that the selected

keywords were inclusive and specic.

Based on this literature review, we identied several frequently

used search terms, such as ”Blockchain in Bank,” ”Blockchain

Technology in Bank,” ”Blockchain in Finance,” and ”Blockchain

Technology in Finance.” Additionally, Boolean search strings such

as (blockchain AND banking) and (block-chain AND adoption

AND banking) were identied. Furthermore, consultation with two

academic experts in nance and block-chain conrmed that the key-

words ”Blockchain AND Banking” are frequently used to describe

studies where both blockchain and banking are major foci, rather

than merely contextually related.

Although this exploratory phase did identify a number of re-

lated terms, we purposely restricted the scope of the nal retrieval

query to just ”Blockchain AND Banking” to make sure that both

the blockchain and banking domains are the primary focus of our

analysis and that the studies retrieved are focused products of those

two areas of study. Moreover, the selection of these keywords is con-

gruent with our research objectives, particularly in developing an

intellectual structure and determining the main contributions to-

wards the understanding of the impact of blockchain technology on

banking.

2.1.3. Search Criteria and Data Extraction

The data collection process during the study was conducted system-

atically, following standard bibliometric study practices [38] and

PRISMA guidelines for transparent reporting [39]. On 12 May 2025,

a search was made using the Scopus database with the search term

”Blockchain AND Banking” and yielded 1,641 documents published

between 2015 and 12 May 2025. Although blockchain technol-

ogy emerged in 2008, until 2015, academic interest in adopting

blockchain technology in the banking sector was signicantly nonex-

istent. Therefore, the selected time frame (2015–2025) indicates the

evolution of scientic production in the eld.

Because of the novelty and rapid progress of the research eld,

formal inclusion criteria were applied to ensure the analytical rele-

vance and dataset quality. Only peer-reviewed articles, conference

papers, and review articles were chosen, restricting analysis to the

most relevant subject areas: Business, Management, and Account-

ing; Economics, Econometrics, and Finance; and Social Sciences.

Publications that focused primarily on technical or computational

aspects without a substantial connection to banking, economic, or -

nancial applications were excluded to maintain thematic consistency

with the study’s objectives. Additionally, only English-language doc-

uments were included to ensure conceptual consistency and facilitate

systematic review. The nal search string was as follows:

TITLE-ABS-KEY ( Blockchain AND Banking ) AND PUB-

YEAR > 2015 AND PUBYEAR < 2026 AND ( LIMIT-TO (

SUBJAREA , ”BUSI” ) OR LIMIT-TO ( SUBJAREA , ”ECON”

) OR LIMIT-TO ( SUBJAREA , ”SOCI” ) ) AND ( LIMIT-TO (

LANGUAGE , ”English” ) ) AND ( LIMIT-TO ( DOCTYPE , ”ar”

) OR LIMIT-TO ( DOCTYPE , ”cp” ) OR LIMIT-TO ( DOCTYPE

, ”re” ) )

This search strategy prioritizes thematic specicity over broad

recall, which is a typical approach used in bibliometric mapping

studies that strive for clearer concepts and greater analytical co-

herence. In addition, exploratory pilot studies utilizing broader

terminology (ntech, nancial services, distributed ledgers, etc.)

resulted in the retrieval of an excessive number of records (more

than double) that either only slightly or not substantially refer-

enced Blockchain and/or Banking. As a result, continued usage of

this focused search strategy was maintained to ensure precision and

maintain the analytical quality of the results of bibliometric and

thematic analyses of this research.

Following the removal of duplicates and non-relevant documents

using lters, the nal dataset of 389 documents was attained.

The records were saved in CSV (Comma-Separated Values) for-

mat for subsequent bibliometric analysis. To ensure replicability and

transparency, the dataset has been publicly released in a special

repository and is in line with the banking and nance literature’s

standard practices.

2.2. Data Renement and Analysis

The second stage of the systematic protocol involved rening re-

trieved data after the previous step to generate a dataset for

bibliometric mapping and thematic synthesis of the literature. It

is important to note that this stage did not change the content

or makeup of the dataset retrieved from earlier stages; rather, it

enhanced the reliability and interpretability of analyses of keyword-

based data, specically co-occurrence networks and clustering

themes from keywords.

Data preparation involved the systematic elimination of false

positives, the cleaning of metadata elds, and the normalization of

author-provided keywords. Keyword optimization was accomplished

by merging singular and plural forms, standardizing spelling dier-

ences, and consolidating synonymous terms into cohesive conceptual

labels. For instance, terms like ”cryptocurrency” and ”cryptocur-

rencies,” ”smart contracts” and ”smart contract,” as well as ”bank”

and ”banks,” were standardized into singular keyword inputs. Ad-

ditionally, terminology standardization was employed to harmonize

overlapping denitions typically found in the diverse blockchain lit-

erature. This process included incorporating equivalent phrases such

as ”distributed ledger,” ”distributed ledger technology,” ”ntech,”

”decentralized nance,” and ”DeFi,” along with ”banking sector”

and ”banking industry.” Terms that were irrelevant or contextu-

ally unsuitable (such as ”bibliometric analysis and COVID-19”)

were eliminated to uphold thematic consistency. After optimiza-

tion and normalization, two complementary analytical methods were

executed: - Descriptive bibliometric analysis, which evaluated pub-

lication trends, citation patterns, source productivity, and networks

of key contributors across various elds; - Systematic content anal-

ysis, which pinpointed key research themes, core theme groups, and

predominant scholarly discussions arising from the literature.

As a result, this renement process ensured the statistical relia-

bility of bibliometric network structures and the conceptual clarity

of thematic interpretations while not impacting article inclusion or

research coverage.

2.2.1. Bibliometric Analysis

Following the collection and preparation of the research dataset, a

scientometric analysis was conducted for the purpose of examin-

ing the structure and dynamics of the research eld. In this study,

VOSviewer [40], a specialized computer software program for con-

structing and visualizing large-scale bibliometric networks [41], was

employed. VOSviewer software was selected due to its proven capa-

bilities in the management of large networks, as well as its inbuilt

text-mining functionality, which enables the extraction and analysis

of valuable terms and concepts from the literature [42]. Additionally,

Microsoft Excel was used for statistical analysis and data visualiza-

tion, including examining publication trends by year, conducting

citation analysis, and determining keyword frequency.

Bibliometric analysis provides a comprehensive approach for

tracing the development of a research theme with established and

reproducible methods. These methods are largely recognized as

objective, accurate, and reproducible [43]. Two main bibliometric

approaches were applied in this study: performance analysis and

science mapping.

Performance analysis isa fundamental component of bibliometric

analysis, focused on the quantitative assessment of scientic produc-

tivity and impact. It provides a data-driven perspective of scholarly

output and the growth of a scientic discipline over time. This

technique encompasses the analysis of annual publication trends,

identication of highly cited publications, evaluation of leading sci-

entic journals, and assessment of the research contributions by

institutions, countries, and individual authors [

44]. By examining

these indicators, performance analysis provides a comprehensive un-

derstanding of the intellectual evolution of the eld and uncovers its

most inuential contributors.

Science mapping, on the other hand, provides a graphic and

structural representation of the intellectual architecture of the eld

[45]. This technique involves advanced bibliometric techniques such

as co-authorship analysis, citation and co-citation analysis, keyword

co-occurrence, and bibliographic coupling. These analyses help to

uncover primary research areas, common keywords, and the the-

matic clusters that form the landscape of the eld [46]. In particular,

bibliographic coupling was used to study thematic clusters and cur-

rent fronts of research to identify emeging topics, research gaps,

and directions for future research. Certain previous bibliometric

research has utilized similar approaches to examine blockchain re-

search within the banking sector [18], [36], [37], conrming the

relevance and propriety of the methodology used herein.

2.2.2. Systematic Content Review

To comprehensively explore the emerging themes of block-chain

technology in banking, this study adopted a two-phase method-

ological design. Specically, it combined bibliometric analysis with

systematic content review. The mixed-method approach was used

to address the research questions RQ2 and RQ3. By integrating

quantitative and qualitative techniques, the study aimed to synthe-

size dominant research themes, assess how blockchain would impact

banking operations, and ascertain dominant scholarly trends. Sys-

tematic content analysis not only contributed to complementing

bibliometric ndings but also to enhancing the interpretative depth

of the results.

In the rst phase, bibliometric techniques were applied using

VOSviewer in order to visualize the intellectual structure of the

eld. Following the procedure outlined by [38], two science mapping

techniques were used. First, an analysis of keyword co-occurrence

was performed to identify words that frequently co-occur together

in the metadata of articles’ titles, abstracts, and keywords [47].

Consequently, the main research areas, key themes, and emerging

research topics were identied [45]. Second, bibliographic coupling

was employed to cluster articles that share common cited references,

thereby revealing thematically related research streams [

48]. A min-

imum citation threshold of 30 citations per document was applied to

exclude publications with limited scholarly impact. This resulted in

the selection of 69 highly cited papers. Additionally, the 10 highly

cited papers were manually added to ensure conceptual compre-

hensiveness. After duplicate removal and further manual ltering,

the nal dataset of 70 peer-reviewed papers was established as the

foundation for the subsequent qualitative review.

In the second phase, a qualitative content analysis was per-

formed using Braun and Clarke’s framework [49] as follows. First,

each article in the nal dataset was examined and coded to extract

relevant information regarding research objectives, methodological

approaches, core themes, principal ndings, and identied research

gaps. Second, the coded content was grouped into preliminary the-

matic categories based on their conceptual similarities. Thereafter,

these thematic categories were manually rened to ensure concep-

tual relevance and logical coherence within the categorization. For

instance, thematic clusters that have similar central themes (e.g.,

blockchain and cryptocurrency, blockchain and DeFi) were con-

solidated into a common thematic cluster. Finally, the outcomes

derived from the qualitative analysis were cross-validated against

those generated through keyword co-occurrence analysis to enhance

the results of the study.

Methodological Novelty and Contribution

The novelty of the methodological approach of this study resides

in its explicit triangulation and cross-validation framework that

combines bibliometric maps of science with qualitative thematic

analysis. Previous studies in this eld either used descriptive bib-

liometric mapping or a qualitative synthesis, but these studies were

typically based on small samples and treated these methods sepa-

rately. In contrast, this research is designed in three stages: (i) to

identify macro-level thematic structure through quantitative biblio-

metric mapping; (ii) to use systematic qualitative content analysis to

capture in-depth conceptual patterns and research gaps; and (iii) to

cross-validate the results of quantitative and qualitative analysis to

determine both the statistical relationship and conceptual alignment

of those analyses.

The triangulation approach provides greater methodological

strength because it provides a more extensive and functionally re-

liable representation of the research landscape, helping to better

establish a framework for developing theories, as well as plan-

ning future investigations into the impact of blockchain on banking

studies.

S. A. Aladeeb et al.

Methodological Chal lenges and Mitigation

The rapid growth of the literature on blockchain applications

in banking presents several methodological challenges. These is-

sues arise from four dimensions of interrelated challenges: disci-

plinary fragmentation, terminological inconsistency, publication vol-

ume/size/overview, and methodological heterogeneity. Blockchain

research in banking spans broad disciplinary areas, including -

nance, computer science, information systems, law, and regulatory

research, making it dicult to align themes and integrate theo-

ries. Additionally, many overlapping terms exist, including ntech,

digital banking, cryptocurrencies, decentralized nance (DeFi), cen-

tral bank digital currency (CBDC), etc. These multiple terms

signicantly increase the potential for conceptual confusion and

misclassication.

In addition to these diculties caused by the rapid growth in

the number of publications, many dicult processes of literature

screening and synthesis occur when hundreds of literature arti-

cles are reviewed while attempting to keep the reviews analytically

sound. Moreover, the research literature reviewed exhibited con-

siderable methodological dierences, ranging from technical system

architectures and research analysis to policy-oriented and conceptual

frameworks, complicating the synthesis of cross-study data.

To alleviate these issues, the present research develops a triangu-

lated methodological approach that employs bibliometric mapping of

literature using computer analysis tools as well as systematic qual-

itative analysis and manual validation [

50–52]. Triangulating the

methodology allows for increased coverage of the literature reviewed

while providing for increased assurance of analytical integrity and

credibility in addition to conceptual consistency.

In summary, this study’s integrated methodology enhances the

validity and reliability of its ndings by combining quantitative map-

ping, qualitative thematic interpretation, and cross-validation. This

integrative approach strengthens the robustness of the ndings and

enables a holistic understanding of blockchain’s role in banking.

It also provided a solid foundation for identifying future research

directions in this evolving eld.

3. Results and Discussion

3.1. General information and performance analysis

The bibliometric analysis revealed 389 documents, published in 269

sources between 2015 and May 2025, authored or co-authored by

1,077 scholars. The principal purpose of collecting this bibliographic

dataset is to provide an overall picture of the scientic literature

that addresses the application of blockchain in banking during this

period. This overview not only identies key publication patterns

but also brings an understanding of the evolution of the eld. Such

mapping is essential for understanding the development of the topic,

as it helps to identify publication patterns, collaborative networks,

and the most active research domains. Moreover, it underscores the

academic relevance of the dataset and provides a foundation for

further analysis.

Table 1 presents the descriptive statistics summarizing the

dataset. In addition, the results illustrate key aspects of research

productivity and collaboration, such as annual publication trends

(Table 2; Figure 2), top productive scientic journals publishing in

the eld (Table 3), top contributing authors (Table 4), and most

active institutions (Table 5),leading countries in publication output

(Table 6), and the highly cited documents (Table 7). These analy-

ses collectively provide a detailed account of the scientic landscape

and support the evaluation of scholarly performance in the eld.

Table 1. Main Information of the Dataset

Description Results

Retrieval Date 12 May 2025

Time-Span 2015–May 2025

Total Publications 389.00

Subject Area:

Business, Management, and Accounting

Economics, Econometrics, and Finance

Social Sciences

Document Type:

Article 274.00

Conference Paper 84.00

Review 31.00

Number of Cited Publications 313.00

Number of Non-Cited Publications 76.00

Total Citations 9354.00

Average Citations per Publication 24.05

Average Citations per Cited Publication 29.89

Average Years from Publication 3.10

Average Citations per Year per Document 4.63

Sources (Journals, Books, etc.) 269.00

Aliations 786.00

Countries 88.00

References 18437.00

Keywords Plus (ID) 1818.00

Author’s Keywords (DE) 1131.00

Authors 1077.00

Publications per Author 0.36

Authors per Publication 2.77

3.1.1. Publication Trends Over Time

Table 2 and Figure 2 illustrate the publication trends over a year

from 2016 to May 2025. The analysis comprises metrics such as

total publications (TP), cumulative publications (CTP), total ci-

tations (TC), and average citations per publication (TC/CTP and

TC/TCP). The data reveal three distinct phases in the evolution

of the research eld: (1) Early emergence and foundational impact

(2016–2018), (2) Expansion and thematic diversication (2019–

2021), and (3) Peak production with initial signs of saturation

(2022–2025).

The initial phase (2016–2018) reects the inception of academic

activity, with four articles published in 2016 being cited 1,249 times

(312.25 per article), indicating foundational signicance. The num-

ber of publications increased from eight in 2017 to 20 in 2018,

reecting growing interest in the potential of blockchain technology

in the banking sector.

In the second phase, between 2019 and 2021, production in-

creased sharply, from 30 in 2019 to 47 papers in 2020, though

decreasing slightly to 41 papers in 2021. Despite the growth be-

ing notable, average citations declined (TC/CTP fell from 8.85 in

2019 to 5.99 in 2021), most likely due to higher participation and

decline of the novelty. This is the stage that points towards the di-

versication of research themes and the decline in the productivity of

2021, possibly impacted by global disruptions such as the COVID-19

pandemic.

The third phase (2022–2025) represents the most productive pe-

riod in terms of publication volume, with annual outputs increasing

from 54 in 2022 to a peak of 78 in 2024. Publications during this

phase constitute over half of the total output, highlighting the area’s

rapid expansion and highest level of publication activity. Although

the TC/CTP ratio fell from 6.53 in 2022 to 1.17 in 2024, this decline

Figure 2. Total Publications (TP) over time (2016–2025). Note: The data for 2025 (33 publications) is incomplete, reecting the data cuto date of May

12, 2025, and therefore does not accurately represent a decline in annual output.

Table 2. Publication Trends Over Time

Year TP PTP CTP TCP TC TC/CTP TC/TCP

2016 4.00 1.00% 4.00 4.00 1249.00 312.25 312.25

2017 8.00 2.00% 12.00 8.00 850.00 70.83 106.25

2018 20.00 5.00% 32.00 19.00 1033.00 32.28 54.37

2019 30.00 12.00% 62.00 28.00 549.00

8.85

19.61

2020 47.00 12.00% 109.00 44.00 2321.00 21.29 52.75

2021 41.00 11.00% 150.00 38.00 899.00 5.99 23.66

2022 54.00 14.00% 204.00 52.00 1333.00 6.53 25.63

2023 74.00 19.00% 278.00 56.00 656.00 2.36 11.71

2024 78.00 20.00% 356.00 53.00 418.00 1.17 7.89

2025 33.00 8.00% 389.00 11.00 46.00 0.12 4.18

TP = Total Publications, PTP = Percentage of Total Publications, CTP = Cumulative Total Publications, TCP = Total Cited

Publications, TC = Total Citations.

is largely attributable to the recency eect, as newer articles have

not yet accumulated signicant citations.

The data for 2025 is partial and represents an artifact of the

data cuto. As of May 12, 2025, only 33 publications were indexed

at this time. So, the apparent decline in output for 2025 constitutes

a methodological artifact rather than a substantive downturn. While

this gure is expected to increase signicantly by the end of the year,

annual publication counts, rather than citation-based indicators, re-

ect a consistent rise in research activity. Moreover, the increasing

diversication of research themes, particularly applied studies inte-

grating blockchain with AI, IoT, and FinTech, is likely to inuence

future citation patterns.

Overall, while publication volumes have risen exponentially,

falling citation metrics indicate the need for yet more innovative and

theory-driven studies. Future studies need to undertake interdisci-

plinary, problem-based approaches to advance the practical uptake

of blockchain in banking contexts.

3.1.2. Leading Scientic Journals Publishing Blockchain

and Banking Research

The most impactful journals that publish research on block-chain

technology within the banking sector are detailed in Table 3, which

presents both productivity measures (Total number of publications)

and impact indicators (Total citations, Average citations per article,

Average publication year, and normalized citation metrics). These

combined measures enable the evaluation of not only the quantity

of output but also the intellectual impact of each journal within the

rapidly changing research environment.

An important observation in Table 3 is that, while Technologi-

cal Forecasting and Social Change and Sustainability (Switzerland)

are at the forefront journals in terms of volume, each journal’s schol-

arly impact varies signicantly. Technological Forecasting and Social

Change exhibits a notably superior citation prole (648 total cita-

tions; 92.57 citations per article), which underscores the journal’s

strong focus on technology adoption, innovation dissemination, and

socio-economic changes, subjects that closely relate to block-chain

research in the nancial sector. Its wide interdisciplinary reader-

ship and emphasis on theory-driven forecasting likely enhance its

visibility and citation across various elds. In comparison, although

Sustainability frequently covers block-chain topics, its more practi-

cal and policy-oriented focus, often aimed at specic sustainability

audiences, leads to lower average citation rates (18.86 per article),

indicating a more localized rather than broad academic inuence.

In contrast, Financial Innovation, despite having published only

ve articles, boasts the highest overall citations (922) and greatest

average impact per article (184.40). This remarkable achievement

illustrates that thematic relevance of a journal, rather than just

the volume of publications, drives academic inuence. The jour-

nal’s concentrated focus on nancial technologies, digital currencies,

and banking change positions it as a primary outlet for signicant

theoretical and empirical contributions, making its articles particu-

larly prominent and often cited across nance, economics, and policy

research communities.

S. A. Aladeeb et al.

Table 3. Leading Scientic Journals Publishing Blockchain in Banking Research

Rank Source Documents Citations Avg. Citations Avg. Year Avg. Norm. Citations

1 Technological Forecasting and Social Change 7.00 648.00 92.57 2022.29 4.04

2 Sustainability (Switzerland) 7.00 132.00 18.86 2021.57 0.98

3 International Journal of Scientic and Technology Research 6.00 57.00 9.50 2019.67 0.21

4 Financial Innovation 5.00 922.00 184.40 2020.40 1.99

5 Technology Analysis and Strategic Management 4.00 101.00 25.25 2022.75 3.55

6 Journal of Risk and Financial Management 4.00 72.00 18.00 2022.75 1.67

7 IEEE Transactions on Engineering Management 4.00 204.00 51.00 2022.00 6.44

8 Frontiers in Blockchain 4.00 93.00 23.25 2021.00 2.01

9 New Economic Windows 3.00 543.00 181.00 2016.00 0.58

10 Journal of Money Laundering Control 3.00 116.00 38.67 2020.00 2.00

11 Journal of Financial Stability 3.00 83.00 27.67 2020.33 0.99

12 Fintech 3.00 85.00 28.33 2023.00 1.15

A similar trend is evident in New Economic Windows, which

attained 543 citations with just three publications. Its early ex-

ploration of blockchain topics (with an average publication year of

2016) enabled its articles to gather citations over an extended pe-

riod, demonstrating the benets of early involvement in emerging

research areas. These foundational studies often serve as essential

reference points for subsequent scholarship.

On the other hand, journals like the International Journal of

Scientic and Technology Research, while comparatively produc-

tive (six publications), exhibit limited citation impact (averaging

9.5 citations per article). This variance likely stems from the jour-

nal’s broader technical audience and its less focused engagement

with nancial or banking communities, leading to reduced cita-

tion engagement within social science and nance-oriented research

networks.

Normalized citation metrics further enhance impact evaluation

by considering publication age. Journals such as IEEE Transac-

tions on Engineering Management (6.44) and Technology Analysis

and Strategic Management (3.55) show strong relative citation

performance given their more recent publication schedules. Their

heightened normalized impact emphasizes the increasing importance

of management- and gover-nance-related perspectives in blockchain

research, particularly at the crossroads of engineering innovation,

organizational strategy, and transformation in the nancial sector.

Overall, these trends suggest that scholarly inuence in the realm

of blockchain-banking research is more inuenced by journal the-

matic alignment, multidisciplinary engagement, early positioning in

specic topics, and theoretical focus rather than merely by pub-

lication frequency. Journals that contextualize blockchain within

wider discussions on nancial governance, innovation management,

regulatory adjustment, and socio-economic change achieve greater

citation visibility than journals that are technically oriented or

narrowly focused on sustainability. This uneven distribution of in-

uence indicates that the intellectual essence of the eld is anchored

in publications that connect nancial theory, policy analysis, and

studies of innovation rather than solely in technically driven or

sustainability-centric journals.

3.1.3. The 10 Most Inuential Authors

Table 4 shows the most prolic authors who have made the largest

academic contributions to blockchain research in the banking sector.

This evaluation considers their productivity, citation impact, nor-

malized inuence, and the strength of their collaborative networks.

These metrics not only identify the most visible researchers but also

show how intellectual leadership and collaboration patterns shape

the eld.

It can be seen that Devi, N. Chitra and Kumari, Anitha are the

most prolic with three papers each and the same citation count

of 105 and 35 average citations per paper. While both have the

same normalized citation score (2.14), only Devi has a sizable total

link strength (19), suggesting more robust collaboration networks.

This suggests that Devi’s inuence goes beyond citation metrics

to include a bridging function between various research teams, en-

couraging cross-pollination of ideas related to adoption, operational

eciency, and governance in blockchain.

In contrast, although Mbaidin, Hisham O. has the same number

of publications of Devi and Kumari, he has lower citations and aver-

age citation per document wit 43 and 14.33 respectively. Moreover,

the author is strongly linked (link strength: 23), suggesting broad

collaborative activity in the eld. This pattern highlights authors

whose main contributions are in interdisciplinary collaboration and

empirical research across multiple countries. This fosters method-

ological diversity but may not yet result in highly cited conceptual

breakthroughs.

A dierent type of intellectual leadership is seen in authors

like Ramzi El-Haddadeh, Nitham Hindi, Vishanth Weerakkody,

and especially Uthayasankar Sivarajah. They achieve notable cita-

tion eciency despite fewer publications. Each of them had two

high-impact papers with over than 100 citations, an average of 55

citations per article, and 2.54 normalized scores, indicating inuence

and visibility. However, Uthayasankar Sivarajah has the highest ci-

tation average (109.5) and a 5.03 normalized citation score, showing

exceptional scholarly impact with fewer papers. He particularly fo-

cuses on governance, data management, and digital transformation

strategies within nancial institutions. These authors help consoli-

date theory by presenting models that link blockchain adoption with

organizational readiness and regulatory issues.

Emerging researchers like Gan, Qingqiu, and Lau, Raymond

Yiu Keung, show strong normalized citation rates of 4.78 despite

their recent publication activity, with an average publication year

of 2024.5. Their rapid accumulation of citations highlights a grow-

ing second wave of leadership focused on algorithmic nance, data

analytics, and the convergence of emerging ntech. This trend in-

dicates a shift in the eld from foundational theoretical work to

application-oriented and interdisciplinary growth.

In summary, the author network structure illustrates a layered

knowledge ecosystem that balances established theorists, network

connectors, and rapidly advancing innovators. Leadership in this

eld is dened not just by the number of publications but also by

the ability to present impactful conceptual frameworks, provide scal-

able empirical evidence, and foster new research initiatives through

collaborative networks. This evolving prole of authorship shows the

maturation of blockchain and banking research into a more unied

yet methodologically diverse academic domain.

3.1.4. The Top 10 Most Productive Institutions

Table 5 presents the leading institutions that have contributed most

to blockchain research in banking in terms of productivity, impact,

Table 4. The Most Inuential Authors

Rank Author TP TC APY ACPP ANC TLS

1 Devi, N. Chitra 3.00 105.00 2022.33 35.00 2.14 19.00

2 Kumari, Anitha 3.00 105.00 2022.33 35.00 2.14 0.00

3 Mbaidin, Hisham O. 3.00 43.00 2023.67 14.33 1.91 23.00

4 Choo, Kim-Kwang Raymond 2.00 63.00 2022.00 31.50 2.14 3.00

5 El-Haddadeh, Ramzi 2.00 110.00 2022.00 55.00 2.54 27.00

6 Gan, Qingqiu 2.00 37.00 2024.50 18.50 4.78 8.00

7 Hindi, Nitham 2.00 110.00 2022.00 55.00 2.54 10.00

8 Lau, Raymond Yiu Keung 2.00 37.00 2024.50 18.50 4.78 3.00

9 Sivarajah, Uthayasankar 2.00 219.00 2022.00 109.50 5.03 13.00

10 Weerakkody, Vishanth 2.00 110.00 2022.00 55.00 2.54 10.00

TP = Total Publications; TC = Total Citations; APY = Average Publication Year; ACPP = Average Citations Per Publication; ANC =

Average Normalized Citations; TLS = Total Link Strength.

and other important indicators such as, citations, average publica-

tion year, average citations per document, and average normalized

citations.

Foremost among them is the Department of Management Studies

at the Indian Institute of Technology Delhi, with 3 papers that gar-

nered 110 citations, achieving an average of 36.67 citations per paper

and an average normalized citation score of 1.32. This reects a high

academic impact and research quality in the feild. Conversely, the

Adnan Kassar School of Business at the Lebanese American Univer-

sity, despite being equally prolic with 3 papers, has a lower average

citation (3.67) and normalized citation score (0.49), revealing a less

widespread scholarly impact.

In addition, certain institutions such as Al Qasimia University,

Mutah University, Abu Dhabi University, and independent institu-

tions such as the Financial and Taxation Consultant, Jordan, both

of which have 2 papers of low citation frequency (average number

of citations per paper of 6) but relatively high normalized citation

scores (1.12), showing greater engagement and increasing popularity

over the last few years (average year of publication: 2024), were also

taken into account.

Most prominently, Spiru Haret University of Romania, with only

2 publications, received 56 citations and the highest normalized cita-

tion score (3.23), indicating the inuence of its work in the discipline.

Similarly, Symbiosis Institute of Digital and Telecom Management

achieved a moderate impact with 21 citations from 2 publications.

In general, the results show geographically widespread and in-

stitutionally varied research eorts. Productivity is spread across

institutions, but citation impact is concentrated in a few, indicating

the distinction between quantity and quality of scholarly production.

3.1.5. The Most Productive and Inuential Countries

Table 6 illustrates the signicant geographical variation in research

contributions, citation impact, and other major indicators, such as

average publication year, average citations, average normalized cita-

tions, and total link strength of blockchain research in the banking

sector.

As shown in Table

6, India is the most prolic and productive

country with 94 documents, but it is lower ranked in citation impact

(average citations per paper with 17.33 ) and normalized citation

score (1.28). This indicates that while it leads in quantity, the overall

impact remains moderate.

In contrast, the United States, with 51 papers, has the highest

total citations (2,847) and a high average citation score (55.82), thus

indicating a high academic impact. Likewise, the United Kingdom,

with a lower productivity of 25 publications, achieves the top aver-

age citations (64.44) and normalized citation score (2.42), reecting

high-quality and highly recognized research output.

China also demonstrates a balanced prole with 24 papers and

an average citation of 47.79, showing a good compromise between

productivity and impact. The United Arab Emirates shows emerging

activity with 21 papers and a good normalized score (1.41), yet still

a moderate average citation per document (12.19).

Other countries, such as Germany, Italy, and Malaysia are

moderately impactful and productive. Jordan and Switzerland, in

contrast, while producing smaller volumes of output (12 and 10

papers, respectively), stand at competitive normalized citation av-

erages (1.15 and 0.82, respectively), indicating quite high-impact

research. Surprisingly, Spain and the Russian Federation have lower

normalized and average citation indicators, reecting limited impact

despite modest research production.

Overall, India produces the most research in quantity, but other

countries like the UK, the US, and China have a greater scientic

impact. These patterns show that there is a global contribution, but

the quality and visibility of research in the eld of blockchain in

banking are uneven.

3.1.6. The Top 10 Most Cited Documents

As we stated above, the dataset is retrieved from the Scopus

database, and as we know, the topic has been investigated in various

contexts by authors from Business, Management and Accounting,

Economics, Econometrics and Finance, and Social Sciences. The

analysis of the top 10 most cited documents in blockchain and

banking research identies the seminal works that have inuenced

academic investigation and applied applications in this multidisci-

plinary research area. These documents span various areas of study,

ranging from nancial innovation to accounting, regulatory studies,

and information systems. Citation counts indicate academic and

intellectual interest, while more complex metrics, such as average

citations per year and normalized citation score, provide a bet-

ter indication of the signicant documents and their comparative

inuence over time and across research elds [53].

In view of this, Table 7 presents the ten most highly cited docu-

ments in our research eld, according to the Scopus database. It is

noted that, nine of the ten most highly cited papers received more

than 200 citations, even though most of them were published less

than four years ago.

Leading the list is Guo and Liang [26] pioneering document enti-

tled ”Blockchain application and outlook in the banking industry,”

published in the Financial Innovation journal, with a total of 706

citations as the most cited document in nance. Its average annual

citation rate of 78.44 indicates a consistently high impact since its

publication, although its normalized citation score of 2.26 suggests

that, despite its high number of citations, its performance compared

to other publications in its eld is more moderate. Nonetheless,

S. A. Aladeeb et al.

Table 5. The Most Inuential Institutions

Rank Institution TP TC APY ACPP ANC

1 Adnan Kassar School of Business, Lebanese American University, Beirut,

Lebanon

3.00 11.00 2023.67 3.67 0.49

2 Dept. of Management Studies, Indian Institute of Technology Delhi, New

Delhi, India

3.00 110.00 2023.00 36.67 1.32

3 Al Qasimia University, United Arab Emirates 2.00 12.00 2024.00 6.00 1.12

4 Business Intelligence and Data Analytics Dept., Business School, Mutah Uni-

versity, Jordan

2.00 12.00 2024.00 6.00 1.12

5 Dept. of Economics, College of Economics and Management, Al Qasimia Uni-

versity, Sharjah, UAE

2.00 12.00 2024.00 6.00 1.12

6 Faculty of Economics, Kharazmi University, Tehran, Iran 2.00 13.00 2022.50 6.50 0.37

7 Faculty of IT, Abu Dhabi University, UAE 2.00 12.00 2024.00 6.00 1.12

8 Financial and Taxation Consultant, Jordan 2.00 12.00 2024.00 6.00 1.12

9 Spiru Haret University, Romania 2.00 56.00 2023.50 28.00 3.23

10 Symbiosis Institute of Digital and Telecom Mgmt., Symbiosis Intl. (Deemed

Univ.), Pune, India

2.00 21.00 2022.00 10.50 0.43

TP = Total Publications; TC = Total Citations; APY = Average Publication Year; ACPP = Average Citations Per Publication; ANC =

Average Normalized Citations.

Table 6. The Most Productive Countries

Rank Country TP TC APY ACPP ANC TLS

1 India 94.00 1629.00 2022.60 17.33 1.28 48.00

2 United States 51.00 2847.00 2021.35 55.82 1.62 31.00

3 United Kingdom 25.00 1611.00 2022.08 64.44 2.42 43.00

4 China 24.00 1147.00 2022.50 47.79 1.30 20.00

5 United Arab Emirates 21.00 256.00 2022.71 12.19 1.41 12.00

6 Italy 20.00 327.00 2021.70 16.35 0.90 16.00

7 Russian Federation 19.00 169.00 2019.58 8.89 0.29 0.00

8 Germany 18.00 740.00 2021.44 41.11 1.31 8.00

9 Malaysia 14.00 186.00 2022.36 13.29 0.97 19.00

10 Jordan 12.00 103.00 2023.75 8.58 1.15 18.00

11 Spain 11.00 145.00 2021.55 13.18 0.81 0.00

12 Indonesia 10.00 137.00 2022.30 13.70 0.33 2.00

13 Switzerland 10.00 280.00 2021.80 28.00 0.82 5.00

TP = Total Publications; TC = Total Citations; APY = Average Publication Year; ACPP = Average Citations Per Publication; ANC =

Average Normalized Citations; TLS = Total Link Strength.

the work is still inuential owing to its pioneering and general in-

troduction of the revolutionary nature of blockchain for banking,

specically as it pertains to operational eciency and transparency

and transactional security.

On the contrary, Thakor’s [54] article entitled ”Fintech and

banking: What do we know? ranks second in terms of total cita-

tions (601), but outperforms all other documents in terms of average

annual citations (120.20) and the number of normalized citations

(12.17). This suggests that the study has quickly become a leading

reference in its eld, although it has just 4 years since its publi-

cation. This suggests that the study is already a classic reference

in the area. The Journal of Financial Intermediation presents a

solid theoretical model on how ntech, including blockchain tech-

nology, is transforming long-established paradigms in banking. Its

very high normalized citation score also indicates high inuence and

cross-disciplinary adoption, especially in nance, economics, and

regulation studies in banking.

Its third most cited paper, authored by Dai and Vasarhelyi [55],

entitled ”Toward blockchain-based accounting and assurance,” pub-

lished in the Journal of Information Systems, has been cited 532

times. It has a high average of 66.5 yearly citations and a nor-

malized score of 5.01, attesting to its contributory quality as a

connecting publication between accounting theory and blockchain

technology. It oers research that informs discussion about the use

of blockchain to enable auditability and trust in nancial reports,

and is thus a reference work on the research of nancial assurance

with blockchain-based.

An equally signicant contribution is made by Peters and

Panayi [56], entitled ”Understanding modern banking ledgers us-

ing blockchain technologies,” cited 452 times. Its 50.22 times per

year citation rate indicates ongoing interest by researchers, while

its normalized citation of 1.45 indicates moderate impact in its

broader research eld. The signicance of this work lies in its spe-

cic contribution to addressing distributed ledger technology and

smart contracts, and oering insight into blockchain’s technology

foundation from a banking industry perspective.

Additionally, the International Journal of Information Manage-

ment published research by Schuetz and Venkatesh [57] on using

blockchain to drive nancial inclusion in India. The article was cited

297 times with an average annual citation rate of 59.4 and a normal-

ized citation rate of 6.01. This article is clearly very interdisciplinary

in applicability. Its focus on social and developmental implications

of blockchain makes it more relevant in policy development and

nancial inclusion policies, particularly in emerging economies.

With regard to infrastructure and security, although not

banking-focused, Minoli and Occhiogrosso’s [58] article entitled

”Blockchain Mechanisms for IoT Security,” has garnered 285 ci-

tations, an average annual citation of 40.71, and a normalized

Table 7. The Top 10 Most Cited Do cuments

Rank Authors Year Title Source Document Type TC ACPY NC

1 Ye Guo & Chen Liang 2016 Blockchain application and outlook

in the banking industry

Financial Innovation, 2(1) Original research article 706.00 78.44 2.26

2 Anjan V. Thakor 2020 Fintech and banking: What do we

know?

Journal of Financial Inter-

mediation, 41

Review article 601.00 120.20 12.17

3 Dai J.; Vasarhelyi M.A. 2017 Toward blockchain-based accounting

and assurance

Journal of Information Sys-

tems, 31(3)

Conceptual research article 532.00 66.50 5.01

4 Gareth W. Peters & Efstathios Panayi 2016 Understanding Modern Banking

Ledgers Through Blockchain Tech-

nologies: Future of Transaction

Processing and Smart Contracts on

the Internet of Money

New Economic Windows

(NEW), pp. 239–278

Book chapter 452.00 50.22 1.45

5 Schuetz S.; Venkatesh V. 2020 Blockchain, adoption, and nancial

inclusion in India: Research opportu-

nities

International Journal of In-

formation Management, 52

Original research article 297.00 59.40 6.01

6 Daniel Minoli & Benedict Occhiogrosso 2018 Blockchain mechanisms for IoT secu-

rity

Internet of Things (Nether-

lands), 1–2, 1–13

Original research article 285.00 40.71 5.52

7 Zetzsche D.A.; Arner D.W.; Buckley R.P. 2020 Decentralized Finance Journal of Financial Regula-

tion, 6(2), 172–203

Conceptual/policy article 264.00 52.80 5.35

8 Poonam Garg et al. 2021 Measuring the perceived benets of

implementing blockchain technology

in the banking sector

Technological Forecasting

and Social Change, 163

Empirical research article 218.00 54.50 9.94

9 Saurabh Ahluwalia et al. 2020 Blockchain technology and startup

nancing: A transaction cost eco-

nomics perspective

Technological Forecasting

and Social Change, 151

Empirical research article 212.00 42.40 4.29

10 Mohd Javaid et al. 2022 A review of Blockchain Technology

applications for nancial services

BenchCouncil Transactions

on Benchmarks, Standards

and Evaluations, 2(3)

Review article 207.00 69.00 8.39

TC = Total Citations; ACPY = Average Citations per Year; NC = Normalized Citations.

score of 5.52. Its interdisciplinary contribution comes in the form

of providing data transmission protocols that are secure, something

that would be essential to highly technologically advanced banking

systems that are based on Internet-of-Things (IoT) incorporation.

Furthermore, regulatory aspects of blockchain are analyzed in

the most highly-cited paper by Zetzsche, Arner, and Buckley [

59],

entitled ”Decentralized Finance,” which has been cited 264 times.

The article has a yearly average of 52.8 citations and a normal-

ized citation of 5.35, and it illustrates increasing academic interest

in legal and compliance matters of decentralized nancial sys-

tems. Published in the Journal of Financial Regulation, it oers

a critical framework for the examination of blockchain’s legal and

systemic issues and thus is extremely useful to researchers as well as

policymakers.

Empirical understanding of blockchain adoption is presented

in their article ”Measuring the perceived benets of implementing

blockchain in the banking sector,” which has been cited 218 times,

by Garg et al. [

60]. Interestingly, it has a high average citation rate

of 54.5 per year and a signicant normalized citation score of 9.94,

which indicates high use and strong cross-eld inuence. Using struc-

tural equation modeling, the authors assign a numeric value to the

benets of blockchain, such as trust, transparency, and eciency,

and make this study highly applicable to banking professionals.

Parallel to this is the work of Ahluwalia, Mahto, and Guerrero

[61] enhances the knowledge of blockchain technology within the

entrepreneurial nance context through their empirical article ti-

tled ”Blockchain and Startup Finance.” The paper has been cited

212 times at an average rate of 42.4 citations per annum, besides

a normalized citation count of 4.29. The article extends the use of

blockchain from traditional banking institutions to its impact on

startup and venture capital environments through the adoption of

transaction cost economics as a conceptual building block. Round-

ing out the list is the most recent contribution by Javaid et al. [62],

titled ”A Review of Blockchain Applications in Financial Services,”

which accumulated 207 citations within a brief period. With an an-

nual average of 69.0 citations and a normalized citation score of

8.39, the article’s direct impact and growing importance are evi-

dent. The article summarizes the various applications of blockchain

technology in nancial services, reecting the growing demand from

academics and industry experts for comprehensive reviews amid the

rapid development of the Fintech sector.

Taken together, the citation patterns observed suggest that the

inuence within the blockchain-banking literature is more linked to

the capacity to relate technological advancements to broader insti-

tutional, accounting, regulatory, and socio-economic issues than to

purely technological innovation. Works that receive a high num-

ber of citations bring together conceptual theorization (like ntech

transformation), incorporate insights from multiple disciplines (such

as accounting, law, and information systems), and present empiri-

cal evidence that tackles real-world banking issues, including trust,

nancial inclusion, compliance, and eciency. This highlights that

the most impactful articles in academia frame blockchain not merely

as a technical tool, but as a driver for signicant changes in bank-

ing ecosystems. As a result, the structure of citations indicates a

mature eld that is progressively focusing on governance frame-

works, adoption processes, regulatory legitimacy, and organizational

transformation instead of isolated demonstrations of technology.

3.2. Science Mapping

Science mapping examines the relationships among contributors in

a research eld. Particularly, it focuses on patterns of intellec-

tual interaction and structural connections between key scholarly

constituents, such as how sources, countries, institutions, au-

thors, references, keywords, and publications relate to each other

[

46, 63, 64].

The present study uses a range of science mapping techniques,

including co-authorship analysis, co-citation analysis, co-occurrence

analysis, and bibliographic coupling analysis. These methods facili-

tate gaining in-depth knowledge about the evolution of the eld, the

collaborative patterns that characterize it, and the thematic struc-

ture that underpins it [46]. When paired with network visualization

software such as VOSviewer, these methods illustrate the bibliomet-

ric and intellectual structure of the research landscape [41, 45], as

outlined below.

S. A. Aladeeb et al.

pakistan

canada

bangladesh

ukraine

turkey

south africa

saudi arabia

romania

iran

australia

france

switzerland

poland

indonesia

spain

jordan

malaysia

germany

russian federation

italy

united states

india

VOSviewer

Figure 3. International Co-authorship Network of Countries in Blockchain and Banking Research. Node size represents publication volume, link

thickness indicates collaboration intensity, and colors denote distinct col laboration clusters.

3.2.1. Co-authorship of Countries

Co-authorship analysis is a bibliometric technique that is employed

to study patterns of collaboration among authors, institutions, and

countries based on joint publications [65, 66]. At the national level,

it reveals international research collaboration, mapping the global

dispersion of scientic production and the transnational network

structure [

67, 68]. Particularly, the analysis reveals leading coun-

tries, maps geographical patternsof collaboration, and illustrates the

eect of international networks on knowledge production [69, 70].

To explore global collaboration in blockchain research in the

banking sector, we conducted a co-authorship analysis at the coun-

try level using VOSviewer. We included countries that had at least

ve documents and 30 citations. This led to 25 out of 88 countries

meeting the criteria, with 72 links and a total link strength (TLS)

of 105. As shown in Figure 3, the visualization displays six color-

coded clusters, where nodes represent countries and links indicate

the strength and frequency of co-authorships. Node size reects pub-

lication volume, while link thickness shows collaboration intensity,

and TLS quanties a country’s total collaborative strength.

The blue cluster, led by India, comprises the United Arab

Emirates, Jordan, and South Africa, indicating close cooperation

between South Asia and the Middle East. The central position and

large node size of India highlight its high research productivity and

its role as a regional leader in blockchain innovation. The partici-

pation of the United Arab Emirates and South Africa signies an

escalating level of interest in the nancial applications of blockchain

among digitally transforming economies.

The red cluster comprises the United States, China, Italy,

Romania, Saudi Arabia, Pakistan, and Canada, forming a wide in-

tercontinental network. The U.S. stands out for its high research

volume and multiple collaborative ties. This cluster spans North

America, Europe, the Middle East, and South Asia, indicating rich

interdisciplinary exchanges. China and Italy are major contribu-

tors to the technological and regulatory aspects of blockchain, while

Saudi Arabia and Pakistan can point to stronger academic connec-

tions with the West, possibly underpinned by digitization reforms

and plans like the Vision 2030 of Saudi Arabia.

The yellow cluster includes the Russian Federation, Germany,

Switzerland, and Turkey. Though geographically spread across Eu-

rope and Eurasia, these countries show strategic interest in digital

nance and decentralization. Germany and Switzerland lead in

ntech, while Russia and Turkey focus on modernizing nancial

systems, suggesting collaboration based on national strategies for

digital transformation.

Moreover, the purple cluster consists of the United Kingdom,

France, and Iran. The UK is the middle connection between the

Middle East and Western Europe, showing high intra-European co-

operation along with historical scholarly ties to the region. France

and the UK are high-output researchers, while Iran shows up as a

leading Middle Eastern producer of blockchain research. However,

the light blue cluster includes Poland, Spain, and Ukraine. The

nations, while not central, are reective of increasing Eastern and

Southern European engagement in blockchain research. Their inclu-

sion is reective of increased cross-border collaboration as well as a

willingness to adopt blockchain towards economic modernization.

Overall, the ndings of this analysis reveal a dispersed worldwide

and interconnected research landscape. Developed and emerging

economies are actively engaging with blockchain research in banking.

3.2.2. Co-citation of Authors

Co-citation analysis is a bibliometric technique that is applied to ex-

amine the intellectual landscape of a research area through analyzing

how frequently two documents, authors, or sources are cited together

in subsequent works [71]. A specic type of this analysis, Author

Co-citation Analysis (ACA), examines how frequently two authors

appear cited in tandem, therefore reecting the conceptual structure

underlying scholarly communication and conceptual evolution in an

area [72, 73]. An increased frequency of co-citation between two au-

thors implies a tight thematic correspondence or common inuence

on the shaping of specic streams of research [74].

In the current study, to better understand intellectual founda-

tions and underlying blockchain research in the banking context,

an author co-citation analysis was conducted using VOSviewer soft-

ware. We applied a minimum threshold of 25 citations per author,

resulting in the identication of 102 prominent authors out of a total

of 25,779 who met the predened criteria.

As shown in the network map in Figure 4, the authors were dis-

tributed to four distinct clusters, each represented by a dierent

color. This network included 4,921 co-citation links and a total link

strength of 56,680. The authors are shown as nodes within the clus-

ters, while the edges illustrate how they have been co-cited. The

sizes of the nodes indicate the extent of their co-citation. As a re-

sult, authors who are frequently co-cited appear as larger nodes.

This pattern reveals a strong trend in scholarly relationships and

co-citations, along with the overall growth in research for this eld.

The red color is the rst cluster in the network map. It is the

largest and most central cluster and consists of authors like Chen

Y., Chen S., Wang Y., Wang H., Liu J., Zhang Y., and Xu X.

These authors have made major contributions in applying blockchain

technology, digital technology, and information systems to banking

and nance. They are most frequently cited in academic literature,

i.e., they are the foundation of theoretical and empirical research on

blockchain technology in the eld. This cluster is also highly linked

to other clusters, which indicates the intellectual power of the cluster

over other elds.

In contrast, the second cluster, as can be shown by the blue color,

includes prominent authors Kumar S., Khan S., Arner D.W., Zet-

zsche D.A., Thakor A.V., Kauman R.J., and Hassan M.K. These

authors are mainly involved with nancial regulation, law, and

policy matters related to blockchain technology. Their co-citation

network indicates that they concentrate on the risk, governance,

and legal concerns of blockchain implementation in banks. The

uniqueness of the cluster indicates the interdisciplinary connection

of information systems, law, and nance.

The third cluster, shown in green color, consists of authors such

as Nakamoto S., Tapscott D., De Filippi P., Eyal I., Zhang Z., Has-

sani H., Janssen M., Potts J., and El-haddadeh R. They provide an

all-round perspective of the revolutionary role of blockchain tech-

nology in banks. They examine cryptocurrencies, decentralization,

governance, and innovation. Additionally, their co-citation suggests

blockchain research covers a wide range of themes, from technical

to legal, economic, and regulatory domains.

Finally, the fourth yellow color cluster comprises the following

authors: Dwivedi Y.K., Kshetri N., Gupta S., Gunasekaran A., and

Venkatesh V. This cluster also appears to be talking about infor-

mation systems, models of technology adoption, and regulatory

eects of blockchain technology. The cluster suggests that there

is a widening of the research landscape on the implementation of

blockchain technology in bank operations and business designs, with

a concentration on technology adoption and strategic management.

In summary, these ndings will be valuable to other researchers,

IT professionals, nancial service rms, practitioners, and banking

professionals looking to consult with the right experts in related

services.

3.2.3. Keyword Co-occurrence Analysis

Keyword co-occurrence analysis is a widely used bibliometric

method that is employed to map and identify the intellectual struc-

ture and thematic evolution of a research eld. It measures the

frequency with which co-occurring pairs of keywords appear in the

same papers, based on the assumption that higher co-occurrence in-

dicates a stronger conceptual relationship between the terms [47].

This technique enables researchers to identify the primary research

themes, evaluate the conceptual associations, and detect emerging

topics in the literature [45, 75].

In the present study, we conducted a keyword co-occurrence

analysis using VOSviewer software to gain a more profound un-

derstanding of the thematic context of blockchain technology in

the banking sector. This approach has been demonstrated to be

eective in identifying the leading research clusters and their con-

nections. This is based on the frequency of using keywords and how

they co-occur across publications’ titles, abstracts, and keywords.

For this study, a minimum of ve occurrences for an author-

keyword was applied as an inclusion criterion. This was used to

ensure an analytical focus on the most relevant and frequently oc-

curring terms. Of the 1,131 keywords examined, 52 satised this

initial criterion. In the second stage of our research protocol, we

manually rened the dataset of the selected keywords by merging

singular and plural terms, such as ”cryptocurrency” and ”cryp-

tocurrencies,” ”smart contract” and ”smart contracts,” and ”bank”

and ”banks.” We also consolidated and standardized synonyms, in-

cluding ”distributed ledger” and ”distributed ledger technology,”

”ntech” and ”nancial technology,” ”decentralized nance” and

”DeFi,” and ”banking industry” and ”banking sector.” Furthermore,

we eliminated keywords that were not related to our topic, such as

”bibliometric analysis and COVID-19.” Following the data rene-

ment, the 42 keywords were included in the nal analysis. Table

8 presents the most frequently occurring keywords and the data

needed to ascertain areas related to blockchain research in banking.

The 42 keywords yielded 289 links, with a total TLS of 799, and

were organized into six distinct thematic clusters.

Table 8. Top Keywords by Occurrence

Rank Keyword Occurrences TLS

1 blockchain 206.00 373.00

2 ntech 68.00 171.00

3 banking 49.00 125.00

4 blockchain technology 49.00 49.00

5 cryptocurrency 40.00 104.00

6 bitcoin 33.00 92.00

7 articial intelligence 19.00 54.00

8 nancial inclusion 16.00 39.00

9 smart contracts 16.00 30.00

10 nance 14.00 39.00

11 digital banking 13.00 27.00

12 nancial services 13.00 36.00

13 innovation 13.00 40.00

14 security 13.00 31.00

S. A. Aladeeb et al.

wang l.

weber i.

panayi e.

kauffman r.j.

auer r.

bouri e.

zetzsche d.a.

bellavitis c.

sharma r.

hassan m.k.

chen s.

arami m.

li h.

liu y.

huang x.

garg p.

liu j.

el-haddadeh r.

rabbani m.r.

hair j.f.

thakor a.v.

yarovaya l.

li z.

swan m.

sarkis j.

queiroz m.m.

beck r.

gunasekaran a.

corbet s.

kumar a.

li y.

weerakkody v.

tapscott a.

kshetri n.

arner d.w.

zhang h.

de filippi p.

guo y.

li j.

zheng z.

nakamoto s.

kumar s.

xu x.

chen y.

chen x.

venkatesh v.

gupta s.

wang y.

wang h.

VOSviewer

Figure 4. Author Co-citation Network in Blockchain and Banking Research. Node size corresponds to citation inuence, while links indicate co-citation

strength. Colors denote major intellectual clusters.

The network visualization produced (Figure 5) presents these

clusters with each node representing a keyword, the node size rep-

resenting frequency of occurrence, and lines (edges) representing

co-occurrence relationships. The thickness of the lines is indicative

of the strength of the relationship between terms, with thicker lines

denoting a stronger relationship. The closeness of the lines to each

other is also a helpful way to determine how related they are.

As shown in the network map, the keyword ”blockchain” is the

most central node in terms of frequency of occurrence and inter-

connectivity. This is indicative of its central position in scientic

discourse. Secondary keywords such as ”banking,” ”ntech,” ”cryp-

tocurrency,” and ”Bitcoin,” which are also highly frequent and

highly interconnected, emphasize blockchain’s central position in

discourse regarding digital change in the nance and banking sec-

tor. The visualization (Figure

5) breaks down six distinct thematic

clusters based on the following:

The initial cluster (blue) focuses on cryptocurrencies and de-

centralization, as evidenced by the terms ”blockchain,” ”Bitcoin,”

”cryptocurrencies,” ”decentralization,” ”Ethereum,” ”money,” and

”regulation.” The strong interconnection between these key-

words and ”blockchain” indicates the inherent relationship of

blockchain technology with digital currencies, particularly Bitcoin

and Ethereum, which have always been of academic interest and

a research topic in this eld. Furthermore, the cluster groups crit-

ical words that dene the world of cryptocurrency, since Bitcoin,

Ethereum, and cryptocurrencies in general have a very close link

with terms such as ”decentralization” and ”money.” It is clear that

the literature in this cluster provides a comprehensive overview of

the history and evolution of blockchain technology as applied to

decentralized digital currencies. In addition to this, it provides a

detailed discourse on the regulation of crypto assets, which is an in-

evitable consequence of the disruptive eect that these assets have

on traditional nancial institutions. This cluster reects a wide range

of studies on how blockchain technology can reshape the structure of

money and payment systems, indicating sustained academic interest

in decentralized money innovations.

Conversely, the second cluster (red) focuses on banking in-

novation and technology adoption. This cluster includes both

emerging technology keywords, such as machine learning, arti-

cial intelligence, big data, and the Internet of Things, as well

as banking applications, including technology adoption, cybersecu-

rity, sustainability, and digital banking. Together, these keywords

encapsulate the technological infrastructure necessary to integrate

blockchain technology into banking. Furthermore, this suggests that

researchers are progressively interested in examining the combina-

tion of blockchain with other emerging technologies to re-engineer

banking operations and service delivery. The emphasis on cyberse-

curity and sustainability indicates great concerns about the security

and sustainability of innovation within nancial institutions.

Similarly, the third cluster (in green) includes the keywords

”banking,” ”ntech,” ”nance,” ”nancial services,” ”nancial in-

clusion,” ”crowdfunding,” and ”peer-to-peer lending.” This indicates

an awareness of blockchain technology’s macro-level ramications

on the augmentation of access to and eciency of nancial sys-

tems. The prevalence of the term ”ntech” in this cluster captures

the essence of the transformation in nancial intermediation, high-

lighting the pivotal role of blockchain technology in reengineering

nancial services. Additionally, the intersection of ”ntech” and

”nancial inclusion” suggests a promising research area exploring

blockchain’s potential to address gaps in the banking sector.

Another notable cluster, marked in purple, focuses on trust-

related issues and includes terms such as ”trust,” ”transparency,”

”security,” ”privacy,” ”smart contracts,” and ”banking.” The preva-

lence of these keywords indicates a persistent academic interest in

the technological and ethical dimensions of blockchain technology.

Specically, the focus is on the potential impact of blockchain tech-

nology on trust, privacy, and security in banking and nancial

institutions. This thematic emphasis highlights blockchain tech-

nology’s central role in addressing data integrity and user trust

challenges, both of which are key to maximizing its value in banking

applications.

The fth cluster is represented by light blue and comprises key-

words such as ”digitization,” ”innovation,” ”digital transformation,”

”banking services,” and ”Islamic banking.” These terms pertain to

digital transformation and innovation in the banking sector. This

thematic cluster indicates research trends that investigate the im-

pact of blockchain technology on contemporary banking models with

the aim of diversication and modernization.

The yellow cluster is particularly signicant because it in-

cludes the keywords ”distributed ledger technology,” ”decentralized

nance,” ”nancial regulation,” ”central bank digital currency,”

”cryptocurrencies,” and ”RegTech.” These terms are poised to

dominate future discourse concerning regulation and decentralized

nance (DeFi). ”Regtech” signies the integration of regulatory con-

trol and compliance in blockchain-based banking. The cluster also

highlights the pivotal role of policy and governance mechanisms in

the adoption of blockchain technology in nancial markets.

The network visualization of keyword co-occurrence in (Figure

5) led to the identication of six major clusters, conrming the the-

matic structure of the eld. These clusters show the current research

frontiers and common terms used by scholars. For the nal synthesis

and interpretation of these thematic clusters, please see Section 3.3.

3.2.4. Bibliographic Coupling of Documents

Bibliographic coupling is a bibliometric technique that measures the

similarity between two documents based on their shared references.

The extent of the overlap between references is indicative of the

strength of the implied connection among the documents. This is

because it is assumed that they are discussing the same topics or

drawing on identical intellectual structures [48]. This technique is

particularly useful for identifying stable research streams and the

underlying intellectual structure of a research eld.

The present study used VOSviewer to perform bibliographic cou-

pling analysis and to visualize the intellectual structure of blockchain

literature in the banking sector. Two documents are considered to

be bibliographically coupled if they cite one or more of the common

references. To enhance interpretability and focus on inuential con-

tributions, a minimum of 30 citations per document and a minimum

cluster size of 10 documents were applied to be analytically signi-

cant. The application of this criterion resulted in the selection of 65

articles, which were subsequently organized into four clusters, each

distinguished by a distinct color as shown in Figure 6.

In the resulting network visualization, each node represents an

individual academic paper that has been used in the analysis. The

size of a node is directly proportional to the number of citations it

has received. The presence of larger nodes is indicative of a greater

level of scientic inuence. Lines linking nodes indicate bibliographic

coupling relationships, while the thickness of the lines signies the

number of common citations between the two documents. The thick-

ness of the line is indicative of the strength of the connection, with

thicker lines denoting closer intellectual or thematic relationships.

Moreover, the visualization map supports two important quantita-

tive indicators. It produced 603 bibliographic links among the 65

documents that have demonstrated exceptional scholarly impact, as

evidenced by their substantial citation counts. Additionally, the to-

tal link strength (TLS), calculated as the sum of all individual link

strengths, is 1,226, reecting high levels of connectivity and a com-

prehensive set of blockchain banking studies. The network map in

this case provides valuable insight into thematic connectivity among

highly cited articles. The clustering reects how closely related the

topics are and how references are linked between publications, which

in turn highlights the main themes across the eld.

Figure

6 demonstrates that Thakor’s (2020) work exhibits con-

siderable scholarly inuence, characterized by its substantial node

and cross-cluster edges, thereby establishing a signicant connec-

tion between the domains of mainstream banking and blockchain

literature. Dai (2017), Minoli (2018), and Schuetz (2020) have also

been revealed to be central and highly connected nodes, forming a

dense core within the red cluster. The signicant overlap between

these elds could potentially indicate an underlying contribution,

particularly to blockchain technology and nancial applications. In

contrast, Auer (2022), Rehman (2023), and Kumar (2018) have fo-

cused their attention on peripheral areas, suggesting the existence

of niches or novel research avenues that are less directly connected

to the central literature. The peripheral nodes in this case reect the

growing bifurcation of topics such as DeFi and cryptocurrency regu-

lation. As shown in Figure 6 6, the map visualization demonstrates

the following clusters:

Cluster 1 (Red) is dominated by inuential documents, includ-

ing Dai (2017), Peters (2016), Schuetz (2020), Alhuwalia (2020),

Hooper (2020), Shoaib (2020), and Cuccuru (2017). The cluster

forms the theoretical basis of the eld and focuses on blockchain

technology infrastructure, settlement processes, transparency, au-

ditability, and value creation within nancial systems. This cluster

constitutes a pivotal theoretical construct, establishing intricate

internal relationships and exhibiting notable coupling strength.

Cluster 2 (Green), to which Thakor (2020), Minoli (2028), Chen

(2017), Bayram (2022), Naimi-Sadigh (2022), Sangwan (2020), and

Kimani (2020) belong, is characterized by its high level of inter-

connectedness and its tendency to explore blockchain convergence

with FinTech innovation and nancial inclusiveness for transform-

ing banking services. This tendency is underpinned by a focus on

empirical rationales and case-study ndings.

Cluster 3 (Blue) consists of the following documents: Javaid

(2022), Khalil (2022), Menon (2024), Elbashbishy (2022), Choo

(2020), Rehma (2023), and Schlatt (2022). This cluster emphasizes

digital transformation, service innovation, and customer-focused

approaches to blockchain banking. The signicant number of con-

nections within this cluster indicates the presence of an emergent

yet cohesive scholarly conversation.

Cluster 4 (yellow) is led by Garg (2021), Osmani (2021), Kumar

(2018), Le Nguyen (2018), and Auer (2022). This cluster presents

a network that also focuses on blockchain adoption models, con-

sumer trust, and theories of innovation diusion. This cluster is

grounded in extant literature on the behavior and diusion of in-

novation, thereby establishing a relationship at both technical and

organizational levels.

The high interconnectivities among clusters emphasize the in-

terdisciplinary nature of blockchain research in banking, due to the

convergence of technology, economics, regulation, and behavioral

perspectives. The prevalence of strong coupling relationships and

numerous thematic avenues also suggests that, despite the fact that

the eld is still in its infancy, it has attained some level of maturity

with well-dened but interrelated subelds.

S. A. Aladeeb et al.

trust

p2p lending

ethereum

cybersecurity

banking services

technology adoption

regulation

money

islamic banking

financial regulation

decentralization

privacy

big data

banks

transparency

decentralized finance

iot

distributed ledger technology

digitalization

banking sector

security

innovation

financial services

digital banking

finance

smart contracts

financial inclusion

artificial intelligence

bitcoin

cryptocurrency

banking

fintech

blockchain

VOSviewer

Figure 5. Keyword Co-occurrence Network of Blockchain and Banking Research. Node size reects keyword frequency, link strength indicates

co-occurrence intensity, and clusters represent dominant thematic areas.

3.3. Content Analysis and Thematic Clustering

In order to address Research Question 2 (RQ2), this section presents

a qualitative thematic analysis of literature on blockchain tech-

nology in banking, which was systematically performed with the

aim of identifying the main themes and providing a comprehensive

understanding of the research landscape. Instead of repeating the

bibliometric analyses provided in Section 3.2, this section builds

on those quantitative ndings to provide contextual interpretation,

conceptual validation, and thematic coherence.

As outlined in the research methodology section, the initial

identication of the major themes was derived using two methods:

keyword co-occurrence and bibliographic coupling (as described in

Sections 3.2.3 and 3.2.4). Keyword co-occurrence and bibliographic

coupling highlight the closest relationships in terms of their relation-

ship, or co-occurrence with each other, based on the frequency with

which they were cited by authors and published in peer-reviewed

journals [48, 76].

The previously described bibliometric research methods are help-

ful for creating a high-level map of the research domain. However,

they do not provide a detailed explanation of the substantive con-

tent within the identied thematic clusters. To address this gap in

the literature, we conducted a qualitative content analysis to syn-

thesize, triangulate, and validate the mapped bibliometric thematic

clusters.

To achieve this qualitative synthesis, we employed Braun and

Clarke’s thematic analysis framework [49]. First, we identied a

dataset of 70 articles selected in previous sections of this paper. We

then reviewed the articles manually to determine if the thematic

clusters contained similar semantic meanings and if the cluster con-

tents were conceptually consistent. Finally, we examined whether

the thematic clusters contained relevant theories.

As illustrated in Table

9, this methodological approach yielded

six robust thematic clusters that collectively dene the intellec-

tual structure of blockchain research in the banking sector from

2015 to May 2025. These thematic clusters represent the analytical

framework through which to understand how blockchain inuences

nancial intermediation, business processes, compliance with reg-

ulation, innovation strategy, trust creation, and integration with

next-generation technology.

The following discussion focuses on the conceptual signicance

of these themes, providing a concentrated and analytical summary

of the key intellectual trends in the eld. This fullls the mandate

of the systematic content review phase.

3.3.1. Cluster 1: Blockchain Applications for

Transforming Banking Operations and Financial

Intermediation.

This cluster represents the most foundational and established body

of literature on blockchains in banking, focusing on their ability to

improve eciency, smooth transaction frictions, or transform core

banking systems. More conceptually, the literature in this stream is

concerned with the idea that the value of blockchains is not found

in isolated pilot projects but rather in their integration into back-

oce functions, interbank settlement mechanisms, and audit and

compliance processes.

Foundational studies, such as [55, 56], establish the theoretical

frameworks that describe how blockchain technology helps automate

elbashbishy (2022)

rjoub (2023)

gan (2024)

rehman (2023)

andrian (2018)

auer (2022)

khalil (2022)

saheb (2021)

schlatt (2022)

hooper (2020)

pal (2021)

naimi-sadigh (2022)

patel (2022)

cuccuru (2017)

sangwan (2020)

osmani (2021)

kumar (2018)

ahluwalia (2020)

garg (2021)

minoli (2018)

schuetz (2020)

dai (2017)

thakor (2020)

VOSviewer

Figure 6. Document–Bibliographic Coupling in Blockchain and Banking Research. Nodes represent documents, node size reects citation inuence, links

indicate bibliographic coupling strength, and colors represent major intellectual and thematic clusters.

Table 9. Identied Thematic Clusters in Blockchain Banking Research.

No. Cluster Thematic Main Themes Sample References

1 Blockchain Applications for Transform-

ing Banking Operations and Financial

Intermediation

Real-time accounting, automation, reconcili-

ation, operational eciency, startup nance,

and cost reduction.

[26], [55, 56],[61, 62],[77–80]

2 Decentralized Finance (DeFi) and Cryp-

tocurrencies Enabled by Blockchain

DeFi, ICOs, remittances, nancial decentral-

ization, ethics of crypto, speculative behav-

ior.

[81–87]

3 Blockchain as an Enabler of Digital and

Financial Technology Convergence

Integration with IoT, AI, ML, FinTech, KYC,

smart contracts, and digital ID; enhancing

automation and inclusion.

[46], [54], [58], [88–92]

4 Trust-Related Dimensions in Blockchain-

Based Banking

Trust, transparency, data privacy, organiza-

tional condence, strategic alignment, adop-

tion barriers.

[

1], [60], [91], [93–96],

5 Regulatory, Legal, and Institutional

Frameworks for Blockchain Governance

Smart contracts and law, compliance, anti-

money laundering (AML), CBDCs, DeFi reg-

ulation, policy adaptation.

[59], [97–101]

6 Strategic Modernization of Banking

Business Model Enabled by Blockchain

Disruption, competitive strategy, sandboxes,

sustainable development.

[16], [37], [102]

banking ledgers, enhances settlement eciency and reconciliation

accuracy, and improves auditability. This technology also enables

continuous quality assurance and real-time accounting systems.

Building on this conceptual foundation, empirical evidence, notably

from the Sponta Banca initiative, shows that blockchain frame-

works signicantly reduce settlement timeframes, enhance data

traceability, and increase the reliability of interbank data exchange

[77].

A large body of literature on this cluster, such as works by

[26, 62, 80, 103, 104], consistently emphasizes the advantages of

blockchain technology. Compared to traditional systems, blockchain

technology enhances operational eciency in terms of cost savings,

risk mitigation, transaction security, transparency, and privacy.

Also, this technology helps minimize information asymmetry and

startup capital costs [61]. These benets extend beyond payments to

credit information systems, international settlements, and broader

nancial data networks. This reinforces the idea that blockchain

technology is fundamental rather than limited in application.

Furthermore, this cluster emphasizes the strategic and organiza-

tional factors that facilitate successful blockchain implementation.

S. A. Aladeeb et al.

Research using technology adoption models [79] and innovation ca-

pability frameworks [105] identies critical factors that mediate the

operational eectiveness of blockchain technology, including trust,

management commitment, and resource readiness. Furthermore,

studies focusing on emerging markets [78, 106] indicate that banks’

ability to achieve eciency improvements is signicantly aected by

institutional maturity and technological infrastructure.

Overall, these ndings underscore the importance of blockchain

technology as a key tool capable of reducing operational costs,

automating complex verication tasks, and promoting resilient -

nancial systems with low response times. However, the studies also

point to ongoing challenges, particularly with regard to scalability

and institutional readiness, which continue to aect the speed and

scope of practical implementation.

3.3.2. Cluster 2: Decentralized Finance (DeFi) and

Cryptocurrencies Enabled by Blockchain

This thematic cluster focuses on an increasingly signicant body

of research that examines blockchain technology as the core infras-

tructure for decentralized nance (DeFi) and cryptocurrency-driven

nancial systems. Theoretically, research presents blockchain as a

tool that eliminates intermediaries in conventional nancial oper-

ations by allowing direct peer-to-peer value exchange, automating

processes thr-ough smart contracts, and fostering transparent nan-

cial frameworks that function independently of central authorities.

The core idea that emerges from this stream is that decentralized -

nance not only improves current banking processes but also radically

challenges traditional models of nancial intermediation.

Groundbreaking research indicates that decentralized nance of-

fers an alternative nancial structure capable of replicating essential

banking activities or services, such as lending, borrowing, and as-

set trading, through decentralized protocols that are governed by

code rather than traditional institutions [

81]. This viewpoint is fur-

ther reinforced by theoretical contributions that depict blockchain

as a “trust protocol,” highlighting its function in enabling trans-

parency, immutability, and the automated execution of nancial

transactions [86]. Collectively, this body of literature lays the the-

oretical groundwork for comprehending how decentralized systems

challenge conventional banking frameworks.

Furthermore, empirical and analytical studies within this the-

matic cluster reveal a more nuanced and diverse landscape. While

blockchain-based money transfer systems and tokenized nancial in-

struments show the potential to reduce costs and increase eciency,

evidence suggests that adoption of cryptocurrencies is often driven

by speculative behavior rather than dissatisfaction with traditional

banking services [

85, 87]. Furthermore, research highlights persis-

tent concerns about market volatility, governance ambiguity, and

regulatory uncertainty, which continue to shape the risk prole of

decentralized nance (DeFi) systems [

82–84].

In addition to technical and economic factors, this research high-

lights the ethical, behavioral, and institutional consequences of

DeFi. Studies focusing on accountability, nancial inclusion, and

ethical responsibilities caution that the decentralization of nan-

cial authority introduces new challenges associated with consumer

protection, systemic risk, and regulatory supervision [82, 83, 107].

These ndings imply that the transformative capacity of DeFi is

closely connected to governance and public policy factors.

In summary, these studies arm that decentralized nance

(DeFi) and cryptocurrencies signify a groundbreaking extension

of blockchain technology with the potential to transform nan-

cial intermediation. However, the research notes that the long-term

viability of DeFi and its integration into mainstream banking sys-

tems depends on establishing regulatory frameworks, governance

structures, and empirical evaluations of systemic risks.

3.3.3. Cluster 3: Blockchain as an Enabler of Digital and

Financial Technology Convergence

This thematic cluster includes studies that look at blockchain as

a fundamental infrastructure that supports and enhances the func-

tionality of other emerging digital and nancial technologies, such

as articial intelligence (AI), machine learning (ML), the Internet of

Things (IoT), FinTech platforms, smart contract applications, and

digital identity systems. Conceptually, the literature in this stream

portrays blockchain not as an isolated solution but as a coordina-

tion and trust layer that improves interoperability, automation, and

data integrity across complex digital ecosystems.

Key contributions within this cluster highlight the potential of

blockchain to reshape nancial value chains by facilitating decentral-

ized data exchange, automated decision-making, and secure identity

management [54]. In addition to nancial services, this stream also

highlights the role of blockchain technology in securing Internet of

Things (IoT) systems by preventing data manipulation and enabling

decentralized control, particularly in environments that require high

levels of reliability and trust [58]. These studies portray blockchain

as a complementary technology that enhances the reliability and

transparency of data-driven nancial services while paving the way

for innovative digital intermediation. In this context, blockchain

technology enables the secure integration of diverse technologies that

typically operate in isolated locations.

Empirical studies further indicate that the convergence of

blockchain with AI, big data analytics, cloud computing, and mobile

banking technologies can lead to signicant performance enhance-

ments in the delivery of nancial services, especially in lending, risk

assessment, and customer onboarding processes [89, 90, 92, 107]. Ev-

idence from banking applications points out that such technological

convergence improves predictive accuracy, operational scalability,

and nancial inclusion, particularly for small and medium-sized

enterprises and underrepresented populations.

A notable subtheme within this cluster focuses on digital iden-

tity management and automated compliance processes. Research on

blockchain-based self-sovereign identity and smart contract–enabled

Know Your Customer (KYC) processes shows signicant advances

in privacy protection, cost-eectiveness, and regulatory compliance

[

91]. Similarly, research on block-chain-enabled access control mech-

anisms highlights its potential to improve data management and

security across interconnected digital platforms [88]. These applica-

tions demonstrate blockchain’s potential to address long-standing

ineciencies in identity verication and data governance within

nancial institutions.

In summary, this cluster emphasizes the importance of

blockchain as a key driver of technological convergence in digital

nance. However, the literature also indicates ongoing challenges

related to the system’s compatibility, organizational coordination,

and institutional compatibility. However, the literature also empha-

sizes ongoing challenges concerning system interoperability, regu-

latory harmonization, and compatibility. These limitations imply

that the advantages of blockchain integration hinge on supportive

institutional frameworks and the maturity of related technologies.

3.3.4. Cluster 4: Trust-Related Dimensions in

Blockchain-Based Banking

This cluster synthesizes literature examining the impact of trust

on the use of blockchain technology in banking. Cryptographic

verication and decentralized consensus have often led to charac-

terizing blockchain as a ”trustless” technology; however, existing

studies continually highlight the importance of trust between orga-

nizations, user condence, and the legitimacy of institutions when

implementing blockchain within the nancial sector. From a concep-

tual framework, research within this stream illustrates that whilst

blockchain does not remove trust, it re-establishes it, moving it from

centralized intermediaries to the technology itself, to governance

structures, and to the institutions.

Empirical research indicates that, for both users and banks, per-

ceived usefulness, transparency, and security are strong motivators

for the adoption of blockchain technology, while technical capabil-

ity is not as signicant [1, 60, 93]. These results indicate that both

types of trust (in technology and in the institution) interact with

each other rather than exist separately.

A second theme in this cluster discusses blockchain’s eects on

increasing transparency and providing customers with greater data

integrity and privacy. The research has indicated that the imple-

mentation of blockchain-based architectures can decrease the level

of information asymmetry between lenders and borrowers, provide

an increased level of auditing capabilities, and create greater levels

of condence in nancial transactions through various regulatory

processes, including lending [95, 96]. However, the literature points

out that certain organizational barriers to establishing trust exist

within nancial industries, such as resistance to changing current

ways of distributing credit and the lack of standardization, and the

uncertainty regarding accountability, i.e., which party or parties are

ultimately responsible in any given transaction [94].

Another prominent sub-theme is connected to digital identity

and the privacy-preserving elements of the associated trust mech-

anism. Research examining digital identity management through

blockchain and the application of KYC frameworks has illustrated

that decentralized identity models will augment user control over

their personal data and enable compliance with regulatory KYC re-

quirements, while also assisting banks and regulators in forming a

greater degree of institutional trust [

91]. Furthermore, current re-

search has demonstrated that the manner in which a digital identity

is constructed has a direct impact on the level of trust between

banks, regulatory authorities, and their customers.

In summary, the literature supporting this stream clearly es-

tablishes trust as a multi-dimensional construct that acts as an

intermediary factor in the adoption of blockchain technology in the

nancial industry. While blockchain technologies provide a structure

to increase transparency and security, the literature conrms that

accepting blockchain technology into an organization must reach the

appropriate balance between the institution’s expectations regard-

ing the reliability of the technology, the organizational readiness to

use the technology, the availability of clear laws and regulations re-

lated to the use of the technology, and the overall level of acceptance

by society at large.

3.3.5. Cluster 5: Regulatory, Legal, and Institutional

Frameworks for Blockchain Governance

This thematic cluster synthesizes research on the impact of regula-

tions, laws, and institutional frameworks on the use and adoption

of blockchain technology in nancial institutions. In theory, and

according to the literature in this stream, blockchain technology

contributes to greater transparency, process automation, and in-

creased eciency. However, institutions are unable to fully leverage

this potential due to the uncertainty surrounding the regulation of

this technology and because the current limited regulatory and legal

structures are unable to keep pace with the transformation brought

about by blockchain technology.

This theme focuses primarily on how enforcement and gover-

nance issues related to blockchain applications and the use of smart

contracts are evolving. Many researchers point to numerous areas

where the mechanisms for creating automatically enforced records

conict, as well as many unresolved issues related to accountability,

jurisdiction, and the enforceability of programming-based agree-

ments [97]. These challenges illustrate the diculty of applying

standard regulatory structures to decentralized nancial systems.

Other important areas in this thematic cluster are compliance

and risks that may threaten the integrity of the nancial system and

the systemic risks of blockchain technology. The use of blockchain

technology for pseudonyms in nancial transactions poses a poten-

tial dilemma for nancial regulators [73] [99, 74]. The ease of creating

anonymous accounts gives users easier access to money laundering

[98]. At the same time, technologies enabled by blockchain, such

as automated reporting, automated audit trails, and early warning

systems, enhance transparency and regulatory eectiveness [

100].

Additionally, studies indicate that regulatory approaches are

necessary to support blockchain technology innovations. Regula-

tory sandboxes serve as tools for managing the relationship between

innovation and risk through controlled testing, contributing to op-

portunities for learning, public policy development, and institutional

adaptation [26]. Researchers are also exploring ways to integrate

regulation into decentralized nance (DeFi) applications, emphasiz-

ing the need to incorporate governance and compliance mechanisms

into system design as a means of mitigating the risks associated with

decentralization [59].

Finally, studies on central bank digital currencies (CBDCs) show

how the introduction of blockchain technology has prompted pub-

lic authorities to develop hybrid governance models. Evidence from

digital currency projects shows central banks’ eorts to combine

technological advances with centralized oversight to achieve nancial

stability objectives and ensure the eective transmission of monetary

policy [101].

In conclusion, this research corpus reinforces the three essential

ingredients for the long-term success of blockchain technology in

the banking sector: regulatory clarity, institutional exibility, and

adaptive governance. In all three areas, the literature shows that

sustainable governance of blockchain technology requires a balance

between providing an environment conducive to innovation, ensuring

legal certainty for consumers, and maintaining systemic nancial

stability.

3.3.6. Cluster 6. Strategic Modernization of Banking

Business Model Enabled by Blockchain

This thematic cluster focuses on the role of blockchain as a mecha-

nism for modernizing conventional business models in the banking

industry, specically as a type of strategic transformation. While

much research refers to blockchain for its greater operational e-

ciency, the literature in this stream points out that blockchain’s

inuence will create long-term economic and social governance sys-

tems by creating new biases toward competition and allowing for

entirely new nancial service architectures.

The literature in this cluster collectively conceptualizes

blockchain technology as disruptive rather than complementary to

existing systems and processes. This body of literature recognizes

that blockchain platforms disrupt the traditional centralized struc-

ture of the banking industry by enabling the delivery of new services

to customers and allowing peer-to-peer interactions to create value

without going through a bank or intermediary. Consequently, banks

are under increased pressure to reevaluate their strategic positioning,

organizational structure, and competitive response to their evolving

roles in the digital nancial service environment [

16, 37].

S. A. Aladeeb et al.

In addition, this cluster of research has further explored how

business model innovation driven by blockchain technology can

support nancial inclusion and global sustainable development.

Studies have identied ways in which blockchain can provide

expanded access to nancial services, decrease transaction costs, and

improve transparency in areas such as payments, savings, credit,

and insurance, particularly inunderserved areas and regions [102].

However, the literature of this cluster has also emphasized that,

for these potential strategic benets to be realized, a supporting

institutional framework is necessary.

In general, this cluster validates the assertion that blockchain is a

strategic enabler of banking modernization, encompassing more than

incremental process improvements. That said, the ndings also indi-

cate that the eect of blockchain on banking ultimately depends on

how well nancial institutions use and integrate the new technology

into their organizational strategies and adapt to achieve organiza-

tional compliance and advance organizational goals in a changing

economy and broader social structure.

4. Research Implications

4.1. Theoretical Implications

This review enhances blockchain adoption theory by broadening pri-

marily individual-level acceptance models (e.g., TAM, UTAUT) and

organization-centered readiness viewpoints (e.g., TOE, RBV) into

a multi-tiered, ecosystem-based comprehension of blockchain dis-

semination in tightly regulated nancial contexts. The bibliometric

clustering demonstrates that blockchain adoption in the banking sec-

tor is inuenced not only by technological preparedness or perceived

value but also by the interplay of regulatory legitimacy, institutional

trust, cross-organizational interoperability, and strategic resource

management throughout nancial networks. This observation re-

nes traditional technology adoption models by highlighting that

disruptive nancial technologies face diusion constraints imposed

by governance frameworks and regulatory compliance demands, re-

sulting in adoption pathways that are fundamentally dierent from

those seen in cons-umer-oriented digital technologies.

Additionally, the thematic evolution indicates a theoretical shift

within the literature from initial techno-optimistic narratives to an-

alytical perspectives that focus on institutional, risk-oriented, and

governance issues. This progression marks a shift from exploratory

research on technology diusion to integrated frameworks that re-

gard blockchain as a facilitator of organizational transformation

rather than simply a discrete operational tool. Therefore, this review

presents a cohesive conceptual framework that incorporates tech-

nological, organizational, regulatory, and ecosystem dynamics into

a comprehensive explanatory model for blockchain-driven nancial

innovation.

By synthesizing bibliometric ndings with qualitative thematic

analysis, this study presents an established, multi-level framework

that describes the process by which the banking sector adopts

blockchain technology as a broader ecosystemic and governance-

driven process rather than a technology-driven phenomenon.

4.2. Managerial Implications

In addition to outlining technological advantages, the current nd-

ings suggest a strategic rethinking of blockchain as a tool for

organizational transformation rather than a mere digital upgrade.

By synthesizing insights from bibliometric and thematic clusters,

this research shows that the success of adoption is more dependent

on banks’ capacity to implement coordinated process reengineering,

cross-unit integration, and alignment of institutional governance

than on technical installation.

The thematic clusters that highlight operational eciency, cost

savings, and process automation imply that blockchain should be

viewed not just as a technological asset but also as a driver of op-

erational reorganization. Therefore, banking managers are urged to

re-evaluate current workows and identify areas where distributed

ledger technologies can optimize accounting processes, enhance rec-

onciliation accuracy, and decrease overhead expenses through smart

contract automation [55, 62].

Moreover, the ndings stress the growing importance of security,

transparency, and trust in modern banking practices. With increas-

ing cyber threats and regulatory compliance demands, blockchain-

based systems provide solutions for ensuring data integrity, tracing

audit trails, and automating contract enforcement. These fea-

tures are particularly pertinent to Know Your Customer (KYC)

and Anti-Money Laundering (AML) compliance frameworks, where

blockchain applications can support regulatory adherence while

simultaneously enhancing institutional credibility [98, 103].

Similarly, the rise of decentralized nance (DeFi) and token-

based ecosystems indicates a fundamental shift in banking busi-

ness models. As a result, managers must look beyond incremen-

tal enhancements to investigate new service architectures, such

as peer-to-peer intermediation platforms, blockchain-enabled pay-

ment systems, and digital asset tokenization. This shift requires

innovation-driven leadership cultures, investment in blockchain-

related expertise, and strategic alliances with ntech developers to

maintain a competitive advantage.

Lastly, the noted decrease in citation impact alongside increasing

publication volumes highlights the need for more practically oriented

blockchain initiatives. Banking leaders must connect blockchain

adoption to clearly dened institutional objectives, quantiable

performance metrics, and stepwise implementation strategies to en-

sure that investments yield tangible benets rather than remaining

symbolic or experimental.

In summary, these managerial implications illustrate that the

adoption of blockchain is primarily a challenge of leadership, gover-

nance, and change management, rather than solely a decision related

to technological procurement.

4.3. Practical Implications

From a practical viewpoint, this review indicates that blockchain

technology generates its most signicant benets when it is inte-

grated within regulatory and transactional frameworks rather than

operated as a standalone pilot initiative. The most pronounced em-

pirical focus in the literature pertains to cross-border settlements

and interbank transaction clearing, where ineciencies are still com-

mon. Incorporating blockchain into these areas has the ability to

speed up settlement times, lower operational expenses, and reduce

the risks of fraud [

26, 56, 60].

Concurrently, blockchain provides capabilities for automating

regulatory processes and managing identities. Smart contracts and

decentralized identity systems can improve compliance precision and

operational transparency, yielding considerable cost savings in ful-

lling KYC, AML, and nancial reporting requirements [

80, 98].

Therefore, regulatory bodies and nancial institutions are urged

to consider RegTech-driven blockchain solutions not merely as

additional controls but as comprehensive compliance frameworks.

The literature also highlights the inclusive potential of

blockchain, especially via DeFi-enabled micronance platforms,

crowdfunding opportunities, and mobile-focused peer lending

initiatives[59, 85, 108]. Such models create avenues for underserved

communities to obtain nancial services without reliance on tra-

ditional intermediaries. Implementation eorts should, therefore,

prioritize areas with high rates of nancial exclusion, particularly

in emerging and developing economies. For technology developers

and consulting agencies, the insights point to key areas for develop-

ment that include secure audit platforms, green nance traceability

systems, decentralized asset management frameworks, and interop-

erable payment solutions. Collaborative design partnerships with

nancial institutions are essential to ensure that technological mod-

els closely correspond with sector-specic regulatory and operational

needs.

Ultimately, the eective implementation of blockchain in the

banking sector necessitates not only experimental adoption but

also ongoing institutional coordination that encompasses regula-

tory dialogue, workforce education, governance adaptation, and

strategic oversight. Thus, the full potential of blockchain is real-

ized when technical advancements are aligned with organizational

preparedness and policy coherence.

5. Conclusion and Future Research

5.1. Conclusion

This research comprehensively examined the evolving intellectual

landscape and thematic development of blockchain studies within

the banking industry from 2015 to 2025 using a hybrid approach

that combines bibliometric analysis with qualitative systematic syn-

thesis. The analysis of 389 peer-reviewed articles highlighted distinct

developmental stages—from initial conceptual exploration to the-

matic broadening and into the current phase of applied governance

and integration studies.

An analysis of geographic contributions revealed disparities, with

the majority coming from India, the United States, and the United

Kingdom, while newer research centers in China, the United Arab

Emirates, and various parts of Europe are progressively inuencing

the empirical direction of the eld. At the levels of institutions and

authorship, research networks show both fragmentation and cross-

regional collaboration, indicating that global integration in research

is inconsistent.

Six key thematic clusters delineate the structure of disciplinary

knowledge: nancial intermediation and operational eciency, de-

centralized nance (DeFi) and cryptocurrencies, convergence of

blockchain technology, infrastructures for trust and transparency,

regulatory and governance frameworks, and modernization strate-

gies in banking. Together, these aspects characterize blockchain

as not just a standalone technological x but as an integrated

transformation platform that concurrently impacts organizational

frameworks, regulatory systems, and nancial ecosystems.

Although the volume of publications is on the rise, the literature

remains empirically scattered. Studies focusing on large-scale indus-

try adoption are limited, the interactions between blockchain and

complementary technologies (such as AI and IoT) are insuciently

theorized, and long-term evaluations of nancial stability and sys-

temic risk are scarce. Governance research, especially in areas of

regulatory enforcement and international coordination, is also still

underexplored.

In addition to mapping thematic growth, this review oers an

integrative theoretical framework based on our synthesis of the

six thematic clusters. This framework improves our understanding

of blockchain adoption by presenting it as an innovation pro-

cess inuenced by regulatory legitimacy, organizational governance,

and ecosystem interoperability, rather than merely a technical

event. This integrative view distinguishes the current review from

previous bibliometric analyses because it clearly articulates the

causal relationships connecting our validated knowledge structure

to the broader agenda of organizational transformation, regulatory

alignment, and strategic value creation in nance.

As a result, this study provides a cohesive theoretical groundwork

for future empirical research and oers practical insights for banking

professionals and policymakers as they navigate the implementation

of blockchain technologies in regulatory environments undergoing

transition.

5.2. Future Research Directions

To improve our understanding of this research area, more research

should be conducted on the six thematic clusters discussed earlier.

Since research on blockchain technology in the banking sector is in

its infancy, identifying and dening possible areas for future research

is crucial. These research directions are derived from existing liter-

ature and reect the gaps, constraints, and prospects identied by

previous researchers.

Existing studies have identied that blockchain has the potential

to transform operational processes in the banking sector for greater

eectiveness, nancial inclusion, and decentralized nance (DeFi),

as well as to completely modernize business models [55, 81, 86].

However, serious issues remain regarding regulatory ambiguity [59],

interoperability [26], adoption of trust [93], and integration into

future-proof technologies [90]. Thus, future research must bridge

these gaps through empirical, interdisciplinary, and cross-regional

studies.

Table 10 shows directions reecting both conceptual and practi-

cal priorities. These directions provide a research map for charting

blockchain scholarship and positioning policymakers, nancial insti-

tutions, and technology providers toward the development of secure,

ethical, and scalable distributed ledger technology applications.

5.3. Limitations of the Study

Despite providing an overall bibliometric and thematic analysis, this

study has some limitations that should be acknowledged. First, the

dataset was derived exclusively from the Scopus database. Although

Scopus provides the widest coverage of peer-reviewed journals re-

lated to nance, management, and information systems research,

the exclusion of other databases (such as Web of Science, IEEE

Xplore, and Google Scholar) may have resulted in the omission

of some relevant publications, particularly conference proceedings

and technically oriented studies. Nevertheless, this review focuses

primarily on the social, economic, managerial, and organizational

aspects of blockchain technology in the banking sector, rather than

on the development of engineering or cryptographic systems, which

are typically covered in technical databases.

Furthermore, the study examined 389 peer-reviewed articles from

2015 to May 2025. Due to Scopus’s dynamic nature, the database

used for the study might not include the newest publications at

the cuto time of the nal submission, which could slightly aect

the bibliometric results. The study only used VOSviewer to map

and visualize bibliometric networks. Although VOSviewer is a pop-

ular tool, other tools, such as Gephi or CiteSpace, could have been

used to provide additional bibliometric measures, including network

centrality, modularity, and mediation scores.

Furthermore, this research did not propose a conceptual model

for how banks adopt blockchain technology. Therefore, subsequent

studies can build on this research to develop a more extensive model

that encapsulates the multidimensionality of blockchain applica-

tions. Despite its limitations, the research provides a preliminary

examination of the intellectual structure and thematic history of

blockchain research in the banking sector.

S. A. Aladeeb et al.

Ethical Statement

No ethical approval was required for this study, as it did not involve

human or animal subjects.

Funding

This research received no specic grant from any funding agency in

the public, commercial, or not-for-prot sectors.

Declaration of competing interests

The authors declare that they have no known competing nancial

interests or personal relationships that could have appeared to in-

uence the work reported in this article. Moreover, they assert that

no conicts of interest exist.

Declaration of generative AI and AI-assisted

technologies in the writing process

During the preparation of this manuscript, the author(s) used

language editing tools/services, including DeepL and Grammarly,

to improve grammatical accuracy and readability. The author(s)

subsequently reviewed and edited the contents thoroughly for accu-

racy and integrity after utilizing these tools/services and are fully

responsible for the nal version of the manuscript.

Data Availability Statement

The bibliometric dataset supporting the ndings of this study, in-

cluding the Scopus CSV le used for VOSviewer analyses, is publicly

available on Zenodo at:

https://doi.org/10.5281/zenodo.17992285

Credit authorship contribution statement

[Sadeq Abdullah Aladeeb]: Conceptualization, Software, Methodol-

ogy, Data curation, Formal analysis, Investigation, Visualization,

Writing – original draft & Editing. [Fatima Zohra Sossi Alaoui]:

Supervision, Validation, review & editing.

Table 10. Blockchain Themes and Future Research Directions in Banking.

No.Cluster Theme Future Research Directions References

1 Blockchain Applica-

tions for Transform-

ing Banking Op era-

tions and Financial

Intermediation

• Carry out comparative empirical research assessing the eects of smart contracts on transac-

tion settlement durations and operational costs in various banks.

• Create process-mapping mo dels to quantify the reduction of reconciliation steps in interbank

clearing attributable to blockchain (utilizing Business Process Model and Notation “BPMN”

and time–motion analysis).

• Perform cross-country econometric evaluations to determine how blockchain-based remittance

solutions impact transfer expenses and delivery times in developing compared to developed

nations.

• Employ UTAUT2 or TOE frameworks to pinpoint the factors inuencing blo ckchain adoption

in retail versus corporate banking sectors.

• Employ UTAUT2 or TOE frameworks to pinpoint the factors inuencing blo ckchain adoption

in retail versus corporate banking sectors.

• Conduct case studies in low-income nations to uncover obstacles to scalability, interoperability,

and institutional integration.

[56],[61, 62], [77],[79],

[103],[105]

2 Decentralized Fi-

nance (DeFi) and

Cryptocurren-

cies Enabled by

Blockchain

• Model contagion and systemic risks within DeFi ecosystems through network analytics and

simulation methodologies (e.g., agent-based modeling).

• Conduct studies on regulatory impacts, comparing the eectiveness of various legal frameworks

in mitigating fraud and protecting consumers in DeFi lending platforms.

• Conduct studies on regulatory impacts, comparing the eectiveness of various legal frameworks

in mitigating fraud and protecting consumers in DeFi lending platforms

• Evaluate the inuence of DeFi credit markets on the liquidity, protability, and risk parameters

of commercial banks.

• Carry out behavioral studies to examine how cultural dierences shape motivations for adopt-

ing crypto currencies (sp eculation versus utility).

[81–83],[85–87]

3 Blockchain as an En-

abler of Digital and

Financial Technology

Convergence

• Design and evaluate blockchain–IoT prototypes for real-time Know Your Customer (KYC) /

Anti-Money Laundering (AML) monitoring within banking data streams.

• Assess the eectiveness of AI-enhanced smart contracts in dynamic access control through

penetration testing and cyb ersecurity evaluations.

• Create machine-learning models using blockchain transaction data to forecast credit risk or

fraud patterns, and validate using actual banking datasets.

• Develop and assess (Self-Sovereign Identity) SSI-based identity frameworks in partnership with

banks to gauge improvements in onboarding eciency and KYC compliance.

[58], [88],[90],[91]

4 Trust-Related

Dimensions in

Blockchain-Based

Banking

• mixed-methods surveys and interviews to evaluate the impact of human trust and organiza-

tional culture on blo ckchain adoption within banks.

• Establish a standardization readiness index to evaluate how system compatibility, legacy sys-

tems, and regulations imp ede blo ckchain integration.

• Design blo ckchain-based credit scoring prototypes and assess their eectiveness in diminishing

information asymmetry in SME lending.

• Implement longitudinal studies to track how increased transparency through blo ckchain inu-

ences customer trust over time.

[60],[93], [95],[96]

5 Regulatory, Legal,

and Institutional

Frameworks for

Blockchain Gover-

nance

• Propose and evaluate blockchain-enabled AML/CFT (Countering the Financing of Terrorism)

monitoring systems and measure their detection accuracy compared to traditional systems.

• Examine the ecacy of regulatory sandboxes by monitoring innovation outputs (patents,

pilots, startups) preceding and following sandbox involvement.

• Develop automated reporting and cryptographic pro of systems for embedded supervision mod-

els in DeFi.

• Analyze real-world CBDC pilot pro jects (e.g., e-CNY) to gauge privacy risks, transaction

speeds, and impacts on monetary p olicy using macro-nancial mo dels.

[26],[59],[97],[98],[101]

6 Strategic Moderniza-

tion of Banking Busi-

ness Model Enabled

by Blockchain

• Employ scenario analysis to illustrate how blockchain inuences competition between neobanks

and traditional banks.

• Perform studies on the eects of nancial inclusion by evaluating blockchain-based micro-

nance initiatives in rural or underserved areas.

• Chart out policy, infrastructure, and institutional elements that contribute to successful

blockchain-driven transformation using the PESTEL (Political, Economic, Social, Techno-

logical, Environmental, and Legal) framework and multi-country case research.

[16],[37],[102]

S. A. Aladeeb et al.

References

1. A. Kumari and C. Devi. The impact of ntech and blockchain

technologies on banking and nancial services. Technology In-

novation Management Review, 12(1/2):22010204, 2022. doi:

10.22215/timreview/1481.

2. S. Nakamoto. Bitcoin: A peer-to-peer electronic cash sys-

tem, 2008. Social Science Research Network, Rochester, NY:

3440802. doi:10.2139/ssrn.3440802.

3. T. Philip, R. G. Brown, and Y. Danny. Blockchain technology

in nance. Computer, 50(9):14–17, 2017. doi:10.1109/mc.

2017.3571047.

4. S. Sarmah. Understanding blockchain technology. Ameri-

can Journal of Computer Science and Technology, 8(2):23–29,

August 2018.

doi:10.5923/j.computer.20180802.02.

5. A. T. Sherman, F. Javani, H. Zhang, and E. Golaszewski. On

the origins and variations of blockchain technologies. IEEE

Security & Privacy, 17(1):72–77, 2019. doi:10.1109/MSEC.

2019.2893730.

6. A. A. Monrat, O. Schelén, and K. Andersson. A survey of

blockchain from the perspectives of applications, challenges,

and opportunities. IEEE Access, 7:117134–117151, 2019. doi:

10.1109/ACCESS.2019.2936094.

7. J. Li and M. Kassem. Applications of distributed ledger

technology (dlt) and blockchain-enabled smart contracts in

construction. Automation in Construction, 132:103955, 2021.

doi:10.1016/j.autcon.2021.103955.

8. M. S. Ali, M. Vecchio, M. Pincheira, K. Dolui, F. Antonelli, and

M. H. Rehmani. Applications of blockchains in the internet of

things: A comprehensive survey. IEEE Communications Sur-

veys & Tutorials, 21(2):1676–1717, 2019. doi:10.1109/COMST.

2018.2886932.

9. M. H. Joo, Y. Nishikawa, and K. Dandapani. Cryptocurrency,

a successful application of blockchain technology. Managerial

Finance, 46(6):715–733, 2019. doi:10.1108/MF- 09-2018-0451.

10. T. K. Mackey et al. ‘t-for-purpose?’ – challenges and op-

portunities for applications of blockchain technology in the

future of healthcare. BMC Medicine, 17(1):68, 2019. doi:

10.1186/s12916-019- 1296-7.

11. C. Laroiya, D. Saxena, and C. Komalavalli. Chapter 9 - ap-

plications of blockchain technology. In S. Krishnan, V. E.

Balas, E. G. Julie, Y. H. Robinson, S. Balaji, and R. Ku-

mar, editors, Handbook of Research on Blockchain Technol-

ogy, pages 213–243. Academic Press, 2020.

doi:10.1016/

B978-0- 12-819816- 2.00009- 5.

12. A. Pal, C. K. Tiwari, and A. Behl. Blockchain technology

in nancial services: a comprehensive review of the literature.

Journal of Global Operations and Strategic Sourcing, 14(1):61–

80, March 2021. doi:10.1108/JGOSS-07-2020- 0039.

13. Niels Hackius and Moritz Petersen. Blockchain in logistics

and supply chain: Trick or treat? In Wolfgang Kersten,

Thorsten Blecker, and Christian M. Ringle, editors, Proceed-

ings of the Hamburg International Conference of Logistics

(HICL), Vol. 23, pages 3–18, Berlin, 2017. epubli GmbH.

doi:10.15480/882.1444.

14. Marcin Hernes, Artur Rot, and Dorota Jelonek, editors. To-

wards Industry 4.0 — Current Chal lenges in Information

Systems, volume 887 of Studies in Computational Intelligence.

Springer International Publishing, Cham, Switzerland, 1st

edition, 2020. doi:10.1007/978-3-030-40417- 8.

15. L. Cocco, A. Pinna, and M. Marchesi. Banking on blockchain:

Costs savings thanks to the blockchain technology. Future

Internet, 9(3):25, September 2017.

doi:10.3390/fi9030025.

16. W. L. Harris and J. Wonglimpiyarat. Blockchain platform and

future bank competition. Foresight, 21(6):625–639, July 2019.

doi:10.1108/FS-12- 2018-0113.

17. H. S. Ali, F. Jia, Z. Lou, and J. Xie. Eect of blockchain tech-

nology initiatives on rms’ market value. Financial Innovation,

9(1):1–35, 2023. doi:10.1186/s40854-023-00456-8.

18. S. M. M. Rahman et al. Blockchain in the banking indus-

try: Unravelling thematic drivers and proposing a technological

framework through systematic review with bibliographic net-

work mapping. IET Blockchain, 5(1):e12093, 2025. doi:

10.1049/blc2.12093.

19. A. Ali. Decentralized nance (de) and its impact on tradi-

tional banking systems: Opportunities, challenges, and future

directions. Social Science Research Network, August 2024.

SSRN ID: 4942313. doi:10.2139/ssrn.4942313.

20. A. Alamsyah, G. N. W. Kusuma, and D. P. Ramadhani. A

review on decentralized nance ecosystems. Future Internet,

16(3):76, March 2024. doi:10.3390/fi16030076.

21. D. Tapscott and A. Tapscott. Blockchain Revolution: How the

Technology Behind Bitcoin Is Changing Money, Business, and

the World. Penguin Publishing Group, 2016.

22. T. Ahram, A. Sargolzaei, S. Sargolzaei, J. Daniels, and B. Am-

aba. Blockchain technology innovations. In 2017 IEEE

Technology & Engineering Management Conference (TEM-

SCON)

, pages 137–141, June 2017.

doi:10.1109/TEMSCON.

2017.7998367.

23. V. K. Vemuri. Blockchain: a practical guide to developing busi-

ness, law, and technology solutions. Journal of Information

Technology Case and Application Research, 2018. Accessed:

Jun. 18, 2025. doi:10.1080/15228053.2019.1588546.

24. R. Zhang, R. Xue, and L. Liu. Security and privacy on

blockchain. ACM Computing Surveys, 52(3):1–34, May 2020.

doi:10.1145/3316481.

25. E. O. Manu A. D. Bello A. O. Leo C. E. Ukatu A. A. Bello, D.

A. Oduro and N. Okika. Enhancing know your customer (kyc)

and anti-money laundering (aml) compliance using blockchain:

A business analysis approach. Iconic Research And Engineering

Journals, 8(9):297–305, 2025. [Online]. URL:

https://www.

irejournals.com/paper-details/1707440.

26. Y. Guo and C. Liang. Blockchain application and outlook in

the banking industry. Financial Innovation, 2(1):24, December

2016. doi:10.1186/s40854- 016-0034- 9.

27. T.-G. Budisteanu. Blockchain and the banking sector: Benets,

challenges and perspectives. Journal of Service Science (JSS),

13(03):288–300, 2025. doi:10.4236/jss.2025.133019.

28. R. Mishra, R. K. Singh, S. Kumar, S. K. Mangla, and V. Ku-

mar. Critical success factors of blockchain technology adoption

for sustainable and resilient operations in the banking industry

during an uncertain business environment. Electronic Com-

merce Research, 25(1):595–629, February 2025.

doi:10.1007/

s10660-023- 09707-3

29. H. Taherdoost. A critical review of blockchain acceptance

models—blockchain technology adoption frameworks and ap-

plications. Computers, 11(2):24, February 2022. doi:10.3390/

computers11020024.

30. J. Paul and A. R. Criado. The art of writing literature review:

What do we know and what do we need to know? International

Business Review, 29(4):101717, Aug 2020. doi:10.1016/j.

ibusrev.2020.101717.

31. J. Pritchard. Statistical-bibliography or bibliometrics? Journal

of Documentation, 25(4):348–349, 1969.

32. L. Haggarty. What is content analysis? Medical Teacher,

18(2):99–101, Jan 1996. doi:10.3109/01421599609034141.

33. Y. Feng, Q. Zhu, and K.-H. Lai. Corporate social responsibility

for supply chain management: A literature review and biblio-

metric analysis. Journal of Cleaner Production, 158:296–307,

Aug 2017. doi:10.1016/j.jclepro.2017.05.018.

34. T. Anushree, K. Puneet, M. Matti, and D. Amandeep.

Blockchain applications in management: A bibliometric analy-

sis and literature review. Technological Forecasting and Social

Change, 166:120649, May 2021. doi:10.1016/j.techfore.

2021.120649.

35. M. M. Alshater, M. Joshipura, R. E. Khoury, and N. Nasral-

lah. Initial coin oerings: a hybrid empirical review. Small

Business Economics, 61(3):891–908, Oct 2023. doi:10.1007/

s11187-022- 00726-2.

36. S. M. M. Rahman, K. J. Yii, E. K. Masli, and M. L.

Voon. The blockchain in the banking industry: a systematic

review and bibliometric analysis. Cogent Business & Manage-

ment, 11(1):2407681, Dec 2024.

doi:10.1080/23311975.2024.

2407681.

37. R. Patel, M. Migliavacca, and M. E. Oriani. Blockchain

in banking and nance: A bibliometric review. Research in

International Business and Finance, 62:101718, Dec 2022.

doi:10.1016/j.ribaf.2022.101718.

38. N. Donthu, S. Kumar, D. Mukherjee, N. Pandey, and W. M.

Lim. How to conduct a bibliometric analysis: An overview and

guidelines.

Journal of Business Research

, 133:285–296, Sep

2021. doi:10.1016/j.jbusres.2021.04.070.

39. M. J. Page and et al. The prisma 2020 statement: an updated

guideline for reporting systematic reviews. Revista Española de

Cardiología (English Edition), 74(9):790–799, Sep 2021. doi:

10.1016/j.rec.2021.07.010.

40. N. J. van Eck and L. Waltman. Software survey: Vosviewer, a

computer program for bibliometric mapping. Scientometrics,

84(2):523–538, Aug 2010. doi:10.1007/s11192-009-0146- 3.

41. N. J. van Eck and L. Waltman. Citation-based cluster-

ing of publications using citnetexplorer and vosviewer. Sci-

entometrics, 111(2):1053–1070, May 2017.

doi:10.1007/

s11192-017- 2300-7.

42. A. M. Alawag, W. S. Alaloul, B. N. Saleh Al-dhawi, A. O.

Baarimah, M. A. Bazel, and A. W. Mushtaha. A review and

bibliometric analysis of blockchain adoption within the context

of smart construction projects. In 2024 ASU International

Conference in Emerging Technologies for Sustainability and

Intelligent Systems (ICETSIS), pages 805–811, Jan 2024. doi:

10.1109/ICETSIS61505.2024.10459703.

43. Öguzhan Öztürk, Rıdvan Kocaman, and Dominik K. Kanbach.

How to design bibliometric research. Review of Managerial Sci-

ence, 18:3333–3361, 2024.

doi:10.1007/s11846-024- 00738-0.

44. Rahul Kumar. Bibliometric analysis: comprehensive insights

into tools, techniques, applications, and solutions for re-

search excellence. Spectrum of Engineering and Management

Sciences, 3(1):45–62, 2025.

doi:10.31181/sems31202535k.

45. C. M. J., L.-H. A. G., H.-V. E., and H. F. Science mapping

software tools: Review, analysis, and cooperative study among

tools. Journal of the American Society for Information Science

and Technology, 62(7):1382–1402, May 2011. doi:10.1002/

asi.21525.

46. Chaomei Chen. Science mapping: A systematic review of the

literature. Journal of Data and Information Science, 2(2), May

2017. doi:10.1515/jdis- 2017-0006.

47. M. Callon, J.-P. Courtial, W. A. Turner, and S. Bauin. From

translations to problematic networks: An introduction to co-

word analysis. Social Science Information, 22(2):191–235, Mar

1983. doi:10.1177/053901883022002003.

48. M. M. Kessler. Bibliographic coupling between scientic pa-

pers. American Documentation, 14(1):10–25, 1963. doi:

10.1002/asi.5090140103.

49. V. Braun and V. Clarke. Using thematic analysis in psychol-

ogy. Qualitative Research in Psychology, 3(2):77–101, Jan

2006. doi:10.1191/1478088706qp063oa.

50. M. Cheng, D. Edwards, S. Darcy, and K. Redfern. A

tri-method approach to a review of adventure tourism liter-

ature: Bibliometric analysis, content analysis, and a quanti-

tative systematic literature review. Journal of Hospitality &

Tourism Research, 42(6):997–1020, Aug 2018. doi:10.1177/

1096348016640588.

51. M. D. Moon. Triangulation: A method to increase validity,

reliability, and legitimation in clinical research. Journal of

Emergency Nursing, 45(1):103–105, Jan 2019. doi:10.1016/

j.jen.2018.11.004.

52. W. M. Lim, T. Rasul, S. Kumar, and M. Ala. Past, present, and

future of customer engagement. Journal of Business Research,

140:439–458, Feb 2022.

doi:10.1016/j.jbusres.2021.11.014.

53. D. W. Aksnes, L. Langfeldt, and P. Wouters. Citations,

citation indicators, and research quality: An overview of ba-

sic concepts and theories. SAGE Open, 9(1), 2019. doi:

10.1177/2158244019829575.

54. A. V. Thakor. Fintech and banking: What do we know?

Journal of Financial Intermediation

, 41:100833, Jan 2020.

doi:10.1016/j.jfi.2019.100833.

55. J. Dai and M. A. Vasarhelyi. Toward blockchain-based account-

ing and assurance. Journal of Information Systems, 31(3):5,

2017.

56. G. Peters and E. Panayi. Understanding modern banking

ledgers through blockchain technologies: Future of transaction

processing and smart contracts on the internet of money. So-

cial Science Research Network, Nov 2015. SSRN ID: 2692487.

doi:10.2139/ssrn.2692487.

57. S. Schuetz and V. Venkatesh. Blockchain, adoption, and nan-

cial inclusion in india: Research opportunities. International

Journal of Information Management, 52:101936, Jun 2020.

doi:10.1016/j.ijinfomgt.2019.04.009.

58. D. Minoli and B. Occhiogrosso. Blockchain mechanisms for

iot security. Internet of Things, 1–2:1–13, Sep 2018. doi:

10.1016/j.iot.2018.05.002.

59. D. A. Zetzsche, D. W. Arner, and R. P. Buckley. Decentralized

nance (de), 2020. doi:10.2139/ssrn.3539194.

60. P. Garg, B. Gupta, A. K. Chauhan, U. Sivarajah, S. Gupta,

and S. Modgil. Measuring the perceived benets of implement-

ing blockchain technology in the banking sector. Technological

Forecasting and Social Change, 163:120407, Feb 2021.

doi:

10.1016/j.techfore.2020.120407.

61. S. Ahluwalia, R. V. Mahto, and M. Guerrero. Blockchain tech-

nology and startup nancing: A transaction cost economics

perspective. Technological Forecasting and Social Change,

151:119854, Feb 2020.

doi:10.1016/j.techfore.2019.119854.

62. M. Javaid, A. Haleem, R. P. Singh, R. Suman, and S. Khan.

A review of blockchain technology applications for nancial

services. BenchCouncil Transactions on Benchmarks, Stan-

dards and Evaluations, 2(3):100073, Jul 2022. doi:10.1016/j.

tbench.2022.100073.

63. Katy Börner, Chaomei Chen, and Kevin W. Boyack. Visu-

alizing knowledge domains. Annual Review of Information

Science and Technology, 37(1):179–255, 2003. doi:10.1002/

aris.1440370106.

64. Katy Börner, T. N. Theriault, and Kevin W. Boyack. Mapping

science introduction: Past, present and future. Bulletin of the

S. A. Aladeeb et al.

Association for Information Science and Technology, 41(2):12–

16, 2015. doi:10.1002/bult.2015.1720410205.

65. G. Wolfgang and S. András. Analysing scientic networks

through co-authorship. In Handbook of Quantitative Sci-

ence and Technology Research, pages 257–276. Springer, 2005.

doi:10.1007/1-4020- 2755-9_12.

66. A. Isfandyari-Moghaddam, M. K. Saberi, S. Tahmasebi-

Limoni, S. Mohammadian, and F. Naderbeigi. Global sci-

entic collaboration: A social network analysis and data

mining of the co-authorship networks. Journal of Informa-

tion Science, 49(4):1126–1141, August 2023. doi:10.1177/

01655515211040655.

67. T. Luukkonen, O. Persson, and G. Sivertsen. Understand-

ing patterns of international scientic collaboration. Science,

Technology, & Human Values, 17(1):101–126, 1992.

68. Q. Gui, C. Liu, and D. Du. Globalization of science and

international scientic collaboration: A network perspective.

Geoforum, 105:1–12, October 2019. doi:10.1016/j.geoforum.

2019.06.017.

69. J. S. Katz and B. R. Martin. What is research collabora-

tion? Research Policy, 26(1):1–18, March 1997. doi:10.1016/

S0048-7333(96)00917- 1.

70. C. Wagner and L. Leydesdor. Mapping the network of global

science: comparing international co-authorships from 1990 to

2000.

International Journal of Technology and Globalisation

1(2):185–208, 2005. doi:10.1504/IJTG.2005.007050.

71. H. Small. Co-citation in the scientic literature: A new mea-

sure of the relationship between two documents. Journal of

the American Society for Information Science, 24(4):265–269,

1973. doi:10.1002/asi.4630240406.

72. H. D. White and B. C. Grith. Author cocitation: A litera-

ture measure of intellectual structure. Journal of the American

Society for Information Science, 32(3):163–171, 1981. doi:

10.1002/asi.4630320302.

73. D. Zhao and A. Strotmann. Intellectual structure of in-

formation science 2011–2020: an author co-citation analysis.

Journal of Documentation, 78(3):728–744, 2021.

doi:10.1108/

JD-06- 2021-0119.

74. K. W. McCain. Mapping authors in intellectual space: A

technical overview. Journal of the American Society for In-

formation Science, 41(6):433–443, 1990. doi:10.1002/(SICI)

1097-4571(199009)41:6< 433::AID-ASI11> 3.0.CO;2- Q.

75. M. Sedighi. Application of word co-occurrence analysis method

in mapping of the scientic elds (case study: the eld of

informetrics). Library Review, 65(1/2):52–64, 2016. doi:

10.1108/LR-07- 2015-0075.

76. L. Leydesdor and A. Nerghes. Co-word maps and topic model-

ing: A comparison using small and medium-sized corpora (n <

1,000). Journal of the Association for Information Science and

Technology, 68(4):1024–1035, 2017. doi:10.1002/asi.23740.

77. N. Cucari, V. Lagasio, G. Lia, and C. Torriero. The im-

pact of blockchain in banking processes: the interbank spunta

case study. Technology Analysis Strategic Management,

2022. Accessed: Jun. 25, 2025. doi:10.1080/09537325.2021.

1891217.

78. M. Shoaib, M. K. Lim, and C. Wang. An integrated framework

to prioritize blockchain-based supply chain success factors. In-

dustrial Management Data Systems, 120(11):2103–2131, 2020.

doi:10.1108/IMDS-04- 2020-0194.

79. R. K. Jena. Examining the factors aecting the adoption of

blockchain technology in the banking sector: An extended utaut

model. International Journal of Financial Studies, 10(4):90,

2022. doi:10.3390/ijfs10040090.

80. L. Mishra and V. Kaushik. Application of blockchain in deal-

ing with sustainability issues and challenges of nancial sector.

Journal of Sustainable Finance Investment, 13(3):1318–1333,

2023. doi:10.1080/20430795.2021.1940805.

81. P. Schueel. De: Decentralized nance - an introduction and

overview. Journal of Innovation Management, 9:i, Dec 2021.

doi:10.24840/2183-0606_009.003_0001.

82. C. Dierksmeier and P. Seele. Cryptocurrencies and business

ethics. Journal of Business Ethics, 152(1):1–14, Sep 2018.

doi:10.1007/s10551-016- 3298-0.

83. M. Fauzi and N. Paiman. Bitcoin and cryptocurrency: Chal-

lenges, opportunities and future works. Journal of Asian

Finance Economics and Business, 7:695–704, Aug 2020. doi:

10.13106/jafeb.2020.vol7.no8.695.

84. J. Campino, A. Brochado, and Á. Rosa. Initial coin oerings

(icos): Why do they succeed? Financial Innovation, 8(1):17,

Jan 2022.

doi:10.1186/s40854-021- 00317-2.

85. T. MacDonald, D. W. E. Allen, and J. Potts. Blockchains

and the boundaries of self-organized economies: Predictions for

the future of banking. Social Science Research Network, Mar

2016. SSRN working paper No. 2749514. doi:10.2139/ssrn.

2749514.

86. Alex Tapscott and Don Tapscott. How blockchain

is changing nance. Harvard Business Review, Mar.

1 2017. [Online]. URL: https://hbr.org/2017/03/

how-blockchain- is-changing- finance.

87. Raphael Auer and David Tercero-Lucas. Distrust or specu-

lation? the socioeconomic drivers of u.s. cryptocurrency in-

vestments. Journal of Financial Stability, 62:101066, 2022.

doi:10.1016/j.jfs.2022.101066.

88. Ravi Gupta, V. K. Shukla, S. S. Rao, Shahbaz Anwar, Pooja

Sharma, and Ritu Bathla. Enhancing privacy through ‘smart

contract’ using blockchain-based dynamic access control. In

2020 International Conference on Computation, Automation

and Knowledge Management (ICCAKM), pages 338–343, 2020.

doi:10.1109/ICCAKM46823.2020.9051521.

89. Amin Naimi-Sadigh, Tooraj Asgari, and Majid Rabiei. Digital

transformation in the value chain disruption of banking ser-

vices. Journal of the Knowledge Economy, 13(2):1212–1242,

2022. doi:10.1007/s13132- 021-00759- 0.

90. Husam Rjoub, Temitope S. Adebayo, and Dervis Kirikkaleli.

Blockchain technology-based ntech banking sector involve-

ment using adaptive neuro-fuzzy-based k-nearest neighbors

algorithm. Financial Innovation, 9(1):65, 2023. doi:10.1186/

s40854-023- 00469-3.

91. Vincent Schlatt, Jonas Sedlmeir, Sven Feulner, and Nils Ur-

bach. Designing a framework for digital kyc processes built

on blockchain-based self-sovereign identity. Information &

Management, 59(7):103553, 2022.

doi:10.1016/j.im.2021.

103553.

92. Syed Umar Rehman et al. Fintech adoption in smes and bank

credit supplies: A study on manufacturing smes. Economies,

11(8):213, 2023. doi:10.3390/economies11080213.

93. Qian Gan and Raymond Y. K. Lau. Trust in a ‘trust-free’

system: Blockchain acceptance in the banking and nance sec-

tor. Technological Forecasting and Social Change, 199:123050,

2024. doi:10.1016/j.techfore.2023.123050.

94. T. Saheb and F. H. Mamaghani. Exploring the barriers and

organizational values of blockchain adoption in the banking

industry. The Journal of High Technology Management Re-

search, 32(2):100417, November 2021. doi:10.1016/j.hitech.

2021.100417.

95. J. G. Umarovich and R. K. Bakhtiyorovich. Modeling the

decision-making process of lenders based on blockchain tech-

nology. In 2021 International Conference on Information Sci-

ence and Communications Technologies (ICISCT), pages 1–5,

November 2021. doi:10.1109/ICISCT52966.2021.9670211.

96. D. Kimani, K. Adams, R. Attah-Boakye, S. Ullah, J. Frecknall-

Hughes, and J. Kim. Blockchain, business and the fourth

industrial revolution: Whence, whither, wherefore and how?

Technological Forecasting and Social Change, 161:120254,

December 2020. doi:10.1016/j.techfore.2020.120254.

97. P. Cuccuru. Beyond bitcoin: an early overview on smart

contracts. International Journal of Law and Information Tech-

nology, 25(3):179–195, September 2017. doi:10.1093/ijlit/

eax003.

98. C. Albrecht, K. M. Dun, S. Hawkins, and V. M. M. Rocha.

The use of cryptocurrencies in the money laundering process.

Journal of Money Laundering Control, 22(2):210–216, May

2019. doi:10.1108/JMLC- 12-2017- 0074.

99. K.-K. R. Choo, S. Ozcan, A. Dehghantanha, and R. M. Parizi.

Editorial: Blockchain ecosystem—technological and manage-

ment opportunities and challenges: Part ii. IEEE Transactions

on Engineering Management, 69(3):773–775, June 2022. doi:

10.1109/TEM.2022.3147274.

100. S. Dashottar and V. Srivastava. Corporate banking—risk

management, regulatory and reporting framework in india:

a blockchain application-based approach. Journal of Bank-

ing Regulation, 22(1):39–51, March 2021. doi:10.1057/

s41261-020- 00127-z.

101. Jianguo Xu. Developments and implications of central bank

digital currency: The case of china e-cny. Asian Economic

Policy Review, 17(2):235–250, 2022. doi:10.1111/aepr.12396.

102. D. Mhlanga. Block chain technology for digital nancial in-

clusion in the industry 4.0, towards sustainable development?

Frontiers in Blockchain, 6, February 2023. doi:10.3389/

fbloc.2023.1035405.

103. M. Osmani, R. El-Haddadeh, N. Hindi, M. Janssen, and

V. Weerakkody. Blockchain for next generation services

in banking and nance: cost, benet, risk and opportunity

analysis. Journal of Enterprise Information Management,

34(3):884–899, Jun 2020. doi:10.1108/JEIM- 02-2020- 0044.

104. R. Weerawarna, S. Miah, and X. Shao. Emerging advances of

blockchain technology in nance: a content analysis. Personal

and Ubiquitous Computing, 27:1–14, Feb 2023. doi:10.1007/

s00779-023- 01712-5.

105. M. Khalil, K. F. Khawaja, and M. Sarfraz. The adoption of

blockchain technology in the nancial sector during the era

of fourth industrial revolution: a moderated mediated model.

Qualitative and Quantitative, 56(4):2435–2452, Aug 2022.

doi:

10.1007/s11135-021- 01229-0.

106. H. O. Mbaidin, M. A. K. Alsmairat, and R. Al-Adaileh.

Blockchain adoption for sustainable development in developing

countries: Challenges and opportunities in the banking sector.

International Journal of Information Management Data In-

sights, 3(2):100199, Nov 2023.

doi:10.1016/j.jjimei.2023.

100199.

107. Luca Rella. Blockchain technologies and remittances: From

nancial inclusion to correspondent banking. Frontiers in

Blockchain, 2, 2019. doi:10.3389/fbloc.2019.00014.

108. Zhuo Chen, Yanhui Li, Yujie Wu, and Jie Luo. The transi-

tion from traditional banking to mobile internet nance: an

organizational innovation perspective - a comparative study

of citibank and icbc. Financial Innovation, 3(1):12, 2017.

doi:10.1186/s40854-017- 0062-0.