
Fabrizio Lombardi
VerifiedNortheastern University · Electrical and Energy Engineering
Active 1982–2026
About
Fabrizio Lombardi is the ITC Endowed Professor of Electrical and Computer Engineering at Northeastern University, Boston. He graduated in 1977 from the University of Essex with a B.Sc. (Hons.) in Electronic Engineering and subsequently joined the Microwave Research Unit at University College London, where he earned a Master in Microwaves and Modern Optics, a Diploma in Microwave Engineering, and a Ph.D. from the University of London. His research focuses on fault-tolerant computing, VLSI CAD, testing, configurable computing, distributed systems, quantum and nano computing, ATE systems, and defect tolerance. Lombardi has held significant leadership roles, including serving as Chair of the Department of Electrical and Computer Engineering at Northeastern University from 1998 to 2004, and has been involved in organizing numerous international symposia and conferences. He has received multiple professional awards and honors, including the IEEE Fellow, the IEEE Transactions on Nanotechnology Editor-in-Chief, and leadership positions within IEEE societies. His extensive publication record and editorial activities underscore his contributions to advancing research in digital systems testing, nanotechnology, and emerging computing paradigms.
Research topics
- Computer Science
- Electronic engineering
- Engineering
- Algorithm
- Machine Learning
- Electrical engineering
- Computer hardware
- Parallel computing
- Arithmetic
- Artificial Intelligence
- Mathematics
- Computer engineering
- Embedded system
- Computational science
- Physics
- Programming language
- Computer architecture
- Reliability engineering
Selected publications
Arithmetic Circuits in Unipolar Format for Stochastic Computing (SC)
IEEE Open Journal of Nanotechnology · 2026-01-01
articleOpen accessSenior authorStochastic Computing (SC) is a computational technique that executes arithmetic operations on a random bitstream basis. Traditionally, stochastic arithmetic circuits are found according to probability equations, however not all arithmetic operations can directly be generated using this technique. This paper introduces a sequential processing technique to design SC arithmetic circuits in unipolar encoding format. The proposed technique processes each bit of data in the input bitstream by considering the previous information. The output bitstream can then easily be generated. A SC adder, a SC subtractor, a SC comparator, a SC absolute subtractor, a SC multiplier by integer, and a SC divider by integer are proposed. The proposed technique provides many advantages over the traditional methods such as ease of design, flexibility, reductions in limitations when generating arithmetic functions, and efficient circuits. Moreover, the value of the output signal can be read without the need to multiply the result. The impact of each parameter in the proposed SC arithmetic circuits is studied to assess the accuracy of the proposed SC circuit; then, a comparison between the proposed and other SC circuits is presented. The results in this paper show that the proposed designs are significantly better than other SC arithmetic circuits in terms of accuracy, correlation between input signals to the output data, and cross-correlation to input signals, area, delay, power dissipation, and power delay product (PDP). For example, in the proposed SC adder circuit, its accuracy is improved by 64.67% while its delay, power dissipation, area, and PDP are improved by 45.79%, 85.86%, 85.60%, and 92.33% respectively.
Reliable modular designs under time-continuous input data
Microelectronics Reliability · 2025-01-04 · 1 citations
articleSenior authorCorrespondingMixed-Precision Floating-Point Formats and Arithmetic for Highly Accurate Artificial Neural Networks
2025-12-16
articleOpen accessSenior authorRecent developments in artificial neural networks (ANNs) have resulted in applications requiring the integration of power and computational efficiency without compromising model accuracy. The data format and its precision play an important role in the final performance of complex ANN processing. As a compromise between hardware metrics and numerical accuracy, the so-called mixed-precision technique for floating-point (FP) calculations has been proposed. Since multiply-accumulation (MAC) is the most important operation, fused multiply-add (FMA) units designed for mixed-precision calculation can offer solutions by dynamically adjusting precision according to the operation's requirements, thus supporting both accuracy and efficiency. In this work, the design of several computational operations during inference and training of ANNs has been proposed; this includes an FMA-based MAC unit design that supports neuron computation with mixed-precision, and an efficient normalization unit based on novel schemes of FP division and square root designs. Evaluation results verify the benefits of the proposed design in terms of computational accuracy and operational latency. Also, several metrics are compared for different popular FP precision formats to show that the final accuracy can be leveraged by choosing the proper precision.
Adaptive Separately Constrained Triplet Loss (A-SCTL) for High-Performance Triplet Networks
IEEE Transactions on Nanotechnology · 2025-01-01
articleOpen accessSenior authorTriplet Networks (TNs) consist of three subchannels and are widely utilized in machine learning applications. The efficacy of TNs is highly dependent on the loss function employed during training. This paper proposes a novel loss function for TNs, referred to as the Adaptive Separately Constrained Triplet Loss (A-SCTL). The unique feature of A-SCTL is the separation of intra-class and inter-class constraints, strictly adhering to the objective of similarity-measuring networks. Its adaptive strategy leverages the dynamics between inter-class and intra-class terms to achieve a balanced convergence; without manually adjusting hyperparameters, it enhances flexibility and facilitates adaptation across various applications. Moreover, A-SCTL mitigates possible false solutions and offers insights into network behavior through the dependency of the two constraint terms. Performance metrics of the loss functions are evaluated in deep metric learning classification and face recognition tasks. Simulations illustrate the evolution of the two loss terms and the adaptive hyperparameter across training epochs; the results demonstrate that TNs utilizing A-SCTL outperform other existing loss functions in accuracy. Additionally, this paper details the hardware implementation of A-SCTL and evaluates its associated overhead. Results show that compared to other losses, the additional hardware overhead required for A-SCTL is negligible (0.008% energy per operation) when considering the entire TN system.
Approximate Memory Protection Against Double-Adjacent Bit Errors with Low Redundancy SEC-DAEC Codes
2025-10-21
articleSenior authorError protection schemes (e.g., Error Correction Codes (ECCs)) must be implemented for memories in safety-critical applications to guarantee a dependable system operation. However, for hardware-constrained systems at the nanoscale, the maj or challenge of employing ECCs is to meet the requirements of both dependability and hardware efficiency. A solution is to further consider the specific protection requirements of different systems, rather than developing only generic ECCs. For example, for some applications like machine learning, data tends to have different importance and only errors on some critical bits can degrade performance. In this paper, an approximate protection scheme using codes derived from original Single Error Correction-Double Adjacent Error Detection (SEC-DAED) codes is proposed. The proposed codes expand the correction capability for several critical bits under DAEs (the most frequent error pattern for memories with advanced technology), while keeping the number of parity bits and decoding complexity unchanged. The design of such codes is established to achieve the maximum number of critical bits that can be protected; examples for protecting 16- and 32-bit memories are provided to verify the theoretically supported number of critical bits (6 and 20 bits respectively). Moreover, compared to existing ECCs with similar protection capability, the proposed codes incur the smallest memory redundancy and lower (or similar) encoder/decoder overhead as per synthesis results. Finally, a case study for protecting the parameter memory of neural networks is presented; results show that although protecting only some critical bits under DAEs, the model with the proposed scheme achieves the same accuracy as in the error-free case.
A Configurable Floating-Point Fused Multiply-Add Design With Mixed Precision for AI Accelerators
IEEE transactions on circuits and systems for artificial intelligence. · 2025-05-13 · 1 citations
articleSenior authorHardware accelerators for deep learning in artificial intelligence applications must often meet stringent constraints for accuracy and throughput. In addition to architecture/algorithm improvements, high performance computational techniques such as mixed precision are also required. In this paper, a floating-point (FP) fused multiply-add (FMA) unit supporting mixed/multiple precision is proposed. A wide range of conventional FP formats (such as half and single) as well as emerging formats (including E4M3, E5M2, DLFloat, BFLoat16 and TF32) are supported in the proposed design. In addition to all these formats, the proposed design is flexible in manipulating the exponent and mantissa lengths for 8, 16 and 32-bit FP numbers based on the needs of an application. The proposed FMA can be configured to support either multiple normal FMA operations, or alternatively mixed precision in ASIC. It is fully pipelined and in each cycle, the input bit streams are processed based on the provided configuration, so independent of the previous cycles. For normal FMA operations, the proposed design utilizes sharing of resources to parallelize multiple operations based on the available hardware and required precision. For mixed precision the FMA accumulates the lower precision dot products into higher precision to avoid overflow/underflow. It improves computational accuracy by adding all possible dot products at the same time while decreasing the number of rounding operations to prevent rounding errors. An innovative method to accumulate the dot products and the aligned addend is also proposed. By considering tradeoffs between reusing the available hardware and removing unnecessary complex units, a more efficient and flexible design is attained in terms of hardware metrics and supported different precision computation compared to other designs found in the technical literature. Extensive simulation results for comparative analysis are provided.
Can ChatGPT Learn to Count Letters?
Computer · 2025-02-20 · 4 citations
articleOpen accessSenior authorLarge language models (LLMs) struggle on simple tasks such as counting the number of occurrences of a letter in a word. In this paper, we investigate if ChatGPT can learn to count letters and propose an efficient solution.
Low-Power Multiplier Designs by Leveraging Correlations of 2$\times$×2 Encoded Partial Products
IEEE Transactions on Computers · 2025-09-02
articleMultipliers, particularly those with small bit widths, are essential for modern neural network (NN) applications. In addition, multiple-precision multipliers are in high demand for efficient NN accelerators; therefore, recursive multipliers used in low-precision fusion schemes are gaining increasing attention. In this work, we design exact recursive multipliers based on customized approximate full adders (AFAs) for low-power purposes. Initially, the partial products (PPs) encoded by 2×2 multiplications are analyzed, which reveals the correlations among adjacent PPs. Based on these correlations, we propose 4×4 recursive multiplier architectures where certain full adders (FAs) can be simplified without affecting the correctness of the multiplication. Manually and synthesis tool-based FA simplifications are performed separately. The obtained 4×4 multipliers are then used to construct 8×8 multipliers based on a low-power recursive architecture. Finally, the proposed signed and unsigned 4×4 and 8×8 multipliers are evaluated using a 28nm CMOS technology. Compared with DesignWare (DW) multipliers, the proposed signed and unsigned 4×4 multipliers achieve power reductions of 16.5% and 11.6%, respectively, without compromising area or delay; alternatively, the delay can be reduced by 20.9% and 39.4%, respectively, without compromising power or area. For signed and unsigned 8×8 multipliers, the maximum power reductions are 9.7% and 13.7%, respectively, albeit with a trade-off in area.
On the Dependable Operation of Key-Value Caches in Large Language Models (LLMs)
2025-03-28 · 1 citations
preprintOpen accessSenior authorThe use of Transformer-based architectures has triggered the fast development of large language models (LLMs); LLMs achieve unprecedented performance for a wide range of natural language processing tasks. The Attention mechanism in LLMs is computationally intensive, so most LLMs choose to cache the Keys and Values vectors of existing tokens to achieve a tradeoff between computational complexity and additional memory; this technique is generally known as KV cache. The size of this cache can be even larger than the memory storing the parameters, so, for its dependable operation as requirement in many applications, it is very important to assess its performance in the presence of soft errors in the memory. To the best of the authors' knowledge the impact of soft errors on the memory for the KV cache has not been previously studied. In this paper, the impact of bit-flip memory errors on the KV caches (with half-precision floating-point values) of two widely used LLMs (Mistral-7B and LlaMA2-7B) is evaluated based on error injection simulation. The results show that the first two exponent bits of the cache values are critical for LLM dependability, and errors on the prefilling stage tend to have a more severe impact than those on the decoding stage.
IEEE Transactions on Computers · 2025-09-01
articleOpen accessSenior authorThe utilization of Large Language Models (LLMs) requires dependable operation in the presence of errors in the hardware (caused by for example radiation) as this has become a pressing concern. At the same time, the scale and complexity of LLMs limit the overhead that can be added to detect errors. Therefore, there is a need for low-cost error detection schemes. Concurrent Error Detection (CED) uses the properties of a system to detect errors, so it is an appealing approach. In this paper, we present a new methodology and scheme for error detection in LLMs: Concurrent Linguistic Error Detection (CLED). Its main principle is that the output of LLMs should be valid and generate coherent text; therefore, when the text is not valid or differs significantly from the normal text, it is likely that there is an error. Hence, errors can potentially be detected by checking the linguistic features of the text generated by LLMs. This has the following main advantages: 1) low overhead as the checks are simple and 2) general applicability, so regardless of the LLM implementation details because the text correctness is not related to the LLM algorithms or implementations. The proposed CLED has been evaluated on two LLMs: T5 and OPUS-MT. The results show that with a 1% overhead, CLED can detect more than 87% of the errors, making it suitable to improve LLM dependability at low cost.
Recent grants
Frequent coauthors
- 194 shared
Cecilia Metra
Los Alamitos Medical Center
- 146 shared
Thomas M. Conte
- 146 shared
Gianluca Setti
King Abdullah University of Science and Technology
- 142 shared
Shanshan Liu
SAIC-GM (China)
- 138 shared
Jill Gostin
American University of Beirut
- 132 shared
David S. Ebert
- 131 shared
Elizabeth Burd
- 130 shared
Stefano Zanero
Politecnico di Milano
Awards & honors
- IEEE/Engineering Foundation Research Initiation Award (1985/…
- Silver Quill Award from Motorola-Austin (1996)
- Outstanding Engineering Research Award at Northeastern Unive…
- International Research Award from the Ministry of Science an…
- Fellow, Institute of Electrical and Electronics Engineers
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Fabrizio Lombardi
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup