Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Mikhail Dorojevets

Mikhail Dorojevets

· Associate ProfessorVerified

Stony Brook University · Electrical and Computer Engineering

Active 1995–2018

h-index15
Citations829
Papers48
Funding
See your match with Mikhail Dorojevets — sign in to PhdFit.Sign in

About

Mikhail Dorojevets is an Associate Professor at the Department of Electrical and Computer Engineering at Stony Brook University. His research focuses on parallel computer architecture, high-performance systems design, and superconductor processors. His work involves developing advanced computational systems and architectures to enhance processing capabilities and efficiency in high-performance computing environments.

Research topics

  • Computer science
  • Parallel computing
  • Electrical engineering
  • Computer hardware
  • Embedded system

Selected publications

  • Energy-Efficient Superconductor Bloom Filters for Streaming Data Inspection

    IEEE Transactions on Dependable and Secure Computing · 2018-06-04 · 4 citations

    article1st authorCorresponding

    Bloom filters can be used in network intrusion detection systems to detect known attack signatures in packet payloads. In this paper we propose and analyze the potential application of superconductor flux quantum technology for streaming data inspection with Bloom filters designed with Reciprocal Quantum Logic (RQL). This paper describes the gate-level design, performance, and energy-efficiency analysis of three superconductor 2 Kbit Bloom filters with 1) the run-time selection of the number of hashes per stream, and 2) different numbers of input streams per Bloom filter. The Bloom filter circuits were designed using a bottom-up approach with manual placing and routing of basic RQL gates. The design complexity is below 97K Josephson junctions. The highest clock frequency reached in the simulation of the circuits is 14.7 GHz. The false positive rates of the RQL Bloom filters are in very close agreement with the theoretical expectations of the false positive probability for the filters. For the cryocooling efficiency of 0.1 percent, the RQL Bloom filters demonstrate high energy efficiency in the range of ~1.5-43.6 pJ/stream/operation at room temperature for stream lengths from 16 to 256 bits. All circuits are designed and simulated for the 248 nm MIT Lincoln Laboratory SFQ5ee fabrication process.

  • FPGA-based satisfiability filters for deep packet inspection

    2018-05-01 · 1 citations

    articleSenior author

    Satisfiability (SAT) filters have been recently proposed as a fast and storage-efficient way of implementing set membership operations. In this paper we discuss the application of the random SAT filters with k hash functions (k-SAT filters), for detecting the potential presence of known malicious signatures (byte patterns) in packet payloads to prevent cyber attacks. We developed and verified the operation of a FPGA-based 3-SAT filter with 3 hash functions per signature. The hash functions are implemented with bit stream processing circuits using the CRC-32 polynomial. The 3-SAT filter with 1,024 variables has a single-instance architecture with 64 solutions for a set of 3,360 input test patterns extracted from the content fields of the known malicious signatures in the Snort intrusion detection system database. During a filter construction phase, the 64 “good” solutions with the maximum Hamming distance between them have been selected among the 8,000 solutions found by a SAT solver. A Digilent Arty A7 with an Artix-7 FPGA was used to implement the filter design. The complete FPGA filter system operates at a 200 MHz clock rate and uses 720 Kbit of BRAM, 17,606 LUTs, and 20,296 flip-flops. The experimentally observed false positive rate for 50,000 randomly-generated signatures of different lengths was ~1.6%. The 3-SAT FPGA design can be used to work with any set of signatures of interest with no need for changing and re-synthesizing VHDL code and reprogramming the entire FPGA. The results of this project allow for better understanding and planning of our next steps in the work on k-SAT filter applications for deep packet inspection.

  • Novel integration of Dimetheus and WalkSAT solvers for k-SAT filter construction

    2017-05-01 · 2 citations

    articleSenior author

    This paper describes a novel approach used to integrate two leading satisfiability (SAT) solvers, Dimetheus and WalkSAT, into a system to provide users with solver selection and solution customization capabilities. The two solvers are efficient for two different cases, have different execution procedures and generate a single solution from a single line of command. This integration provides the most efficient way to find multiple random solutions of any set membership problem from a common single line of command. To build an effective k-SAT filter, multiple random solutions are essential. The integration also provides a unified solution output format rather than two different output formats of two solvers. The theoretical approach and a practical C program were developed and tested during the work in an on-going project to build world's first practical k-SAT filter for deep packet inspection in network intrusion detection systems at Stony Brook University.

  • Design and Demonstration of a 30 GHz 16-bit Superconductor RSFQ Microprocessor

    2015-03-10 · 1 citations

    article1st authorCorresponding

    Abstract : The major objective of the project was to design and demonstrate operation of key components of a 30 GHz 16-bit RSFQ processor prototype implemented with the AIST/ISTEC 10 kA/cm sq. fabrication process. Our team has developed complete logical and physical designs of five RSFQ chips using the CONNECT cell library and RSFQ CAD tools developed at the Universities of Yokohama and Nagoya (Japan). The major results are the world's first successful design, fabrication, and demonstration of correct operation of a 20 GHz 8x8-bit parallel carry-save RSFQ multiplier with approximately 6K JJs, a 16-bit sparse-tree wave-pipelined RSFQ adder with approximately 10K JJs, and partial operation of an 8-bit ALU chip with approximately 9K JJs. The goal of the second phase of the project was to get detailed understanding of the performance, complexity, and energy efficiency of on-chip storage units implemented with superconductor Reciprocal Quantum Logic (RQL) using our RQL VHDL cell library tuned to the MIT Lincoln Laboratory 10 kA/cm2 248 nm process. The 8.5 GHz 1-4 Kbit 32-/64-bit multi-ported scratchpad memory, register files, write-through and write-back caches designed with RQL Non-Destructive Read-Out storage cells have the average energy consumption of 3.0-9.5 fJ/bit/operation at room temperature using the cryocooling efficiency of 0.1%.

  • Fast pipelined storage for high-performance energy-efficient computing with superconductor technology

    2015-10-01 · 12 citations

    article1st authorCorresponding

    New superconductor single flux quantum (SFQ) technology, such as Reciprocal Quantum Logic (RQL), is currently considered one of the promising candidates for highperformance energy-efficient computing. This paper presents our work on the design and detailed energy efficiency analysis of three types of 32- and 64-bit RQL multi-ported pipelined local storage structures (13 total), namely 1) random access memory (RAM) and register files, 2) direct-mapped write-through and write-back caches, and 3) first-in-first-out (FIFO) buffers. Our layout-aware cell-level design process uses a VHDL RQL cell library developed at the Ultra High Speed Computing Laboratory at Stony Brook University (SBU). The SBU VHDL RQL cell library specifies the dynamic and standby energy consumption, gate delays, a number of Josephson junctions (JJs) per cell, and approximate sizes of individual cells based on the parameters of the 248 nm 100 μA/μm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> 10 Nb metal layer SFQ fabrication process currently under development at the MIT Lincoln Laboratory. Gate and wire delays as well as clock skew are taken into account during digital circuit simulation done with Mentor Graphics CAD tools. After completing a physical chip layout, the circuit models need to be updated and re-simulated to include the effects of parasitic inductances and actual wire lengths on signal propagation delays. To meet both performance and energy efficiency targets, the RQL storage structures were designed with RQL non-destructive read-out single-bit storage cells. We chose a relatively moderate clock frequency of 8.5 GHz for all storage units to keep their read latencies in the range of 1- 3 cycles. The most complex design in terms of JJs is a tripleported 4 Kbit 64x64-bit register file with 253,918 JJs and its read access latency of 338 ps. The highest energy consumption in terms of energy/operation/bit (~9.5 aJ at 4.2 K) is for a write hit in a 2 Kbit 32-bit wide write-back cache. The average energy consumption of the RQL storage designs varies from ~1.6 aJ/operation/bit for a small 4x32-bit FIFO to 7.3 aJ/operation/bit for the 2 Kbit write-back cache at 4.2 K. Given the cryocooler efficiency of 0.1%, this means the energy consumption of ~1.6-7.3 fJ/operation/bit at room temperature. The physical implementation of the RQL storage units will become feasible upon the development of the target MIT fabrication process and CAD tools for VLSI RQL chip design in 2015-2016.

  • Towards 32-bit Energy-Efficient Superconductor RQL Processors: The Cell-Level Design and Analysis of Key Processing and On-Chip Storage Units

    IEEE Transactions on Applied Superconductivity · 2014-11-06 · 31 citations

    article1st authorCorresponding

    New superconductor single flux quantum logics with no static power dissipation in bias resistors, such as Reciprocal Quantum Logic (RQL), offer opportunities to create energy-efficient superconductor processors operating at high frequencies with ultra-low power consumption. This paper discusses the results of our work on the cell-level design and analysis of a benchmark set of 32-/64-bit RQL processor integer and floating-point units such as adders, multipliers, an arithmetic-logic unit, and an array shifter, as well as small 1-4 Kbit RQL on-chip storage components such as register files, on-chip memory, and the top level caches. Our layout-aware design process includes the complete cell-level design and approximate physical layout of the circuits followed by the VHDL simulation, verification, and energy profiling using our RQL VHDL cell library tuned to the future MIT Lincoln Laboratory 10 kA/cm2 248 nm process with 10 Nb metal layers and the minimum JJ critical current of 38 μA. Our designs have the energy efficiency of ~1.0 single-precision TFLOPS/W and '0.5 double-precision TFLOPS/W for floating-point units, and ~1-24 TOPS/W for 32-bit integer units at room temperature using the cryocooling efficiency of 0.1 % (1000 W/W). The 1-4 Kbit 32-/64-bit multi-ported scratchpad memory, register files, write-through and write-back caches designed with RQL Non-Destructive Read-Out storage cells have the average energy consumption of 3.0-9.5 fJ/bit/operation at room temperature using the cryocooling efficiency of 0.1%. While these results are very promising, more work is needed to evaluate the contribution of the energy costs of instruction scheduling and off-chip main memory access to the energy efficiency of RQL computing across a whole system.

  • Demonstration of an 8&amp;#x00D7;8-bit RSFQ multi-port register file

    2013-07-01 · 8 citations

    article

    As a part of the 8-bit RSFQ processor datapath development, we have designed, fabricated, and experimentally demonstrated an 8×8-bit RSFQ multi-port register file. The register file provides input data operands and stores Arithmetic Logic Unit (ALU) results. It can perform two simultaneous non-destructive “read” operations and one “write” operation and is capable of storing eight 8-bit words. The distinct feature of the design is an extensive use of passive transmission lines (PTLs) for very complex interconnects inside the register file. The register file is designed for integration with recently demonstrated 20-GHz 8-bit RSFQ ALU. It is fabricated with the standard HYPRES's 1.0-um 4.5-kA/cm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> process. The circuit is placed on a 1 cm × 1 cm chip and consists of ~4,000 Josephson junctions.

  • 16-Bit Wave-Pipelined Sparse-Tree RSFQ Adder

    IEEE Transactions on Applied Superconductivity · 2012-12-12 · 35 citations

    article1st authorCorresponding

    In this paper, we discuss the architecture, design, and testing of the first 16-bit asynchronous wave-pipelined sparse-tree superconductor rapid single flux quantum adder implemented using the ISTEC 10 kA/cm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ADP2.1 fabrication process. Compared to the Kogge-Stone adder, our parallel-prefix sparse-tree adder has better energy efficiency with significantly reduced complexity (at the expense of latency) and almost no decrease in operation frequency. The 16-bit adder core (without SFQ-to-dc and dc-to-SFQ converters) has 9941 Josephson junctions occupying an area of 8.5 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . It is designed for the target operation frequency of 30 GHz with the expected latency of 352 ps at the bias voltage of 2.5 mV. The adder chip was fabricated and successfully tested at low frequency for all test patterns with measured bias margins of +9.8%/-10.7%.

  • 20-GHz 8 $\times$ 8-bit Parallel Carry-Save Pipelined RSFQ Multiplier

    IEEE Transactions on Applied Superconductivity · 2012-11-21 · 22 citations

    article1st authorCorresponding

    We will discuss the microarchitecture, design, and testing of the first 8 × 8-bit (by modulo 256) parallel carry-save RSFQ multiplier implemented using the ISTEC 10- kA/cm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> 1.0-μm fabrication technology. Partial products are asynchronously generated and sent to the reduction stage at the internal “hardwired” rate of 80 GHz. The 8 × 8-bit RSFQ multiplier uses a two-level parallel carry-save reduction tree that significantly reduces the multiplier latency. The 80-GHz carry-save reduction is implemented with asynchronous data-driven wave-pipelined [4:2] compressors built with toggle flip-flop cells. The design has mostly regular layout with both local and global connections between modules. The multiplier core (without SFQ-to-DC and DC-to-SFQ converters) has 5948 Josephson junctions occupying the area of 3.5 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . The multiplier is designed with the target operation frequency of 20 GHz and has the latency of 447 ps at the bias voltage of 2.5 mV. Despite some challenges due to fabrication process parameter variations and flux trapping, the multiplier chip was fabricated and successfully tested for the vast majority of test vectors by the Stony Brook designers with the assistance of colleagues from Yokohama National University in February 2012. While multiplier test operations were generated at low frequency, each of these operations was executed at the “hardwired” rate of 80 GHz. The fabricated chip operated with the measured DC bias margins of ±5%.

  • 8-Bit Asynchronous Sparse-Tree Superconductor RSFQ Arithmetic-Logic Unit With a Rich Set of Operations

    IEEE Transactions on Applied Superconductivity · 2012-11-21 · 34 citations

    article1st authorCorresponding

    This paper describes the design and testing of an 8-bit asynchronous wave-pipelined sparse-tree Rapid Single Flux Quantum (RSFQ) Arithmetic Logic Unit (ALU). Compared to previously developed RSFQ ALUs, this unit features an extensive set of 8 arithmetic and 12 logical operations. The execution of ALU operations consists of two steps. First, when necessary, one or both operands are inverted, and then operations are performed on these pre-processed data. Unlike the RSFQ Kogge-Stone-based designs, our parallel-prefix sparse-tree ALU has significantly reduced circuit complexity while maintaining robust operational margins at high frequency. An 8-bit ALU has been implemented with the International Superconductivity Technology Center (ISTEC) 10 kA/cm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> 1.0 μm 9-metal ADP2.1 fabrication process as a joint effort between Stony Brook University, Yokohama National University, and Nagoya University. Using the CONNECT cell library and SFQ CAD tools developed at Nagoya and Yokohama, the Stony Brook team has developed a complete logical and physical design of the ALU chip. The 8-bit ALU core (without SFQ-to-dc and dc-to-SFQ converters) consists of 8832 Josephson junctions with an area of 7.2 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . Simulations show that the ALU can operate at the maximum rate of 42 GHz. It has the latency of 374 ps at a bias voltage of 2.5 mV. The chip was fabricated and tested at low frequency in 2012. Testing results showed malfunctioning of some gates but despite these shortcomings we still verified several ALU operations with the measured DC bias voltage margins of ±1.8%.

Frequent coauthors

  • P. Bunyk

    D-Wave Systems (Canada)

    16 shared
  • Dmitry Zinoviev

    Suffolk University

    9 shared
  • Artur K. Kasperek

    Stony Brook University

    9 shared
  • Christopher L. Ayala

    Yokohama National University

    6 shared
  • Q.P. Herr

    5 shared
  • A. H. Silver

    Yokohama National University

    5 shared
  • Anubhav Sahu

    Hypres (United States)

    4 shared
  • L.A. Abelson

    Northrop Grumman (United States)

    4 shared

Labs

  • Electrical and Computer EngineeringPI

Education

  • Ph.D., Computer Science

    University of California, Los Angeles

    2005
  • M.S., Computer Science

    University of California, Los Angeles

    2001
  • B.S., Computer Science

    University of California, Los Angeles

    1999
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Mikhail Dorojevets

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup