Raz Lapid

I'm a Staff AI Security Researcher at Intuit, holding a Ph.D. in Computer Science. My work focuses on deep learning robustness and AI safety—leading research on adversarial robustness, LLM security, and trustworthy AI, developing methods to stress-test, attack, and defend modern neural networks under real-world conditions.

My work spans adversarial attacks & defenses for vision and language models, black-box and gray-box threat models, jailbreaking of LLMs, and the intersection of evolutionary computation with deep learning. I've published at venues including ICLR, ICCV, NeurIPS, and AAAI.

Israel · Staff AI Security Researcher @ Intuit · Ph.D. in Computer Science

Raz Lapid

Research Interests

My research centers on understanding and improving the robustness, security, and trustworthiness of deep learning systems. I develop novel attack methodologies to expose vulnerabilities in neural networks—from adversarial patches that fool object detectors to black-box jailbreaks that bypass LLM alignment—and design principled defenses to mitigate these threats in deployment.

Selected Publications

* denotes equal contribution

Diffusion compression robustness thumbnail
On the Robustness of Diffusion-Based Image Compression to Bit-Flip Errors AIGENS @ CVPR · 2026 A. Vaisman, G. Pomerants, R. Lapid

Shows that RCC-based diffusion compressors are substantially more robust to bit-flip corruption, and introduces a more robust Turbo-DDCM variant with minimal rate-distortion-perception trade-off impact.

Activation Steering thumbnail
Activation Steering for Masked Diffusion Language Models ReALM-GEN @ ICLR · 2026 A. Shnaidman, E. Feiglin, O. Yaari, E. Mentel, A. LeVi, R. Lapid

Introduces activation steering for masked diffusion language models, extracting low-dimensional directions from contrastive prompts to control safety-related behaviors at inference time.

BenchOverflow thumbnail
BenchOverflow: Measuring Overflow in Large Language Models via Plain-Text Prompts TMLR · 2026  |  NVIDIA GTC · 2026 E. Feiglin, N. Hutnik, R. Lapid

Introduces BenchOverflow, a model-agnostic benchmark of nine plain-text prompting strategies that induce excessive outputs without jailbreaks or policy circumvention.

Backdoors thumbnail
Backdoors in Conditional Diffusion: Threats to Responsible Synthetic Data Pipelines RSD @ AAAI · 2026 R. Lapid, A. Dubin

Reveals backdoor vulnerabilities in ControlNet-based diffusion models that threaten responsible synthetic data generation pipelines.

Audio LLM attack thumbnail
Breaking Audio Large Language Models by Attacking Only the Encoder Preprint · 2025 R. Ziv, R. Lapid, M. Sipper

Proposes a universal targeted latent-space attack on audio-language models by perturbing only the audio encoder.

Per-Task Quantization thumbnail
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations Preprint · 2025 A. LeVi, R. Lapid, R. Himelstein, C. Baskin, R.S. Ziv, A. Mendelson

Introduces task-aware post-training quantization that leverages LLMs' hidden representations to allocate precision where it matters most.

Pulling Back the Curtain thumbnail
Pulling Back the Curtain: Unsupervised Adversarial Detection via Contrastive Auxiliary Networks SafeMM-AI @ ICCV · 2025 · Spotlight E. Mizrahi, R. Lapid, M. Sipper

Presents an unsupervised contrastive-learning approach to detect adversarial examples without requiring attack-specific training data.

Don't Lag, RAG thumbnail
Don't Lag, RAG: Training-Free Adversarial Detection Using RAG VLM4RWD @ NeurIPS · 2025 R. Kazoom, R. Lapid, M. Sipper, O. Hadar

Proposes a training-free retrieval-augmented generation approach for detecting adversarial patch attacks on vision systems.

Open Sesame thumbnail
Open Sesame! Universal Black-Box Jailbreaking of Large Language Models SeT-LLM @ ICLR · 2024 R. Lapid, R. Langberg, M. Sipper

Introduces a universal black-box jailbreaking method for LLMs using evolutionary optimization, bypassing alignment safeguards without gradient access.

XAI Deepfake thumbnail
XAI-Based Detection of Adversarial Attacks on Deepfake Detectors TMLR · 2024 B. Pinhasov*, R. Lapid*, R. Ohayon, M. Sipper

Uses eXplainable AI to identify adversarial manipulations targeting deepfake detection systems.

KAN robustness thumbnail
On the Robustness of Kolmogorov-Arnold Networks: An Adversarial Perspective TMLR · 2024 T. Alter*, R. Lapid*, M. Sipper

First systematic adversarial robustness evaluation of Kolmogorov-Arnold Networks (KANs).

Fortify the Guardian thumbnail
Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors Mathematics · 2024 R. Lapid, A. Dubin, M. Sipper

Addresses adaptive adversarial attacks on detector-based defenses and proposes resilient detection architectures.

Patch of Invisibility thumbnail
Patch of Invisibility: Naturalistic Physical Black-Box Adversarial Attacks on Object Detectors ECML-PKDD · 2024 R. Lapid, E. Mizrahi, M. Sipper

Generates naturalistic adversarial patches that physically fool object detectors using black-box evolutionary methods.

I See Dead People thumbnail
I See Dead People: Gray-Box Adversarial Attack on Image-to-Text Models ECML-PKDD · 2023 R. Lapid, M. Sipper

Presents a gray-box attack against encoder-decoder image captioning models, manipulating generated text through adversarial image perturbations.

Foiling Explanations thumbnail
Foiling Explanations in Deep Neural Networks TMLR · 2022 S.V. Tamam*, R. Lapid*, M. Sipper

Demonstrates that popular DNN explanation methods can be adversarially manipulated to produce misleading interpretations.

Evolution of AFs thumbnail
Evolution of Activation Functions for Deep Learning-Based Image Classification GECCO · 2022 R. Lapid, M. Sipper

Uses evolutionary algorithms to discover novel activation functions that outperform ReLU on image classification benchmarks.

Black-box algorithm thumbnail
An Evolutionary, Gradient-Free, Query-Efficient, Black-Box Algorithm for Generating Adversarial Instances in Deep CNNs Algorithms · 2022 R. Lapid, Z. Haramaty, M. Sipper

Proposes a query-efficient evolutionary black-box attack that generates adversarial examples without gradient information.

Honors & Awards

Professional Experience

Education

2021 – 2024
Ph.D. in Computer Science, Direct Track, Ben-Gurion University of the Negev

AI Safety, Deep Learning and Evolutionary Computation. Advisor: Prof. Moshe Sipper.

2020 – 2021
M.Sc. in Computer Science, Magna cum laude, Ben-Gurion University of the Negev

AI Safety, Deep Learning and Evolutionary Computation. Advisor: Prof. Moshe Sipper.

2016 – 2020
B.Sc. in Computer Science, Ben-Gurion University of the Negev