I'm a Staff AI Security Researcher at Intuit, holding a Ph.D. in Computer Science. My work focuses on deep learning robustness and AI safety—leading research on adversarial robustness, LLM security, and trustworthy AI, developing methods to stress-test, attack, and defend modern neural networks under real-world conditions.
My work spans adversarial attacks & defenses for vision and language models, black-box and gray-box threat models, jailbreaking of LLMs, and the intersection of evolutionary computation with deep learning. I've published at venues including ICLR, ICCV, NeurIPS, and AAAI.
Israel · Staff AI Security Researcher @ Intuit · Ph.D. in Computer Science
My research centers on understanding and improving the robustness, security, and trustworthiness of deep learning systems. I develop novel attack methodologies to expose vulnerabilities in neural networks—from adversarial patches that fool object detectors to black-box jailbreaks that bypass LLM alignment—and design principled defenses to mitigate these threats in deployment.
* denotes equal contribution
Shows that RCC-based diffusion compressors are substantially more robust to bit-flip corruption, and introduces a more robust Turbo-DDCM variant with minimal rate-distortion-perception trade-off impact.
Introduces activation steering for masked diffusion language models, extracting low-dimensional directions from contrastive prompts to control safety-related behaviors at inference time.
Introduces BenchOverflow, a model-agnostic benchmark of nine plain-text prompting strategies that induce excessive outputs without jailbreaks or policy circumvention.
Reveals backdoor vulnerabilities in ControlNet-based diffusion models that threaten responsible synthetic data generation pipelines.
Proposes a universal targeted latent-space attack on audio-language models by perturbing only the audio encoder.
Introduces task-aware post-training quantization that leverages LLMs' hidden representations to allocate precision where it matters most.
Presents an unsupervised contrastive-learning approach to detect adversarial examples without requiring attack-specific training data.
Proposes a training-free retrieval-augmented generation approach for detecting adversarial patch attacks on vision systems.
Introduces a universal black-box jailbreaking method for LLMs using evolutionary optimization, bypassing alignment safeguards without gradient access.
Uses eXplainable AI to identify adversarial manipulations targeting deepfake detection systems.
First systematic adversarial robustness evaluation of Kolmogorov-Arnold Networks (KANs).
Addresses adaptive adversarial attacks on detector-based defenses and proposes resilient detection architectures.
Generates naturalistic adversarial patches that physically fool object detectors using black-box evolutionary methods.
Presents a gray-box attack against encoder-decoder image captioning models, manipulating generated text through adversarial image perturbations.
Demonstrates that popular DNN explanation methods can be adversarially manipulated to produce misleading interpretations.
Uses evolutionary algorithms to discover novel activation functions that outperform ReLU on image classification benchmarks.
Proposes a query-efficient evolutionary black-box attack that generates adversarial examples without gradient information.
AI Safety, Deep Learning and Evolutionary Computation. Advisor: Prof. Moshe Sipper.
AI Safety, Deep Learning and Evolutionary Computation. Advisor: Prof. Moshe Sipper.