Jon Bisila
Research Engineer, Sandia National Laboratories
Building and evaluating AI systems for mission-critical applications.
MS in Computer Science, Georgia Tech (May 2026).
About
I'm a research engineer at Sandia National Laboratories, where I work on evaluating and understanding AI systems, particularly large language models. My focus is on building benchmarks and evaluation frameworks that help us understand not just whether models work, but how and why they fail in practice, especially on problems where reliability matters.
What fascinates me is the gap between benchmark performance and real-world behavior. Models can ace standardized tests but fail in unexpected ways when deployed. I enjoy building end-to-end ML systems that bridge this gap, from research prototypes to production systems that solve real problems.
I'm currently finishing my MS in Computer Science at Georgia Tech (May 2026) while working full-time, with a focus on taking research concepts and putting them into practice for mission-critical applications where the work actually makes a difference.
Projects
C-to-Rust Code Translation Benchmark
CurrentPrincipal Investigator & Lead Developer
Leading a team to build a benchmark that evaluates LLM capabilities for C-to-Rust code translation, with a focus on memory safety. As legacy codebases migrate to memory-safe languages, understanding where automated translation works (and where it fails) becomes critical for both security and efficiency, and for assessing the feasibility of AI-assisted translation and modernization of legacy systems. The framework assesses state-of-the-art models and techniques on realistic translation tasks, establishing baselines for what's currently possible and identifying key failure modes.
Multilingual Entity Extraction Pipeline
2023-2024Lead Research Engineer
Built and deployed an end-to-end NLP pipeline using specialized LLMs for domain-agnostic entity extraction and translation across multiple languages. The system augments a manual analysis process, enabling web-scale data processing with dynamic scaling based on load. Conducted initial research that demonstrated off-the-shelf solutions were insufficient, then designed and validated a custom LLM-based approach that met both performance and throughput requirements for production deployment.
Select Publications
Anomaly Detection in Video Using Compression
Michael R. Smith, Renee Gooding, Jonathan Bisila, Christina Ting
2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR) (2024)
For the Public Good: Connecting, Retaining, and Recognizing Current and Future RSEs at US National Research Laboratories and Agencies
Miranda R. Mundt, Keith Beattie, Jonathan Bisila, et al.
Computing in Science & Engineering, IEEE, Vol. 24, No. 6 (2023)
DevOps Pragmatic Practices and Potential Perils in Scientific Software Development
Reed Milewicz, Jonathan Bisila, Miranda Mundt, et al.
International Congress on Information and Communication Technology, Springer (2023)
Topic Modeling with Natural Language Processing for Identification of Nuclear Proliferation-Relevant Scientific and Technical Publications
Jonathan Bisila, Daniel Dunlavy, Zoe Nellie Gastelum, Craig D. Ulmer
Sandia National Laboratories Technical Report (2020)
Blog
Thoughts on AI/ML research, evaluation frameworks, and other topics I'm exploring.
Read My BlogContact
Feel free to reach out if you'd like to discuss AI/ML research, evaluation frameworks, or potential opportunities.