Curriculum Vitae

download pdf ↓

A dedicated and results-driven software engineer with a system-oriented mindset and expertise in performance engineering. Experienced in lock-free concurrency, event-driven architectures, kernel-level networking (eBPF), low-level optimization of GPU model inference, and GPU memory models.

Contact

Education

The University of Illinois at Chicago

Aug 2024 – May 2026

M.S. Computer Science · GPA 3.9 / 4.0

FPT University

Aug 2017 – Aug 2021

B.S. Computer Science · Full-ride scholarship

Experience

Founding Software Engineer · Vizgard Ltd

Jul 2021 – Aug 2024

London, United Kingdom

  • Contributed to the company's growth from early prototype to securing over $2.5M in venture funding.
  • Led the design and development of a real-time computer vision system for surveillance and unmanned system automation. Architected an event-driven, lock-free MPMC double-buffer pipeline to optimize latency and minimize memory locality issues — achieving 6 simultaneous 1080p camera streams on NVIDIA Jetson and 40 streams across 4 NVIDIA Ada RTX 6000.
  • Built and optimized pipelines for multiple deep learning models, including object detection (DETR/YOLO), object tracking (SiameseRPN++, DeepSORT), pose estimation (AlphaPose), and face redaction. Trained on GCP with PyTorch/TensorFlow; deployed with TensorRT for high-speed inference.
  • Led performance optimization across the stack (profiling, memory locality, batching, async execution); accelerated critical paths by ~3× (measured in Nsight Systems / Remotery) by moving CPU stages to CUDA kernels without compromising model accuracy.
  • Developed a low-latency WebRTC streaming server using GStreamer and Node.js to deliver real-time HD video to browsers and RTSP endpoints.

Software Research Engineer · VinAI Research (acquired by Qualcomm Research)

Jan 2020 – Jul 2021
  • Designed a two-stage infrared anti-spoofing system, achieving < 3% false acceptance rate (FAR).
  • Improved the face recognition model using knowledge distillation and network optimization — delivering a 4× speed improvement with equivalent accuracy.

Projects

  • Investigated weak memory consistency behavior on NVIDIA GPUs using PTX and CUDA, identifying synchronization patterns to improve kernel reliability and performance.

  • Linux kernel module — a dynamic eBPF-based network firewall in Rust to block traffic and mitigate DDoS attacks via real-time IP filtering.

  • Conversational Agent using AWS Bedrock

    EC2 · Scala · Go · gRPC · Python · AWS Lambda

    Distributed LLM training pipeline using Apache Spark; RESTful and gRPC servers for cloud integration. EC2 routes requests to Lambda and Bedrock, powering an agent built on LLaMA.

  • WebGPU Black Hole Raymarching

    WebGPU · WGSL · TypeScript

    Real-time black hole raymarcher in WGSL — accretion disk, gravitational lensing, and a Schwarzschild metric integrated per-pixel. Live demo on this site.

Skills

Languages
C/C++, Python, Java, JavaScript, Scala
DL accelerators & inference
CUDA, TensorRT, Triton, ONNX, GPU/TPU
ML/DL frameworks
PyTorch, JAX, TensorFlow, Scikit-Learn
Systems
Linux, eBPF/XDP, memory models, perf/Remotery/Nsight profiling
Generative AI / multimodal
LLMs, VLMs, Transformers, HuggingFace, LoRA/QLoRA, RAG
Databases & big data
MySQL, MongoDB, Redis, Spark, Hadoop
MLOps
Docker, Kubernetes, Jenkins, AWS, GCP, Azure