I'm currently focussed on LLM inference and kernel level optimization using C++ and CUDA.
I'm looking to work on low-level ML systems and performance optimization.
Core contributor focused on model providers, toolkits, and other features. Regularly review PRs and help shape design decisions.
Added MCC metric for evaluation, expanded test coverage, and fixed several edge case bugs.
Implemented the exclude_modules parameter in LoRAConfig, improving fine-tuning control for large models.