CUDA Kernel Optimizer

2 weeks ago

Greater London, United Kingdom Mercor Full time

CUDA Kernel Optimizer - ML Engineer at Mercor Role Overview Mercor is engaging advanced CUDA experts who specialize in GPU kernel optimization, performance profiling, and numerical efficiency. These professionals possess a deep mental model of how modern GPU architectures execute deep learning workloads. They are comfortable translating algorithmic concepts into finely tuned kernels that maximize throughput while maintaining correctness and reproducibility. Key Responsibilities Develop, tune, and benchmark CUDA kernels for tensor and operator workloads. Optimize for occupancy, memory coalescing, instruction-level parallelism, and warp scheduling. Profile and diagnose performance bottlenecks using Nsight Systems, Nsight Compute, and comparable tools. Report performance metrics, analyze speedups, and propose architectural improvements. Collaborate asynchronously with PyTorch Operator Specialists to integrate kernels into production frameworks. Produce well-documented, reproducible benchmarks and performance write-ups. Ideal Qualifications Deep expertise in CUDA programming, GPU architecture, and memory optimization. Proven ability to achieve quantifiable performance improvements across hardware generations. Proficiency with mixed precision, Tensor Core usage, and low-level numerical stability considerations. Familiarity with frameworks like PyTorch, TensorFlow, or Triton (not required but beneficial). Strong communication skills and independent problem-solving ability. Demonstrated open-source, research, or performance benchmarking contributions. More About the Opportunity Ideal for independent contractors who thrive in performance-critical, systems-level work. Engagements focus on measurable, high-impact kernel optimizations and scalability studies. Work is fully remote and asynchronous; deliverables are outcome-driven. Access to shared benchmarking infrastructure and reproducibility tooling via Mercor support resources. Compensation & Contract Terms Typical range: $120‑$250/hour, depending on scope, specialization, and results achieved. Payments will be based on accepted task output over flat hourly. Structured as a contract-based engagement, not an employment relationship. Compensation tied to measurable deliverables or agreed milestones. Confidentiality, IP, and NDA terms as defined per engagement. Application Process Submit a brief overview of prior CUDA optimization experience, profiling results, or performance reports. Include links to relevant GitHub repos, papers, or benchmarks if available. Indicate your hourly rate, time availability, and preferred engagement length. Selected experts may complete a small, paid pilot kernel optimization project. About Mercor Mercor connects domain experts with top AI research and technology organizations through project-based contracts. Contractors operate independently, with full flexibility over methods, timelines, and tools. Our mission is to help top engineers and researchers access frontier technical work without rigid employment structures. Seniority level Not Applicable Employment type Full-time Job function Engineering and Information Technology Industries Software Development #J-18808-Ljbffr

Remote CUDA Kernel Optimizer for ML Performance

2 weeks ago

Greater London, United Kingdom Mercor Full time

A leading technology consultancy is seeking a CUDA Kernel Optimizer - ML Engineer to develop and benchmark CUDA kernels, focusing on performance optimization and profiling. This role is ideal for independent contractors who excel in systems-level work with a compensation range of $120-$250/hour based on deliverables. Candidates should have deep expertise in...
CUDA Kernel Optimizer ML Engineer

2 weeks ago

London, United Kingdom Mercor Full time

1) Role Overview Mercor is engaging advanced CUDA experts who specialize in GPU kernel optimization performance profiling and numerical efficiency. These professionals possess a deep mental model of how modern GPU architectures execute deep learning workloads. They are comfortable translating algorithmic concepts into finely tuned kernels that maximize...
CUDA Engineer

1 week ago

Greater London, United Kingdom FD Technologies Full time

About KXKX software powers the time-aware data-driven decisions that enable fast-moving companies to outpace competitors, realizing the full potential of their AI investments. The KX platform delivers transformational value by addressing data challenges related to completeness, timeliness and efficiency, ensuring companies understand change over time and can...
CUDA Engineer

2 weeks ago

London, United Kingdom KX Full time

About KXKX software powers the time-aware data-driven decisions that enable fast-moving companies to outpace competitors, realizing the full potential of their AI investments. The KX platform delivers transformational value by addressing data challenges related to completeness, timeliness and efficiency, ensuring companies understand change over time and can...
CUDA Engineer

2 weeks ago

London, United Kingdom KX Full time

About KX KX software powers the time-aware data-driven decisions that enable fast-moving companies to outpace competitors, realizing the full potential of their AI investments. The KX platform delivers transformational value by addressing data challenges related to completeness, timeliness and efficiency, ensuring companies understand change over time and...
CUDA Engineer

1 week ago

London, United Kingdom KX Full time

Job DescriptionAbout KXKX software powers the time-aware data-driven decisions that enable fast-moving companies to outpace competitors, realizing the full potential of their AI investments. The KX platform delivers transformational value by addressing data challenges related to completeness, timeliness and efficiency, ensuring companies understand change...
CUDA Engineer

1 week ago

London, United Kingdom KX Full time

About KXKX software powers the time-aware data-driven decisions that enable fast-moving companies to outpace competitors, realizing the full potential of their AI investments. The KX platform delivers transformational value by addressing data challenges related to completeness, timeliness and efficiency, ensuring companies understand change over time and can...
CUDA Engineer

2 weeks ago

London Area, United Kingdom KX Full time

About KXKX software powers the time-aware data-driven decisions that enable fast-moving companies to outpace competitors, realizing the full potential of their AI investments. The KX platform delivers transformational value by addressing data challenges related to completeness, timeliness and efficiency, ensuring companies understand change over time and can...
ML Systems/Infrastructure Engineer

5 days ago

Greater London, United Kingdom Oriole Networks Full time

Oriole is seeking a talented ML Systems/Infrastructure Engineer to help co‑optimize our AI/ML software stack with cutting‑edge network hardware. You’ll be a key contributor to a high‑impact, agile team focused on integrating middleware communication libraries and modelling the performance of large‑scale AI/ML workloads. Key Responsibilities Design...
ML Engineer, Large Language Models

1 week ago

Greater London, United Kingdom Nebius Group Full time

ML Engineer, Large Language Models (LLM Training & Inference Optimization) Amsterdam, Netherlands; London, United Kingdom; Remote - Europe Why work at Nebius Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without...

Americas

Europe

Asia / Oceania

Africa

CUDA Kernel Optimizer