Description

We are seeking an experienced professional to join our team in the role of Senior Consultant Specialist (Model Hosting/Inference Optimization). As a key member of our AI platform team, you will collaborate closely with AI researchers, data scientists, software engineers, and product teams to deliver production-grade solutions that combine optimized inference, reliable hosting, and flexible fine-tuning capabilities for a wide range of AI models.

Principal responsibilities:

Design, build, and operate scalable, reliable model hosting platforms for LLMs, embeddings, and STT/TTS across heterogeneous hardware.
Drive inference optimisation for latency, throughput, and cost (quantisation, KV-cache optimisation, dynamic/continuous batching).
Evaluate, integrate, and tailor inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to maximise performance on target hardware.
Own inference health and performance monitoring: latency, throughput, TTFT, memory, availability; troubleshoot bottlenecks and deployment issues.
Partner with hardware teams to apply hardware-specific optimisations and improve resource utilisation.
Ensure hosting systems meet production standards for reliability, scalability, security, and high availability.
Build end-to-end, scalable fine-tuning pipelines to adapt foundation models using domain datasets.
Work with data scientists/domain experts to define objectives and metrics, validate results, and integrate fine-tuned models into the hosting/inference stack.

Requirements:

Bachelor’s/Master’s/PhD in ML/NLP/CS/Data Science/Statistics (or related).
3 years on AI platforms, covering both model hosting/inference optimisation and fine-tuning pipelines; LLM experience strongly preferred.
Strong engineering skills in Python and CUDA, with solid understanding of GPU/CPU architecture and HPC fundamentals.
Deep inference expertise: KV-cache, batching, quantisation (INT4/FP8/GPTQ/AWQ), operator optimisation, and framework integration (vLLM, TensorRT-LLM, SGLang); hands-on hosting on Docker/Kubernetes and AWS/GCP/Azure.
End-to-end fine-tuning expertise: data prep, distributed training, hyperparameter tuning, HF/Accelerate/LoRA/QLoRA; plus benchmarking/monitoring/troubleshooting, AI-native mindset, and effective use of coding assistants.

You’ll achieve more when you join HSBC. HSBC is an equal opportunity employer committed to building a culture where all employees are valued, respected and opinions count. We take pride in providing a workplace that fosters continuous professional development, flexible working and, opportunities to grow within an inclusive and diverse environment.

This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://portal.careers.hsbc.com/careers/job/563774611335304