New The Skills of Tomorrow: how AI-exposed is every skill in 2026? See the data →
HSBC

Senior Consultant Specialist (Model Hosting/Inference Optimization)

HSBC
Apply →
senior full-time Guangzhou, Guangdong

First indexed 18 Jun 2026

Description

We are currently seeking an experienced professional to join our team in the role of Senior Consultant Specialist.

Principal responsibilities:

  • Design, build, and operate scalable, reliable model hosting platforms for LLMs, embeddings, and STT/TTS across heterogeneous hardware.
  • Drive inference optimisation for latency, throughput, and cost (quantisation, KV-cache optimisation, dynamic/continuous batching).
  • Evaluate, integrate, and tailor inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to maximise performance on target hardware.
  • Own inference health and performance monitoring: latency, throughput, TTFT, memory, availability; troubleshoot bottlenecks and deployment issues.
  • Partner with hardware teams to apply hardware-specific optimisations and improve resource utilisation.
  • Ensure hosting systems meet production standards for reliability, scalability, security, and high availability.
  • Build end-to-end, scalable fine-tuning pipelines to adapt foundation models using domain datasets.
  • Work with data scientists/domain experts to define objectives and metrics, validate results, and integrate fine-tuned models into the hosting/inference stack.

Requirements:

  • Bachelor’s/Master’s/PhD in ML/NLP/CS/Data Science/Statistics (or related).
  • 3 years on AI platforms, covering both model hosting/inference optimisation and fine-tuning pipelines; LLM experience strongly preferred.
  • Strong engineering skills in Python and CUDA, with solid understanding of GPU/CPU architecture and HPC fundamentals.
  • Deep inference expertise: KV-cache, batching, quantisation (INT4/FP8/GPTQ/AWQ), operator optimisation, and framework integration (vLLM, TensorRT-LLM, SGLang); hands-on hosting on Docker/Kubernetes and AWS/GCP/Azure.
  • End-to-end fine-tuning expertise: data prep, distributed training, hyperparameter tuning, HF/Accelerate/LoRA/QLoRA; plus benchmarking/monitoring/troubleshooting, AI-native mindset, and effective use of coding assistants.
This listing is enriched and indexed by YubHub. To apply, use the employer's original posting: https://portal.careers.hsbc.com/careers/job/563774611491345