# Senior Consultant Specialist (Model Hosting/Inference Optimization)

**Company**: HSBC Software Development (GuangDong) Limited
**Location**: Guangzhou
**Work arrangement**: onsite
**Experience**: senior
**Job type**: full-time
**Category**: Engineering
**Industry**: Technology

**Apply**: https://portal.careers.hsbc.com/careers/job/563774611335304?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply
**Canonical**: https://yubhub.co/jobs/job_b16bcb1b-63c

## Description

We are seeking an experienced professional to join our team in the role of Senior Consultant Specialist (Model Hosting/Inference Optimization). As a key member of our AI platform team, you will collaborate closely with AI researchers, data scientists, software engineers, and product teams to deliver production-grade solutions that combine optimized inference, reliable hosting, and flexible fine-tuning capabilities for a wide range of AI models.

Principal responsibilities:

- Design, build, and operate scalable, reliable model hosting platforms for LLMs, embeddings, and STT/TTS across heterogeneous hardware.

- Drive inference optimisation for latency, throughput, and cost (quantisation, KV-cache optimisation, dynamic/continuous batching).

- Evaluate, integrate, and tailor inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to maximise performance on target hardware.

- Own inference health and performance monitoring: latency, throughput, TTFT, memory, availability; troubleshoot bottlenecks and deployment issues.

- Partner with hardware teams to apply hardware-specific optimisations and improve resource utilisation.

- Ensure hosting systems meet production standards for reliability, scalability, security, and high availability.

- Build end-to-end, scalable fine-tuning pipelines to adapt foundation models using domain datasets.

- Work with data scientists/domain experts to define objectives and metrics, validate results, and integrate fine-tuned models into the hosting/inference stack.

Requirements:

- Bachelor’s/Master’s/PhD in ML/NLP/CS/Data Science/Statistics (or related).

- 3 years on AI platforms, covering both model hosting/inference optimisation and fine-tuning pipelines; LLM experience strongly preferred.

- Strong engineering skills in Python and CUDA, with solid understanding of GPU/CPU architecture and HPC fundamentals.

- Deep inference expertise: KV-cache, batching, quantisation (INT4/FP8/GPTQ/AWQ), operator optimisation, and framework integration (vLLM, TensorRT-LLM, SGLang); hands-on hosting on Docker/Kubernetes and AWS/GCP/Azure.

- End-to-end fine-tuning expertise: data prep, distributed training, hyperparameter tuning, HF/Accelerate/LoRA/QLoRA; plus benchmarking/monitoring/troubleshooting, AI-native mindset, and effective use of coding assistants.

You’ll achieve more when you join HSBC. HSBC is an equal opportunity employer committed to building a culture where all employees are valued, respected and opinions count. We take pride in providing a workplace that fosters continuous professional development, flexible working and, opportunities to grow within an inclusive and diverse environment.

## Skills

### Required
- Python
- CUDA
- GPU/CPU architecture
- HPC fundamentals
- KV-cache
- batching
- quantisation
- operator optimisation
- framework integration
- Docker
- Kubernetes
- AWS
- GCP
- Azure
- data prep
- distributed training
- hyperparameter tuning
- HF/Accelerate/LoRA/QLoRA

---

Source: [Apply at portal.careers.hsbc.com](https://portal.careers.hsbc.com/careers/job/563774611335304?utm_source=yubhub.co&utm_medium=jobs_feed&utm_campaign=apply)