Key Responsibilities
- Core Technical Areas
- LLM Fine-Tuning & RAG: Fine-tune open-weight language models on domain-specific data using techniques such as SFT/Quantization. Build and optimize Retrieval-Augmented Generation (RAG) pipelines integrating vector databases and policy document ingestion.
- Inference Optimization: Serve models efficiently using production inference engines (e.g., vLLM, SGLang). Apply quantization and batching strategies to meet strict latency and throughput SLAs.
- MLOps & Deployment: Manage model deployment pipelines across DEV, UAT, PRE-PRD, and PRD environments on enterprise cloud infrastructure (IBM watsonx.ai / OpenShift).
- AI-Assisted Development
- Utilize modern LLMs to accelerate development - Python code generation, prompt engineering, and pipeline optimiz
- Engage in prompt engineering to refine how systems interact with complex, multilingual datasets.
- Research & Prototyping
- Evaluate emerging open-source models, inference frameworks, and AI libraries for production feasibility.
- Produce written validation reports and contribute to technical design and test documentation.
Talent Cultivation & Mentorship (What You Will Learn)
- Broad Exposure: You will understand how LLM fine-tuning, RAG, inference optimization, and deployment pipelines interact in a real-world production system.
- Technical Guidance: Work directly with senior engineers to learn how to move AI models from notebook experiments to production-ready, enterprise-grade code.
- Impactful Work: Your contributions will directly power a live AI system handling real financial data at scale.
Requirements
Technical Requirements:
- Degree in Computer Science, Data Science, AI, or a related field.
- Hands-on experience with LLM fine-tuning, RAG pipelines, or model serving.
- Strong proficiency in Python.
- Solid understanding of machine learning fundamentals and deep learning frameworks (PyTorch, TensorFlow).
- Familiarity with relevant libraries such as Hugging Face Transformers (e.g. Qwen/Deepseek), LangChain, LlamaIndex, or vLLM.
- Ability to read and write technical documentation in English.
- Proficient in utilizing cutting-edge AI tools (e.g. Claude Code, GPT-codex) to accelerate development cycles and conduct rapid feasibility studies (PoCs).
Nice-to-Haves:
- Experience with LoRA, QLoRA, Unsloth, DPO, or RLHF fine-tuning techniques.
- Familiarity with quantization (INT4, INT8) and production inference optimiz
- Experience with vector databases (e.g., Milvus, pgvector).
- Exposure to IBM watsonx.ai, OpenShift, or Kubernetes.
- Experience with multilingual NLP, particularly CJK (Chinese, Japanese, Korean) datasets.
- Prior hands-on experience with AI projects in a financial services or regulated industry context.
What We Offer
- Flexible working hours and work-from-home policy.
- Subsidized access to premium AI development tools to empower your workflow.
- On-job training and technical guidance.
- Opportunity to work on a high-impact, production LLM system in the financial services sector.
- Exposure to a cutting-edge open-weight model stack and enterprise-grade deployment practices.