Kubernetes for Generative AI Solutions

// GitHub https://github.com/webmakaka/Kubernetes-for-Generative-AI-Solutions/tree/main

Подготовка окружения

// HuggingFace https://huggingface.co/docs/huggingface_hub/en/installation

$ pip install --upgrade huggingface_hub

// https://huggingface.co/docs/huggingface_hub/en/guides/cli
$ curl -LsSf https://hf.co/cli/install.sh | bash


// GENERATE TOKEN
https://huggingface.co/settings/tokens

$ hf auth login

Запуск в docker

$ git clone github.com:webmakaka/Kubernetes-for-Generative-AI-Solutions.git

$ cd Kubernetes-for-Generative-AI-Solutions/ch02

$ hf download TheBloke/Llama-2-7B-Chat-GGUF llama-2-7b-chat.Q2_K.gguf --local-dir .

$ docker build -t my-llama .

$ docker tag my-llama webmakaka/my-llama
$ docker push webmakaka/my-llama

$ docker run -p 8000:5000 webmakaka/my-llama

// OK!
$ curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Create a poem about humanity?","sys_msg":"You are a helpful, respectful, and honest assistant. Always provide safe, unbiased, and positive responses. Avoid harmful, unethical, or illegal content. If a question is unclear or incorrect, explain why. If unsure, do not provide false information."}' \
  | jq .

Повтор в kubernetes

Minikube + Metal LB

$ kubectl create deploy my-llama --image webmakaka/my-llama

$ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
my-llama-8649ff89c6-djdvd   1/1     Running   0          5m35s

$ cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  labels:
    app: my-llama-svc
  name: my-llama-svc
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 5000
  type: LoadBalancer
  selector:
    app: my-llama
EOF

$ export NLB_URL=$(kubectl get svc my-llama-svc -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ echo ${NLB_URL}

// OK!
$ curl -X POST http://${NLB_URL}/predict \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Create a poem about humanity?","sys_msg":"You are a helpful, respectful, and honest assistant. Always provide safe, unbiased, and positive responses. Avoid harmful, unethical, or illegal content. If a question is unclear or incorrect, explain why. If unsure, do not provide false information."}' \
  | jq .