Kubernetes for Generative AI Solutions
// GitHub https://github.com/webmakaka/Kubernetes-for-Generative-AI-Solutions/tree/main
Подготовка окружения
// HuggingFace https://huggingface.co/docs/huggingface_hub/en/installation
$ pip install --upgrade huggingface_hub
// https://huggingface.co/docs/huggingface_hub/en/guides/cli
$ curl -LsSf https://hf.co/cli/install.sh | bash
// GENERATE TOKEN
https://huggingface.co/settings/tokens
$ hf auth login
Запуск в docker
$ git clone github.com:webmakaka/Kubernetes-for-Generative-AI-Solutions.git
$ cd Kubernetes-for-Generative-AI-Solutions/ch02
$ hf download TheBloke/Llama-2-7B-Chat-GGUF llama-2-7b-chat.Q2_K.gguf --local-dir .
$ docker build -t my-llama .
$ docker tag my-llama webmakaka/my-llama
$ docker push webmakaka/my-llama
$ docker run -p 8000:5000 webmakaka/my-llama
// OK!
$ curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"prompt":"Create a poem about humanity?","sys_msg":"You are a helpful, respectful, and honest assistant. Always provide safe, unbiased, and positive responses. Avoid harmful, unethical, or illegal content. If a question is unclear or incorrect, explain why. If unsure, do not provide false information."}' \
| jq .
Повтор в kubernetes
Minikube + Metal LB
$ kubectl create deploy my-llama --image webmakaka/my-llama
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-llama-8649ff89c6-djdvd 1/1 Running 0 5m35s
$ cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
labels:
app: my-llama-svc
name: my-llama-svc
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
ports:
- port: 80
protocol: TCP
targetPort: 5000
type: LoadBalancer
selector:
app: my-llama
EOF
$ export NLB_URL=$(kubectl get svc my-llama-svc -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ echo ${NLB_URL}
// OK!
$ curl -X POST http://${NLB_URL}/predict \
-H "Content-Type: application/json" \
-d '{"prompt":"Create a poem about humanity?","sys_msg":"You are a helpful, respectful, and honest assistant. Always provide safe, unbiased, and positive responses. Avoid harmful, unethical, or illegal content. If a question is unclear or incorrect, explain why. If unsure, do not provide false information."}' \
| jq .