最近碰到个问题,使用 deployment 运行 nexus3 跑一段时间后 replicas 会被设置为 0 ,也没启用 HPA ,restartPolicy 配置的是 Always ,查了 kube-apiserver 的 audit 日志也不是人为操作的,查看 nexus3 的日志也没有报错。
deployment yaml:
kind: Deployment
apiVersion: apps/v1
metadata:
name: service-nexus3-deployment
namespace: service
annotations:
deployment.kubernetes.io/revision: '6'
spec:
replicas: 1
selector:
matchLabels:
app: service-nexus3
envronment: test
template:
metadata:
creationTimestamp: null
labels:
app: service-nexus3
envronment: test
annotations:
kubesphere.io/restartedAt: '2022-02-16T01:11:44.479Z'
spec:
volumes:
- name: service-nexus3-volume
persistentVolumeClaim:
claimName: service-nexus3-pvc
- name: docker-proxy
configMap:
name: docker-proxy
defaultMode: 493
containers:
- name: nexus3
# 用的阿里的镜像仓库,删了仓库名
image: 'registry.cn-hangzhou.aliyuncs.com/nexus3-latest'
ports:
- name: tcp8081
containerPort: 8081
protocol: TCP
resources:
limits:
cpu: '4'
memory: 8Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- name: service-nexus3-volume
mountPath: /data/server/nexus3/
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
- name: docker-proxy
# 用的阿里的镜像仓库,删了仓库名
image: 'registry.cn-hangzhou.aliyuncs.com/nginx-latest'
ports:
- name: tcp80
containerPort: 80
protocol: TCP
resources:
limits:
cpu: '2'
memory: 4Gi
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- name: docker-proxy
mountPath: /usr/local/nginx/conf/vhosts/
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: Always
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
nodeSelector:
disktype: raid1
securityContext: {}
imagePullSecrets:
- name: registrysecret
schedulerName: default-scheduler
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
HPA:
# kubectl get hpa -A
No resources found
deployment describe:
...
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 34m (x2 over 38h) deployment-controller Scaled down replica set service-nexus3-deployment-57995fcd76 to 0
kube controller 日志:
# kubectl logs kube-controller-manager-k8s-130 -n kube-system|grep nexus
I0509 10:49:11.687356 1 event.go:281] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"service", Name:"service-nexus3-deployment", UID:"e0c4abba-bbe5-4c19-9853-de63ee571124", APIVersion:"apps/v1", ResourceVersion:"126342143", FieldPath:""}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set service-nexus3-deployment-57995fcd76 to 0
I0509 10:49:11.701642 1 event.go:281] Event(v1.ObjectReference{Kind:"ReplicaSet", Namespace:"service", Name:"service-nexus3-deployment-57995fcd76", UID:"9f96fdf1-1e20-4c83-ad18-1b3640d52493", APIVersion:"apps/v1", ResourceVersion:"126342151", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: service-nexus3-deployment-57995fcd76-t6bhx
kube-apiserver audit 相关日志:
nexus3 日志:
已经出现好几次,日志都查了实在没有思路,特来请教,希望大佬们支支招。
1
anonydmer 2022-05-09 16:53:58 +08:00
检查下是不是服务不稳定,容器在不停的失败和重启
|
2
rabbitz OP replica to 0 之前 RESTARTS 一直是 0
|
3
rabbitz OP 不好意思,上面的图片发错了,底下这个才是
|
4
wubowen 2022-05-09 17:38:14 +08:00
有点怀疑审计日志图里的内容可以证明非人为操作嘛?如果是人为操作的 scale ,最终也需要 replicaset controller 去删除 pod 吧?是不是可以考虑直接搜审计日志里和 kubeconfig 用户相关的操作,直接看是否有人为 scale
|
5
defunct9 2022-05-09 17:46:28 +08:00
开 ssh ,让我上去看看
|
6
basefas 2022-05-09 17:48:48 +08:00
监控下这个项目的 replicas ,变了报警,然后看 event
|
7
hwdef 2022-05-09 17:53:01 +08:00
hello kitty 可还行,有点萌
|