Stack/Layer Based Troubleshooting
UCP Troubleshooting
[UCP Architecture] (https://docs.docker.com/datacenter/ucp/2.2/guides/architecture/)
[UCP Node States[( https://docs.docker.com/ee/ucp/admin/monitor-and-troubleshoot/troubleshoot-node-messages/#ucp-node-states)
Troubleshooting UCP Cluster Matrix
Swarm Configurations Troubleshooting
On UCP Node, check the key value cluster health
docker exec -it ucp-kv etcdctl \
--endpoint https://127.0.0.1:2379 \
--ca-file /etc/docker/ssl/ca.pem \
--cert-file /etc/docker/ssl/cert.pem \
--key-file /etc/docker/ssl/key.pem \
cluster-health
Rethink DB Status
NODE_ADDRESS=$(docker info --format '{{.Swarm.NodeAddr}}')
VERSION=$(docker image ls --format '{{.Tag}}' docker/ucp-auth | head -n 1)
docker container run --rm -v ucp-auth-store-certs:/tls docker/ucp-auth:${VERSION} --db-addr=${NODE_ADDRESS}:12383 db-status
DTR Troubleshooting
Health Checks
https://dtr.schoolofdevops.org/_ping https://dtr.schoolofdevops.org/nginx_status https://dtr.schoolofdevops.org/api/v0/meta/cluster_status
DTR Overlay and RethinkDB Troubleshooting https://docs.docker.com/ee/dtr/admin/monitor-and-troubleshoot/troubleshoot-with-logs/
Swarm Troubleshooting
Node Level
docker node ls
docker node inspect <node>
docker node update --availability drain <node>
docker node rm <node>
Service Level
docker service ls
docker service ps <service>
docker service inspect <service>
docker service logs <service>
Look for CreatedAt, UpdatedAt to corroborate with start of the issue.
Tasks/Containers
docker inspect <container>
docker logs <container>
Finding Docker Daemon Logs
- Ubuntu (old using upstart ) - /var/log/upstart/docker.log
- Ubuntu (new using systemd ) - sudo journalctl -fu docker.service
- Boot2Docker - /var/log/docker.log
- Debian GNU/Linux - /var/log/daemon.log
- CentOS - /var/log/daemon.log | grep docker
- CoreOS - journalctl -u docker.service
- Fedora - journalctl -u docker.service
- Red Hat Enterprise Linux Server - /var/log/messages | grep docker
- OpenSuSE - journalctl -u docker.service
- OSX - ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/log/docker.log
- Windows - Get-EventLog -LogName Application -Source Docker -After (Get-Date).AddMinutes(-5) | Sort-Object Time
System Troubleshooting
File Descriptors
cat /proc/sys/fs/file-max
cat /proc/sys/fs/file-nr
[output]
1632 0 202648
where,
1632: currently allocated file descriptors 0: free allocated file descriptors 202648 : max file descriptors
To update the limit
ulimit -n 99999
sysctl -w fs.file-max=100000
docker run --ulimit nofile=90000:90000 <image-tag>
To check open files
lsof
lsof | wc -l
lsof | grep <pid>
References
Etcd and RethinkDB Troubleshooting: https://docs.docker.com/datacenter/ucp/2.2/guides/admin/monitor-and-troubleshoot/troubleshoot-configurations/#check-the-status-of-the-database
Where are docker daemon logs, stackoverflow discussion https://stackoverflow.com/questions/30969435/where-is-the-docker-daemon-log?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
File Descriptors: https://www.netadmintools.com/art295.html