Upgrade Advisory
This documentation is for Flux (v1) and Helm Operator (v1). Both projects are in maintenance mode and will soon reach end-of-life. We strongly recommend you familiarise yourself with the newest Flux and start looking at your migration path.
For documentation regarding the latest Flux, please refer to this section.
Troubleshooting
Also see the
issues labeled with
FAQ
, which often
explain workarounds.
Flux is taking a long time to apply manifests when it syncs
If you notice that Flux takes tens of seconds or minutes to get through each sync, while you can apply the same manifests very quickly by hand, you may be running into this issue: fluxcd/flux#1422.
Briefly, the problem is that mounting a volume into $HOME/.kube
effectively disables kubectl
’s caching, which makes it much much
slower. You may have used such a volume mount to override
$HOME/.kube/config
, possibly unknowingly – the Helm chart did this
for you, prior to
fluxcd/flux#1435.
The remedy is to mount the override to some other place in the
filesystem, and use the environment entry KUBECONFIG
to point
kubectl
at it. This is what the Helm chart now does, so fixing it
may be as easy as reapplying the chart if that’s what you’re using.
This is also documented in the FAQ.
fluxctl
returns a 500 Internal Server Error
This usually indicates there’s a bug in the Flux daemon somewhere – in which case please tell us about it!
Flux answers everything with git repo is not configured
This means Flux can’t read from and write to the git repo. Check that
… you’ve supplied a git repo URL. If it’s of the form
https://github.com/user/repo
then you will need to use the SSH-style URL,git@github.com:user/repo
instead.… the deploy key has read/write access to the repo. In GitHub, deploy keys are installed in the settings for a repository. To get the deploy key Flux is using, use
fluxctl identity
.… that the host where your git repo lives is in
~/.ssh/known_hosts
in the fluxd container. We prime the container image with host keys forgithub.com
,gitlab.com
,bitbucket.org
,dev.azure.com
, andvs-ssh.visualstudio.com
, but if you’re using your own git server, you’ll need to add its host key. See “Using a private Git host”.
I’m using GCR/GKE and I keep seeing “Quota exceeded” in logs
GCP (in general) has quite conservative API rate limiting, and Flux’s default settings can bump API usage over the limits. See fluxcd/flux#1016 for advice.
Flux doesn’t seem to be able to use my imagePullSecrets
If you’re using kubectl
v1.13.x to create them, then it may be due
to
this problem. In
short, there was a breaking change to how kubectl
creates secrets,
that found its way into the Kubernetes 1.13.0 release. It has been
corrected in
kubectl
v1.13.2,
so using that version or newer to create secrets should fix the
problem.
Why are my images not showing up in the list of images?
Sometimes, instead of seeing the various images and their tags, the
output of fluxctl list-images
shows nothing. There’s a number of
reasons this can happen:
- Flux just hasn’t fetched the image metadata yet. This may be the case if you’ve only just started using a particular image in a workload.
- Flux can’t get suitable credentials for the image repository. At
present, it looks at
imagePullSecret
s attached to workloads, service accounts, platform-provided credentials on GCP, AWS or Azure, and a Docker config file if you mount one into thefluxd
container (see the command-line usage). - When using images in ECR, from EC2, the
NodeInstanceRole
for the worker node runningfluxd
must have permissions to query the ECR registry (or registries) in question.eksctl
andkops
(with.iam.allowContainerRegistry=true
) both make sure this is the case. - When using images from ACR in AKS, the HostPath
/etc/kubernetes/azure.json
should be mounted into the Flux Pod. Setregistry.acr.enabled=True
in the helm chart or alter the Deployment:If you encounter permission errors, you can alternatively create a secretspec: containers: image: docker.io/fluxcd/flux ... volumeMounts: - name: acr-credentials mountPath: /etc/kubernetes/azure.json readOnly: true volumes: - name: acr-credentials hostPath: path: /etc/kubernetes/azure.json type: ""
acr-credentials
based on theazure.json
file and setregistry.acr.secretName=acr-credentials
. - Flux excludes images with no suitable manifest (linux amd64) in manifestlist
- Flux doesn’t yet understand image refs that use digests instead of tags; see fluxcd/flux#885.
If none of these explanations seem to apply, please file an issue.
Why do my image tags appear out of order?
You may notice that the ordering given to image tags does not always correspond with the order in which you pushed the images. That’s because Flux sorts them by the image creation time; and, if you have retagged an older image, the creation time won’t correspond to when you pushed the image. (Why does Flux look at the image creation time? In general there is no way for Flux to retrieve the time at which a tag was pushed from an image registry.)
This can happen if you explicitly tag an image that already exists. Because of the way Docker shares image layers, it can also happen implicitly if you happen to build an image that is identical to an existing image.
If this appears to be a problem for you, one way to ensure each image build has its own creation time is to label it with a build time; e.g., using OpenContainers pre-defined annotations.
What is the “sync tag”; or, why do I see a flux-sync
tag in my git repo?
Flux keeps track of the last commit that it’s applied to the cluster,
by pushing a tag (controlled by the command-line flags
--git-sync-tag
and --git-label
) to the git repository. This gives
it a persistent high water mark, so even if it is restarted from
scratch, it will be able to tell where it got to.
Technically, it only needs this to be able to determine which image
releases (including automated upgrades) it has applied, and that only
matters if it has been asked to report those with the --connect
flag. Future versions of Flux may be more sparing in use of the sync
tag.
Flux fails with an error log similar to couldn’t get resource list for example.com/version: the server is currently unable to handle the request
This means your Kubernetes cluster fails to respond to list queries for resources in example.com/version.
If the error is transient, Flux will work once the error recedes.
However, the error won’t normally go away since most of the time it’s caused by a misconfiguration of your cluster.
For instance, you can run into this problem:
- When a Kubernetes Webhook server is removed without removing its Webhook definition.
- When a custom resource definition (CRD) is not available due to
a
FailedDiscoveryCheck
error.
We recommend trying to address the root cause by fixing your cluster configuration. In the examples above, you would need to remove the Webhook definition or add the CRD.
However, fixing your cluster configuration may not always be possible. The
problem is common enough that Flux provides a flag called
--k8s-unsafe-exclude-resource
. The name says it all, you should only use it
if you know what you are doing.
--k8s-unsafe-exclude-resource
will tell Flux to avoid querying the cluster
for those resources. This in turn means that Flux won’t take into account those
excluded cluster resources when syncing. This can cause excluded resources:
- to be unexpectedly overwritten by their corresponding definition in
Git during a sync (even if they are annotated with
flux.weave.works/ignore: "true"
on the cluster-side). - not to be garbage-collected.
The rule of thumb is that you can use --k8s-unsafe-exclude-resource
on
resources not matching any manifests in your Git repository.