Caches "support" in tekton pipelines
This article is an exploratory one. It might contains example and command that might not work for the reader. |
The idea of this article is to explore the different ways to "handle"
cache when using
tektoncd/pipeline
or
openshift-pipelines. This should evolve as time passes as it will
explore what is possible today and what could be done built-in. If
someday tektoncd/pipeline
or any tektoncd component provides this features, this article would
be either adapted or marked as deprecated.
Introduction
Cache and caching are very abstract, wide concept. In the context of a CI pipeline, it
can take several forms depending on what the pipeline and tasks are doing. If we are
building an image, the cache is most likely related to the "base image" and layers re-used
accross multiple builds. If we are building a go project, it could be around fetching
the dependencies (if no vendor
folder) or around build cache (the go compiler cache, to
speed the next compilation).
We will focus on filesystem based caching in the next part of the article. Things like a cluster-wide registry for oci image, a maven proxy or a go modules proxy for geting dependency quicker is out of the scope of this article as it doesn’t have anything to do with Tekton itself. |
As of today (end of 2022), there is nothing specific related to caches in the tektoncd/pipeline
API. This means it is up to the Task
authors, users or tektoncd
component integrators to manage cache how they fit. Let’s explore different ways this can be handled.
What caching looks like in Tekton
Even though there is a lot of different ways to define caching, in term of how it "looks" in Tekton, is relatively straightforward. Filesystem-based caching in Tekton maps really well with workspaces
. Most if not all of the rest of this article will use workspaces
as the feature we use to handle caching.
There is two parts of this for Tekton:
-
How
Task
are written to easily use/interact with caches. For example, ago
buildTask
should be able to use aGOCACHE
if provided, but doesn’t need to know anything about how it is provided. This is the easiest part and the least opiniated one as well. -
How to provide the cache content to the the
TaskRun
. This is the tricky part, and there is a lot of different / possible ways to do that.
Authoring cache friendly tasks (example with go
)
This is the easy part of supporting caches in Tekton, and this is also the most important one for Task
author. This part is about how to write tasks that can easily be hook-up, somehow, with some caching mechanism. We’ll go through example of different "language" to draw a decent picture.
As an author of a Task
, especially if we aim to make this Task
as shareable as possible, this is the only thing we should care about. Someone else (a.k.a the user using this Task
or a tool) will have to provide you with the cache.
The Task
definition
Go uses GOCACHE
to know and store any build related cache (compilation) and GOMODCACHE
for go.mod
dependencies. To write a Task
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: golang-build
spec:
description: >-
This Task is Golang task to build Go projects.
params:
- name: package
description: base package to build in
- name: packages
description: "packages to build (default: ./cmd/...)"
default: "./cmd/..."
# Additionnal parameters
# […]
workspaces:
- name: source
- name: cache-go (1)
optional: true
- name: cache-gomod (2)
optional: true
steps:
- name: build
image: docker.io/library/golang:1.18
workingDir: $(workspaces.source.path)
script: |
[[ "$(workspaces.cache-go.bound)" == "true" ]] && { (3)
export GOCACHE=$(workspaces.cache-go.path)
}
[[ "$(workspaces.cache-gomod.bound)" == "true" ]] && { (4)
export GOMODCACHE=$(workspaces.cache-gomod.path)
}
go build $(params.flags) $(params.packages)
env:
- name: GOOS
value: #[…]
1 | This defines an optional workspace for setting up the GOCACHE . |
2 | This defines an optional workspace for setting up the GOMODCACHE . |
3 | If something is bound to the cache-go , export GOCACHE to point to the workspace. |
4 | If something is bound to the cache-gomod , export GOCACHE to point to the workspace. |
Providing the cache
Now, the idea is to provide the cache at runtime, using a PVC for example. Let’s give an example with a Pipeline
and a PipelineRun
.
First, we need a PVC
for gomod cache, and one for go. Let’s create a 2Gi PVC.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: go-cache
spec:
resources:
requests:
storage: 4Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gomod-cache
spec:
resources:
requests:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
The Pipeline
will fetch some sources using the git-clone
Task
and our go Task
.
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: build-go-with-optional-cache
spec:
workspaces:
- name: shared-workspace
- name: go-cache (1)
optional: true
- name: gomod-cache (2)
optional: true
params:
- name: git-url
default: https://github.com/tektoncd/pipeline
tasks:
- name: fetch-repository
taskRef:
name: git-clone
workspaces:
- name: output
workspace: shared-workspace
params:
- name: url
value: $(params.git-url)
- name: subdirectory
value: ""
- name: deleteExisting
value: "true"
- name: build
taskRef:
name: golang-build
runAfter:
- fetch-repository
workspaces:
- name: source
workspace: shared-workspace
- name: cache-go (3)
workspace: cache-go
- name: cache-gomod (4)
workspace: cache-gomod
1 | As for the Task , we define a workspace for GOCACHE . |
2 | As for the Task , we define a workspace for GOMODCACHE . |
3 | We are bind the Pipeline’s defined `cache-go workspace to the cache-go workspace defined in the Task . |
4 | We are bind the Pipeline’s defined `cache-gomod workspace to the cache-gomod workspace defined in the Task . |
Now the PipelineRun
.
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: build-go-with-cache-run
spec:
pipelineRef:
name: build-go-with-optional-cache
params:
- name: git-url
value: https://github.com/tektoncd/pipeline
workspaces:
- name: shared-workspace
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
- name: go-cache
persistentVolumeClaim: (1)
claimName: go-cache
- name: gomod-cache
persistentVolumeClaim: (2)
claimName: gomod-cache
1 | We bind our go-cache PVC to the go-cache workspace. |
2 | We bind our gomod-cache PVC to the gomod-cache workspace. |
Shortcomings
There is a few possible shortcomings with this approach:
-
Depending on the class of the persistent storage, it might be tricky to get those PVC provisionned. In addition, the cluster might have some quotas on the number of PVC used, and these being "always" there would take some place in this quota.
-
Similar to the previous point, depending on the access mode (
ReadWriteMany
,ReadWriteOnce
, …), it may force the pipeline to all run on the same node, or make the cache read-only (which would be.. way less useful).
Full cache support in a Task
Following what we just did, we could enhance our Task
definition to be able to caching itself. At authoring time, we can have steps that pull and push some folder/workspace.
There would be 2 steps to add : cache-fetch
and cache-upload
. For each implementation, the content/tool/workflow of those steps would differ.
Using oci
image
-
cache-fetch
, which would consist of-
compute a unique hash from something to identify the cache. For example, for
go
we would usego.sum
file -
try to fetch an image tagged with that hash
-
if it doesn’t exit, we just warn and pass on
-
if it succeeds, we extract it to the
go-cache
workspace/folder
-
-
-
cache-upload
-
compute a unique hash from something to identify the cache. For example, for
go
we would usego.sum
file -
push the image tagged with that hash
-
possible optimization: compare the cache content / image tarball, with the one we fetch, if it’s similar, we don’t even need to push
-
Remote resolution explorations
The general idea behind this exploration is very similar to tekton-wrap-pipeline
.
Tekton resolver that allows to run a `Pipeline` with `emptydir` workspaces that will be using different mean to transfer data from a one `Task` to the other. This is a experimentation around not using PVC for sharing data with workspace in a Pipeline.
The idea, adapted to caching, would be to enhance Task
with steps to pull and push the cache(s) in the correct workspaces, bound to emptydir
.
This need to be completed |
tekton-wrap-pipeline
We can use tekton-wrap-pipeline
directly. If we take the previous example, we can re-write the PipelineRun
above like the following.
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: simple-pipelinerun
spec:
serviceAccountName: mysa
pipelineRef:
resolver: wrap (1)
params:
- name: pipelineref
value: build-go-with-optional-cache
- name: workspaces
value: go-cache,gomod-cache (2)
- name: target
value: quay.io/vdemeest/pipelinerun-$(context.pipelineRun.namespace)-{{workspace}}:latest (3)
- name: base
value: quay.io/vdemeest/pipelinerun-$(context.pipelineRun.namespace)-{{workspace}}:latest (4)
params:
- name: git-url
value: https://github.com/tektoncd/pipeline
workspaces:
- name: shared-workspace
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
- name: go-cache (5)
emptyDir: {}
- name: gomod-cache (6)
emptyDir: {}
1 | We are using the remote resolution with the resolver named wrap (provided by tekton-wrap-pipeline ) |
2 | These are the 2 workspaces we want to handle with the "wrapper". In a gist, this means : for those 2 workspaces, using an oci image (a different one) for saving and pushing to it |
3 | This is the naming template for the target image to use, one "image" per namespace and go-cache/gomod-cache. |
4 | This is the naming template for the base image to use, using the same to ensure we "keep" the content from one run to the other. |
5 | We "bind" the go-cache workspace with emptydir just to "pass validation" |
6 | Same with gomod-cache, we "bind" the go-cache workspace with emptydir just to "pass validation" |
This approach has few shortcomings as of today:
-
Using
base
image means we need to "create" the repository prior to being to run (otherwise, it will fail to get the content of the cache because it doesn’t exists) -
As it is proposed, it will share the
go-cache
andgomod-cache
for all runs using this, on the same namespace. Fiddling withtarget
andbase
allow you to decide what to use, but still it doesn’t take into account thego.sum
, … -
As of today, it only works with OCI images
-
As of today, it needs extra auth to be able to push/pull the cache to an oci image registry
-
tekton-wrap-pipeline
append layers, which means at some point, the image will be too big and have too many layers. In our case, we don’t necessarily care about the layers but only the final content.
tekton-cache-pipeline
We can build on top of this tekton-wrap-pipeline
to provide an "easier" way to setup cache. The idea, is to be able to write the following PipelineRun
.
As said above, tekton-wrap-pipeline
append layers, which means at some point, the image will be too big and have too many layers. In our case, we don’t necessarily care about the layers but only the final content. What we want here, is a way to get some content from a given hash.
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: simple-pipelinerun
spec:
serviceAccountName: mysa
pipelineRef:
resolver: cache (1)
params:
- name: pipelineref
value: build-go-with-optional-cache
- name: workspaces
value: go-cache,gomod-cache (2)
- name: files (3)
value: **/go.sum
- name: target (4)
value: quay.io/vdemeest/cache/{{workspace}}:{{hash}}
params:
- name: git-url
value: https://github.com/tektoncd/pipeline
workspaces:
- name: shared-workspace
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
- name: go-cache
emptyDir: {}
- name: gomod-cache
emptyDir: {}
1 | We are using our tekton-cache-pipeline resolver :) |
2 | These are the 2 workspaces we want to handle with the "wrapper". In a gist, this means : for those 2 workspaces, using an oci image (a different one) for saving and pushing to it |
3 | Which files to compute the hash from. The idea here is that, we will compute the hash, and try to fetch the content (using an oci image for now) tagged with that hash, if it doesn’t exists, we don’t fetch anything. But in any case, we’ll push an image tagged with that hash at the end. |
4 | Very similar with tekton-wrap-pipeline , it’s the naming pattern for the image we want to use to push the cache to/from |
This need to be implemented |
CustomTask
explorations
The idea is very similar to remote resolution, but using CustomTask
instead.
This need to be completed |