Under digest bucket create 2 buckets, one for storing origin blob
and one for storing deduped blobs.
PutBlob() - puts an origin blob in both buckets
- puts a deduped blob in deduped bucket
GetBlob() - returns blobs only from origin bucket
DeleteBlob() - deletes an origin blob from both buckets
and moves one deduped blob into origin bucket
- deletes a deduped blob from deduped bucket
[storage] When deleting an origin blob, next time we GetBlob() we get a
deduped blob with no content and we will move the content from
the deleted origin blob to it (inside s3.go).
Signed-off-by: Petu Eusebiu <peusebiu@cisco.com>
in order to know which blob is 'real' (has content)
we need to know which was the first blob inserted in cache,
because that is always the real one.
because we can not modify the keys order in boltdb we'll do this
by marking the first blob inserted with a value
when GetBlob() return the blob which is marked
when PutBlob() if is the first one, mark it
when DeleteBlob() in case deleted is marked then mark the next blob
Signed-off-by: Petu Eusebiu <peusebiu@cisco.com>
This commit uses native go fuzzing to fuzz test implementations
of storage in storage_fs.
moved fuzzing testdata for storage_fs in separate repo
added make target and script for importing fuzz data and running all fuzz tests
Signed-off-by: Alex Stan <alexandrustan96@yahoo.ro>
Because s3 doesn't support hard links we store duplicated blobs
as empty files. When the original blob is deleted its content is
moved to the the next duplicated blob and so on.
Signed-off-by: Petu Eusebiu <peusebiu@cisco.com>
PR (linter: upgrade linter version #405) triggered lint job which failed
with many errors generated by various linters. Configurations were added to
golangcilint.yaml and several refactorings were made in order to improve the
results of the linter.
maintidx linter disabled
Signed-off-by: Alex Stan <alexandrustan96@yahoo.ro>
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Behavior controlled by configuration (default=off)
It is a trade-off between performance and consistency.
References:
[1] https://github.com/golang/go/issues/20599
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
previously mount blob will look for blob that is provided in http request and try to hard link that path
but ideally we should look for path from our cache and do the hard link of that particular path.
this commit does the same.
added support to point multiple storage locations in zot by running multiple instance of zot in background.
see examples/config-multiple.json for more info about config.
Closes#181
In use cases, when there are large images with shared layers across
repositories, clients may benefit from not re-uploading the same blobs
over and over again.
We ensure this by hard linking when check-blob api is called.
Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
The idea initially was to use bazel to do our builds, however golang
build system is now good enough and our code base is entirely go.
It is also slowing down our travis ci/cd pipeline.
The storage layer is protected with read-write locks.
However, we may be holding the locks over unnecessarily large critical
sections.
The typical workflow is that a blob is first uploaded via a per-client
private session-id meaning the blob is not publicly visible yet. When
the blob being uploaded is very large, the transfer takes a long time
while holding the lock.
Private session-id based uploads don't really need locks, and hold locks
only when blobs are published after the upload is complete.
dstRecord :- blob path stored in cache.
dst :- blob path that is trying to be uploaded.
Currently, if the actual blob on disk may have been removed by GC/delete, during syncing the cache dst is being passed to DeleteBlob function and retry section is being continuously called because DeleteBlob function never deletes dst path (doesn't exist in db), dstRecord should be passed into DeleteBlob function because dstRecord is actual blob path stored in db.
If dst and dstRecord path value is same then this issue will not be produced and DeleteBlob method will delete the blob info from cache but if both are different then DeleteBlob method will try to delete dst path which is not in cache.
Note:- boltdb delete method return nil even when value doesn't exist (https://godoc.org/github.com/boltdb/bolt#Bucket.Delete)
We perform inline garbage collection of orphan blobs. However, the
dist-spec poses a problem because blobs begin their life as orphan blobs
and then a manifest is add which refers to these blobs.
We use umoci's GC() to perform garbage collection and policy support
has been added recently which can control whether a blob can be skipped
for GC.
In this patch, we use a time-based policy to skip blobs.
Go version changed to 1.14.4
Golangci-lint changed to 1.26.0
Bazel version changed to 3.0.0
Bazel rules_go version changed to 0.23.3
Bazel gazelle version changed to v0.21.0
Bazel build tools version changed to 0.25.1
Bazel skylib version changed to 1.0.2
An existing manifest descriptor in index.json can be updated with
different manifest contents for the same/existing tag. We were updating
the digest but not the size field causing GC to report an error.
Add a unit test case to cover this.
Add logs.
In a production use case we found that the actual rootdir can be moved.
Currently, cache entries for dedupe record the full blob path which
doesn't work in the move use case.
Only for dedupe cache entries, record relative blob paths.
Since we want to conform to dist-spec, sometimes the gc and dedupe
optimizations conflict with the conformance tests that are being run.
So allow them to be turned off via configuration params.
Upstream conformance tests are being updated, so we need to align along
with our internal GC and dedupe features.
Add a new example config file which plays nice with conformance tests.
DeleteImageManifest() updated to deal with the case where the same
manifest can be created with multiple tags and deleted with the same
digest - so all entries must be deleted.
DeleteBlob() delete the digest key (bucket) when last reference is
dropped
As the number of repos and layers increases, the greater the probability
that layers are duplicated. We dedupe using hard links when content is
the same. This is intended to be purely a storage layer optimization.
Access control when available is orthogonal this optimization.
Add a durable cache to help speed up layer lookups.
Update README.
Add more unit tests.
Clients today expect the repo to clean up if there are unused blobs, not to
manually delete things they think are unused. Let's do that, and use
umoci's code to do it since it's tested and works.
v2: also run GC on update as well as delete
v3: fix up error return paths needing two args
Signed-off-by: Serge Hallyn <shallyn@cisco.com>
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Fixes issue #67.
As per dist spec, DELETE of a image manifest can only be done with
digest as <reference> param. Previously, tags were being allowed as
well. This is not conformant to the spec.
dist-spec compliance tests are now becoming a part of dist-spec repo
itself - we want to be compliant
pkg/api/regex.go:
* revert uppercasing in repository names
pkg/api/routes.go:
* ListTags() should support the URL params 'n' and 'last'
for pagination
* s/uuid/session_id/g to use the dist-spec's naming
* Fix off-by-one error in GetBlobUpload()'s http response "Range" header
* DeleteManifest() success status code is 202
* Fix PatchBlobUpload() to account for "streamed" use case
where neither "Content-Length" nor "Content-Range" headers are set
pkg/storage/storage.go:
* Add a "streamed" version of PutBlobChunk() called PutBlobChunkStreamed()
pkg/compliance/v1_0_0/check.go:
* fix unit tests to account for changed response status codes