Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

add unarchiver example #425

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 65 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# archiver [![Go Reference](https://pkg.go.dev/badge/github.com/mholt/archiver/v4.svg)](https://pkg.go.dev/github.com/mholt/archiver/v4) [![Ubuntu-latest](https://github.com/mholt/archiver/actions/workflows/ubuntu-latest.yml/badge.svg)](https://github.com/mholt/archiver/actions/workflows/ubuntu-latest.yml) [![Macos-latest](https://github.com/mholt/archiver/actions/workflows/macos-latest.yml/badge.svg)](https://github.com/mholt/archiver/actions/workflows/macos-latest.yml) [![Windows-latest](https://github.com/mholt/archiver/actions/workflows/windows-latest.yml/badge.svg)](https://github.com/mholt/archiver/actions/workflows/windows-latest.yml)

Introducing **Archiver 4.0** - a cross-platform, multi-format archive utility and Go library. A powerful and flexible library meets an elegant CLI in this generic replacement for several platform-specific or format-specific archive utilities.
Introducing **Archiver 4.0 (alpha)** - a cross-platform, multi-format archive utility and Go library. A powerful and flexible library meets an elegant CLI in this generic replacement for several platform-specific or format-specific archive utilities.

**:warning: v4 is in ALPHA. The core library APIs work pretty well but the command has not been implemented yet, nor have most automated tests. If you need the `arc` command, stick with v3 for now.**

Expand All @@ -11,8 +11,8 @@ Introducing **Archiver 4.0** - a cross-platform, multi-format archive utility an
- By file name
- By header
- Traverse directories, archive files, and any other file uniformly as [`io/fs`](https://pkg.go.dev/io/fs) file systems:
- [`DirFS`](https://pkg.go.dev/github.com/mholt/archiver/v4#DirFS)
- [`FileFS`](https://pkg.go.dev/github.com/mholt/archiver/v4#FileFS)
- [`DirFS`](https://pkg.go.dev/github.com/mholt/archiver/v4#DirFS)
- [`ArchiveFS`](https://pkg.go.dev/github.com/mholt/archiver/v4#ArchiveFS)
- Compress and decompress files
- Create and extract archive files
Expand Down Expand Up @@ -91,11 +91,12 @@ if err != nil {
}
defer out.Close()

// we can use the CompressedArchive type to gzip a tarball
// we can use the Archive type to gzip a tarball
// (compression is not required; you could use Tar directly)
format := archiver.CompressedArchive{
format := archiver.Archive{
Compression: archiver.Gz{},
Archival: archiver.Tar{},
Extraction: archiver.Tar{},
}

// create the archive
Expand All @@ -111,26 +112,16 @@ The first parameter to `FilesFromDisk()` is an optional options struct, allowing

Extracting an archive, extracting _from_ an archive, and walking an archive are all the same function.

Simply use your format type (e.g. `Zip`) to call `Extract()`. You'll pass in a context (for cancellation), the input stream, the list of files you want out of the archive, and a callback function to handle each file.

If you want all the files, pass in a nil list of file paths.
Simply use your format type (e.g. `Zip`) to call `Extract()`. You'll pass in a context (for cancellation), the input stream, and a callback function to handle each file.

```go
// the type that will be used to read the input stream
format := archiver.Zip{}

// the list of files we want out of the archive; any
// directories will include all their contents unless
// we return fs.SkipDir from our handler
// (leave this nil to walk ALL files from the archive)
fileList := []string{"file1.txt", "subfolder"}
var format archiver.Zip

handler := func(ctx context.Context, f archiver.File) error {
err := format.Extract(ctx, input, func(ctx context.Context, f archiver.File) error {
// do something with the file
return nil
}

err := format.Extract(ctx, input, fileList, handler)
})
if err != nil {
return err
}
Expand All @@ -141,7 +132,7 @@ if err != nil {
Have an input stream with unknown contents? No problem, archiver can identify it for you. It will try matching based on filename and/or the header (which peeks at the stream):

```go
format, input, err := archiver.Identify("filename.tar.zst", input)
format, input, err := archiver.Identify(ctx, "filename.tar.zst", input)
if err != nil {
return err
}
Expand All @@ -154,8 +145,8 @@ if ex, ok := format.(archiver.Extractor); ok {
}

// or maybe it's compressed and you want to decompress it?
if decom, ok := format.(archiver.Decompressor); ok {
rc, err := decom.OpenReader(unknownFile)
if decomp, ok := format.(archiver.Decompressor); ok {
rc, err := decomp.OpenReader(unknownFile)
if err != nil {
return err
}
Expand All @@ -165,13 +156,58 @@ if decom, ok := format.(archiver.Decompressor); ok {
}
```

`Identify()` works by reading an arbitrary number of bytes from the beginning of the stream (just enough to check for file headers). It buffers them and returns a new reader that lets you re-read them anew.
`Identify()` works by reading an arbitrary number of bytes from the beginning of the stream (just enough to check for file headers). It buffers them and returns a new reader that lets you re-read them anew. If your input stream is `io.Seeker` however, no buffer is created (it uses `Seek()` instead).

### Automatically identifying formats and extracting archives

Combining the above two features, you can automatically identify the format of an input stream and extract it:

```go
func Unarchive(tarball, dst string) error {
f, err := os.Open(tarball)
if err != nil {
return fmt.Errorf("open tarball %s: %w", tarball, err)
}
// Identify the format and input stream for the archive
format, input, err := archiver.Identify(tarball, f)
if err != nil {
return fmt.Errorf("identify format: %w", err)
}

// Check if the format supports extraction
extractor, ok := format.(archiver.Extractor)
if !ok {
return fmt.Errorf("unsupported format for extraction")
}

// Ensure the destination directory exists
if err := createDir(dst); err != nil {
return fmt.Errorf("creating destination directory: %w", err)
}

// Extract files using the official handler
handler := func(ctx context.Context, f archiver.File) error {
log.Printf("Processing file: %s", f.NameInArchive)
return handleFile(f, dst) // implement handleFile to write the file to destination
}

// Use the extractor to process all files in the archive
if err := extractor.Extract(context.Background(), input, nil, handler); err != nil {
return fmt.Errorf("extracting files: %w", err)
}

log.Printf("Unarchiving completed successfully.")
return nil
}
```

See the [example](./examples/unarchiver) for details.

### Virtual file systems

This is my favorite feature.

Let's say you have a file. It could be a real directory on disk, an archive, a compressed archive, or any other regular file. You don't really care; you just want to use it uniformly no matter what it is.
Let's say you have a file. It could be a real directory on disk, an archive, a compressed archive, or any other regular file (or stream!). You don't really care; you just want to use it uniformly no matter what it is.

Use archiver to simply create a file system:

Expand All @@ -182,7 +218,7 @@ Use archiver to simply create a file system:
// - a compressed archive ("example.tar.gz")
// - a regular file ("example.txt")
// - a compressed regular file ("example.txt.gz")
fsys, err := archiver.FileSystem(filename)
fsys, err := archiver.FileSystem(ctx, filename, nil)
if err != nil {
return err
}
Expand Down Expand Up @@ -212,7 +248,7 @@ if dir, ok := f.(fs.ReadDirFile); ok {
return err
}
for _, e := range entries {
fmt.Println(e.Name())
fmt.Println(e.Extension())
}
}
```
Expand All @@ -225,7 +261,7 @@ if err != nil {
return err
}
for _, e := range entries {
fmt.Println(e.Name())
fmt.Println(e.Extension())
}
```

Expand All @@ -247,6 +283,8 @@ if err != nil {
}
```

**Important .tar note:** Tar files do not efficiently implement file system semantics due to their roots in sequential-access design for tapes. File systems inherently assume random access, but tar files need to be read from the beginning to access something at the end. This is especially slow when the archive is compressed. Optimizations have been implemented to amortize `ReadDir()` calls so that `fs.WalkDir()` only has to scan the archive once, but they use more memory. Open calls require another scan to find the file. It may be more efficient to use `Tar.Extract()` directly if file system semantics are not important to you.

#### Use with `http.FileServer`

It can be used with http.FileServer to browse archives and directories in a browser. However, due to how http.FileServer works, don't directly use http.FileServer with compressed files; instead wrap it like following:
Expand All @@ -257,7 +295,7 @@ http.HandleFunc("/", func(writer http.ResponseWriter, request *http.Request) {
// disable range request
writer.Header().Set("Accept-Ranges", "none")
request.Header.Del("Range")

// disable content-type sniffing
ctype := mime.TypeByExtension(filepath.Ext(request.URL.Path))
writer.Header()["Content-Type"] = nil
Expand Down
25 changes: 25 additions & 0 deletions examples/unarchiver/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
module unarchiver

go 1.22.7

require github.com/mholt/archiver/v4 v4.0.0-alpha.8

require (
github.com/andybalholm/brotli v1.0.4 // indirect
github.com/bodgit/plumbing v1.2.0 // indirect
github.com/bodgit/sevenzip v1.3.0 // indirect
github.com/bodgit/windows v1.0.0 // indirect
github.com/connesc/cipherio v0.2.1 // indirect
github.com/dsnet/compress v0.0.1 // indirect
github.com/golang/snappy v0.0.4 // indirect
github.com/hashicorp/errwrap v1.0.0 // indirect
github.com/hashicorp/go-multierror v1.1.1 // indirect
github.com/klauspost/compress v1.15.9 // indirect
github.com/klauspost/pgzip v1.2.5 // indirect
github.com/nwaples/rardecode/v2 v2.0.0-beta.2 // indirect
github.com/pierrec/lz4/v4 v4.1.15 // indirect
github.com/therootcompany/xz v1.0.1 // indirect
github.com/ulikunitz/xz v0.5.10 // indirect
go4.org v0.0.0-20200411211856-f5505b9728dd // indirect
golang.org/x/text v0.3.8 // indirect
)
Loading