diff options
| author | Přemysl Eric Janouch <p@janouch.name> | 2024-01-18 18:16:18 +0100 | 
|---|---|---|
| committer | Přemysl Eric Janouch <p@janouch.name> | 2024-01-18 18:31:10 +0100 | 
| commit | 77e988365d1c3a5ee67301c99c8992fbed7ce7bd (patch) | |
| tree | fdae4f1c9f4bffa022802bf1c2ac55dbd1442136 /deeptagger | |
| parent | 8df76dbaab2912e86059f8c7d8e4d2abf350a5d3 (diff) | |
| download | gallery-77e988365d1c3a5ee67301c99c8992fbed7ce7bd.tar.gz gallery-77e988365d1c3a5ee67301c99c8992fbed7ce7bd.tar.xz gallery-77e988365d1c3a5ee67301c99c8992fbed7ce7bd.zip | |
Add some benchmarks and information
Diffstat (limited to 'deeptagger')
| -rw-r--r-- | deeptagger/README.adoc | 119 | ||||
| -rwxr-xr-x | deeptagger/bench-interpret.sh | 51 | 
2 files changed, 163 insertions, 7 deletions
| diff --git a/deeptagger/README.adoc b/deeptagger/README.adoc index 8ea83cc..8d65dfe 100644 --- a/deeptagger/README.adoc +++ b/deeptagger/README.adoc @@ -2,24 +2,129 @@ deeptagger  ==========  This is an automatic image tagger/classifier written in C++, -without using any Python, and primarily targets various anime models. +primarily targeting various anime models. -Unfortunately, you will still need Python and some luck to prepare the models, -achieved by running download.sh.  You will need about 20 gigabytes of space. +Unfortunately, you will still need Python 3, as well as some luck, to prepare +the models, achieved by running download.sh.  You will need about 20 gigabytes +of space for this operation. -Very little effort is made to make this work on non-Unix systems. +"WaifuDiffusion v1.4" models are officially distributed with ONNX model exports +that do not support symbolic batch sizes.  The script attempts to fix this +by running custom exports. -Getting this to work --------------------- +You're invited to change things to suit your particular needs. + +Getting it to work +------------------  To build the evaluator, install a C++ compiler, CMake, and development packages  of GraphicsMagick and ONNX Runtime.  Prebuilt ONNX Runtime can be most conveniently downloaded from  https://github.com/microsoft/onnxruntime/releases[GitHub releases]. -Remember to install CUDA packages, such as _nvidia-cudnn_ on Debian, +Remember to also install CUDA packages, such as _nvidia-cudnn_ on Debian,  if you plan on using the GPU-enabled options.   $ cmake -DONNXRuntime_ROOT=/path/to/onnxruntime -B build   $ cmake --build build   $ ./download.sh   $ build/deeptagger models/deepdanbooru-v3-20211112-sgd-e28.model image.jpg + +Very little effort is made to make the project compatible with non-POSIX +systems. + +Options +------- +--batch 1:: +	This program makes use of batches by decoding and preparing multiple images +	in parallel before sending them off to models. +	Batching requires appropriate models. +--cpu:: +	Force CPU inference, which is usually extremely slow. +--debug:: +	Increase verbosity. +--options "CUDAExecutionProvider;device_id=0":: +	Set various ONNX Runtime execution provider options. +--pipe:: +	Take input filenames from the standard input. +--threshold 0.1:: +	Output weight threshold.  Needs to be set very high on ML-Danbooru models. + +Model benchmarks +---------------- +These were measured on a machine with GeForce RTX 4090 (24G), +and Ryzen 9 7950X3D (32 threads), on a sample of 704 images, +which took over eight hours. + +There is room for further performance tuning. + +GPU inference +~~~~~~~~~~~~~ +[cols="<,>,>", options=header] +|=== +|Model|Batch size|Time +|ML-Danbooru Caformer dec-5-97527|16|OOM +|WD v1.4 ViT v2 (batch)|16|19 s +|DeepDanbooru|16|21 s +|WD v1.4 SwinV2 v2 (batch)|16|21 s +|WD v1.4 ViT v2 (batch)|4|27 s +|WD v1.4 SwinV2 v2 (batch)|4|30 s +|DeepDanbooru|4|31 s +|ML-Danbooru TResNet-D 6-30000|16|31 s +|WD v1.4 MOAT v2 (batch)|16|31 s +|WD v1.4 ConvNeXT v2 (batch)|16|32 s +|WD v1.4 ConvNeXTV2 v2 (batch)|16|36 s +|ML-Danbooru TResNet-D 6-30000|4|39 s +|WD v1.4 ConvNeXT v2 (batch)|4|39 s +|WD v1.4 MOAT v2 (batch)|4|39 s +|WD v1.4 ConvNeXTV2 v2 (batch)|4|43 s +|WD v1.4 ViT v2|1|43 s +|WD v1.4 ViT v2 (batch)|1|43 s +|ML-Danbooru Caformer dec-5-97527|4|48 s +|DeepDanbooru|1|53 s +|WD v1.4 MOAT v2|1|53 s +|WD v1.4 ConvNeXT v2|1|54 s +|WD v1.4 MOAT v2 (batch)|1|54 s +|WD v1.4 SwinV2 v2|1|54 s +|WD v1.4 SwinV2 v2 (batch)|1|54 s +|WD v1.4 ConvNeXT v2 (batch)|1|56 s +|WD v1.4 ConvNeXTV2 v2|1|56 s +|ML-Danbooru TResNet-D 6-30000|1|58 s +|WD v1.4 ConvNeXTV2 v2 (batch)|1|58 s +|ML-Danbooru Caformer dec-5-97527|1|73 s +|=== + +CPU inference +~~~~~~~~~~~~~ +[cols="<,>,>", options=header] +|=== +|Model|Batch size|Time +|DeepDanbooru|16|45 s +|DeepDanbooru|4|54 s +|DeepDanbooru|1|88 s +|ML-Danbooru TResNet-D 6-30000|4|139 s +|ML-Danbooru TResNet-D 6-30000|16|162 s +|ML-Danbooru TResNet-D 6-30000|1|167 s +|WD v1.4 ConvNeXT v2|1|208 s +|WD v1.4 ConvNeXT v2 (batch)|4|226 s +|WD v1.4 ConvNeXT v2 (batch)|16|238 s +|WD v1.4 ConvNeXTV2 v2|1|245 s +|WD v1.4 ConvNeXTV2 v2 (batch)|4|268 s +|WD v1.4 ViT v2 (batch)|16|270 s +|WD v1.4 ConvNeXT v2 (batch)|1|272 s +|WD v1.4 SwinV2 v2 (batch)|4|277 s +|WD v1.4 ViT v2 (batch)|4|277 s +|WD v1.4 ConvNeXTV2 v2 (batch)|16|294 s +|WD v1.4 SwinV2 v2 (batch)|1|300 s +|WD v1.4 SwinV2 v2|1|302 s +|WD v1.4 SwinV2 v2 (batch)|16|305 s +|WD v1.4 MOAT v2 (batch)|4|307 s +|WD v1.4 ViT v2|1|308 s +|WD v1.4 ViT v2 (batch)|1|311 s +|WD v1.4 ConvNeXTV2 v2 (batch)|1|312 s +|WD v1.4 MOAT v2|1|332 s +|WD v1.4 MOAT v2 (batch)|16|335 s +|WD v1.4 MOAT v2 (batch)|1|339 s +|ML-Danbooru Caformer dec-5-97527|4|637 s +|ML-Danbooru Caformer dec-5-97527|16|689 s +|ML-Danbooru Caformer dec-5-97527|1|829 s +|=== diff --git a/deeptagger/bench-interpret.sh b/deeptagger/bench-interpret.sh new file mode 100755 index 0000000..ffad9c9 --- /dev/null +++ b/deeptagger/bench-interpret.sh @@ -0,0 +1,51 @@ +#!/bin/sh -e +parse() { +	awk 'BEGIN { +		OFS = FS = "\t" +	} { +		name = $1 +		path = $2 +		cpu = $3 != "" +		batch = $4 +		time = $5 + +		if (path ~ "/batch-") +			name = name " (batch)" +		else if (name ~ /^WD / && batch > 1) +			next +	} { +		group = name FS cpu FS batch +		if (lastgroup != group) { +			if (lastgroup) +				print lastgroup, mintime + +			lastgroup = group +			mintime = time +		} else { +			if (mintime > time) +				mintime = time +		} +	} END { +		print lastgroup, mintime +	}' "${BENCH_LOG:-bench.out}" +} + +cat <<END +GPU inference +~~~~~~~~~~~~~ +[cols="<,>,>", options=header] +|=== +|Model|Batch size|Time +$(parse | awk -F'\t' 'BEGIN { OFS = "|" } +	!$2 { print "", $1, $3, $4 " s" }' | sort -t'|' -nk4) +|=== + +CPU inference +~~~~~~~~~~~~~ +[cols="<,>,>", options=header] +|=== +|Model|Batch size|Time +$(parse | awk -F'\t' 'BEGIN { OFS = "|" } +	$2 { print "", $1, $3, $4 " s" }' | sort -t'|' -nk4) +|=== +END | 
