From 77e988365d1c3a5ee67301c99c8992fbed7ce7bd Mon Sep 17 00:00:00 2001 From: Přemysl Eric Janouch
Date: Thu, 18 Jan 2024 18:16:18 +0100
Subject: Add some benchmarks and information
---
deeptagger/README.adoc | 119 +++++++++++++++++++++++++++++++++++++++---
deeptagger/bench-interpret.sh | 51 ++++++++++++++++++
2 files changed, 163 insertions(+), 7 deletions(-)
create mode 100755 deeptagger/bench-interpret.sh
(limited to 'deeptagger')
diff --git a/deeptagger/README.adoc b/deeptagger/README.adoc
index 8ea83cc..8d65dfe 100644
--- a/deeptagger/README.adoc
+++ b/deeptagger/README.adoc
@@ -2,24 +2,129 @@ deeptagger
==========
This is an automatic image tagger/classifier written in C++,
-without using any Python, and primarily targets various anime models.
+primarily targeting various anime models.
-Unfortunately, you will still need Python and some luck to prepare the models,
-achieved by running download.sh. You will need about 20 gigabytes of space.
+Unfortunately, you will still need Python 3, as well as some luck, to prepare
+the models, achieved by running download.sh. You will need about 20 gigabytes
+of space for this operation.
-Very little effort is made to make this work on non-Unix systems.
+"WaifuDiffusion v1.4" models are officially distributed with ONNX model exports
+that do not support symbolic batch sizes. The script attempts to fix this
+by running custom exports.
-Getting this to work
---------------------
+You're invited to change things to suit your particular needs.
+
+Getting it to work
+------------------
To build the evaluator, install a C++ compiler, CMake, and development packages
of GraphicsMagick and ONNX Runtime.
Prebuilt ONNX Runtime can be most conveniently downloaded from
https://github.com/microsoft/onnxruntime/releases[GitHub releases].
-Remember to install CUDA packages, such as _nvidia-cudnn_ on Debian,
+Remember to also install CUDA packages, such as _nvidia-cudnn_ on Debian,
if you plan on using the GPU-enabled options.
$ cmake -DONNXRuntime_ROOT=/path/to/onnxruntime -B build
$ cmake --build build
$ ./download.sh
$ build/deeptagger models/deepdanbooru-v3-20211112-sgd-e28.model image.jpg
+
+Very little effort is made to make the project compatible with non-POSIX
+systems.
+
+Options
+-------
+--batch 1::
+ This program makes use of batches by decoding and preparing multiple images
+ in parallel before sending them off to models.
+ Batching requires appropriate models.
+--cpu::
+ Force CPU inference, which is usually extremely slow.
+--debug::
+ Increase verbosity.
+--options "CUDAExecutionProvider;device_id=0"::
+ Set various ONNX Runtime execution provider options.
+--pipe::
+ Take input filenames from the standard input.
+--threshold 0.1::
+ Output weight threshold. Needs to be set very high on ML-Danbooru models.
+
+Model benchmarks
+----------------
+These were measured on a machine with GeForce RTX 4090 (24G),
+and Ryzen 9 7950X3D (32 threads), on a sample of 704 images,
+which took over eight hours.
+
+There is room for further performance tuning.
+
+GPU inference
+~~~~~~~~~~~~~
+[cols="<,>,>", options=header]
+|===
+|Model|Batch size|Time
+|ML-Danbooru Caformer dec-5-97527|16|OOM
+|WD v1.4 ViT v2 (batch)|16|19 s
+|DeepDanbooru|16|21 s
+|WD v1.4 SwinV2 v2 (batch)|16|21 s
+|WD v1.4 ViT v2 (batch)|4|27 s
+|WD v1.4 SwinV2 v2 (batch)|4|30 s
+|DeepDanbooru|4|31 s
+|ML-Danbooru TResNet-D 6-30000|16|31 s
+|WD v1.4 MOAT v2 (batch)|16|31 s
+|WD v1.4 ConvNeXT v2 (batch)|16|32 s
+|WD v1.4 ConvNeXTV2 v2 (batch)|16|36 s
+|ML-Danbooru TResNet-D 6-30000|4|39 s
+|WD v1.4 ConvNeXT v2 (batch)|4|39 s
+|WD v1.4 MOAT v2 (batch)|4|39 s
+|WD v1.4 ConvNeXTV2 v2 (batch)|4|43 s
+|WD v1.4 ViT v2|1|43 s
+|WD v1.4 ViT v2 (batch)|1|43 s
+|ML-Danbooru Caformer dec-5-97527|4|48 s
+|DeepDanbooru|1|53 s
+|WD v1.4 MOAT v2|1|53 s
+|WD v1.4 ConvNeXT v2|1|54 s
+|WD v1.4 MOAT v2 (batch)|1|54 s
+|WD v1.4 SwinV2 v2|1|54 s
+|WD v1.4 SwinV2 v2 (batch)|1|54 s
+|WD v1.4 ConvNeXT v2 (batch)|1|56 s
+|WD v1.4 ConvNeXTV2 v2|1|56 s
+|ML-Danbooru TResNet-D 6-30000|1|58 s
+|WD v1.4 ConvNeXTV2 v2 (batch)|1|58 s
+|ML-Danbooru Caformer dec-5-97527|1|73 s
+|===
+
+CPU inference
+~~~~~~~~~~~~~
+[cols="<,>,>", options=header]
+|===
+|Model|Batch size|Time
+|DeepDanbooru|16|45 s
+|DeepDanbooru|4|54 s
+|DeepDanbooru|1|88 s
+|ML-Danbooru TResNet-D 6-30000|4|139 s
+|ML-Danbooru TResNet-D 6-30000|16|162 s
+|ML-Danbooru TResNet-D 6-30000|1|167 s
+|WD v1.4 ConvNeXT v2|1|208 s
+|WD v1.4 ConvNeXT v2 (batch)|4|226 s
+|WD v1.4 ConvNeXT v2 (batch)|16|238 s
+|WD v1.4 ConvNeXTV2 v2|1|245 s
+|WD v1.4 ConvNeXTV2 v2 (batch)|4|268 s
+|WD v1.4 ViT v2 (batch)|16|270 s
+|WD v1.4 ConvNeXT v2 (batch)|1|272 s
+|WD v1.4 SwinV2 v2 (batch)|4|277 s
+|WD v1.4 ViT v2 (batch)|4|277 s
+|WD v1.4 ConvNeXTV2 v2 (batch)|16|294 s
+|WD v1.4 SwinV2 v2 (batch)|1|300 s
+|WD v1.4 SwinV2 v2|1|302 s
+|WD v1.4 SwinV2 v2 (batch)|16|305 s
+|WD v1.4 MOAT v2 (batch)|4|307 s
+|WD v1.4 ViT v2|1|308 s
+|WD v1.4 ViT v2 (batch)|1|311 s
+|WD v1.4 ConvNeXTV2 v2 (batch)|1|312 s
+|WD v1.4 MOAT v2|1|332 s
+|WD v1.4 MOAT v2 (batch)|16|335 s
+|WD v1.4 MOAT v2 (batch)|1|339 s
+|ML-Danbooru Caformer dec-5-97527|4|637 s
+|ML-Danbooru Caformer dec-5-97527|16|689 s
+|ML-Danbooru Caformer dec-5-97527|1|829 s
+|===
diff --git a/deeptagger/bench-interpret.sh b/deeptagger/bench-interpret.sh
new file mode 100755
index 0000000..ffad9c9
--- /dev/null
+++ b/deeptagger/bench-interpret.sh
@@ -0,0 +1,51 @@
+#!/bin/sh -e
+parse() {
+ awk 'BEGIN {
+ OFS = FS = "\t"
+ } {
+ name = $1
+ path = $2
+ cpu = $3 != ""
+ batch = $4
+ time = $5
+
+ if (path ~ "/batch-")
+ name = name " (batch)"
+ else if (name ~ /^WD / && batch > 1)
+ next
+ } {
+ group = name FS cpu FS batch
+ if (lastgroup != group) {
+ if (lastgroup)
+ print lastgroup, mintime
+
+ lastgroup = group
+ mintime = time
+ } else {
+ if (mintime > time)
+ mintime = time
+ }
+ } END {
+ print lastgroup, mintime
+ }' "${BENCH_LOG:-bench.out}"
+}
+
+cat <