Implementation and further improvement of the classical Artistic Neural Style Transfer algorithm by Gatys et al.
The original algorithm by Gatys et al. is based on the idea of using a few intermediate layers of a pretrained neural
net to get features from the so-called "content" and "style" images and harnessing the obtained feature maps for
transferring the corresponding features from the style image to the content one.
However, the classical algorithm has several weaknesses. First, it works well only for images, which size is limited by
the sizes of the feature maps used. For example, if we are using convolutional layers from VGG19, which was trained
for images of resolution 224 X 224 X 3, we can't get a larger output image of a good quality with the
classical Gatys' algorithm.
Secondly, it is very sensitive to the content image's color contrast level. In the regions of low contrast
the algorithm converges very slowly leaving spots of "style baldness", while in the places where color gradient leaps
abruptly, it tends to noticeably diverge with some ugly distortions of the original content image. These effects can be clearly seen on the images of a bird below.
To overcome the above flaws the following improvements were implemented:
The final version shows fast convergence and good visual quality, which doesn't depend on the target resolution of the output image.
A few examples of quality for three resolution levels (see the next section for details). The first column corresponds to a single pyramid level (256 pixels
for the shortest dimension of the optimizing image); the second column - to 2 levels of pyramid (512 pixels); the third column -
to 3 levels (1024 pixels):
This algorithm implements an idea of minimizing loss function values for several sizes of the content-style pairs of images
simultaneously. To achieve that the following steps are performed:
With this approach all the scales of features are extracted from the input images - the large features are amplified on
the small image levels, while the small features are extracted on the large ones.
The modifications mentioned above work in a quite satisfactory manner for images that have more or less uniform contrast level.
But there is a noticeable problem with the image of a bird at a clean sky (see the 4 images above).
This picture suffers from 2 diseases: "the sky baldness" and "the edge distortion". Both could be overcome by adding
some artificially generated noise at the start of the optimization. But what kind of noise should be used?
This algorithm was implemented in a step-by-step manner, in a process of gradual improvement of the output image visual
quality.
Second, a multi-level structure with different granularity of the noise for each level was added. On each of
user-specified levels different sizes of noise spots are used, triggering style features of different scales.
The results of using the different noise scales approach (16 and 128 spots for the shortest dimension of the content image, i.e.
a large and a medium-sized noise, relatively), together with the corresponding noise maps, can be seen on the figures below.
Large-sized noise (noise map):
Large-sized noise (result):
Medium-sized noise (noise map):
Medium-sized noise (result):
There is a possibility to use the noise of different scales at the same time. For instance, while obtaining the promo images
shown at the top of the page, noise levels with 9, 18, and 36 spots for the shortest dimension of the content image were used,
together with the pixel-wide noise.
Third, instead of using a random normally-distributed color noise, it was decided to make a noise map by randomly permuting the pixels of the input style
image. This improvement took away the colors that are irrelative to the style.
Finally, a dependency on the absolute value of the local gradient was added (the larger gradient value is - the less
noise has to be added to this region, and vice versa), as well as gaussian-like envelopes for each noise level. The latter
modification was needed to lower the level of noise at the central part of an image, allowing for obtaining mostly
large-scale features from there, and thus visually distinguish the central foreground objects from the more detailed
background. The final version of the images can be seen on the promo images at the top of this page.
To install please clone the repository to your working machine and follow these simple steps:
Create and activate a new python virtual environment inside the project folder.
Also, make sure that the required system packages are installed:
$ sudo apt install python3.11-venv libgl1-mesa-glx libglib2.0-0 python3-pip
Then create the virtual environment itself:
$ cd ArtStyleTransfer $ python -m venv ./venv
Activate the new virtual environment:
$ . venv/bin/activate
Install all the packages, which are listed inside the files requirements-base.txt
and requirements-torch.txt
:
$ pip install -r requirements-base.txt $ pip install -r requirements-torch.txt
The project expects a GPU compatible with CUDA 12.1.
To run the web UI of the "lab", just execute the command:
$ python lab.py
To run the Telegram bot, first create a file token_DO_NOT_COMMIT.py
in the current directory
(see subsection Obtaining Telegram bot token)
After the bot is created and the token is obtained and set, you can just run the bot backend.
$ python tlbot.py
To run the Telegram bot, first create a file token_DO_NOT_COMMIT.py
in the current directory with the following content:
TOKEN = "YOUR_BOT_TOKEN"
Bot token "YOUR_BOT_TOKEN"
can be obtained via https://t.me/BotFather .
It is assumed that in the process of obtaining the token you will also create your own bot. The instructions of BotFatther
are self-explanatory.
It is also possible to use Dockerfile from the project folder to create a Docker image, which allows for automatically
running the Telegram bot. To do this, first create a file token_DO_NOT_COMMIT.py
in the current project directory
(see subsection Obtaining Telegram bot token). Then build the Docker image with the command
$ docker build -t ast .
and run it by
$ docker run ast
Two variants of asynchronous UI are implemented for the current project:
To use the bot, just send to it a pair of images in one message. The first image will be taken as a content image,
the second - as a style. The bot will start working, producing an intermediate result each 20% of the progress.
Run python lab.py
, after a couple of seconds (plus a bit of time for downloading the neural net pretrained data if you run
it for the first time) it will start producing images and reporting them. To see the report, open your
browser at http://<host>:8080
. The page doesn't update itself, refresh it manually.
The lab.py
app is not interactive. All the configuration is done in the code itself (see config.py
for the default settings).
The original code is located at my GitHub profile:
https://github.com/irenemizus/ArtStyleTransfer
There are no datasets linked
There are no datasets linked