“Drawing” Audio Curves

A small demo where I used my graphics library for audio processing.

In this example I drew two audio curves. For better tracking I used pulsed signals and a simple environment to eliminate background noises. The distance is mapped with a quadratic function to the volume level (this is most likely identical to OpenAL) Click on the image to Download the FLAC audio file:

Example: Audio Curves
Example: Audio Curves

Slideshow: Deconstruction of Optical Illusions

Human intelligence allows us introspectiveness. We can scrutinize our perception and identify problematic workflows in our biological “machine”.(Instead of displaying a BSOD).

I recently discovered two images of optical illusions in the WWW. I could not identify the source or the original author of the images (further information would be useful). Nevertheless I was interested in their construction.

Optical Illusion: Dots become invisible

The dots are placed inside a less pronounced Hermann-Grid illusion. It is difficult to distinguish real dots with imaginary spots.

The recreated Image in its original size can be viewed here: (link)

Source code: GitHub

Optical Illusion: Image Parts Starts To move

Different gray values are used to simulate spatial structures. The gray values are interpreted as illuminated and shadowed parts. By flipping the light source and slope in parts of the image, contradicting structures are generated. There are now at least four different interpretations of the spatial structure.

The recreated Image in its original size can be viewed here: (link)

Source code: GitHub

Will I Dream?

Anomaly Detection in Neural Networks.

Will I Dream?
Will I Dream? (Render Sequence made with Forced Perspective, Color Essence, Virtual Mannequin)

Training and usage of neural-networks is sensible for problem cases with an unknown or fuzzy domain. Consequently the complexity and the size of the belonging problem is likewise unknown. However it may be helpful to estimate the error-rate for non verifiable results.

Artificial Intelligence described as class inside computational complexity theory

It is necessary to identify AI as complexity class to obtain clues about its performance. Obviously neural-networks and other machine learning techniques can be run on Turing Machines and at least for some types of NNs it can be shown that they are Turing complete.

Otherwise we know that the results of NNs have heuristic characteristics. They can be both false-positive and false-negative. I also assume that they always stop and return a result.

The closest match I could find inside the class of search problems are Monte Carlo algorithms. The corresponding complexity class for decidable randomized problems is BPP.

Under these assumptions it is now possible to calculate a two dimensional map to predict the error rate (between 0 = infrared and 1 = ultraviolet). The X-Axis thereby defines the domain size and the Y-Axis the problem complexity:

errorexpect

It is also possible to inspect the result, if the error-rate does not grow linearly with the domain size..

errorsize

or with the complexity:

errorcompexity

Comparing macOS with Linux background network traffic

A short overview of unwittingly opened network connections.

A network traffic graph generated from my Mac running macOS Sierra. The data was collected during a 10 hour workday. The computer was started and left unused during this time. The upper half shows individual connections, each with its own color. Below the number of transmitted packages per time-frame is shown.
A network traffic graph generated from my Mac running macOS Sierra. The data was collected during a 10 hour workday. The computer was started and left unused during this time. The upper half shows individual connections, each with its own color. Below the number of transmitted packages per time-frame are shown.

Over a period of ten hours my Mac handled 157 different connections. Most of them are used to uphold LAN services like neighbor advertisement(NDP). I crated an application that creates small graphs for a visual comparison of network traffic.

Overview of Internet connections

NTP

Overall 115 connections via NTP were counted. From the incoming addresses one was striking: scan-04b.shadowserver.org. A fast search revealed that my firewall could use a rule-set update (Ref: https://ntpscan.shadowserver.org/). At first, a NTP query was started every minute and by the end of the 10 hour test the interval time extended to 20 minutes. These irregular intervals can be explained by NTPs update strategy to maintain the highest possible accuracy. If a high accuracy of less than 10 milliseconds is not needed, an alternative method available with Linux/SystemD can be used. With the systemd-timesyncd service the system clock is synchronized only once during boot:

Email and Social Media accounts

Email accounts are queried hourly without user activity.

Other Services

Other Services, for example up-to-date weather information are queried every three hours.

Jabber service

Apples Jabber service was activated every 10 Minutes.

The Cloud

The Mac connected itself once every hour to the cloud. Interestingly the data transmitted with Transport Layer Security (TLS) 1.2 never changed.

Application Updates

Applications, not distributed by the AppStore, have their own update intervals. At 10 o clock GpgTools decided to update itself.

Oddities

At two occasions the system tried to send defective tcp packages. Both were rejected by the router.

Linux
Linux Network Graph
Linux Network Graph

I stopped the test with Linux after three hours. The system was silent and did not open any connection outside the LAN.

Considerations

I suspect that my Mac has a very individual setup and differs a lot form other possible configurations, but it should be expected that macOS on average creates more background network traffic than a Linux distribution like ArchLinux.

From macOS To Linux:
Port Obstacles And Benchmarks

During the port of my macOS applications and libraries to Linux over the last few years, I was several times surprised by unexpected results and different behaviors between the two operation systems.

Retinal Pigment Epithelium and a Dynamic compression Filter
Example algorithm that was ported (optimized) for Linux: Applying a Retinal Pigment Epithelium-Filter combined with a Light Value Compression-Filter.

The first surprise was that Objective-Cs messaging (dynamic method calls) is fast and provides no notable disadvantage against C++.

In the following article I will show my experiences of porting and optimizing my image manipulating library used for Color Essence to Linux. I will do this by benchmarking one of my monochrome filters. The framework for the Color Essence library consists primarily of a histogram and modification step:

The read and write loops are executed in parallel while the inner block is executed in sequence. All images are processed in HDR(float). LDR images are converted. For benchmarks the image Char_(4088888924).jpg (Resolution: 2,372 × 3,160) (Link: commons.wikimedia) by DARREN ST0NE was used.

Algorithm

The RHD monochrome algorithm compacts unnecessary (near black) parts of the light value spectrum and also enhances it further by adding an unobtrusive saturation. This saturation imitates the normal blood circulation in the retina and is almost unnoticeable without a direct comparison to a grayscale image. The image contrast is further improved by a premultiplied noise. The Algorithm has a \(\mathcal{O}(2*n)\) runtime. In other words it should not be significantly slower than a copy operation. Because of a missing overview of monochrome filters, I cannot say that this algorithm is unique or new.

Benchmark results during Linux optimizations.
Benchmark results during Linux optimizations.

Building the original Source Code

Objective-C can be directly compiled and used on Linux. Basic data structures and algorithms are provided by the GnuStep framework. GnuStep is the Open Source branch of OpenStep OS which is also one of the direct ancestors of OS X. Although the development separated decades ago, it is in most cases still simple to cross compile a OS X application for Linux. What is missing are mostly utility functions and features that are back-ported from IOS like Storyboards, or the graphics API Mantle.

The port is simplified by using the same compiler (clang) for both operation systems. Moreover, recent efforts of Apple to distribute the programming language Swift on other operation systems helps. With the Linux Version of swift comes the library called libdispatch which is also necessary for Objective-C multi-threading.

Swift

Until recently I used Swift solely for the development of User Interfaces, but all to common language and API updates made me rethink this decision. Furthermore the Linux Swift ABI seems to be incompatible with GnuStep.

The Setup

For ArchLinux I’m using the default package:
clang version 3.9.0 (tags/RELEASE_390/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

The GnuStep libraries are directly build from sources (Link: github). With this setup and only few sourcecode modifications, a usable Linux build should be possible. Obviously the port suffers a lot of performance problems compared to the OS X reference. (bar 1 and 8)

The Math

Some time ago my Linux compiler started complaining that terms like If(x == x) always returns true. Up until then this was my only check for NAN results. I replaced this clause by If(isnormal(x)).

Compiler Flags

In contrast to a consumer version of an application, which has to work on any device type, it is now possible to create an optimized build for your server or Linux desktop. So instead of the default release flags:
OPT_RELEASE=-Os -fPIC -ffast-math
It is possible to use additional compiler parameters like:
OPT_RELEASE=-O3 -fPIC -ffast-math -march=native -fslp-vectorize-aggressive
Interestingly the resulting binary (bar 2) was considerably slower than the original build. Statistics collected with perf -stat indicates that the clangs SIMD optimization doesn’t work as expected (stalled backend cycles at 40%).

RANDOM

Further profiling revealed that premultiplied noise, created by a random generator was a major bottleneck. My original algorithm uses the FreeBSD variant arc4random() (libBSD). Replacing the function with the rand() from stdlib gives the first performance boost (bar 3).

RANDOM Array

The Linux rand() function is thread safe, but depends on a single source and is therefore time consuming. A small pre-computed random array can avoid this bottleneck (bar 4).

Disabling libDispatch

The simple framework described above allows a lock free read and write access on images. To utilize this feature it was necessary to replace the previous used OpenEXR library with a self-development. The new image API is optimized for vectorization and allows arbitrary channel and channel block selections. In theory this should result in a linear speedup of the calculation for each CPU core available. I therefore replaced the outer row loop with a dispatch queue. Similar to OpenMP, libDispatch creates a thread pool and divides the work between a workgroup. However, in this case the overhead of the thread creation exceeded the actual calculation time (Something that did not become apparent on macOS.). By disabling libdispatch, the Linux port first time surpasses the original algorithm performance (bar5).

OpenMP / OpenACC

Still not convinced that parallelism is out of scope I tried OpenMP (GCC, CLANG) and OpenACC (only GCC) with several options. The Dispatch:

was replaced by:

respectively with:

Finally the inner loop was reduced to:

But still the algorithm could not benefit from parallelism on Linux systems. It was not before the code was completely rewritten in C++ and OpenMP was fine-tuned for the actual input image that a minimal speed advantage could be measured:

Every worker processes 1580 lines per call. This value obviously depends on the image width and height and is used to reduce the work on two thread calls.

GCC/C++ Rewrite

GCC(g++) is the default C++ compiler on any Linux distribution. It is therefore obvious but also costly to port Objective-C code to C++. GnuStep dependencies have to be replaced by STL/Boost methods. Despite the effort the benchmark results are almost identical to the original OS X application(bar 7).

GPU Computing

Optimization with Cuda
Optimization with Cuda.

Alternatives for GPU Computing are Apples Metal Compute Shading, Nvidia Cuda, OpenCL, OpenGL Compute Shading and Vulkan Compute Shading. To save time and effort I used Nvidias Cuda for this port. Since Cuda compiles C/C++11 source code, the greatest challenge remaining was the correct selection of the datagrid and block size. (These values can be copied from one of the many image processing demos from the Cuda SDK.) The processing time of 0.357479 seconds still seems high but it includes the image copy to and from the GPU. The average kernel execution time calculated with the following code is 0.147714 seconds:

Recap

Depending on the used APIs a Linux build of a macOS application is a matter of hours. It opens the possibility for further optimizations and the use on Servers. MacOS multithreading is still better suited (faster) for desktop applications. With the rewrite of Objective-C to C++ no direct speed gain can be achieved but it allows a painless transition to HPC with Nvidias Cuda API.

Arithmetic Operations on Random Fields

Recovering steganograms after lossy operations.

The question if a steganogram can be recovered from a modified image depends on the executed operations during the modification process. It is advisable to add a header to simplify the identification.

Binary Operations

A bit is encoded in a random field by its position, radius and predefined probability. These parameters can vary during the encoding as long as the mapping is disjunct. In the following image the top left dot is the original bit, followed by AND, OR and XOR operations with fields of changing probabilities:

Binary Operation on a random Field
Binary Operation on a random Field

Combining two random fields with the binary operations AND or OR creates a result where the original bit (0/1) is recoverable. If the information of the encoding is known a complementary random field can be generated and the encoded bit can be erased with a XOR operation (last image). To complicate the erasure, the probability for the 1 bit can vary during the encoding. Any recoverable pseudo random function can be chosen but a noise map, like Perlin maps, create unobtrusive variations:
Adding Fluctuations wit a Noise Map
Adding Fluctuations wit a Noise Map

Complex Operations

Complex image operations like scale, rotate or blur are destructive as soon as the encoded bits are removed or coalesced:

Complex Operations On a Ramdom Field
Complex Operations On a Ramdom Field

By training a Bayesian network on the header data the remaining bits can be recovered.

Demo

The following video contains a steganogram with the technique described above and the previous blog post:

Current Demo of an embedded steganogram.
Current Demo of an embedded steganogram.