Gekkota - synthesizer and audio composition program

view source code

Overview

Gekkota is a piece of software which generates audio data using algorithms to determine the sound wave frames using linear algebra. It uses C/C++/HIP to harness the power of CPU's and GPU's to build and create electronic music from algorithms and patterns. The application features both a live performance interface and a recording/sequencing user interface.

The engine multiprocesses between foreground and background activities to create music. Incoming ui or midi events trigger algorithms. These algorithms influence the data used on the CPU and GPU to output sound to the ALSA driver. The application leverages platforms on Linux, such as Wayland, the compositor and window manager which is the successor to Xorg, ALSA which is the userland driver for the Linux kernel, Blend2D which is a high performace 2D vector graphics engine, and the AMD ROCM/HIP platform for processing the sound generation on the GPU.

Objectives

The objective of Gekkota is to be a system that can synthesize transitions in sound shapes to create unuque mosaic effects for music. Further uses in synthesizing audio from 3D models, or generating analysis plugins is possible, though not in active development.

Architecture

Gekkota is written in C/C++ using HIP for GPU processing in a Linux environment. The main application is broken up into subfolders to handling the inputs, outputs, and processing of the system. Within the Gekkota application, there are seperate threads for managing the ui/midi instruments or processing the sound data on the CPU/GPU. These threads call, listen, or call out to their respective devices and gather information.

The sound algorithms are packed into a packet structure so they can easily be shared as memory to the GPU for processing. Most of the arithmetic/vector calculations take place on the GPU in C code using the struct gka_entry arrays (see audio-segment.h for details). These segment arrays are configured to be navigated by both the CPU/GPU for ease of sharing in order to both compose sounds (CPU) and render them to frames (GPU). architecture diagram

GPU calculation stages

In order for sound to be calculated in parallel, the time sensitive parts must be considered, sounds can happen in absence of each other, but sound waves are periodic and thereby require a component of in series processing. Because of this, the GPU computation process happens in three stages (see hip-calculations.hpp for more details on how the kernels run).
  1. Calculate the steps for a given frame. This happens as a frame per thread (at 192khz per second, for 1 1/100th of a second this is 1920 GPU threads). These are the steps of difference in the sine wave phase positions from the last time it was rendered to a frame. This changes with frequency becuase if a higher frequency is used the sine wave will be smaller.
  2. Calculate the phase of the sound for the given frame. This is dependent on the frames preceeding it, so this parallel processing is done one thread per sound, not one thread per frame.
  3. Generate frames and fold them all into one value per frame for the specified period. In this calculation, using the outcome of steps 1 and 2, we again calculate using one thread per frame.
stages diagram In the below example, four sounds are used to generate frames, each frame represents one value in the PCM output. For example, in the output format 24bit 192khz, the final frame cells would be converted from doubles to be one of 192,000 24 bit integers that would play per second for each left or right audio channel.
soundsframes
melody10.120.130.140.15...
melody20.000.000.000.10...
kick-0.01-0.000.010.02...
snare0.230.250.300.28...
final frame0.340.380.450.55...

Current state

The present state of Gekkota is that the calculation and memory architectures are in place along with most of the base ui infrastructure. The engine successfully processes sounds according to the algorithms for volume and frequency, using the GPU to expedite the calculations. Visualization of the data is in place, but utilization of the events to propogate information is limited. MIDI divices are used directly as binary packets for playing live.

Roadmap