-
Notifications
You must be signed in to change notification settings - Fork 0
The Instrument
The "Instrument" is a project to develop a Java based application system by building on a general purpose Signal Processing engine. In the first instance this platform provides a framework for the implementation of a Music Information Processing application. The aim is to deliver tools and capabilities that can be used to experiment creatively with musical sound by extracting, analysing and synthesising existing and new musical data features for a given input.
The purpose of my doing this "side" project was to:
- Strengthen and update my core Java skills, especially when I was seeking my last role.
- Be a fun and playful creative exercise to provide really useful tools to support my hobby as an experimental musician and film artist as this gives me joy. (Spotify and YouTube)
The work is based on ideas and techniques that I have taken from the many wonderful insights in the rich field of "MIR", (Music Information Retrieval,) research and practice. I have also derived and built on many examples of Java signal processing code from here on Github and beyond. Please see Acknowledgements section below, but most of all I must credit "Meinard Muller" for his great and unique text book, "Fundamentals of Music processing" (Amazon Books) and also the developers of the Java project, "Tarsos" which I have used as the core library for Java audio signal processing. (Tarsos on Github)
Please refer to the project Instrument Project README and code for more details on technical aspects of the implementation. However, briefly, my aim here was to use pure, core Java 17 built as a Maven project.
The overall design pattern of "perception, "thought" and "action" playfully models that of the elements of the "body/mind" as an organism.
Signal input in the form of an Audio waveform based stream of data samples is read in and analysed in multiple dimensions to extract useful musical sound related features. A layered network of dynamic time based processing units derives, integrates and aggregates an internal data model of musical information, including harmonic, melodic and rhythmic features. A continuous processing pipeline manages a stream of data and builds a complex model that is used to generate and synthesise Musical artefacts for output via sound playback, storage and even feedback into the system.
A modular approach allows for the delivery of alternative client implementations on top of the core engine. These include; Desktop, (Java packaged as cross-platform installable on a PC,) Command line batch and Cloud service using AWS Lambda functions. For the latter case a separate project, (please see instrumentamper, and the web app site, Instrument Amp,) provides a "React" based web app which can be used to access the "Instrument" Cloud Service on AWS and is suitable for use on a mobile phone or any browser.
All aspects of the system are fully configured with reusable and externalised key/value property files. The system can be monitored and controlled through a rich UI on the desktop or in a more simple, one-shot mode on command line or through the web app browser UI.
The following screenshots show examples of the "Desktop" format Instrument Java Swing based UI.
The following screenshots show an example of the "mobile" format Instrument Amp web page.
Input: Bird Song WAV file
XC128838.-.Common.Blackbird.-.Turdus.merula.merula.mp4
Output: MIDI Voice Tracks
XC128838.-.Common.Blackbird.-.Turdus.merula.merula_recording_track_voices.1.mp4
Output: MIDI Ensemble Tracks
XC128838.-.Common.Blackbird.-.Turdus.merula.merula_recording_track_ensemble.mp4
package: jomu.instrument
Main application container, and root launch and lifecycle control.
Organ API - Common base API for the organisation of system modules, composed as a hierarchy of interacting Component parts sharing a coordinated lifecycle. The instrument is built as a Body of Organss.
Instrument component - Main root container and lifecycle manager.
Manages lifecycle of:
- Controller
- Storage
- Workspace
- Console
- Coordinator
package: jomu.instrument.monitor
User interface for monitoring and application control functions.
Console component manages lifecyle of:
- Visor API defines user interface.
package: jomu.instrument.control
System Configuration control and process coordination functions.
Controller API manages lifecycle of:
- ParameterManager
Coordinator API manages lifecycle of:
- Cortex
- Hearing
- Voice
package: jomu.instrument.perception
Input and pre-processing of external signal sources, including;
Hearing component processes Audio signals as sampled sound wave form data from file or microphone source.
package: jomu.instrument.cognition
Internal data processing.
Dynamic multi-layer network of Processing Cells.
Cortex component manages lifecyle of the complex data model processing network composed of NuCell components.
Each Cell contains a specialised instance of a Processor that acts on the items in the data stream, including layers of Cells specific to the processing of Audio data streams.
Processors update internal data model state through the Workspace API.
package: jomu.instrument.actuation
Pre processing and output from the results of the internal data processing.
Voice component processes internal data model and produces Audio and Midi output through sound players and as file.
package: jomu.instrument.workspace
Data model for processed data.
Atlas component contains collections of ToneMap components that encapsulate dimensions of data from the processing streams.
InstrumentSessionManager component contains status information for the current user session and processing runs.
package: jomu.instrument.storage
Data store for managing persistence of processed data.
Storage component manages:
- ObectStore
- FrameStore
- InstrumentStoreService
- instrument : Parent module
- instrument-core : Core signal processing framework and functionality (as per design above.)
- instrument-desktop : PC Java Desktop implementation of instrument API including Swing UI, MicroStream DB and local file store.
- instrument-cmd : Simple implementation of instrument API for command line operation.
- instrument-st : System tests for Core Instrument funtions.
- instrument-aws : AWS based Cloud Services parent module.
- instrument-s3handler : AWS Cloud Services implementation of instrument API including Lambda functions and S3 store.
- instrument-s3handler-cdk : AWS Cloud Services "CDK" infrastructure-as-code implementation of deployable AWS service components.
- instrument-s3handler-st : System tests for AWS Cloud Services implementation of instrument API.
High level view of Process flow through System modules
Audio Stream Processing Pipeline
Contains components for managing User Interaction.
API for components to implement various forms of UI. For example implementation using Java SWING for Desktop application.
Functions include:
- Display various graphical Views of internal data from Workspace objects.
- Provide controls to operate the system, e.g. load audio files,
- Set values for system Configuration parameters.
Flow: Provides controls to initiate Audio processing flow through the Coordinator API
Contains components for managing and configuring system state.
Component coordinates central processing actions
Functions include:
- Initialise Audio processing sub systems
- Set values for System Configuration parameters.
Flow: Start Audio processing stream through Hearing component
Contains components for handling external system signal input and pre-processing.
Component manages input processing of Audio data.
Functions include:
- Accept request to start processing an Audio data source.
- Accept stream from Microphone or File.
- Create a new AudioStream object to encapsulate processing functions for a given streamId.
- Set up an AudioFeatureProcessor and add to AudioDispatcher.
- Start the AudioStream process.
AudioFeatureProcessor handles streamed Audio data from the input and derives a diverse set of useful Audio features from a collection of AudioEventSource objects.
Features are sliced up and grouped over a given incremental 100ms time frame in a sequence of AudioEventFeatures, by sequence number. For each such frame a signal is sent to the Source cell at the head of the processing network managed by the Cortex sub system with the Cognition module. The time frame increment is configurable but 100ms is a reasonable compromise to provide resolution and response requre by the across the system .
Contains components for handling internal data processing and the transformation of input to output.
Manages a network of Cell components. for handling external system signal input and pre-processing.
Functions include:
- Initially uses Weaver component to build the processing network by connecting together a collection of Cells, each associated with a specific type of Processor. Cells instances created using Generator component.
- Cells can be connect together from input to output in a many-to-many relationship.
- The network is composed of layers within which are groups of Cells so allowing for a series of parallel and sequential processing events on data frames in the current, ongoing stream sequence from input Source to output Sink.
- Each processor Cell accepts input on an asynchronous queue and processes data frame on its own dedicated Thread.
- AudioEventFeatures for each sequence feed into the processes that gradually and in cooperation with each other, build a data model in the Workspace
- Within the Workspace, many varieties of ToneMaps and other data items are generated to contain data as calculated by the processors.
- Useful musical information is created by deriving, integrating and synthesising Musical information relating to 12 tone Pitch, Note, Tone, Rhythm, Chords, Beats and so on.
- A final Sink process sends a signal for each incoming frame stream sequence to the Voice component for subsequent output processing.
Contains components for handling output data processing depending on the forms of action as configured.
Component to manage Audio data output processing functions.
Functions include:
- Manage MIDI and Audio Wave Players.
- Accept data frames from the Cortex Sink cell and generate MIDI data based on musical information in the Workspace data model for the given stream sequence data frame.
- MIDI sub systems configured the ParameterManager including instrument channels, volume, beat timing and so on.
- Write MIDI tracks to files on file system as configured.
Contains components for managing internal data model.
- Contains a Map of ToneMaps
- ToneMaps contain ToneTimeFramess that contain ToneMapElements and other items that encapsulate useful Musical information as extracted for each Audio stream by the ongoing processes in the Cortex
- Contains a Map of ToneMaps
- Component that contains status information for the current user session and processing runs.
Contains components for managing access to persistent storage of data across the system.
API for components to implement various forms of persistent file data storage. For example, the implementation using local file system and MicroStream IO Desktop application. Or the implemention using S3 object store for the AWS Cloud service version.
API for components to implement various forms of persistent model data storage. For example, the implementation using local file system and MicroStream IO Desktop application.
Summary description of the function of the main components in each sub system module.
Supports the following functions:
- Audio File and Microphone line input. Build Buffered Input Streams of Audio data.
- Support automatic Audio File format conversion to WAV file as required internally
- Set up Audio Calibration of power levels across the Audio stream.
- Creates a new AudioStream object to encapsulate processing functions for a given StreamId.
- Set up AudioDispatcher to drive chain of audio data sample buffer processors.
- Set up an AudioFeatureProcessor which contains a set of AudioEventSource objects.
- Add AudioFeatureProcessor to AudioDispatcher.
- Start the AudioStream process run.
- Manage closing and shutdown of AudioStream at end of process or on stop command.
The AudioFeatureProcessor manages processing of streamed Audio data from the input and derives a diverse set of useful Audio features from a collection of AudioEventSource objects.
Features are sliced up and grouped over a given incremental 100ms time frame in a sequence of AudioEventFeatures, by sequence number. For each such frame a signal is sent to the Source cell at the head of the processing network managed by the Cortex sub system with the Cognition module. The time frame increment is configurable but 100ms is a reasonable compromise to provide resolution and response required by the across the system
The Cortex manages a network of Cell components each with an associated Processor.
- Weaver.
Binds together the Cortex Cell processing network. Cells chained together via Dendrite input and Axon output connectors to form layered groups of linked Cell processing units.
- Generator.
Used by Weaver to build specific instances of Processors to bind to given Cells as defined by the CellType.
- Cell
A Cell is the basic abstract holder for processing unit functions within the Cortex network. It is defined by a CellType and provides a lifecycle API.
- NuCell
A NuCell extends the basic Cell class and functions as a general node within the Cortex network by managing links between upstream and downstream nodes. The API is modelled as a simple "Neuron" like structure with a set of Dendrites for input and an Axon for output connections. Each cell has its own BlockingQueue to receive incoming signals. When a sufficient signal threshold rule is received then the associated Processor function is invoked. The Processor takes a message off the queue containing StreamId and frame Sequence Number which are used to act on the parts of the Workspace data model. As a result of processing some new frames of data are generated for subsequent processing further downstream. The idea of the NUCell component comes from both the concepts of a Perceptron and that of a UGen, the latter being common in audio signal processing applications, (e.g. as described in this project Beads.)
- Cell Processors
A set of various Cell Processors extend the ProcessorCommon base class and provide specific processing function, some of which are described by the components below.
This diagram illustrates the NuCell scheme:
Bridge from Hearing into the Cortex processing network
- Source Input
A cortex cell Processor that accepts the initial trigger signal from the Hearing AudioFeatureProcessor. This signal contains the StreamId and Sequence Number that indicates that a new set of AudioEventFeatures are ready for processing by the Cortex network. The processor acts as bridge and passes the signal on to the next Differentiation layer in the network.
Initial extract and transformations of audio feature into internal ToneMap data models. This layer will only process data from the single current sequence time frame.
- ConstantQ
A cortex cell Processor that creates a specific ToneMap frame using data from the ConstantQFeatures as extracted from the Audio stream. The Constant Q transform function is provided by the Tarsos DSP library. Constant Q provides a good basis for deriving musically relevant 12 tone pitched data from the sampled audio data source with the time frame scale as defaulting to 100ms. ConstantQ
The Processor further applies many, optional filters as defined by the system configuration, including:
Calibration, Normalisation, Envelope Whitening, Harmonic Overtone filtering, Pitch Semitone Sharpening, Amplitude (Compression, Squaring, Low Thresholding, Scaling, Adaptive Whitening and Decibel Scaling.)
- FFT PitchDetect
A cortex cell Processor that creates a specific ToneMap frame using data from the PitchDetectorFeatures as extracted from the Audio stream. These features are derived from a data source that uses the SDFT pitch detection algorithm with a PitchProcessor as provided by the Tarsos DSP library. Discrete-time STFT
The Processor (and others below,) further applies a pitch detection algorithm, as yet far from complete, if partially based on great work of A. Klappuri. [A. P. Klapuri, “Multiple fundamental frequency estimation based on harmonicity and spectral smoothness,”] https://www.ee.columbia.edu/~dpwe/papers/Klap03-multif0.pdf)
- SpectralPeaks
A cortex cell Processor that creates a specific ToneMap frame using data from the SpectralPeaksFeatures as extracted from the Audio stream. These features are derived from a data source that uses the SDFT pitch detection algorithm with a SpectralPeaksProcessor as provided by the Tarsos DSP library.
- YIN
A cortex cell Processor that creates a specific ToneMap frame using data from the YINFeatures as extracted from the Audio stream. These features are derived from a data source that uses the YIN pitch detection algorithm with a PitchProcessor as provided by the Tarsos DSP library. The YIN algorithm aims to derive a single dominant pitch which may be of use in some cases. YIN
- Cepstrum
A cortex cell Processor that creates a specific ToneMap frame using data from the CepstrumFeatures as extracted from the Audio stream. These features are derived from a data source that uses a custom Autocorrelation algorithm. CEPSTRUM Autocorrelation (https://en.wikipedia.org/wiki/Cepstrum)
- MFCC
A cortex cell Processor that creates a specific ToneMap frame using data from the MFCCFeatures (Mel-frequency cepstral coefficients,) as extracted from the Audio stream. These features are derived from a data source that uses a custom Autocorrelation algorithm. MFCC
- SACF
A cortex cell Processor that creates a specific ToneMap frame using data from the SACFFeatures, (,) as extracted from the Audio stream. These features are derived from a data source that uses a custom Autocorrelation algorithm. SACF
- Beat Detect
A cortex cell Processor that creates a specific ToneMap frame using data from the BeatFeatures as extracted from the Audio stream. These features are derived from a data source that uses the ComplexOnsetDetector as provided by the Tarsos DSP library.
- Percussion
A cortex cell Processor that creates a specific ToneMap frame using data from the PercussionFeatures as extracted from the Audio stream. These features are derived from a data source that uses the PercussionOnsetDetector as provided by the Tarsos DSP library.
Further processing of output from the Differentiation Layer to derive further useful features in the Workspace models. This layer may process data from across a series of time sequence frames, current and "historical".
- Onset Detect
A cortex cell Processor that creates a specific ToneMap frame derived from data generated by the ConstantQ processor as above. Onsets as delta amplitude are detected by comparing previous time data frames with current data frame for a given stream sequence.
- Chroma
A cortex cell AudioChromaPreProcessor and AudioChromaPostProcessor that create a set of specific ToneMap frames of "Chromatic Chord" based musically useful information as derived from data generated by the ConstantQ and other pitch extraction processors as above.
ChordNotes for a given ToneMap time frame are calculated by using the "CENS", (Chroma Energy Normalized Statistics) technique. An aggregation and summary of power within 12 tones across the whole spectral frame is derived with a degree of temporal smoothing and aggregation. Meinard Muller and CENS
- HPS
A cortex cell AudioHpsProcessor that creates a set of specific ToneMap frames of "Harmonic/Percussive Separated" musically useful information as derived from data generated by the ConstantQ and other pitch extraction processors as above.
Pitch amplitudes for a given ToneMap time frame are calculated by using the "HPS" technique. Alternative frames represent either power smoothed "horizontally" over multiple time frames to extract harmonically rich elements or a frame smoothed "vertically" across pitch to isolate more perscussive elements and from each further produce masking frames. Meinard Muller and HPS
Processing of output from the Derivation and Differentiation Layer to combine and aggregate further useful features in the Workspace models. This layer may process data from across a series of time sequence frames, current and "historical".
- Integrate
A cortex cell AudioIntegrateProcessor that creates a set of ToneMap frames of data merged from all of the above upstream processors.
- Notate
A cortex cell AudioNotateProcessor that creates a set of ToneMap frames of data derive from merged from Integrate layer.
Uses AudioTuner functions to scan across time frames and generate a set of NoteListElements that each describe an individaul candidate "Note" - as a pitch that exists with a certain duration across time "frames".
Final layer of processing, from output of all the above layers to synthesise a full description of useful musical information in terms of hamronics, pitched notes, chors , rythm ad so on. This layer may process data from across a series of time sequence frames, current and "historical".
- Synthesis
A cortex cell AudioSynthesisProcessor that creates a set of ToneMap frames of data derived upstream layers.
Assembles data from ToneMaps for pitch note, chord and beat and uses ToneSynthesiser and NoteTrackerSynthesiser functions to build a final aggregation of musical information across all frames in the stream.
Bridge from Cortex processing network out to the Voice sub system process.
- Sink Output
A cortex cell AudioSinkProcessor that creates a calls the Voice musical data and sound output functions, passing reference to incoming frames from the Synthesis layer.
Signals process network close down operations for a given StreamId when the last Sequence number frame for the current stream is detected.
Entity Diagram of ToneMap Data Model:
- Atlas
A main directory for access to ToneMaps by key.
- ToneMap
Contains summary data and a set of ToneTimeFrames for a specific type of information as generated by the Cortex process run for a given stream.
- ToneTimeFrame
Contains summary data and a set of ToneMapElementss for a specific type of information as generated by the Cortex process run for a given stream.
- ToneMapElement
Attributes describe tone power values for a specific pitch index in the context of an associated ToneTimeFrame.
- PitchSet
Defines the "Pitch" dimensions for a ToneTimeFrame in terms of the associated range of 12 tone based pitch and midi note index values.
- TimeSet
Defines the "Time" dimensions for a ToneTimeFrame in terms of the start time each frame is created.
- NoteListElement
Attributes describe a candidate pitched "note" entity in the context of an associated ToneTimeFrame in terms of start time, duration, pitch index and amplitude.
- ChordListElement
Attributes describe a candidate pitched "chroma chord" entity in the context of an associated ToneTimeFrame in terms of start time, duration, possible 12 note pitch indexes and amplitudes each defined by ChordNotes.
- BeatListElement
Attributes describe a candidate rhythmic "beat" entity in the context of an associated ToneTimeFrame in terms of start time, duration and amplitude.
- ToneSynthesiser
A component provides a synthesise function that generates useful musical information from the ToneMap data model, including note track generation using NoteTracker, and quantisation and aggregation functions.
- NoteTracker
A component provides a tracker functions that generates NoteTracks to describe collections of pitched notes, chords and beats.
- MIDI Synthesis
The MidiSynthesizer component supports a Midi file writer and multi-track Midi player using the standard java sound API and whatever built in support is provided by the runtime operating system platform.
13 tracks of various types can be enabled and assigned to different MIDI GM instruments through the configuration and control scheme. These include 4 "voice" tracks (for pitched note tracks,) 2 "chord" tracks, 2 "pads", (each on Midi GM Channel 1-9,) and 4 "beats" tracks (on Midi GM Channel 10.)
- Audio Synthesis
The AudioSynthesizer aims to support an Audio file writer and player. This is currently in-operative and under development.
- Configuration Parameter Files
All aspects of the system can be controlled by a commonly accesible ParameterManager function.
This is initialised from configuration property files containing key/value text entries.
Keys are defined in the file InstrumentParameterNames
Properties files are included in project class path, e.g. in resource directories.
There are three levels of configuration:
- "instrument.properties" file in the "instrument-core" module - default common configuration.
- "instrument-client.properties" file, a single instance should exist in the deployed client implementation module - e.g. in the instrument-desktop" module.
- Various alternative - "instument-*****.properties". Override "styles" that can be dynamically loaded and generated from the UI, e.g. "instrument-folk.properties" - contains sub set of value tuned to a specific audio data source type or task.
Properties are validated by the ParameterValidator component and this validation is controlled by configuration in the file "parameter-validation.properties".
Please refer to the project Instrument Project README and code for more details on technical aspects of the implementation.
However, briefly, my aim here was to use pure, core Java 17 built as a Maven project.
I have also used the Quarkus framework for CDI, not really necessary, but just to get some experience with it and does work well with the AWS CLoud service version of the implementation too.
Desktop application installers with self contained JDK created using JPackage.
Java Swing was used for the Desktop client. Though it may be deemed old fashioned, I can code easily in it and I needed to turn something out quick and dirty. I realise alternative UI, e.g. JavaFX or some non-Java thing would be the modern way but this will suffice for now.
I have recently at work over several years used VSCode and IntelliJ. However my first love is Eclipse and given the option would prefer to contine working with that tool as I have done on this project. It keeps upto date and with some configuration more than suffices and I still prefer that solid UI.
- "Meinard Muller" for his great and unique text book, "Fundamentals of Music processing" (Amazon Books)
- The developers of the Java project, "Tarsos" which I have used as the core library for Java audio signal processing. (Tarsos on Github)