Welcome to DASH’s documentation!

This is a listing of DASH classes that may be relevant for further development and understanding of code. This is by no means exhaustive or complete, but should give a highlight of how the demonstrator works.

Speech enhancement modules

class mono_model.MonoModel(path, scaling_factor=1, clip=0)

MonoModel wraps monophonic masking into a simple model usage within Runtime. This models loads its’ neural network from path and prepares it for fast evaluation. Model is assumed to accept plain absolute values of spectrum and return a soft mask for that spectrum. Masks are scaled and clipped before application to the data.

initialize()

Prepare the model - load all required data.

process(sample)

Accept a single multichannel frame. Discard all but one channel and perform masking on that channel.

class mvdr_model.Model(n, frame_len, delay_and_sum, use_channels, model_name, choose=None)

This class wraps the process of beamforming with MVDR and subsystemms requried for MVDR. It loads a model from prespecified path, uses the outputs of the model to determine dominant source and uses the dominant sources in updating covariance matrices of both noise and speech. Covariance matrix of speech is used to estimate direction of incidence of sound from the main source.

gcc_phat(sigl_fft, sigr_fft, max_delay, distance)

Method for computing angle for a pair of microphones, used to localize the source. Not used in the main pipeline.

initialize()

Initialize the model - preload and perform some dry runs o reduce latency

process(ffts)

Process the sample - accepts single time frame with multiple channels. Returns beamformed signal. Uses LSTM masking as a part of beamforming process.

class post_filter.DAEPostFilter(fname='storage/dae-pf.h5', n_fft=1024)

Postfilter for signal based on DAE. The DAE accepts a context of time, therefore class inherits BufferMixin. Class contains methods to train the postfilter.

initialize()

Call this before processing starts

process(sample)

Accept single mono sample, push to the rolling buffer and then process the buffer with the model. Input and output of model is log-power. Phase is reapplied at the end of processing,

classmethod train(model_config, train_X, train_Y, valid_ratio=0.1, path_to_save='storage/dae-pf.h5', n_fft=512)

This should create a model from some training script… train_X should be padded by 16 from the beginning of the recording… n_fft - determines the size of the network

Preprocessing modules

class audio.Audio(buffer_size=1024, buffer_hop=128, sample_rate=16000, n_in_channels=6, n_out_channels=1, input_device_id=None, output_device_id=None, input_from_file=None, play_output=True, save_input=False, save_output=False, record_name=None)

Class to record and play data

Application can create an instance of this class and pass it to models. The application can use it’s methods (or methods as callbacks) to fetch some more data for next iteration of a model. Class should utilize sliding buffer, so user needs only to read that buffer, STFT it and the processing is set up.

Args:

buffer_size (int, optional): number of samples in a single output frame buffer_hop (int, optional): number of samples that gets removed from a buffer on a single read sample_rate (int, optional): sample rate [Hz] of recording n_in_channels (int, optional): number of input channels n_out_channels (int, optional): number of output channels input_device_id (int, optional): Index of input Device to use output_device_id (int, optional): Index of input Device to use input_from_file (str, optional): Path to the file from which read input,

if not provided, than it will be get input from input audio device

play_output (bool, optional): Play output to the speakers save_input (bool, optional): Save recorded input also to the file

stored in ‘records/inputs/’, default set to False
save_output (bool, optional): Save played output also to the file
stored in ‘records/outputs/’, default set to False

record_name (str, optional): Name of the output file

close()

Close all threads

get_input()

Get values from the buffer, encode it and return

Retruns:
np.array of the shape (buffer_size, n_in_channels)
open()

Create and start threads

write_to_output(arr)

Decode values and pass it to the buffer

Args:
arr (np.array of shape(buffer_hop, n_out_channels)): Frames to be played
class audio.PlayThread(p, buffer, hop, sample_rate, channels, id=None, play=True, record_to_file=False, record_name=None)

Thread which pass data stored in the buffer to the speakers

Args:

p (pyAudio): Python interface to PortAudio buffer (queue.Queue): Queue with byte string to be played hop (int): Number of samples to play in single read sample_rate (int): Sample rate [Hz] of playing channels (int): Number of channels to play id (int, optional): Index of output Device to use play (bool, optional): Play output to the speakers record_to_file (bool, optional): Save played output also to the file

stored in ‘records/outputs/’, default set to False

record_name (str, optional): Name of the saved file

run()

Method representing the thread’s activity

Wait until buffer is full, than play frames from the buffer until thread is stopped.

stop()

Stop thread, play what’s left in the buffer and close stream

class audio.ReadThread(p, buffer, hop, sample_rate, channels, id=None, from_file=None, record_to_file=True)

Thread which read data from microphones and pass it to the buffer

Args:

p (pyAudio): Python interface to PortAudio buffer (queue.Queue): Queue where write byte strings hop (int): Number of samples to record in single read sample_rate (int): Sample rate [Hz] of recording channels (int): Number of channels to record id (int, optional): Index of input Device to use from_file (str, optional): Path to the file from which read input, if not

provided, than it will be get input from input audio device
record_to_file (bool, optional): Save played output also to the file
stored in ‘records/outputs/’, default set to False
run()

Method representing the thread’s activity

Get data from microphones or from file and put it to the buffer

stop()

Stop thread and close stream

class runtime.Runtime
build(audio_config, post_filter_config, model_config)

Builds and initializes all subcomponents, accepts dictionaries with parameters to the model classes as well as ‘mode’ parameter which chooses one of available classes. The models available are listed in MODEL_LIB and POSTFILTER_LIB in this module.

main(audio_config=None, post_filter_config=None, model_config=None)

Main processing loop, all processing should be there, all configuration should be elsewhere, training should be done in other files.

pause()

Stops processing for a while and awaits any other thread to restart it

rebuild(config)

Pick available configuration and rebuild the pipeline with that configuration

utils.BufferMixin(buffer_size=[1, 257], dtype=<class 'numpy.float32'>)

Factory of classes that are rolling buffers of appropriate type

class utils.Remix(buffer_size, buffer_hop, channels)

Reconstruction of signal by overlap and add

process(sample)

Method to call the reconstruction.

class utils.AdaptiveGain(level=0.005, update_win=0.975, max_gain=10)

Signal is enhanced in AdaptiveGain up to max_gain times to match the prespecified power level. The current power level is a windowed measurement to avoid suddent bursts of gain.