Welcome to DASH’s documentation!¶
This is a listing of DASH classes that may be relevant for further development and understanding of code. This is by no means exhaustive or complete, but should give a highlight of how the demonstrator works.
Speech enhancement modules¶
-
class
mono_model.
MonoModel
(path, scaling_factor=1, clip=0)¶ MonoModel wraps monophonic masking into a simple model usage within Runtime. This models loads its’ neural network from path and prepares it for fast evaluation. Model is assumed to accept plain absolute values of spectrum and return a soft mask for that spectrum. Masks are scaled and clipped before application to the data.
-
initialize
()¶ Prepare the model - load all required data.
-
process
(sample)¶ Accept a single multichannel frame. Discard all but one channel and perform masking on that channel.
-
-
class
mvdr_model.
Model
(n, frame_len, delay_and_sum, use_channels, model_name, choose=None)¶ This class wraps the process of beamforming with MVDR and subsystemms requried for MVDR. It loads a model from prespecified path, uses the outputs of the model to determine dominant source and uses the dominant sources in updating covariance matrices of both noise and speech. Covariance matrix of speech is used to estimate direction of incidence of sound from the main source.
-
gcc_phat
(sigl_fft, sigr_fft, max_delay, distance)¶ Method for computing angle for a pair of microphones, used to localize the source. Not used in the main pipeline.
-
initialize
()¶ Initialize the model - preload and perform some dry runs o reduce latency
-
process
(ffts)¶ Process the sample - accepts single time frame with multiple channels. Returns beamformed signal. Uses LSTM masking as a part of beamforming process.
-
-
class
post_filter.
DAEPostFilter
(fname='storage/dae-pf.h5', n_fft=1024)¶ Postfilter for signal based on DAE. The DAE accepts a context of time, therefore class inherits BufferMixin. Class contains methods to train the postfilter.
-
initialize
()¶ Call this before processing starts
-
process
(sample)¶ Accept single mono sample, push to the rolling buffer and then process the buffer with the model. Input and output of model is log-power. Phase is reapplied at the end of processing,
-
classmethod
train
(model_config, train_X, train_Y, valid_ratio=0.1, path_to_save='storage/dae-pf.h5', n_fft=512)¶ This should create a model from some training script… train_X should be padded by 16 from the beginning of the recording… n_fft - determines the size of the network
-
Preprocessing modules¶
-
class
audio.
Audio
(buffer_size=1024, buffer_hop=128, sample_rate=16000, n_in_channels=6, n_out_channels=1, input_device_id=None, output_device_id=None, input_from_file=None, play_output=True, save_input=False, save_output=False, record_name=None)¶ Class to record and play data
Application can create an instance of this class and pass it to models. The application can use it’s methods (or methods as callbacks) to fetch some more data for next iteration of a model. Class should utilize sliding buffer, so user needs only to read that buffer, STFT it and the processing is set up.
- Args:
buffer_size (int, optional): number of samples in a single output frame buffer_hop (int, optional): number of samples that gets removed from a buffer on a single read sample_rate (int, optional): sample rate [Hz] of recording n_in_channels (int, optional): number of input channels n_out_channels (int, optional): number of output channels input_device_id (int, optional): Index of input Device to use output_device_id (int, optional): Index of input Device to use input_from_file (str, optional): Path to the file from which read input,
if not provided, than it will be get input from input audio deviceplay_output (bool, optional): Play output to the speakers save_input (bool, optional): Save recorded input also to the file
stored in ‘records/inputs/’, default set to False- save_output (bool, optional): Save played output also to the file
- stored in ‘records/outputs/’, default set to False
record_name (str, optional): Name of the output file
-
close
()¶ Close all threads
-
get_input
()¶ Get values from the buffer, encode it and return
- Retruns:
- np.array of the shape (buffer_size, n_in_channels)
-
open
()¶ Create and start threads
-
write_to_output
(arr)¶ Decode values and pass it to the buffer
- Args:
- arr (np.array of shape(buffer_hop, n_out_channels)): Frames to be played
-
class
audio.
PlayThread
(p, buffer, hop, sample_rate, channels, id=None, play=True, record_to_file=False, record_name=None)¶ Thread which pass data stored in the buffer to the speakers
- Args:
p (pyAudio): Python interface to PortAudio buffer (queue.Queue): Queue with byte string to be played hop (int): Number of samples to play in single read sample_rate (int): Sample rate [Hz] of playing channels (int): Number of channels to play id (int, optional): Index of output Device to use play (bool, optional): Play output to the speakers record_to_file (bool, optional): Save played output also to the file
stored in ‘records/outputs/’, default set to Falserecord_name (str, optional): Name of the saved file
-
run
()¶ Method representing the thread’s activity
Wait until buffer is full, than play frames from the buffer until thread is stopped.
-
stop
()¶ Stop thread, play what’s left in the buffer and close stream
-
class
audio.
ReadThread
(p, buffer, hop, sample_rate, channels, id=None, from_file=None, record_to_file=True)¶ Thread which read data from microphones and pass it to the buffer
- Args:
p (pyAudio): Python interface to PortAudio buffer (queue.Queue): Queue where write byte strings hop (int): Number of samples to record in single read sample_rate (int): Sample rate [Hz] of recording channels (int): Number of channels to record id (int, optional): Index of input Device to use from_file (str, optional): Path to the file from which read input, if not
provided, than it will be get input from input audio device- record_to_file (bool, optional): Save played output also to the file
- stored in ‘records/outputs/’, default set to False
-
run
()¶ Method representing the thread’s activity
Get data from microphones or from file and put it to the buffer
-
stop
()¶ Stop thread and close stream
-
class
runtime.
Runtime
¶ -
build
(audio_config, post_filter_config, model_config)¶ Builds and initializes all subcomponents, accepts dictionaries with parameters to the model classes as well as ‘mode’ parameter which chooses one of available classes. The models available are listed in MODEL_LIB and POSTFILTER_LIB in this module.
-
main
(audio_config=None, post_filter_config=None, model_config=None)¶ Main processing loop, all processing should be there, all configuration should be elsewhere, training should be done in other files.
-
pause
()¶ Stops processing for a while and awaits any other thread to restart it
-
rebuild
(config)¶ Pick available configuration and rebuild the pipeline with that configuration
-
-
utils.
BufferMixin
(buffer_size=[1, 257], dtype=<class 'numpy.float32'>)¶ Factory of classes that are rolling buffers of appropriate type
-
class
utils.
Remix
(buffer_size, buffer_hop, channels)¶ Reconstruction of signal by overlap and add
-
process
(sample)¶ Method to call the reconstruction.
-
-
class
utils.
AdaptiveGain
(level=0.005, update_win=0.975, max_gain=10)¶ Signal is enhanced in AdaptiveGain up to max_gain times to match the prespecified power level. The current power level is a windowed measurement to avoid suddent bursts of gain.