Once the graphical application is launched (by writing ./specanasyn_gui
in a terminal), we see the main window. It has a menu, and space for the tabs. One tab (that is always present) is for text about the operations done. Other tabs show waveforms, spectra, or sinusoids, and are present once the data is generated.
A use of the application could be to load an audio file, analyze it, optionally save analysis data to a text file, synthesize starting from the analysis, and save the results to another audio file. Another possibility could be to load a text file with analysis data, synthesize it, and save the results.
In any moment, if there are audio data loaded, we can play it, selecting the menu Play and then the right option. If we have data in the application, and want to clear and start again, we can select File -> Clear/Reset on the menu.
On the five parameter windows the application has: analysis parameters, synthesis parameters, playing parameters, equalizer, and compressor/expander, there is a common behaviour: Once selected desired parameters, if we press Ok, the window disappears and parameters are saved; if we want to discard the chages and close the window, we can press Cancel or close it with the 'x' icon; if we want, without closing the window, to load default values for all parameters in the window, we can press Default Values.
In order to do the analysis we must load an audio file. For this we select File -> Open on the menu. In the window that appears, we can select the desired file. The file format will be automatically detected. If we want to load a raw file (headerless), we can, checking Open file in RAW mode and selecting the sample format, sample rate, and number of channels. Once this is done, we can press the Open button and a progress bar will indicate the part of the file that is loaded. When finished, on the Text tab it will appear the file data, including its maximum amplitude, and as many other tabs as channels had the file, with the waveform representation of each channel, labeled with WAVE c0, WAVE c1, ... This representation has time in the horizontal axis, and amplitude in the vertical axis.
Now, if we want to analyze the audio data with non-default parameters, we select Analysis -> Parameters.... In the parameter window that will appear, we can specify the desired configuration.
In the Window Size box, we select the size of the STFT window. High sizes mean high frequency resolution and low time resolution. Low sizes mean low frequency resolution and high time resolution.
Below, in the Window Advance box, we select how many samples the window will advance each step, so for a window size of 2048 and an advance of 512, we would have an overlapping of 75%.
In the Window Type box, we select a window function for applying to the window, before the FFT.
Finally, in Minimum Amplitude Shown (dB), we can specify the amplitude that corresponds to white color. Bigger amplitudes will be represented with darker colors. Smaller amplitudes will be white as well.
In Discard Sinusoids Smaller than (dB), if the checkbox is selected, we can specify that sinusoids smaller than the value chosen will be discarded. If the checkbox is not selected, sinusoids will not be discarded by that cause.
In Discard lower frequencies (bins) we can specify how many low indexes (near the 0 frequency) will be discarded.
In Discard upper frequencies (bins) we can specify how many high indexes (near the Nyquist frequency) will be discarded.
In Keep only this Number of Sinusoids, if the checkbox is selected, only the specified number of most intense sinusoids will be kept, discarding the rest ones.
In Discard Transitions Greater than (bins) if the checkbox is selected, we can avoid that a sinusoid in a window will be linked (associated) to one in the next window if there is a frequency index difference bigger than specified.
Last, in Minimum Length of Sinusoids (windows), we can choose between allowing all sinusoids (except discarded elsewhere) and only allowing sinusoids that last more than the specified number of windows.
Now that we have the sound and the parameters, we can do the analysis.
For that, first we do the STFT analysis, selecting Analysis -> STFT on the menu. We will see a progress bar showing the part of the process that is done. When ended, we will have as many new tabs as channels the sound has, labeled with Spectrum c0, Spectrum c1, ... These tabs will show the time in the horizontal axis and the frequency in the vertical axis, and each point will have a color, as dark as the intensity of the sound in the corresponding frequency and time. On the Text tab will appear several data of the STFT analysis, like the number of frames processed, etc.
Then we do the sinusoidal analysis, obtaining the partials. For that, we select Analysis -> Sinusoidal. As in the previous case, we will see a progress bar showing the part of the process that is done. When ended, there will be as many new tabs as channels the sound has, labeled with Sinusoids c0, Sinusoids c1, ... These tabs have time in the horizontal axis and frequency in the vertical axis, and show a series of lines more or less horizontal, that match the partials detected and not discarded. On the Text tab will appear some data of this analysis phase, like the amplitude of the most intense partial, etc.
Once the analysis is done, we can save it to a file for later processing, or we can do the synthesis immediatly.
In order to save it, we select File -> Save Analysis on the menu. In the window appearing, we can write the file name, where we want to save the data. There will we a progress bar indicating the part of the process that is done, and when ended, on the Text tab, we will see the confirmation saying that data was saved.
Now we can do the synthesis. If we have just done the analysis, we can skip to the next step. Otherwise, we have to load a text file with analysis data. For this, we select File -> Open Analysis on the menu, and choose a file in the window. A progress bar will indicate the part that is loaded. In the Text tab there will be information about the loaded data, and in the Sinusoids c0... tabs (as many as channels) there will be the sinusoidal representation.
Now, if we want to synthesize the audio signal with non-default parameters, we select Synthesis -> Parameters... on the menu, and in the parameter window that will appear we can specify the desired configuration.
For specifying an equalization with which the sound will be generated, we click the Equalization button. A window will appear with an horizontal line that represents an identity response (all frequencies unmodified). The horizontal axis means frequency and the vertical axis means amplitude, in dB. There is a logarithmic scale in both axes. In the Predefined Settings combobox we can select a predefined equalization, that will be immediately visible in the upper box.
If we want to specify an arbitrary equalization, we can do it with the mouse, creating points. Lines will be drawn that will interpolate the amplitude values between the points. For this, there are three operations: the first one it to create a point (clicking with the left mouse button on a blank zone, and, without releasing the button, dragging the point to the desired position); the second one is to move a point (clicking with the left mouse button over a point and dragging it to the desired position); the third one is to remove a point (clicking the center or right button over the point to delete). We can't change the point sequence order in the x axis.
Selecting the Change Sample Rate to (Hz) box, we can specify a sampling rate for the sinthesized signal. The default value (when the box is not checked) is the one of the original signal.
In Time Stretching we set a factor that will determine the duration of the synthesized sound. For instance, a value of 1.0 implies that it will last the same time. A value of 0.5 will make it last half of the time, and a value of 2.0 will make the synthesized signal last the double of the original signal.
The value in Frequency Scaling (semitones) points to a frequency scaling, in semitones. A positive or negative value makes the original frequencies be multiplied by a number that will shift them up or down, as chosen.
In Frequency Shifting (Hz), we can put a value, in Hertz, that will be added to the sound frequencies, shifting them the desired number of Hertz (positive or negative).
If Silence out-of-range frequencies is selected, and, by some shifting or scaling, frequencies appear below 0 Hz or above Nyquist frequency, they are silenced, in order to avoid aliasing.
The box Interpolate Frequency between frames being selected, makes that, while synthesizing, the frequency of partials that last more than a frame will continually vary, instead of changing only one time each frame.
The box Interpolate Amplitude between frames being selected, makes that, while synthesizing, the amplitude of partials that last more than a frame will continually vary, instead of changing only one time each frame. This avoids possible audible 'clicks' caused by abrupt amplitude changes.
Below, we have options to determine how will sinusoids terminate.
Checking the box Delay Sinusoid Until Crosses Over 0, we make the ending sinusoids last a bit more than the limit of a frame, until they cross amplitude zero. This way we avoid audible distortions, making them end near zero amplitude.
If Decay from last value to 0 is selected, when a sinudois ends, an exponential decay with the chosen multiplicative factor will be created, towards zero.
For doing a dynamic range compression or expansion, we press the Compression / Expansion button, and a parameter window will appear. In that window there is a box that graphically represents the dynamic range driver (compressor or expander) response, in function of input. The black point represents the threshold. In the Mode of operation section, we select Off (no action) if we don't want any processing, Compressor for acting as a compressor, and Expander for acting as a expander. Below we can specify the Attach time in seconds (how many seconds the compressor/expander will take to act once the threshold is reached). Then, we can choose the Release time in seconds (how many seconds will take to stop acting when the threshold is not reached anymore). Below we can select the amplitude Threshold, in dB (when will start to act the compressor/expander). Then, we can specify the Ratio of compression/expansion, from 0 (will act as a limiter) to 1 (no action). Finally, we can select Stereo link, to make the compression/expansion act at the same time and with the same gain in all channels; otherwise, each channel will be separately processed.
In this part, there are three options: No change (not recommended), that doesn't make any normalization (can appear overflow or very low amplitudes). Normalize if Greater than 1.0, that will reduce the maximum amplitude to 1.0 if it happens that is higher. Normalize Always will normalize in any case, so the maximum amplitude will be always 1.0.
Now that we have the data and the parameters, we can do the synthesis. For that, we select Synthesis -> Synthesize on the menu. We will see a progress bar indicating the part that is done, and at the end some tabs will appear (as many as channels has the sound), labeled Synthesis c0, etc., where we can see the resulting waveform. The horizontal axis is time, and vertical axis is amplitude. The Text tab will have information about the synthesized sound like its maximum amplitude, etc.
We can also save the obtained signal to a file. For that, we select File -> Save Synthesis on the menu. In the window we will choose a directory and file name, and also the file type (WAV, AIFF, AU, headerless RAW, FLAC, OGG, ...), and the sample format (signed 16 bit, floating point, ...). Note that the OGG format ignores the sample format, and not all file types support all sample formats. When we have selected the appropiate parameters, we press Save, and it will be saved (if possible) and the comfirmation (or error) will appear in the Text tab.
If we have the signal in memory, we can play it, to hear it. We can select on the menu: Play -> Play Original to play the original sound, or Play -> Play Synthesis to play the synthesized sound. A progress bar will indicate the part that has been played. We can stop the playing by selecting Play -> Stop.
To see the playing parameters, we select Play -> Parameters... on the menu. A window will appear, which has three options: No change, that will try to play the sound with all its channels. Convert to 1 channel, that will mix all sound channels resulting in only one channel. And Convert to 2 channels, that will always output 2 channels. These options are useful when we have a sound with a number of channels that isn't supported by the sound card. These settings don't permanently change the sound signal, they only affect the played sound.
A command line application was also created, with the same functionality as the graphical one, excepting that it doesn't allow a customized equalization (it allows chosing a predefined equalization setup).
The output obtained executing the application without arguments, or with --usage
, is as follows:
Usage: specanasyn_cli [-12340?] [--raw-format=number] [--raw-channels=number] [--raw-rate=Hz] [-s|--window-size=samples] [-i|--window-incr=samples] [-t|--window-type=number] [-n|--do-not-discard-smaller] [-d|--discard-smaller=dB] [-j|--discard-lower=bins] [-v|--discard-upper=bins] [-c|--do-not-crop-maximums] [-k|--keep-maximums=number] [-o|--do-not-discard-transitions] [-a|--discard-transitions=bins] [-m|--minimum-length=frames] [--file-formats] [--window-types] [--analysis-default] [--output-type=number] [--output-format=number] [-e|--equalize=index] [-u|--output-rate=Hz] [-r|--time-stretch=factor] [-f|--frequency-scaling=semitones] [-q|--frequency-shifting=Hz] [-z|--do-not-silence-freq] [-p|--amplitude-scaling=times] [-l|--do-not-interpolate-freq] [-b|--do-not-interpolate-ampl] [-y|--delay-sinusoids] [-g|--do-not-decay] [-h|--decay-factor=factor] [-w|--do-compression] [-x|--do-expansion] [--attack=s] [--release=s] [--threshold=dB] [--ratio=ratio] [--stereo-link] [--file-types] [--equalize-predefined] [--synthesis-default] [-1|--output-analysis] [-2|--output-synthesis] [-3|--output-synthesis-from-analysis] [-4|--output-both] [-0|--examples] [-?|--help] [--usage]
The output obtained executing the application with the --help
or -?
argument, is as follows:
Usage: specanasyn_cli [OPTIONS]* <files> --raw-format=number analysis: raw format (see --file-formats) --raw-channels=number analysis: raw format: channels --raw-rate=Hz analysis: raw format: sampling rate -s, --window-size=samples analysis: window size -i, --window-incr=samples analysis: window incr -t, --window-type=number analysis: window type (see --window-types) -n, --do-not-discard-smaller analysis: do not discard smaller maximums -d, --discard-smaller=dB analysis: discard maximums smaller than -j, --discard-lower=bins analysis: discard lower frequencies -v, --discard-upper=bins analysis: discard upper frequencies -c, --do-not-crop-maximums analysis: do not crop maximums exceeding given number -k, --keep-maximums=number analysis: keep only this number of maximums -o, --do-not-discard-transitions analysis: do not discard bigger transitions -a, --discard-transitions=bins analysis: discard transitions greater than given number of bins -m, --minimum-length=frames analysis: minimum length of sinusiods (only 1 or 2 frames) --file-formats print table of raw formats available (sample types) --window-types analysis: print table of window types --analysis-default analysis: print table of default values --output-type=number synthesis: choose type of output file (see --file-types) --output-format=number synthesis: choose format of output file (see --file-formats) -e, --equalize=index synthesis: predefined equalization (see --equalize-predefined) -u, --output-rate=Hz synthesis: change output sample rate -r, --time-stretch=factor synthesis: time stretching -f, --frequency-scaling=semitones synthesis: frequency scaling -q, --frequency-shifting=Hz synthesis: frequency shifting -z, --do-not-silence-freq synthesis: do not silence out-of-range frequencies -p, --amplitude-scaling=times synthesis: amplitude scaling -l, --do-not-interpolate-freq synthesis: do not interpolate frequency between frames -b, --do-not-interpolate-ampl synthesis: do not interpolate amplitude between frames -y, --delay-sinusoids synthesis: delay sinusoid until crosses over 0.0 -g, --do-not-decay synthesis: do not decay from last value to 0.0 -h, --decay-factor=factor synthesis: decay from last value to 0.0 with given factor [0..1] -w, --do-compression synthesis: do audio level compression -x, --do-expansion synthesis: do audio level expansion --attack=s synthesis: compressor/expansor attack time --release=s synthesis: compressor/expansor release time --threshold=dB synthesis: compressor/expansor amplitude threshold --ratio=ratio synthesis: compressor/expansor ratio [0..1] --stereo-link synthesis: compressor/expansor: apply stereo link --file-types print table of file types available (WAV, AU, ...) --equalize-predefined synthesis: print table of predefined equalization types --synthesis-default synthesis: print table of default values -1, --output-analysis output analysis: <file.wave> <file.txt> -2, --output-synthesis output synthesis: <file.wave> <file_synthesized.wave> -3, --output-synthesis-from-analysis output synthesis: <file.txt> <file_synthesized.wave> -4, --output-both output analysis and syntehsis: <file.wave> <file.txt> <file_synthesized.wave> -0, --examples print some examples of command line arguments Help options: -?, --help Show this help message --usage Display brief usage message
The output obtained executing the application with the -0
argument, is as follows:
$ ./specanasyn_cli -1 sound.wav sound2.txt analyzes sound.wav and puts analysis results on sound2.txt (with default analysis options) $ ./specanasyn_cli -2 sound.wav sound2.wav analyzes sound.wav, synthesizes it, and puts synthesis results on sound2.wav (with default analysis and synthesis options) $ ./specanasyn_cli -3 sound2.txt sound2.wav synthesizes sound2.txt and puts synthesis results on sound2.wav (with default synthesis options) $ ./specanasyn_cli -4 sound.wav sound2.txt sound2.wav analyzes and synthesizes sound.wav, put analysis results on sound2.txt and synthesis results on sound2.wav (with default analysis and synthesis options) $ ./specanasyn_cli -s 1024 -i 256 -1 file.wav file2.txt analyzes with window-size=1024 and window-advance=256 (75% overlapping) $ ./specanasyn_cli -r 2.0 -u 44100 -3 file2.txt file2.wav synthesizes with time-stretch=2.0 (2 times larger) and changes output sample rate to 44100 Chosen files:
The output obtained executing the application with the --file-types
argument, is as follows:
0: WAV - Microsoft WAV (little endian) 1: AIFF - Apple/SGI AIFF (big endian) 2: AU - Sun/NeXT AU (big endian) 3: RAW - RAW PCM data 4: PAF - Ensoniq PARIS 5: SVX - Amiga IFF / SVX8 / SV16 6: NIST - Sphere NIST 7: VOC - VOC 8: IRCAM - Berkeley/IRCAM/CARL 9: W64 - Sonic Foundry's 64 bit RIFF/WAV 10: MAT4 - Matlab (tm) V4.2 / GNU Octave 2.0 11: MAT5 - Matlab (tm) V5.0 / GNU Octave 2.1 12: PVF - Portable Voice Format 13: XI - Fasttracker 2 Extended Instrument 14: HTK - HMM Tool Kit format 15: SDS - Midi Sample Dump Standard 16: AVR - Audio Visual Research 17: WAVEX - MS WAVE with WAVEFORMATEX 18: SD2 - Sound Designer 2 19: FLAC - Free Lossless Audio Codec 20: CAF - Core Audio File 21: WVE - Psion WVE 22: OGG - OGG VORBIS 23: MPC2K - Akai MPC 2000 sampler 24: RF64 - RF64 WAV Chosen files:
The output obtained executing the application with the --file-formats
argument, is as follows:
0: PCM_S8 - Signed 8 bit 1: PCM_16 - Signed 16 bit 2: PCM_24 - Signed 24 bit 3: PCM_32 - Signed 32 bit 4: PCM_U8 - Unsigned 8 bit 5: FLOAT - 32 bit floating point 6: DOUBLE - 64 bit floating point Chosen files:
The output obtained executing the application with the --window-types
argument, is as follows:
0: rectangular 1: Hamming 2: Hanning 3: triangular 4: Blackman Chosen files: