SpecAnaSyn - manual

Once the graphical application is launched (by writing ./specanasyn_gui in a terminal), we see the main window. It has a menu, and space for the tabs. One tab (that is always present) is for text about the operations done. Other tabs show waveforms, spectra, or sinusoids, and are present once the data is generated.

A use of the application could be to load an audio file, analyze it, optionally save analysis data to a text file, synthesize starting from the analysis, and save the results to another audio file. Another possibility could be to load a text file with analysis data, synthesize it, and save the results.

In any moment, if there are audio data loaded, we can play it, selecting the menu Play and then the right option. If we have data in the application, and want to clear and start again, we can select File -> Clear/Reset on the menu.

On the five parameter windows the application has: analysis parameters, synthesis parameters, playing parameters, equalizer, and compressor/expander, there is a common behaviour: Once selected desired parameters, if we press Ok, the window disappears and parameters are saved; if we want to discard the chages and close the window, we can press Cancel or close it with the 'x' icon; if we want, without closing the window, to load default values for all parameters in the window, we can press Default Values.

In order to do the analysis we must load an audio file. For this we select File -> Open on the menu. In the window that appears, we can select the desired file. The file format will be automatically detected. If we want to load a raw file (headerless), we can, checking Open file in RAW mode and selecting the sample format, sample rate, and number of channels. Once this is done, we can press the Open button and a progress bar will indicate the part of the file that is loaded. When finished, on the Text tab it will appear the file data, including its maximum amplitude, and as many other tabs as channels had the file, with the waveform representation of each channel, labeled with WAVE c0, WAVE c1, ... This representation has time in the horizontal axis, and amplitude in the vertical axis.

Now, if we want to analyze the audio data with non-default parameters, we select Analysis -> Parameters.... In the parameter window that will appear, we can specify the desired configuration.

In the Window Size box, we select the size of the STFT window. High sizes mean high frequency resolution and low time resolution. Low sizes mean low frequency resolution and high time resolution.

Below, in the Window Advance box, we select how many samples the window will advance each step, so for a window size of 2048 and an advance of 512, we would have an overlapping of 75%.

In the Window Type box, we select a window function for applying to the window, before the FFT.

Finally, in Minimum Amplitude Shown (dB), we can specify the amplitude that corresponds to white color. Bigger amplitudes will be represented with darker colors. Smaller amplitudes will be white as well.

In Discard Sinusoids Smaller than (dB), if the checkbox is selected, we can specify that sinusoids smaller than the value chosen will be discarded. If the checkbox is not selected, sinusoids will not be discarded by that cause.

In Discard lower frequencies (bins) we can specify how many low indexes (near the 0 frequency) will be discarded.

In Discard upper frequencies (bins) we can specify how many high indexes (near the Nyquist frequency) will be discarded.

In Keep only this Number of Sinusoids, if the checkbox is selected, only the specified number of most intense sinusoids will be kept, discarding the rest ones.

In Discard Transitions Greater than (bins) if the checkbox is selected, we can avoid that a sinusoid in a window will be linked (associated) to one in the next window if there is a frequency index difference bigger than specified.

Last, in Minimum Length of Sinusoids (windows), we can choose between allowing all sinusoids (except discarded elsewhere) and only allowing sinusoids that last more than the specified number of windows.

Now that we have the sound and the parameters, we can do the analysis.

For that, first we do the STFT analysis, selecting Analysis -> STFT on the menu. We will see a progress bar showing the part of the process that is done. When ended, we will have as many new tabs as channels the sound has, labeled with Spectrum c0, Spectrum c1, ... These tabs will show the time in the horizontal axis and the frequency in the vertical axis, and each point will have a color, as dark as the intensity of the sound in the corresponding frequency and time. On the Text tab will appear several data of the STFT analysis, like the number of frames processed, etc.

Then we do the sinusoidal analysis, obtaining the partials. For that, we select Analysis -> Sinusoidal. As in the previous case, we will see a progress bar showing the part of the process that is done. When ended, there will be as many new tabs as channels the sound has, labeled with Sinusoids c0, Sinusoids c1, ... These tabs have time in the horizontal axis and frequency in the vertical axis, and show a series of lines more or less horizontal, that match the partials detected and not discarded. On the Text tab will appear some data of this analysis phase, like the amplitude of the most intense partial, etc.

Once the analysis is done, we can save it to a file for later processing, or we can do the synthesis immediatly.

In order to save it, we select File -> Save Analysis on the menu. In the window appearing, we can write the file name, where we want to save the data. There will we a progress bar indicating the part of the process that is done, and when ended, on the Text tab, we will see the confirmation saying that data was saved.

Now we can do the synthesis. If we have just done the analysis, we can skip to the next step. Otherwise, we have to load a text file with analysis data. For this, we select File -> Open Analysis on the menu, and choose a file in the window. A progress bar will indicate the part that is loaded. In the Text tab there will be information about the loaded data, and in the Sinusoids c0... tabs (as many as channels) there will be the sinusoidal representation.

Now, if we want to synthesize the audio signal with non-default parameters, we select Synthesis -> Parameters... on the menu, and in the parameter window that will appear we can specify the desired configuration.

For specifying an equalization with which the sound will be generated, we click the Equalization button. A window will appear with an horizontal line that represents an identity response (all frequencies unmodified). The horizontal axis means frequency and the vertical axis means amplitude, in dB. There is a logarithmic scale in both axes. In the Predefined Settings combobox we can select a predefined equalization, that will be immediately visible in the upper box.

If we want to specify an arbitrary equalization, we can do it with the mouse, creating points. Lines will be drawn that will interpolate the amplitude values between the points. For this, there are three operations: the first one it to create a point (clicking with the left mouse button on a blank zone, and, without releasing the button, dragging the point to the desired position); the second one is to move a point (clicking with the left mouse button over a point and dragging it to the desired position); the third one is to remove a point (clicking the center or right button over the point to delete). We can't change the point sequence order in the x axis.

Selecting the Change Sample Rate to (Hz) box, we can specify a sampling rate for the sinthesized signal. The default value (when the box is not checked) is the one of the original signal.

In Time Stretching we set a factor that will determine the duration of the synthesized sound. For instance, a value of 1.0 implies that it will last the same time. A value of 0.5 will make it last half of the time, and a value of 2.0 will make the synthesized signal last the double of the original signal.

The value in Frequency Scaling (semitones) points to a frequency scaling, in semitones. A positive or negative value makes the original frequencies be multiplied by a number that will shift them up or down, as chosen.

In Frequency Shifting (Hz), we can put a value, in Hertz, that will be added to the sound frequencies, shifting them the desired number of Hertz (positive or negative).

If Silence out-of-range frequencies is selected, and, by some shifting or scaling, frequencies appear below 0 Hz or above Nyquist frequency, they are silenced, in order to avoid aliasing.

The box Interpolate Frequency between frames being selected, makes that, while synthesizing, the frequency of partials that last more than a frame will continually vary, instead of changing only one time each frame.

The box Interpolate Amplitude between frames being selected, makes that, while synthesizing, the amplitude of partials that last more than a frame will continually vary, instead of changing only one time each frame. This avoids possible audible 'clicks' caused by abrupt amplitude changes.

Below, we have options to determine how will sinusoids terminate.

Checking the box Delay Sinusoid Until Crosses Over 0, we make the ending sinusoids last a bit more than the limit of a frame, until they cross amplitude zero. This way we avoid audible distortions, making them end near zero amplitude.

If Decay from last value to 0 is selected, when a sinudois ends, an exponential decay with the chosen multiplicative factor will be created, towards zero.

For doing a dynamic range compression or expansion, we press the Compression / Expansion button, and a parameter window will appear. In that window there is a box that graphically represents the dynamic range driver (compressor or expander) response, in function of input. The black point represents the threshold. In the Mode of operation section, we select Off (no action) if we don't want any processing, Compressor for acting as a compressor, and Expander for acting as a expander. Below we can specify the Attach time in seconds (how many seconds the compressor/expander will take to act once the threshold is reached). Then, we can choose the Release time in seconds (how many seconds will take to stop acting when the threshold is not reached anymore). Below we can select the amplitude Threshold, in dB (when will start to act the compressor/expander). Then, we can specify the Ratio of compression/expansion, from 0 (will act as a limiter) to 1 (no action). Finally, we can select Stereo link, to make the compression/expansion act at the same time and with the same gain in all channels; otherwise, each channel will be separately processed.

In this part, there are three options: No change (not recommended), that doesn't make any normalization (can appear overflow or very low amplitudes). Normalize if Greater than 1.0, that will reduce the maximum amplitude to 1.0 if it happens that is higher. Normalize Always will normalize in any case, so the maximum amplitude will be always 1.0.

Now that we have the data and the parameters, we can do the synthesis. For that, we select Synthesis -> Synthesize on the menu. We will see a progress bar indicating the part that is done, and at the end some tabs will appear (as many as channels has the sound), labeled Synthesis c0, etc., where we can see the resulting waveform. The horizontal axis is time, and vertical axis is amplitude. The Text tab will have information about the synthesized sound like its maximum amplitude, etc.

We can also save the obtained signal to a file. For that, we select File -> Save Synthesis on the menu. In the window we will choose a directory and file name, and also the file type (WAV, AIFF, AU, headerless RAW, FLAC, OGG, ...), and the sample format (signed 16 bit, floating point, ...). Note that the OGG format ignores the sample format, and not all file types support all sample formats. When we have selected the appropiate parameters, we press Save, and it will be saved (if possible) and the comfirmation (or error) will appear in the Text tab.

If we have the signal in memory, we can play it, to hear it. We can select on the menu: Play -> Play Original to play the original sound, or Play -> Play Synthesis to play the synthesized sound. A progress bar will indicate the part that has been played. We can stop the playing by selecting Play -> Stop.

To see the playing parameters, we select Play -> Parameters... on the menu. A window will appear, which has three options: No change, that will try to play the sound with all its channels. Convert to 1 channel, that will mix all sound channels resulting in only one channel. And Convert to 2 channels, that will always output 2 channels. These options are useful when we have a sound with a number of channels that isn't supported by the sound card. These settings don't permanently change the sound signal, they only affect the played sound.

A command line application was also created, with the same functionality as the graphical one, excepting that it doesn't allow a customized equalization (it allows chosing a predefined equalization setup).

The output obtained executing the application without arguments, or with --usage, is as follows:

Usage: specanasyn_cli [-12340?] [--raw-format=number] [--raw-channels=number]
        [--raw-rate=Hz] [-s|--window-size=samples] [-i|--window-incr=samples]
        [-t|--window-type=number] [-n|--do-not-discard-smaller]
        [-d|--discard-smaller=dB] [-j|--discard-lower=bins]
        [-v|--discard-upper=bins] [-c|--do-not-crop-maximums]
        [-k|--keep-maximums=number] [-o|--do-not-discard-transitions]
        [-a|--discard-transitions=bins] [-m|--minimum-length=frames]
        [--file-formats] [--window-types] [--analysis-default]
        [--output-type=number] [--output-format=number] [-e|--equalize=index]
        [-u|--output-rate=Hz] [-r|--time-stretch=factor]
        [-f|--frequency-scaling=semitones] [-q|--frequency-shifting=Hz]
        [-z|--do-not-silence-freq] [-p|--amplitude-scaling=times]
        [-l|--do-not-interpolate-freq] [-b|--do-not-interpolate-ampl]
        [-y|--delay-sinusoids] [-g|--do-not-decay] [-h|--decay-factor=factor]
        [-w|--do-compression] [-x|--do-expansion] [--attack=s] [--release=s]
        [--threshold=dB] [--ratio=ratio] [--stereo-link] [--file-types]
        [--equalize-predefined] [--synthesis-default] [-1|--output-analysis]
        [-2|--output-synthesis] [-3|--output-synthesis-from-analysis]
        [-4|--output-both] [-0|--examples] [-?|--help] [--usage]

The output obtained executing the application with the --help or -? argument, is as follows:

Usage: specanasyn_cli [OPTIONS]* <files>
      --raw-format=number                  analysis: raw format (see
                                           --file-formats)
      --raw-channels=number                analysis: raw format: channels
      --raw-rate=Hz                        analysis: raw format: sampling rate
  -s, --window-size=samples                analysis: window size
  -i, --window-incr=samples                analysis: window incr
  -t, --window-type=number                 analysis: window type (see
                                           --window-types)
  -n, --do-not-discard-smaller             analysis: do not discard smaller
                                           maximums
  -d, --discard-smaller=dB                 analysis: discard maximums smaller
                                           than
  -j, --discard-lower=bins                 analysis: discard lower frequencies
  -v, --discard-upper=bins                 analysis: discard upper frequencies
  -c, --do-not-crop-maximums               analysis: do not crop maximums
                                           exceeding given number
  -k, --keep-maximums=number               analysis: keep only this number of
                                           maximums
  -o, --do-not-discard-transitions         analysis: do not discard bigger
                                           transitions
  -a, --discard-transitions=bins           analysis: discard transitions
                                           greater than given number of bins
  -m, --minimum-length=frames              analysis: minimum length of
                                           sinusiods (only 1 or 2 frames)
      --file-formats                       print table of raw formats
                                           available (sample types)
      --window-types                       analysis: print table of window
                                           types
      --analysis-default                   analysis: print table of default
                                           values
      --output-type=number                 synthesis: choose type of output
                                           file (see --file-types)
      --output-format=number               synthesis: choose format of output
                                           file (see --file-formats)
  -e, --equalize=index                     synthesis: predefined equalization
                                           (see --equalize-predefined)
  -u, --output-rate=Hz                     synthesis: change output sample rate
  -r, --time-stretch=factor                synthesis: time stretching
  -f, --frequency-scaling=semitones        synthesis: frequency scaling
  -q, --frequency-shifting=Hz              synthesis: frequency shifting
  -z, --do-not-silence-freq                synthesis: do not silence
                                           out-of-range frequencies
  -p, --amplitude-scaling=times            synthesis: amplitude scaling
  -l, --do-not-interpolate-freq            synthesis: do not interpolate
                                           frequency between frames
  -b, --do-not-interpolate-ampl            synthesis: do not interpolate
                                           amplitude between frames
  -y, --delay-sinusoids                    synthesis: delay sinusoid until
                                           crosses over 0.0
  -g, --do-not-decay                       synthesis: do not decay from last
                                           value to 0.0
  -h, --decay-factor=factor                synthesis: decay from last value to
                                           0.0 with given factor [0..1]
  -w, --do-compression                     synthesis: do audio level
                                           compression
  -x, --do-expansion                       synthesis: do audio level expansion
      --attack=s                           synthesis: compressor/expansor
                                           attack time
      --release=s                          synthesis: compressor/expansor
                                           release time
      --threshold=dB                       synthesis: compressor/expansor
                                           amplitude threshold
      --ratio=ratio                        synthesis: compressor/expansor
                                           ratio [0..1]
      --stereo-link                        synthesis: compressor/expansor:
                                           apply stereo link
      --file-types                         print table of file types available
                                           (WAV, AU, ...)
      --equalize-predefined                synthesis: print table of
                                           predefined equalization types
      --synthesis-default                  synthesis: print table of default
                                           values
  -1, --output-analysis                    output analysis: <file.wave>
                                           <file.txt>
  -2, --output-synthesis                   output synthesis: <file.wave>
                                           <file_synthesized.wave>
  -3, --output-synthesis-from-analysis     output synthesis: <file.txt>
                                           <file_synthesized.wave>
  -4, --output-both                        output analysis and syntehsis:
                                           <file.wave> <file.txt>
                                           <file_synthesized.wave>
  -0, --examples                           print some examples of command line
                                           arguments

Help options:
  -?, --help                               Show this help message
      --usage                              Display brief usage message

The output obtained executing the application with the -0 argument, is as follows:

$ ./specanasyn_cli -1 sound.wav sound2.txt
	analyzes sound.wav and puts analysis results on sound2.txt (with default analysis options)
$ ./specanasyn_cli -2 sound.wav sound2.wav
	analyzes sound.wav, synthesizes it, and puts synthesis results on sound2.wav (with default analysis and synthesis options)
$ ./specanasyn_cli -3 sound2.txt sound2.wav
	synthesizes sound2.txt and puts synthesis results on sound2.wav (with default synthesis options)
$ ./specanasyn_cli -4 sound.wav sound2.txt sound2.wav
	analyzes and synthesizes sound.wav, put analysis results on sound2.txt and synthesis results on sound2.wav (with default analysis and synthesis options)
$ ./specanasyn_cli -s 1024 -i 256 -1 file.wav file2.txt
	analyzes with window-size=1024 and window-advance=256 (75% overlapping)
$ ./specanasyn_cli -r 2.0 -u 44100 -3 file2.txt file2.wav
	synthesizes with time-stretch=2.0 (2 times larger) and changes output sample rate to 44100
Chosen files:

The output obtained executing the application with the --file-types argument, is as follows:

0: WAV - Microsoft WAV (little endian)
1: AIFF - Apple/SGI AIFF (big endian)
2: AU - Sun/NeXT AU (big endian)
3: RAW - RAW PCM data
4: PAF - Ensoniq PARIS
5: SVX - Amiga IFF / SVX8 / SV16
6: NIST - Sphere NIST
7: VOC - VOC
8: IRCAM - Berkeley/IRCAM/CARL
9: W64 - Sonic Foundry's 64 bit RIFF/WAV
10: MAT4 - Matlab (tm) V4.2 / GNU Octave 2.0
11: MAT5 - Matlab (tm) V5.0 / GNU Octave 2.1
12: PVF - Portable Voice Format
13: XI - Fasttracker 2 Extended Instrument
14: HTK - HMM Tool Kit format
15: SDS - Midi Sample Dump Standard
16: AVR - Audio Visual Research
17: WAVEX - MS WAVE with WAVEFORMATEX
18: SD2 - Sound Designer 2
19: FLAC - Free Lossless Audio Codec
20: CAF - Core Audio File
21: WVE - Psion WVE
22: OGG - OGG VORBIS
23: MPC2K - Akai MPC 2000 sampler
24: RF64 - RF64 WAV
Chosen files:

The output obtained executing the application with the --file-formats argument, is as follows:

0: PCM_S8 - Signed 8 bit
1: PCM_16 - Signed 16 bit
2: PCM_24 - Signed 24 bit
3: PCM_32 - Signed 32 bit
4: PCM_U8 - Unsigned 8 bit
5: FLOAT - 32 bit floating point
6: DOUBLE - 64 bit floating point
Chosen files:

The output obtained executing the application with the --window-types argument, is as follows:

0: rectangular
1: Hamming
2: Hanning
3: triangular
4: Blackman
Chosen files:

SpecAnaSyn - Spectral Analysis and Synthesis

Graphical application manual

Introduction

Parameter windows

Analysis: load audio file

Analysis: select analysis parameters

Analysis: STFT parameters

Analysis: sinusoidal analysis parameters

Analysis: perform the analysis

Analysis: save the analysis data

Synthesis: obtain analysis data

Synthesis: select synthesis parameters

Synthesis: equalization parameters

Synthesis: other synthesis parameters

Synthesis: compressor/expander parameters

Synthesis: amplitude normalization parameters

Synthesis: perform the synthesis

Synthesis: save the synthesis data

Playing: play the signals

Playing: playing parameters

Command line application manual

Introduction

Integrated help

Usage

Help

Examples

File types

File formats

Window types