TY - GEN
T1 - Mapping Timbre Space in Regional Music Collections using Harmonic-Percussive Source Separation (HPSS) Decomposition
AU - Guedes, Carlos
AU - Ganguli, Kaustuv
AU - Plachouras, Christos
AU - Senturk, Sertan
AU - Eisenberg, Andrew Jarad
PY - 2020/9/4
Y1 - 2020/9/4
N2 - Timbre — tonal qualities that define a particular sound/source — can refer to an instrument class (violin, piano) or quality (bright, rough), often defined comparatively as an attribute that allows us to differentiate sounds of the same pitch, loudness, duration, and spatial location (Grey, 1975). Characterizing musical timbre is essential for tasks such as automatic database indexing, measuring similarities, and for automatic sound recognition (Fourer et al., 2014). Peeters et al. (2011) proposed a large set of audio features descriptors for quantifying timbre, which can be categorized into four broad classes, namely temporal, harmonic, spectral, and perceptual. The paradigms of auditory modeling (Cosi et al., 1994) and acoustic scene analysis (Abeßer et al., 2017; Huzaifah, 2017) also have extensively used timbral features for the classification task. Timbre spaces, in the typical connotation (Bello, 2010), empirically measure the perceived (dis)similarity between sounds and project to a low-dimensional space where dimensions are assigned a semantic interpretation (brightness, temporal variation, synchronicity, etc.). We recreate timbre spaces in the acoustic domain by extracting low-level features with similar interpretations (centroid, spectral flux, attack time, etc.) by employing audio analysis and machine learning.
Based on our previous work (Trochidis et al., 2019), in this paper, we decompose the traditional mel- frequency cepstral coefficients (MFCC) features into harmonic and percussive components, as well as introduce temporal context (De Leon & Martinez, 2012) in the analysis of the timbre spaces. We will discuss the advantages of obtaining the stationary and transient components over the original MFCC features in terms of clustering and visualizations. The rest of the paper is structured in terms of the proposed methodology, experimental results, and finally, the obtained insights.
AB - Timbre — tonal qualities that define a particular sound/source — can refer to an instrument class (violin, piano) or quality (bright, rough), often defined comparatively as an attribute that allows us to differentiate sounds of the same pitch, loudness, duration, and spatial location (Grey, 1975). Characterizing musical timbre is essential for tasks such as automatic database indexing, measuring similarities, and for automatic sound recognition (Fourer et al., 2014). Peeters et al. (2011) proposed a large set of audio features descriptors for quantifying timbre, which can be categorized into four broad classes, namely temporal, harmonic, spectral, and perceptual. The paradigms of auditory modeling (Cosi et al., 1994) and acoustic scene analysis (Abeßer et al., 2017; Huzaifah, 2017) also have extensively used timbral features for the classification task. Timbre spaces, in the typical connotation (Bello, 2010), empirically measure the perceived (dis)similarity between sounds and project to a low-dimensional space where dimensions are assigned a semantic interpretation (brightness, temporal variation, synchronicity, etc.). We recreate timbre spaces in the acoustic domain by extracting low-level features with similar interpretations (centroid, spectral flux, attack time, etc.) by employing audio analysis and machine learning.
Based on our previous work (Trochidis et al., 2019), in this paper, we decompose the traditional mel- frequency cepstral coefficients (MFCC) features into harmonic and percussive components, as well as introduce temporal context (De Leon & Martinez, 2012) in the analysis of the timbre spaces. We will discuss the advantages of obtaining the stationary and transient components over the original MFCC features in terms of clustering and visualizations. The rest of the paper is structured in terms of the proposed methodology, experimental results, and finally, the obtained insights.
M3 - Conference contribution
BT - Proceedings of the 2nd International Conference on TImbre
ER -