energy. energy本身是一个工程量, 有RMS energy(平均), band energy (不同频段分别), mel energy rhythm抽取. spectral flux频谱变化量是常见的识别onset方法, onset是构成rhythm的底层事件候选, 同一时间可能有多个onset, beat是人类识别的节拍网格 Timbre:harmonics 的强弱分布 + spectral envelope 造成的音色差异。 mel: Hz轴缩放变换, 接近人耳感知尺度 log spectrum 让“细碎 harmonic pattern”和“平滑 spectral envelope”更容易被拆开, 乘法结构识别. ![[Residual after subtracting smooth envelope fine harmonic pattern becomes isolated.png]] MFCC最后使用DCT在freq上进行傅立叶分解, 越低阶DCT coefficient就越是宏观结构, 越高阶越接近harmonic detail, 因此能抽取timbre. Chord是什么? 一个瞬间的纵向音集合 > 乱按几个音也是chord吗? 广义上可以算, 但常见和弦通常具备某种结构. ``` json // 常见和弦; array中1表示一个半音. [ {"name": "major", "array": [0, 4, 7], "commonness": 5}, {"name": "minor", "array": [0, 3, 7], "commonness": 5}, {"name": "sus4", "array": [0, 5, 7], "commonness": 5}, {"name": "sus2", "array": [0, 2, 7], "commonness": 4}, {"name": "dominant7", "array": [0, 4, 7, 10], "commonness": 5}, {"name": "major7", "array": [0, 4, 7, 11], "commonness": 5}, {"name": "minor7", "array": [0, 3, 7, 10], "commonness": 5}, {"name": "add9", "array": [0, 2, 4, 7], "commonness": 5}, {"name": "minor_add9", "array": [0, 2, 3, 7], "commonness": 4}, {"name": "power5", "array": [0, 7], "commonness": 5}, {"name": "major6", "array": [0, 4, 7, 9], "commonness": 4}, {"name": "minor6", "array": [0, 3, 7, 9], "commonness": 3}, {"name": "half_diminished", "array": [0, 3, 6, 10], "commonness": 4}, {"name": "diminished7", "array": [0, 3, 6, 9], "commonness": 3}, {"name": "dominant9", "array": [0, 2, 4, 7, 10], "commonness": 5}, {"name": "major9", "array": [0, 2, 4, 7, 11], "commonness": 5}, {"name": "minor9", "array": [0, 2, 3, 7, 10], "commonness": 5}, {"name": "nine_sus4", "array": [0, 2, 5, 7, 10], "commonness": 5}, {"name": "six_nine", "array": [0, 2, 4, 7, 9], "commonness": 4}, {"name": "major7_sharp11", "array": [0, 4, 6, 7, 11], "commonness": 4} ] ``` Melody是什么? 旋律是可以哼唱的单音旋律. Harmony 是什么? 和声指的是和弦之间的关系和走向. 比如说 C major->G major->Am->F chroma把所有频率折叠到12个pitch class, 因此可以识别重复的和声结构.