energy. energy本身是一个工程量, 有RMS energy(平均), band energy (不同频段分别), mel energy
rhythm抽取. spectral flux频谱变化量是常见的识别onset方法, onset是构成rhythm的底层事件候选, 同一时间可能有多个onset, beat是人类识别的节拍网格
Timbre:harmonics 的强弱分布 + spectral envelope 造成的音色差异。
mel: Hz轴缩放变换, 接近人耳感知尺度
log spectrum 让“细碎 harmonic pattern”和“平滑 spectral envelope”更容易被拆开, 乘法结构识别.
![[Residual after subtracting smooth envelope fine harmonic pattern becomes isolated.png]]
MFCC最后使用DCT在freq上进行傅立叶分解, 越低阶DCT coefficient就越是宏观结构, 越高阶越接近harmonic detail, 因此能抽取timbre.
Chord是什么? 一个瞬间的纵向音集合
> 乱按几个音也是chord吗? 广义上可以算, 但常见和弦通常具备某种结构.
``` json
// 常见和弦; array中1表示一个半音.
[
{"name": "major", "array": [0, 4, 7], "commonness": 5},
{"name": "minor", "array": [0, 3, 7], "commonness": 5},
{"name": "sus4", "array": [0, 5, 7], "commonness": 5},
{"name": "sus2", "array": [0, 2, 7], "commonness": 4},
{"name": "dominant7", "array": [0, 4, 7, 10], "commonness": 5},
{"name": "major7", "array": [0, 4, 7, 11], "commonness": 5},
{"name": "minor7", "array": [0, 3, 7, 10], "commonness": 5},
{"name": "add9", "array": [0, 2, 4, 7], "commonness": 5},
{"name": "minor_add9", "array": [0, 2, 3, 7], "commonness": 4},
{"name": "power5", "array": [0, 7], "commonness": 5},
{"name": "major6", "array": [0, 4, 7, 9], "commonness": 4},
{"name": "minor6", "array": [0, 3, 7, 9], "commonness": 3},
{"name": "half_diminished", "array": [0, 3, 6, 10], "commonness": 4},
{"name": "diminished7", "array": [0, 3, 6, 9], "commonness": 3},
{"name": "dominant9", "array": [0, 2, 4, 7, 10], "commonness": 5},
{"name": "major9", "array": [0, 2, 4, 7, 11], "commonness": 5},
{"name": "minor9", "array": [0, 2, 3, 7, 10], "commonness": 5},
{"name": "nine_sus4", "array": [0, 2, 5, 7, 10], "commonness": 5},
{"name": "six_nine", "array": [0, 2, 4, 7, 9], "commonness": 4},
{"name": "major7_sharp11", "array": [0, 4, 6, 7, 11], "commonness": 4}
]
```
Melody是什么? 旋律是可以哼唱的单音旋律.
Harmony 是什么? 和声指的是和弦之间的关系和走向. 比如说 C major->G major->Am->F
chroma把所有频率折叠到12个pitch class, 因此可以识别重复的和声结构.