Python 音频分析、特征提取的开源库 - V2EX

首页注册登录

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

这是一个创建于 660 天前的主题，其中的信息可能已经有所发展或是发生改变。

https://github.com/libAudioFlux/audioFlux

一个用于音频和音乐分析、特征提取的库，支持数十种时频分析变换方法，以及相应时域、频域数百种特征组合，可以提供给深度学习网络进行训练，用于研究音频领域的分类、分离、音乐信息检索(MIR)、ASR 等各种任务。

系统、多维度的提取特征和组合，可以灵活的用于各种任务研究分析。
性能高效，核心大部分 C 实现，基于不同平台 FFT 硬件加速，方便大规模数据特征提取。
支持移动端，满足移动端音频流实时计算。

快速上手

pip install audioflux

import numpy as np
import audioflux as af

import matplotlib.pyplot as plt
from audioflux.display import fill_spec

# Get a 220Hz's audio file path
sample_path = af.utils.sample_path('220')

# Read audio data and sample rate
audio_arr, sr = af.read(sample_path)

# Extract mel spectrogram
spec_arr, mel_fre_band_arr = af.mel_spectrogram(audio_arr, num=128, radix2_exp=12, samplate=sr)
spec_arr = np.abs(spec_arr)

# Extract mfcc
mfcc_arr, _ = af.mfcc(audio_arr, cc_num=13, mel_num=128, radix2_exp=12, samplate=sr)

# Display
audio_len = audio_arr.shape[0]
# calculate x/y-coords
x_coords = np.linspace(0, audio_len / sr, spec_arr.shape[1] + 1)
y_coords = np.insert(mel_fre_band_arr, 0, 0)
fig, ax = plt.subplots()
img = fill_spec(spec_arr, axes=ax,
                x_coords=x_coords, y_coords=y_coords,
                x_axis='time', y_axis='log',
                title='Mel Spectrogram')
fig.colorbar(img, ax=ax)

fig, ax = plt.subplots()
img = fill_spec(mfcc_arr, axes=ax,
                x_coords=x_coords, x_axis='time',
                title='MFCC')
fig.colorbar(img, ax=ax)

plt.show()

感兴趣的请给个 Star

https://github.com/libAudioFlux/audioFlux

更多实例

https://github.com/libAudioFlux/audioFlux#other-examples

8 条回复 • 2023-04-15 20:13:04 +08:00

1

CMLab

2023-03-09 15:41:29 +08:00

复制了一下代码，ipython 运行，确实出效果

2

smallsung

2023-03-09 16:47:55 +08:00

看起来是跨平台的好评

3

tigerstudent

2023-03-09 17:13:28 +08:00

想知道做 ASR 之前，怎么过滤周围的一些杂音，比如说周围人的小声说话声、广播声、喧哗声

4

CMLab

2023-03-09 17:19:03 +08:00

@tigerstudent 看具体音源和业务情况，这些噪声是低频信号还是高频信号或者有明显特征的频域分布，这样才能有针对性处理，针对 ASR 而言，最简单的方式就是过一下高通滤波器

5

CMLab

2023-03-09 17:24:08 +08:00

@tigerstudent 如果业务要求较高的效果，可以用深度学习方式，针对噪声相关业务数据标注后，走频谱的 mask 的训练

6

kevinlq

2023-03-09 22:01:18 +08:00

已 star, 正对对这方面感兴趣，深入学习下

7

D2h0VL89HMAU417B

2023-03-10 08:49:14 +08:00

学习

8

jememouse

2023-04-15 20:13:04 +08:00 via iPad

不错，最近正在找相关的资料

关于 · 帮助文档 · 博客 · API · FAQ · 实用小工具 · 925 人在线 最高记录 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 23ms · UTC 22:19 · PVG 06:19 · LAX 14:19 · JFK 17:19
Developed with CodeLauncher
♥ Do have faith in what you're doing.