Feat/hzjh-AudioOps#488
Open
starlight6336 wants to merge 2 commits into
Open
Conversation
added 2 commits
May 18, 2026 17:10
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
音频算子测试
audio_anomaly_filteraudio_asr_transcribeaudio_dc_offset_removalaudio_fast_lang_idaudio_fast_lang_id_textaudio_format_convertaudio_gtcrn_denoiseaudio_hum_notchaudio_noise_gateaudio_text_summarize测试素材可从以下目录选择:
通用准备
humanSpeech\zh\aishell_0000.wavhumanSpeech\zh\aishell_0001.wavhumanSpeech\zh\aishell_0002.wavhumanSpeech\en\librispeech_0000.wavhumanSpeech\en\librispeech_0001.wavhumanSpeech\en\librispeech_0002.wavaudio\summary\84-121123-0000.flacaudio\summary\84-121123-0001.flacaudio\summary\BAC009S0002W0122.wavaudio\summary\BAC009S0002W0123.wavext_params是否符合算子功能。1. audio_anomaly_filter
用途:检测音频是否异常,并把检测结果写入
ext_params.audio_quality。该算子输出仍应保留音频,不能只输出标签。推荐素材:
测试步骤:
audio_anomaly_filter。minDur = 1.0maxDur = 20000.0silenceRatioTh = 0.8skipInvalidDownstream = trueext_params.audio_quality.quality_flag应为ok或invalid。ext_params.audio_quality.duration、silence_ratio、global_rms应存在。minDur设置为一个明显大于测试音频时长的值,例如9999。invalid。__quality_invalid_...标记。skipInvalidDownstream决定是否跳过。通过标准:
audio_quality质量信息完整。2. audio_asr_transcribe
用途:输入音频,调用 ASR 模型,输出转写文本。
推荐素材:
前置条件:
train.yaml、final.pt、units.txt。测试步骤:
language = zhdevice根据环境选择,优先使用实际可用设备;无 NPU 时使用cpu。audio_asr_transcribe。ext_params.audio_asr_transcribe.language应为zh。transcript_source应存在,通常为asr。language = enlibrispeech_*.wav。audio_fast_lang_id。audio_asr_transcribe输入。audio_asr_transcribe参数设置为language = auto。通过标准:
language = auto时能读取上游 LID 结果或文件名标记。3. audio_dc_offset_removal
用途:去除音频直流偏置,输出处理后的 WAV 音频。
推荐素材:
测试步骤:
audio_dc_offset_removal,无需配置参数。wav。audio_asr_transcribe。通过标准:
4. audio_fast_lang_id
用途:识别语音音频语言为
zh或en,结果写入ext_params.audio_lid.lang,同时保留原音频给下游继续使用。推荐素材:
前置条件:
测试步骤:
audio_fast_lang_id。device = cpumaxSeconds = 3.0zh或en文本。ext_params.audio_lid.lang应为zh。ext_params.audio_lid.lang应为en。__lid_zh或__lid_en的标记。audio_asr_transcribe输入。audio_asr_transcribe.language设置为auto。通过标准:
5. audio_fast_lang_id_text
用途:识别语音语言,并直接输出一个文本标签文件。该算子是终端标注算子,会用
zh或en文本替换音频。推荐素材:
前置条件:
测试步骤:
audio_fast_lang_id_text。device = cpumaxSeconds = 3.0zh。en。通过标准:
6. audio_format_convert
用途:转换音频格式、采样率和声道数,输出处理后的音频。
推荐素材:
测试步骤:
audio_format_convert。targetFormat = wavsampleRate = 16000channels = 1ext_params.audio_format_convert.format应为wav。ext_params.audio_format_convert.sample_rate应为16000。ext_params.audio_format_convert.channels应为1。targetFormat改为flac或ogg。通过标准:
7. audio_gtcrn_denoise
用途:调用 GTCRN ONNX 模型对音频降噪,输出 WAV 音频。
推荐素材:
前置条件:
测试步骤:
audio_gtcrn_denoise。modelPath使用默认值,或填写实际模型绝对路径。audio_asr_transcribe。通过标准:
8. audio_hum_notch
用途:对 50Hz 或 60Hz 工频噪声做陷波抑制,输出 WAV 音频。
推荐素材:
前置条件:
soundfile、numpy、scipy。测试步骤:
audio_hum_notch。freqHz = 50q = 30freqHz改为60。通过标准:
9. audio_noise_gate
用途:对低于阈值的低能量音频帧做衰减,输出 WAV 音频。
推荐素材:
前置条件:
soundfile、numpy。测试步骤:
audio_noise_gate。thresholdDb = -45frameMs = 20hopMs = 10floorRatio = 0.05thresholdDb设置为-20。floorRatio设置为0。通过标准:
10. audio_text_summarize
用途:输入 ASR 文本,输出摘要文本。该算子输入是文本,不是音频。
推荐素材和前置流程:
建议先用以下音频跑出 ASR 文本,再把 ASR 输出数据集作为本算子的输入:
测试步骤:
audio_asr_transcribe,得到文本输出数据集。audio_text_summarize的输入。audio_text_summarize。method = extractivelineMode = singlemaxSummaryCharsZh = 40maxSummaryWordsEn = 18preserveKeys = truemaxSummaryCharsZh控制。maxSummaryWordsEn控制。ext_params.audio_text_summarize.method应为extractive。通过标准: