Blame - modules/audio_processing/test/conversational_speech/README.md - src/webrtc

blob: bbb4112fc01a5a41d6fd97082c93b0d1c806aac7 [file] [log] [blame] [view]

alessiob	ec99ebc	2017-03-18 09:29:13	[diff] [blame]	1	# Conversational Speech generator tool
alessiob	ce0290b	2017-03-03 15:31:10	[diff] [blame]	2
alessiob	ec99ebc	2017-03-18 09:29:13	[diff] [blame]	3	Tool to generate multiple-end audio tracks to simulate conversational speech
				4	with two or more participants.
alessiob	ce0290b	2017-03-03 15:31:10	[diff] [blame]	5
				6	The input to the tool is a directory containing a number of audio tracks and
				7	a text file indicating how to time the sequence of speech turns (see the Example
				8	section).
				9
				10	Since the timing of the speaking turns is specified by the user, the generated
				11	tracks may not be suitable for testing scenarios in which there is unpredictable
				12	network delay (e.g., end-to-end RTC assessment).
				13
				14	Instead, the generated pairs can be used when the delay is constant (obviously
				15	including the case in which there is no delay).
				16	For instance, echo cancellation in the APM module can be evaluated using two-end
				17	audio tracks as input and reverse input.
				18
				19	By indicating negative and positive time offsets, one can reproduce cross-talk
alessiob	82f71d6	2017-06-15 10:49:57	[diff] [blame]	20	(aka double-talk) and silence in the conversation.
alessiob	ce0290b	2017-03-03 15:31:10	[diff] [blame]	21
alessiob	ec99ebc	2017-03-18 09:29:13	[diff] [blame]	22	### Example
alessiob	ce0290b	2017-03-03 15:31:10	[diff] [blame]	23
				24	For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A)
				25	and b1, b2 (speaker B).
				26	The text file with the timing information may look like this:
alessiob	a541f84	2017-03-08 14:12:23	[diff] [blame]	27
				28	```
				29	A a1 0
				30	B b1 0
				31	A a2 100
				32	B b2 -200
				33	A a3 0
				34	A a4 0
				35	```
				36
alessiob	ce0290b	2017-03-03 15:31:10	[diff] [blame]	37	The first column indicates the speaker name, the second contains the audio track
				38	file names, and the third the offsets (in milliseconds) used to concatenate the
				39	chunks.
				40
				41	Assume that all the audio tracks in the example above are 1000 ms long.
				42	The tool will then generate two tracks (A and B) that look like this:
				43
alessiob	a541f84	2017-03-08 14:12:23	[diff] [blame]	44	Track A
				45	```
alessiob	ce0290b	2017-03-03 15:31:10	[diff] [blame]	46	a1 (1000 ms)
				47	silence (1100 ms)
				48	a2 (1000 ms)
				49	silence (800 ms)
				50	a3 (1000 ms)
alessiob	a541f84	2017-03-08 14:12:23	[diff] [blame]	51	a4 (1000 ms)
				52	```
alessiob	ce0290b	2017-03-03 15:31:10	[diff] [blame]	53
alessiob	a541f84	2017-03-08 14:12:23	[diff] [blame]	54	Track B
				55	```
alessiob	ce0290b	2017-03-03 15:31:10	[diff] [blame]	56	silence (1000 ms)
				57	b1 (1000 ms)
				58	silence (900 ms)
				59	b2 (1000 ms)
alessiob	a541f84	2017-03-08 14:12:23	[diff] [blame]	60	silence (2000 ms)
				61	```
alessiob	ce0290b	2017-03-03 15:31:10	[diff] [blame]	62
				63	The two tracks can be also visualized as follows (one characheter represents
				64	100 ms, "." is silence and "*" is speech).
				65
alessiob	a541f84	2017-03-08 14:12:23	[diff] [blame]	66	```
				67	t: 0 1 2 3 4 5 6 (s)
alessiob	ce0290b	2017-03-03 15:31:10	[diff] [blame]	68	A: ********...........******........******************
				69	B: ..........********.........********....................
				70	^ 200 ms cross-talk
alessiob	a541f84	2017-03-08 14:12:23	[diff] [blame]	71	100 ms silence ^
				72	```