TensorFlow Speech Commands


TensorFlow Speech Command dataset is a set of one-second .wav audio files, each containing a single spoken English word. These words are from a small set of commands, and are spoken by a variety of different speakers. 20 of the words are core words, while 10 words are auxiliary words that could act as tests for algorithms in ignoring speeches that do not contain triggers. Included along with the 30 words is a collection of background noise audio files. The dataset was originally designed for limited vocabulary speech recognition tasks. The audio clips were originally collected by Google, and recorded by volunteers in uncontrolled locations around the world.

Dataset Metadata

Field Value
Format WAV
License CC BY 4.0
Domain Audio
Number of Records 65,000 WAV files
Data Split Train – 51,094 audio clips, Validation – 6,798 audio clips, Test – 6,835 audio clips
Size 1.49 GB
Dataset Origin The audio clips were originally collected by Google.

Recorded by volunteers in uncontrolled locations around the world.
Dataset Version Version 1 – March 17, 2020
Data Coverage Core words: Yes, No, Up, Down, Left, Right, On, Off, Stop, Go, Zero, One, Two, Three, Four, Five, Six, Seven, Eight, and Nine.

Auxiliary words: Bed, Bird, Cat, Dog, Happy, House, Marvin, Sheila, Tree, and Wow.

Background noise: doing_the_dishes, dude_miaowing, exercise_bike, pink_noise, running_tap, and white_noise.

To know more about the data collection process go through data archive’s README.md.
Business Use Case Build voice recognition systems that are widely used in the Internet of Things, Automotive, Security and UX/UI.

Build voice based search applications and voice-activated assistants.

Dataset Archive Contents

File or Folder Description
31 Audio clip folders Folders containing audio clips
testing_list.txt Path to all the files in the test set.
validation_list.txt Path to all the files in the validation set.
LICENSE.txt Terms of Use
README.md Explains data collection, processing details, and steps for splitting dataset

Data Glossary and Preview

Click here to explore the data glossary, sample records, and additional dataset metadata.

Use the Dataset

This dataset is complemented by starter notebooks that will help you get started:

Quick access in Python (requires the pardata pypi package):

$ pip install pardata

import pardata
data = pardata.load_dataset('tensorflow_speech_commands')


@article{speechcommands, title={Speech Commands: A public dataset for
single-word speech recognition.}, author={Warden, Pete}, journal={Dataset
available from
http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz}, year={2017}