The "MNIST" of Brain Digits

The version 1.03 of the open database contains 1,207,293 brain signals of 2 seconds each, captured with the stimulus of seeing  a digit (from 0 to 9) and thinking about it, over the course of almost 2 years between 2014 & 2015, from a single Test Subject David Vivancos. In 2018 we started sharing also a new open dataset "IMAGENET" of The Brain, and in 2021 we started The Visual "MNIST" of Brain Digits. with real individual MNIST digits shown.

All the signals have been captured using commercial EEGs (not medical grade), NeuroSky MindWave, Emotiv EPOC, Interaxon Muse & Emotiv Insight, covering a total of 19 Brain (10/20) locations.

Four files are available for download:

DataBase File Zip size File size Date Mirror
MindWave MindBigData-MW-v1.0.zip 62,6 MB (65,663,303 bytes) 297 MB (311,994,495 bytes) 09/11/2015
EPOC** MindBigData-EP-v1.0.zip 408 MB (427.958.689 bytes) 2,66 GB (2.859.712.035 bytes) 06/16/2018 US DataHub Mirror
Muse MindBigData-MU-v1.0.zip 62,6 MB (65,663,303 bytes) 297 MB (311,994,495 bytes) 09/11/2015
Insight* MindBigData-IN-v1.06.zip 25,3 MB (26,610,979 bytes) 184 MB (193,010,330 bytes) 12/10/2019

We built our own tools to capture them, but there is no post-processing on our side, so they come raw as they are read from each EEG device, in total 395,072,896 Data Points.

Feel free to test any machine learning, deep learning or whatever algorithm you think it could fit, we only ask for acknowledging the source and please let us know of your performance! 

We choose not to differentiate the signals into training/test/validation  sets at this point so pick the distribution you prefer.

A small portion of the signals were captured without the stimulus of seeing the digits for contrast, all are random actions not related to thinking or seeing digits, you can decide to use them or not in your tests, they use the code -1.


This is the distribution of the signals per device and digit:

Device/Digit 0 1 2 3 4 5 6 7 8 9 -1 Total
MindWave (MW) 5,531 5,498 5,517 5,416 5,381 5,568 5,476 5,552 5,545 5,450 12,701 67,635
EPOC (EP) 91,224 88,914 90,930 92,652 88,886 91,994 91,322 88,718 91,728 91,882 2,226 910,476
Muse (MU) 11,904 11,632 11,920 11,832 11,536 12,052 12,368 12,080 12,208 11,988 44,412 163,932
Insight (IN)* 6,305 6,740 6,535 6,605 6,620 6,460 6,425 6,470 6,590 6,500 0 65,250
Total 114,964 112,784 114,902 116,505 112,423 116,074 115,591 112,820 116,071 115,820 59,339 1,207,293

* Insight captures started in September 2015, dataset updated to fix the channel sepparation by comma and use dot for the decimals, instead of commas only , last update 10/12/2019 v1.06

** EPOC dataset updated to fix the channel sepparation by comma and use dot for the decimals, instead of commas only , last update 06/16/2018 v1.01



The data is stored in a very simple text format including:

[id]: a numeric, only for reference purposes.

[event] id, a integer, used to distinguish the same event captured at different brain locations, used only by multichannel devices (all except MW).

[device]: a 2 character string, to identify the device used to capture the signals, "MW" for MindWave, "EP" for Emotive Epoc, "MU" for Interaxon Muse & "IN" for Emotiv Insight.

[channel]: a string, to indentify the 10/20 brain location of the signal, with possible values:
MindWave "FP1"
EPOC "AF3, "F7", "F3", "FC5", "T7", "P7", "O1", "O2", "P8", "T8", "FC6", "F4", "F8", "AF4"
Muse "TP9,"FP1","FP2", "TP10"
Insight "AF3,"AF4","T7","T8","PZ" 

[code]: a integer, to indentify the digit been thought/seen, with possible values 0,1,2,3,4,5,6,7,8,9 or -1 for random captured signals not related to any of the digits.

[size]: a integer, to identify the size in number of values captured in the 2 seconds of this signal, since the Hz of each device varies, in "theory" the value is close to 512Hz for MW, 128Hz for EP, 220Hz for MU & 128Hz for IN, for each of the 2 seconds.

[data]: a coma separated set of numbers, with the time-series amplitude of the signal, each device uses a different precision to identify the electrical potential captured from the brain: integers in the case of MW & MU or real numbers in the case of EP & IN.

There is no headers in the files,  every line is  a signal, and the fields are separated by a tab

For example one line of each device could be (without the headers)

[id] [event] [device] [channel] [code] [size] [data]
27 27 MW FP1 5 952 18,12,13,12,5,3,11,23,37,36,26,24,35,42……
67650 67636 EP F7 7 260 4482.564102,4477.435897,4484.102564…….
978210 132693 MU TP10 1 476 506,508,509,501,497,494,497,490,490,493……
1142043 173652 IN AF3 0 256 4259.487179,4237.948717,4247.179487,4242.051282……


Each EEG device capture the signals via different sensors, located in these areas of my brain, the color represents the device:    MindWave, EPOC, Muse, Insight

David Vivancos Brain 10/20 Locations


- Giving sense to EEG records ( course IFT6390 "machine learning" by Pascal Vincent from MILA) by Amin Shahab, Marc Sayn-Urpar, René Doumbouya, Thomas George & Vincent Antaki.

- Contribution aux décompositions rapides des matrices et tenseurs , Viet-Dung NGUYEN THÈSE UNIVERSITÉ D’ORLÉANS  Nov-16th-2016

- Fast learning of scale-free networks based on Cholesky factorization, Vladislav Jelisavčić, Ivan Stojkovic, Veljko Milutinovic, Zoran Obradovic   May-2018


- Combination of Wavelet and MLP Neural Network for Emotion Recognition System, Phuong Huy Nguyen,Thai Nguyen University of Technology (TNUT) & Thu May Duong ,Thi Mai Thuong Duong ,Thu Huong Nguyen University of Information and Communication Technology Vietnam   Nov-2018

- A Deep Evolutionary Approach to Bioinspired Classifier Optimisation for Brain-Machine Interaction, Jordan J. Bird , Diego R. Faria, Luis J. Manso, Anikó Ekárt, and Christopher D. Buckingham, School of Engineering and Applied Science, Aston University, Birmingham, UK   Mar-2019

- Novel joint algorithm based on EEG in complex scenarios, Dongwei Chen, Weiqi Yang, Rui Miao, Lan Huang, Liu Zhang, Chunjian Deng & Na Han School of Business, Beijing Institute of Technology, Zhuhai, China   Aug-2019

- HHHFL: Hierarchical Heterogeneous Horizontal Federated Learning for Electroencephalography,Dashan Gao,Ce Ju,Xiguang Wei, Yang Liu,Tianjian Chen and Qiang Yan, Hong Kong University of Science and Technology, 2AI Lab, WeBank Co. Ltd.   Sep-2019

- Universal EEG Encoder for Learning Diverse Intelligent Tasks,Baani Leen Kaur Jolly, Palash Aggrawal, Surabhi S Nath, Viresh Gupta, Manraj Singh Grover, Rajiv Ratn Shah, MIDAS Lab, IIIT-Delhi   Nov-2019

- Stanford CS230 - Group Project Final Report,Roman Pinchuk and Will Ross 2020

- Mental State Recognition and Recommendation of Aids to Stabilize the Mind Using Wearable EEG,M.W.A. Aruni Wijesuriya, University of Colombo School of Computing 2020

- Generating the image viewed from EEG signals,Gaffari ÇELİK, Muhammed Fatih 2020

- EEG-Based Emotion Classification for Alzheimer’s Disease Patients Using Conventional Machine Learning and Recurrent Neural Network Models,Mahima Chaudhary, Sumona Mukhopadhyay, Marin Litoiu, Lauren E Sergio, Meaghan S Adams  Aug-2020

- Understanding Brain Dynamics for Color Perception Using Wearable EEG Headband,Jungryul Seo, Teemu H. Laine, Gyuhwan Oh, Kyung-Ah Sohn  Dec-2020

- Toward lightweight fusion of AI logic and EEG sensors to enable ultra edge-based EEG analytics on IoT devices, Tazrin Tahrat, May-2021

- Deep Learning in EEG: Advance of the Last Ten-Year Critical Period,Shu Gong, Kaibo Xing, Andrzej Cichocki, Junhua Li   May-2021

- Convolutional Neural Network-Based Visually Evoked EEG Classification Model on MindBigData,Nandini Kumari, Shamama Anwar,Vandana Bhattacharjee   Jun-2021

Contact us if you need any more info.

Let's decode My Brain!
August 27th 2021
David Vivancos

This MindBigData The "MNIST" of Brain Digits is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/