MindBigData
The "MNIST" of Brain Digits

The version 1.03 of the open database contains 1,207,293 brain signals of 2 seconds each, captured with the stimulus of seeing  a digit (from 0 to 9) and thinking about it, over the course of almost 2 years between 2014 & 2015, from a single Test Subject David Vivancos. In 2018 we started sharing also a new open dataset "IMAGENET" of The Brain, and in 2021 we started The Visual "MNIST" of Brain Digits. with real individual MNIST digits shown , and don't miss MindBigData2023 MNIST-8B the new 8 billion datapoints multimodal dataset

Update December 2023: Check the new Hugging Face Leaderboard of Models

Update January 2023: Read the Paper "MindBigData 2022 A Large Dataset of Brain Signals" and alternative prepared datasets downloads at Hughing Face

All the signals have been captured using commercial EEGs (not medical grade), NeuroSky MindWave, Emotiv EPOC, Interaxon Muse & Emotiv Insight, covering a total of 19 Brain (10/20) locations.

Four files are available for download:

DataBase File Zip size File size Date Mirror
MindWave MindBigData-MW-v1.0.zip 62,6 MB (65,663,303 bytes) 297 MB (311,994,495 bytes) 09/11/2015
EPOC** MindBigData-EP-v1.0.zip 408 MB (427.958.689 bytes) 2,66 GB (2.859.712.035 bytes) 06/16/2018 US DataHub Mirror
Muse MindBigData-MU-v1.0.zip 62,6 MB (65,663,303 bytes) 297 MB (311,994,495 bytes) 09/11/2015
Insight* MindBigData-IN-v1.06.zip 25,3 MB (26,610,979 bytes) 184 MB (193,010,330 bytes) 12/10/2019

We built our own tools to capture them, but there is no post-processing on our side, so they come raw as they are read from each EEG device, in total 395,072,896 Data Points.

Feel free to test any machine learning, deep learning or whatever algorithm you think it could fit, we only ask for acknowledging the source and please let us know of your performance! 

We choose not to differentiate the signals into training/test/validation  sets at this point so pick the distribution you prefer.

A small portion of the signals were captured without the stimulus of seeing the digits for contrast, all are random actions not related to thinking or seeing digits, you can decide to use them or not in your tests, they use the code -1.


SIGNAL DISTRIBUTION:


This is the distribution of the signals per device and digit:

Device/Digit 0 1 2 3 4 5 6 7 8 9 -1 Total
MindWave (MW) 5,531 5,498 5,517 5,416 5,381 5,568 5,476 5,552 5,545 5,450 12,701 67,635
EPOC (EP) 91,224 88,914 90,930 92,652 88,886 91,994 91,322 88,718 91,728 91,882 2,226 910,476
Muse (MU) 11,904 11,632 11,920 11,832 11,536 12,052 12,368 12,080 12,208 11,988 44,412 163,932
Insight (IN)* 6,305 6,740 6,535 6,605 6,620 6,460 6,425 6,470 6,590 6,500 0 65,250
Total 114,964 112,784 114,902 116,505 112,423 116,074 115,591 112,820 116,071 115,820 59,339 1,207,293

* Insight captures started in September 2015, dataset updated to fix the channel sepparation by comma and use dot for the decimals, instead of commas only , last update 10/12/2019 v1.06

** EPOC dataset updated to fix the channel sepparation by comma and use dot for the decimals, instead of commas only , last update 06/16/2018 v1.01

 

FILE FORMAT:

The data is stored in a very simple text format including:

[id]: a numeric, only for reference purposes.

[event] id, a integer, used to distinguish the same event captured at different brain locations, used only by multichannel devices (all except MW).

[device]: a 2 character string, to identify the device used to capture the signals, "MW" for MindWave, "EP" for Emotive Epoc, "MU" for Interaxon Muse & "IN" for Emotiv Insight.

[channel]: a string, to indentify the 10/20 brain location of the signal, with possible values:
 
MindWave "FP1"
EPOC "AF3, "F7", "F3", "FC5", "T7", "P7", "O1", "O2", "P8", "T8", "FC6", "F4", "F8", "AF4"
Muse "TP9,"FP1","FP2", "TP10"
Insight "AF3,"AF4","T7","T8","PZ" 

[code]: a integer, to indentify the digit been thought/seen, with possible values 0,1,2,3,4,5,6,7,8,9 or -1 for random captured signals not related to any of the digits.

[size]: a integer, to identify the size in number of values captured in the 2 seconds of this signal, since the Hz of each device varies, in "theory" the value is close to 512Hz for MW, 128Hz for EP, 220Hz for MU & 128Hz for IN, for each of the 2 seconds.

[data]: a coma separated set of numbers, with the time-series amplitude of the signal, each device uses a different precision to identify the electrical potential captured from the brain: integers in the case of MW & MU or real numbers in the case of EP & IN.

There is no headers in the files,  every line is  a signal, and the fields are separated by a tab

For example one line of each device could be (without the headers)

[id] [event] [device] [channel] [code] [size] [data]
27 27 MW FP1 5 952 18,12,13,12,5,3,11,23,37,36,26,24,35,42……
67650 67636 EP F7 7 260 4482.564102,4477.435897,4484.102564…….
978210 132693 MU TP10 1 476 506,508,509,501,497,494,497,490,490,493……
1142043 173652 IN AF3 0 256 4259.487179,4237.948717,4247.179487,4242.051282……

BRAIN LOCATIONS:

Each EEG device capture the signals via different sensors, located in these areas of my brain, the color represents the device:    MindWave, EPOC, Muse, Insight

David Vivancos Brain 10/20 Locations

RELATED RESEARCH, CITATIONS & RESULTS by 3rd parties:

- Giving sense to EEG records ( course IFT6390 "machine learning" by Pascal Vincent from MILA) by Amin Shahab, Marc Sayn-Urpar, René Doumbouya, Thomas George & Vincent Antaki.

- Contribution aux décompositions rapides des matrices et tenseurs , Viet-Dung NGUYEN THÈSE UNIVERSITÉ D’ORLÉANS  Nov-16th-2016

- Repositórios de dados de pesquisa na Espanha: breve análise.,Fernanda Passini MORENO, Universidade de Brasília   Jun-2017

- Fast learning of scale-free networks based on Cholesky factorization, Vladislav Jelisavčić, Ivan Stojkovic, Veljko Milutinovic, Zoran Obradovic   May-2018

- STRUCTURED LEARNING FROM BIG DATA BASED ON PROBABILISTIC GRAPHICAL MODELS, Vladislav Jelisavčić UNIVERSITY OF BELGRADE SCHOOL OF ELECTRICAL ENGINEERING   May-2018

- Combination of Wavelet and MLP Neural Network for Emotion Recognition System, Phuong Huy Nguyen,Thai Nguyen University of Technology (TNUT) & Thu May Duong ,Thi Mai Thuong Duong ,Thu Huong Nguyen University of Information and Communication Technology Vietnam   Nov-2018

- A Deep Evolutionary Approach to Bioinspired Classifier Optimisation for Brain-Machine Interaction, Jordan J. Bird , Diego R. Faria, Luis J. Manso, Anikó Ekárt, and Christopher D. Buckingham, School of Engineering and Applied Science, Aston University, Birmingham, UK   Mar-2019

- Novel joint algorithm based on EEG in complex scenarios, Dongwei Chen, Weiqi Yang, Rui Miao, Lan Huang, Liu Zhang, Chunjian Deng & Na Han School of Business, Beijing Institute of Technology, Zhuhai, China   Aug-2019

- HHHFL: Hierarchical Heterogeneous Horizontal Federated Learning for Electroencephalography,Dashan Gao,Ce Ju,Xiguang Wei, Yang Liu,Tianjian Chen and Qiang Yan, Hong Kong University of Science and Technology, 2AI Lab, WeBank Co. Ltd.   Sep-2019

- Universal EEG Encoder for Learning Diverse Intelligent Tasks,Baani Leen Kaur Jolly, Palash Aggrawal, Surabhi S Nath, Viresh Gupta, Manraj Singh Grover, Rajiv Ratn Shah, MIDAS Lab, IIIT-Delhi   Nov-2019

- Deep Learning based Recognition of Visual Digit Reading Using Frequency Band of EEG,Jaesik Kim , Jeongryeol Seo , and Kyungah Son, Ajou University Republic of Korea. 2019

- Stanford CS230 - Group Project Final Report,Roman Pinchuk and Will Ross 2020

- Mental State Recognition and Recommendation of Aids to Stabilize the Mind Using Wearable EEG,M.W.A. Aruni Wijesuriya, University of Colombo School of Computing 2020

- Generating the image viewed from EEG signals,Gaffari ÇELİK, Muhammed Fatih 2020

- EEG-Based Emotion Classification for Alzheimer’s Disease Patients Using Conventional Machine Learning and Recurrent Neural Network Models,Mahima Chaudhary, Sumona Mukhopadhyay, Marin Litoiu, Lauren E Sergio, Meaghan S Adams  Aug-2020

- Analysis of Multi-class Classification of EEG Signals Using Deep Learning,Dipayan Das, Tamal Chowdhury and Umapada Pal, National Institute of Technology, Durgapur, India   Oct-2020

- Understanding Brain Dynamics for Color Perception Using Wearable EEG Headband,Jungryul Seo, Teemu H. Laine, Gyuhwan Oh, Kyung-Ah Sohn  Dec-2020

- Image-based Deep Learning Approach for EEG Signal Classification, Haneul Yoo, Jungyul Seo, Kyung-Ah Sohn Ajou University. 2020

- Frequency Band and PCA Feature Comparison for EEG Signal Classification, Wayan Pio Pratama, Made Windu Antara Kesiman, Gede Aris Gunadi, Apr-2021

- Toward lightweight fusion of AI logic and EEG sensors to enable ultra edge-based EEG analytics on IoT devices, Tazrin Tahrat, May-2021

- Deep Learning in EEG: Advance of the Last Ten-Year Critical Period,Shu Gong, Kaibo Xing, Andrzej Cichocki, Junhua Li   May-2021

- Convolutional Neural Network-Based Visually Evoked EEG Classification Model on MindBigData,Nandini Kumari, Shamama Anwar,Vandana Bhattacharjee   Jun-2021

- Using Convolutional Neural Networks for EEG analysis, Arina Kazakova, Swarthmore College, Pennsylvania, United States   Jul-2021

- Emotional behavior analysis based on EEG signal processing using Machine Learning: A case study,Salim KLIBI, Makram Mestiri, Imed Riadh FARAH, National School of Computer Sciences Tunis, Tunisia   Jul-2021

- Visual Brain Decoding for Short Duration EEG Signals,Rahul Mishra, Krishan Sharma, Arnav Bhavsar   Aug-2021

- Quality analysis for reliable complex multiclass neuroscience signal classification via electroencephalography, Ashutosh Shankhdhar, Pawan Kumar Verma, Prateek Agrawal, Vishu Madaan, Charu Gupta   Jan-2022

- A Combinational Deep Learning Approach to Visually Evoked EEG-Based Image Classification, Nandini Kumari, Shamama Anwar, Vandana Bhattacharjee   Jan-2022

- TOWARD RELIABLE SIGNALS DECODING FOR ELECTROENCEPHALOGRAM: A BENCHMARK STUDY TO EEGNEX,Xia Chen, Xiangbin Tengb, Han Chenc, Yafeng Panc & Philipp Geyerd Leibniz University Hannover, Germany, Department of Education and Psychology, Freie Universität Berlin, Germany & Department of Psychology and Behavioral Sciences, Zhejiang University, China   Jul-2022

- A Review on EEG Data Classification Methods for Brain–Computer Interface,Vaibhav Jadhav, Namita Tiwari and Meenu Chawla, Maulana Azad National Institute of Technology, Bhopal, India   Sep-2022

- A survey of electroencephalography open datasets and their applications in deep learning,Alberto Nogales, Álvaro García-Tejedor, Universidad Francisco de Vitoria   Sep-2022

- Fortifying Brain Signals for Robust Interpretation,Kanan Wahengbam; Kshetrimayum Linthoinganbi Devi; Aheibam Dinamani Singh, Indian Institute of Technology Guwahati Guwahati, India   Nov-2022

- Visually evoked brain signals guided image regeneration using GAN variants,Nandini Kumari, Manipal Institute of Technology , Shamama Anwar, Vandana Bhattacharjee, Sudip Kumar Sahana Birla Institute of Technology   Mar-2023

- EEG-based classification of imagined digits using a recurrent neural network,Nrushingh Charan Mahapatra and Prachet Bhuyan   Apr-2023

- Emotions Classification from EEG Waves Using Deep Learning,Vrachnaki Ioanna University Of Wesrtern Attica  2023

- RECOGNITION OF HUMAN EMOTIONS BASED ON EEG BRAINWAVE SIGNALS USING MACHINE LEARNING TECHNIQUES-A COMPARATIVE STUDY,Saba Tahseen, Ajit Dantii,, Christ University, Bengaluru, India,   2023

- Flexible In-Ga-Zn-N-O synaptic transistor for ultralow-power neuromorphic computing and EEG-based brain-computer interfaces,Shuangqing Fan, Enxiu Wu, Minghui Cao, Ting Xu,Tong Liu, Lijun Yang,Jie Su and Jing Liu Jul-2023

- Decoding Thoughts with Deep Learning: EEG-Based Digit Detection using CNNs,Diganta Kalita Aug-2023

- Neural decoding of InterAxon Muse data using a recurrent convolutional neural network,Andrew Perez, Nov-2023

- Salient Arithmetic Data Extraction from Brain Activity via an Improved Deep Network,Nastaran Khaleghi, Shaghayegh Hashemi, Sevda Zafarmandi Ardabili, Sobhan Sheykhivand and Sebelan Danishvar, Dec-2023


Contact us if you need any more info.

Let's decode My Brain!
December 30th 2023
David Vivancos
vivancos@vivancos.com

This MindBigData The "MNIST" of Brain Digits is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/