MindBigData
The Visual "MNIST" of Brain Digits (2021)

Visal MNIST of Brain Digits

In 2014 started capturing brain signals and released the first versions of the "MNIST" of brain digits, and in 2018 released another open dataset with a subset of the "IMAGENET" of The Brain , since many researchers asked about improvements of the "MNIST" of brain digits dataset, I decided to release a new one, but this time with a subset of the real Yann LeCun "MNIST" digits being shown while the EEG signals are captured. Started on August 27th 2021. (Stay tuned for updates).

Version 0.17 (last update 01/12/2022) of the open database contains 72,000 brain signals of 2 seconds each, captured with the stimulus of seeing a real MNIST digit (from 0 to 9)  18,000 so far and thinking about it, + the same amout of signals with another 2 seconds of seeing a black screen, shown in between the digits, from a single Test Subject David Vivancos in a controlled still experiment to reduce noise from EMG & avoiding blinks.

All the signals have been captured using commercial EEGs (not medical grade) like Interaxon Muse 2, (new headsets will be added), covering so far a total of 4 Brain (10/20) locations. + Other BioSignals

Files available for download:

DataBase File Zip size File size Date MNIST Digits Version
Muse2-v0.17 MindBigDataVisualMnist2021-Muse2v0.17.zip 709 Mb (743,459,783 bytes) 2,37 Gb (2,551,653,293 bytes) 01/12/2022 18,000 Beta 0.17
Muse2-v0.14 MindBigDataVisualMnist2021-Muse2v0.14.zip 592 Mb (621,214,944 bytes) 1,97 Gb (2,126,055,435 bytes) 12/16/2021 15,000 Beta 0.14
Muse2-v0.09 MindBigDataVisualMnist2021-Muse2v0.09.zip 394 Mb (413,159,565 bytes) 1,31 Gb (1,417,356,709 bytes) 11/05/2021 10,000 Beta 0.09
Muse2-v0.04 MindBigDataVisualMnist2021-Muse2v0.04.zip 195Mb (205,309,158 bytes) 677Mb (710,554,967 bytes) 09/17/2021 5,000 Beta 0.04
Muse2-v0.01 MindBigDataVisualMnist2021-Muse2v0.01.zip 77,4Mb (81,200,465 bytes) 271Mb (284,676,053 bytes) 08/27/2021 2,000 Beta 0.01

We built our own tools to capture them, but there is no post-processing on our side*, so they come raw as they are read from each EEG device, in total so far 73,728,000 EEG Data Points.  55 Million PPG Datapoints &  55 Million Accelerometer Datapoints & 55 Million Gyroscope Datapoints

Feel free to test any machine learning, deep learning or whatever algorithm you think it could fit, we only ask for acknowledging the source and please let us know of your performance! 

*For the Muse2 device we chose to include the PPG (photoplethysmography) blood volume changes sensor data too, and XYZ Accelerometer & Gyroscope info, all caputured at the same starting timestamp but interpolated to 256hz to make all the signals match the same number of samples, sources where 64hz for PPG and 52hz for Acc & Gyro. 

In this new dataset other bio-signals have been included beyond EEG, to foster the use of multimodal data in training algorithms, since it could help different lines of research.

January 2022 Update: Due to the EEG signal noise detected in some channels of Muse2 recordings, subsets have been created leaving only the best signals still in raw format, one with 2 channels "Cut2" TP9 & TP10 and other with 3 channels "Cut3" TP9 AF7 & TP10

DataBase File Zip size File size Date MNIST Digits Version
Muse2-v0.16Cut2 MindBigDataVisualMnist2021-Muse2v0.16Cut2.zip 189 Mb (198,358,166 bytes) 659 Mb (691,242,736 bytes) 01/09/2022 11,387 Beta 0.16Cut2
Muse2-v0.16Cut3 MindBigDataVisualMnist2021-Muse2v0.16Cut3.zip 20,9 Mb (21,917,547 bytes) 74,8 Mb (78,476,950 bytes) 01/09/2022 1,184 Beta 0.16Cut3

The original Muse2 datasets with the 4 EEG channels above can still be used in many cases with further preprocessing.

New EEG Headsets will be added through 2022.

FILE FORMAT:

The data is stored in a very simple text format (csv like, all comma separated) including:

[dataset]: a text pointing to the original Yann LeCun MNIST source type, can be "TRAIN" or "TEST", related to the 60,000 train digits and 10,000 test digits.

[origin] 1 integer, used to reference the Yann LeCun MNIST location of the original digits in the source data files from 0-59,999 for train 0-9,999 for test or -1 to indicate black signal (meaning not from the original MNIST datasets)

[digit_event]: 1 integer with the original MNIST label of the image from 0 to 9 or -1 to indicate black signal (no digit shown)

[original_png]: 784 integers (comma separated), with the original pixel intensities from the Yann LeCun MNIST from the source png files shown, each pixel can have a value from 0 to 255, (for black signal all will be 0s) 784 comes from from (28x28) since it is single channel square image, flattened

[timestamp]: 1 Unix Like timestamp for initial time of catpture of the signals for this digit capture

[EEGdata]:
For Muse2  
  512 floating point (comma separated) EEG - TP9   channel raw signal (2secs at 256hz), followed by
  512 floating point (comma separated) EEG - AF7   channel raw signal (2secs at 256hz), followed by
    512 floating point (comma separated) EEG - AF8   channel raw signal (2secs at 256hz), followed by
    512 floating point (comma separated) EEG - TP10 channel raw signal (2secs at 256hz)
For Muse2 Cut2 ( only TP9 & TP10)  
  For Muse2 Cut3 ( only TP9, AF7 & TP10)  

[PPGdata]:
For Muse2 (only)  
  512 floating point (comma separated) PPG1 ambient channel raw signal (2secs at 256hz), followed by
  512 floating point (comma separated) PPG2 infrared channel raw signal (2secs at 256hz), followed by
    512 floating point (comma separated) PPG3 red channel raw signal (2secs at 256hz)

[Accdata
]:
For Muse2  
  512 floating point (comma separated) Accelerometer X channel raw signal (2secs at 256hz), followed by
512 floating point (comma separated) Accelerometer Y channel raw signal (2secs at 256hz), followed by
    512 floating point (comma separated) Accelerometer Z channel raw signal (2secs at 256hz)

[Gyrodata
]:
For Muse2  
  512 floating point (comma separated) Gyroscope X channel raw signal (2secs at 256hz),followed by
  512 floating point (comma separated) Gyroscope Y channel raw signal (2secs at 256hz),followed by
    512 floating point (comma separated) Gyroscope Z channel raw signal (2secs at 256hz)

For Muse2 data, in total there are 7,444 values coma separated per row (6,932 for Muse2 Cut3 & 6,420 for Muse2 Cut2)

There are no headers in the files

 

RELATED RESEARCH, CITATIONS & RESULTS by 3rd parties (Using the previous "MNIST" of brain digits dataset):

- Giving sense to EEG records ( course IFT6390 "machine learning" by Pascal Vincent from MILA) by Amin Shahab, Marc Sayn-Urpar, René Doumbouya, Thomas George & Vincent Antaki.

- Contribution aux décompositions rapides des matrices et tenseurs , Viet-Dung NGUYEN THÈSE UNIVERSITÉ D’ORLÉANS  Nov-16th-2016

- Fast learning of scale-free networks based on Cholesky factorization, Vladislav Jelisavčić, Ivan Stojkovic, Veljko Milutinovic, Zoran Obradovic   May-2018

- STRUCTURED LEARNING FROM BIG DATA BASED ON PROBABILISTIC GRAPHICAL MODELS, Vladislav Jelisavčić UNIVERSITY OF BELGRADE SCHOOL OF ELECTRICAL ENGINEERING   May-2018

- Combination of Wavelet and MLP Neural Network for Emotion Recognition System, Phuong Huy Nguyen,Thai Nguyen University of Technology (TNUT) & Thu May Duong ,Thi Mai Thuong Duong ,Thu Huong Nguyen University of Information and Communication Technology Vietnam   Nov-2018

- A Deep Evolutionary Approach to Bioinspired Classifier Optimisation for Brain-Machine Interaction, Jordan J. Bird , Diego R. Faria, Luis J. Manso, Anikó Ekárt, and Christopher D. Buckingham, School of Engineering and Applied Science, Aston University, Birmingham, UK   Mar-2019

- Novel joint algorithm based on EEG in complex scenarios, Dongwei Chen, Weiqi Yang, Rui Miao, Lan Huang, Liu Zhang, Chunjian Deng & Na Han School of Business, Beijing Institute of Technology, Zhuhai, China   Aug-2019

- HHHFL: Hierarchical Heterogeneous Horizontal Federated Learning for Electroencephalography,Dashan Gao,Ce Ju,Xiguang Wei, Yang Liu,Tianjian Chen and Qiang Yan, Hong Kong University of Science and Technology, 2AI Lab, WeBank Co. Ltd.   Sep-2019

- Universal EEG Encoder for Learning Diverse Intelligent Tasks,Baani Leen Kaur Jolly, Palash Aggrawal, Surabhi S Nath, Viresh Gupta, Manraj Singh Grover, Rajiv Ratn Shah, MIDAS Lab, IIIT-Delhi   Nov-2019

- Stanford CS230 - Group Project Final Report,Roman Pinchuk and Will Ross 2020

- Mental State Recognition and Recommendation of Aids to Stabilize the Mind Using Wearable EEG,M.W.A. Aruni Wijesuriya, University of Colombo School of Computing 2020

- Generating the image viewed from EEG signals,Gaffari ÇELİK, Muhammed Fatih 2020

- EEG-Based Emotion Classification for Alzheimer’s Disease Patients Using Conventional Machine Learning and Recurrent Neural Network Models,Mahima Chaudhary, Sumona Mukhopadhyay, Marin Litoiu, Lauren E Sergio, Meaghan S Adams  Aug-2020

- Understanding Brain Dynamics for Color Perception Using Wearable EEG Headband,Jungryul Seo, Teemu H. Laine, Gyuhwan Oh, Kyung-Ah Sohn  Dec-2020

- Frequency Band and PCA Feature Comparison for EEG Signal Classification, Wayan Pio Pratama, Made Windu Antara Kesiman, Gede Aris Gunadi, Apr-2021

- Toward lightweight fusion of AI logic and EEG sensors to enable ultra edge-based EEG analytics on IoT devices, Tazrin Tahrat, May-2021

- Deep Learning in EEG: Advance of the Last Ten-Year Critical Period,Shu Gong, Kaibo Xing, Andrzej Cichocki, Junhua Li   May-2021

- Convolutional Neural Network-Based Visually Evoked EEG Classification Model on MindBigData,Nandini Kumari, Shamama Anwar,Vandana Bhattacharjee   Jun-2021

- Using Convolutional Neural Networks for EEG analysis, Arina Kazakova, Swarthmore College, Pennsylvania, United States   Jul-2021

- Visual Brain Decoding for Short Duration EEG Signals,Rahul Mishra, Krishan Sharma, Arnav Bhavsar   Aug-2021

- Quality analysis for reliable complex multiclass neuroscience signal classification via electroencephalography, Ashutosh Shankhdhar, Pawan Kumar Verma, Prateek Agrawal, Vishu Madaan, Charu Gupta   Jan-2022

 

RELATED RESEARCH, CITATIONS & RESULTS by 3rd parties (Using the previous "IMAGENET" of the brain dataset):

- Inferencia de la Topologia de Grafs,Tura Gimeno Sabater, UPC 2020

- Understanding Brain Dynamics for Color Perception using Wearable EEG headband Mahima Chaudhary, Sumona Mukhopadhyay, Marin Litoiu, Lauren E Sergio, Meaghan S Adams York University, Toronto, Canada 2020

- Developing a Data Visualization Tool for the Evaluation Process of a Graphical User Authentication System Loizos Siakallis , UNIVERSITY OF CYPRUS USA  2020

- Object classification from randomized EEG trials Hamad Ahmed, Ronnie B Wilbur,Hari M Bharadwaj and Jeffrey Mark, Purdue University USA 2020

- Evaluating the ML Models for MindBigData (IMAGENET) of the Brain Signals Priyanka Jain, Mayuresh Panchpor, Saumya Kushwaha and Naveen Kumar Jain, Artificial Intelligence Group, CDAC, Delhi  2022


Contact us if you need any more info.

Let's decode My Brain!
August 15th 2022
David Vivancos
vivancos@vivancos.com

This MindBigData The Visual "MNIST" of Brain Digits is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/