MindBigData
The Visual "MNIST" of Brain Digits (2021)

Visal MNIST of Brain Digits

In 2014 started capturing brain signals and released the first versions of the "MNIST" of brain digits, and in 2018 released another open dataset with a subset of the "IMAGENET" of The Brain , since many researchers asked about improvements of the "MNIST" of brain digits dataset, I decided to release a new one, but this time with the real Yann LeCun 70,000 "MNIST" digits being shown while the EEG signals are captured. Instead of waiting to release it when all the signals are captured, the releases will be gradual and incremental. Started on August 27th 2021. (Stay tuned for updates).

Version 0.03 (last update 09/09/2021) of the open database contains 16,000 brain signals of 2 seconds each, captured with the stimulus of seeing a real MNIST digit (from 0 to 9)  4,000 so far and thinking about it, + the same amout of signals with another 2 seconds of seeing a black screen, shown in between the digits, from a single Test Subject David Vivancos in a controlled still experiment to reduce noise from EMG & avoiding blinks.

All the signals have been captured using commercial EEGs (not medical grade) like Interaxon Muse 2, (new headsets will be added), covering so far a total of 4 Brain (10/20) locations. + Other BioSignals

Files available for download:

DataBase File Zip size File size Date MNIST Digits Version
Muse2-v0.03 MindBigDataVisualMnist2021-Muse2v0.03.zip 156 MB (163,888,563 bytes) 542 MB (569,085,630 bytes) 09/09/2021 4,000 Beta 0.03

We built our own tools to capture them, but there is no post-processing on our side*, so they come raw as they are read from each EEG device, in total so far 16,384,000 EEG Data Points.  12,288,000 PPG Datapoints &  12,288,000 Accelerometer Datapoints & 12,288,000 Gyroscope Datapoints

Feel free to test any machine learning, deep learning or whatever algorithm you think it could fit, we only ask for acknowledging the source and please let us know of your performance! 

*For the Muse2 device we chose to include the PPG (photoplethysmography) blood volume changes sensor data too, and XYZ Accelerometer & Gyroscope info, all caputured at the same starting timestamp but interpolated to 256hz to make all the signals match the same number of samples, sources where 64hz for PPG and 52hz for Acc & Gyro. 

In this new dataset other bio-signals have been included beyond EEG, to foster the use of multimodal data in training algorithms, since it could help different lines of research.

FILE FORMAT:

The data is stored in a very simple text format (csv like, all comma separated) including:

[dataset]: a text pointing to the original Yann LeCun MNIST source type, can be "TRAIN" or "TEST", related to the 60,000 train digits and 10,000 test digits.

[origin] 1 integer, used to reference the Yann LeCun MNIST location of the original digits in the source data files from 0-59,999 for train 0-9,999 for test or -1 to indicate black signal (meaning not from the original MNIST datasets)

[digit_event]: 1 integer with the original MNIST label of the image from 0 to 9 or -1 to indicate black signal (no digit shown)

[original_png]: 784 integers (comma separated), with the original pixel intensities from the Yann LeCun MNIST from the source png files shown, each pixel can have a value from 0 to 255, (for black signal all will be 0s) 784 comes from from (28x28) since it is single channel square image, flattened

[timestamp]: 1 Unix Like timestamp for initial time of catpture of the signals for this digit capture

[EEGdata]:
For Muse2  
  512 floating point (comma separated) EEG - TP9   channel raw signal (2secs at 256hz), followed by
  512 floating point (comma separated) EEG - AF7   channel raw signal (2secs at 256hz), followed by
    512 floating point (comma separated) EEG - AF8   channel raw signal (2secs at 256hz), followed by
    512 floating point (comma separated) EEG - TP10 channel raw signal (2secs at 256hz)
   

[PPGdata]:
For Muse2 (only)  
  512 floating point (comma separated) PPG1 ambient channel raw signal (2secs at 256hz), followed by
  512 floating point (comma separated) PPG2 infrared channel raw signal (2secs at 256hz), followed by
    512 floating point (comma separated) PPG3 red channel raw signal (2secs at 256hz)

[Accdata
]:
For Muse2  
  512 floating point (comma separated) Accelerometer X channel raw signal (2secs at 256hz), followed by
512 floating point (comma separated) Accelerometer Y channel raw signal (2secs at 256hz), followed by
    512 floating point (comma separated) Accelerometer Z channel raw signal (2secs at 256hz)

[Gyrodata
]:
For Muse2  
  512 floating point (comma separated) Gyroscope X channel raw signal (2secs at 256hz),followed by
  512 floating point (comma separated) Gyroscope Y channel raw signal (2secs at 256hz),followed by
    512 floating point (comma separated) Gyroscope Z channel raw signal (2secs at 256hz)

For Muse2 data, in total there are 7,444 values coma separated per row

There are no headers in the files

 

RELATED RESEARCH, CITATIONS & RESULTS by 3rd parties (Using the previous "MNIST" of brain digits dataset):

- Giving sense to EEG records ( course IFT6390 "machine learning" by Pascal Vincent from MILA) by Amin Shahab, Marc Sayn-Urpar, René Doumbouya, Thomas George & Vincent Antaki.

- Contribution aux décompositions rapides des matrices et tenseurs , Viet-Dung NGUYEN THÈSE UNIVERSITÉ D’ORLÉANS  Nov-16th-2016

- Fast learning of scale-free networks based on Cholesky factorization, Vladislav Jelisavčić, Ivan Stojkovic, Veljko Milutinovic, Zoran Obradovic   May-2018

- STRUCTURED LEARNING FROM BIG DATA BASED ON PROBABILISTIC GRAPHICAL MODELS, Vladislav Jelisavčić UNIVERSITY OF BELGRADE SCHOOL OF ELECTRICAL ENGINEERING   May-2018

- Combination of Wavelet and MLP Neural Network for Emotion Recognition System, Phuong Huy Nguyen,Thai Nguyen University of Technology (TNUT) & Thu May Duong ,Thi Mai Thuong Duong ,Thu Huong Nguyen University of Information and Communication Technology Vietnam   Nov-2018

- A Deep Evolutionary Approach to Bioinspired Classifier Optimisation for Brain-Machine Interaction, Jordan J. Bird , Diego R. Faria, Luis J. Manso, Anikó Ekárt, and Christopher D. Buckingham, School of Engineering and Applied Science, Aston University, Birmingham, UK   Mar-2019

- Novel joint algorithm based on EEG in complex scenarios, Dongwei Chen, Weiqi Yang, Rui Miao, Lan Huang, Liu Zhang, Chunjian Deng & Na Han School of Business, Beijing Institute of Technology, Zhuhai, China   Aug-2019

- HHHFL: Hierarchical Heterogeneous Horizontal Federated Learning for Electroencephalography,Dashan Gao,Ce Ju,Xiguang Wei, Yang Liu,Tianjian Chen and Qiang Yan, Hong Kong University of Science and Technology, 2AI Lab, WeBank Co. Ltd.   Sep-2019

- Universal EEG Encoder for Learning Diverse Intelligent Tasks,Baani Leen Kaur Jolly, Palash Aggrawal, Surabhi S Nath, Viresh Gupta, Manraj Singh Grover, Rajiv Ratn Shah, MIDAS Lab, IIIT-Delhi   Nov-2019

- Stanford CS230 - Group Project Final Report,Roman Pinchuk and Will Ross 2020

- Mental State Recognition and Recommendation of Aids to Stabilize the Mind Using Wearable EEG,M.W.A. Aruni Wijesuriya, University of Colombo School of Computing 2020

- Generating the image viewed from EEG signals,Gaffari ÇELİK, Muhammed Fatih 2020

- EEG-Based Emotion Classification for Alzheimer’s Disease Patients Using Conventional Machine Learning and Recurrent Neural Network Models,Mahima Chaudhary, Sumona Mukhopadhyay, Marin Litoiu, Lauren E Sergio, Meaghan S Adams  Aug-2020

- Understanding Brain Dynamics for Color Perception Using Wearable EEG Headband,Jungryul Seo, Teemu H. Laine, Gyuhwan Oh, Kyung-Ah Sohn  Dec-2020

- Toward lightweight fusion of AI logic and EEG sensors to enable ultra edge-based EEG analytics on IoT devices, Tazrin Tahrat, May-2021

- Deep Learning in EEG: Advance of the Last Ten-Year Critical Period,Shu Gong, Kaibo Xing, Andrzej Cichocki, Junhua Li   May-2021

- Convolutional Neural Network-Based Visually Evoked EEG Classification Model on MindBigData,Nandini Kumari, Shamama Anwar,Vandana Bhattacharjee   Jun-2021

 

RELATED RESEARCH, CITATIONS & RESULTS by 3rd parties (Using the previous "IMAGENET" of the brain dataset):

- Inferencia de la Topologia de Grafs,Tura Gimeno Sabater, UPC 2020

- Understanding Brain Dynamics for Color Perception using Wearable EEG headband Mahima Chaudhary, Sumona Mukhopadhyay, Marin Litoiu, Lauren E Sergio, Meaghan S Adams York University, Toronto, Canada 2020

- Developing a Data Visualization Tool for the Evaluation Process of a Graphical User Authentication System Loizos Siakallis , UNIVERSITY OF CYPRUS USA  2020

- Object classification from randomized EEG trials Hamad Ahmed, Ronnie B Wilbur,Hari M Bharadwaj and Jeffrey Mark, Purdue University USA 2020


Contact us if you need any more info.

Let's decode My Brain!
September 9th 2021
David Vivancos
vivancos@vivancos.com

This MindBigData The Visual "MNIST" of Brain Digits is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/