As a part of this project we have collected and preprocessed a few datasets which we used for the development and testing of various classification methods. Below you will find samples of the collected datasets. We are happy to share full sets. You may contact us by email on a way of sharing the data.

Prepared datasets are as follows:
  • letters.zip: handwritten lowercase Latin alphabet letters: a dataset comprising of 32,220 images, ca. 1,240 letters in each class. This dataset was created with the help of our 16 students, writing about 80 copies of each letter. Samples are organised in folders with the folder name being the class label. Images are black and white or in grayscale, 24 pixels by 24 pixels.
  • music.zip: printed music notation: a dataset made of 20 classes. In total, there are 27,326 images. This data set was cut from actual music scores and symbols are contaminated with stave in the background. Scores were authored by different composers, and for different music genres. The dataset is highly imbalanced both with respect to the quantity of samples in scores and their properties such as shape and size. Dataset contains black and white or grayscale images, size of images varies as patterns are not scaled. Samples are organised in folders. Folder name indicates class name, we used underscore character in class names.
  • music_foreign.zip: foreign patterns discovered in printed music notation. The dataset contains 710 samples originating from the same music scores as music.zip. Foreign samples do not belong to any native class. We used this dataset for testing proposed rejection mechanisms.
  • {{ item.filename | limitTo: 20 }}{{item.filename.length > 20 ? '...' : ''}}
    {{item.description}}