DAPlankton: a benchmark dataset for fine-grained domain adaptation
Description
The DAPlankton dataset consists of over 110k expert-labeled plankton images. The data is divided into two subsets: DAPlankton_LAB and DAPlankton_SEA. DAPlankton_LAB consists of images captured from multiple mono-specific phytoplankton cultures, which were analysed using three different imaging instruments: Imaging FlowCytoBot (IFCB), CytoSense (CS) flow cytometer, and FlowCam (FC) imaging microscope each producing cropped images with one plankton particle in each. An expert further verified the class of each image, ensuring that there was no cross contamination between different cultures. This process resulted in a balanced dataset with negligible label uncertainty. DAPlankton_SEA consists of images captured from water samples collected from the Baltic Sea using two different imaging instruments: IFCB and CS. Each image was manually labeled by an expert. DAPlankton_SEA provides a realistic and more challenging dataset with a large class imbalance and natural intra-class variance.
If you use this dataset in your research, we kindly ask that you reference the following paper:
D. Batrakhanov, T. Eerola, K. Kraft, L. Haraguchi, L. Lensu, S. Suikkanen, M.T. Camarena-Gomez, J. Seppälä H. Kälviäinen, DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain Adaptation, arXiv, 2024.
**Data composition**
DAPlankton_LAB contains, in total, 47 471 images from 15 phytoplankton species and 3 different domains (imaging instruments). The number of images per class-domain combination varies between 286 and 2618. The list of classes (species) is as follows:
- Aphanizomenon flosaquae
- Apocalathium malmogiense
- Chrysotila roscoffensis
- Diatoma tenuis
- Gymnodinium corollarium
- Kryptoperidium foliaceum
- Levanderina fissa
- Melosira arctica
- Nephroselmis pyriformis
- Peridiniella catenata
- Pseudopedinella sp.
- Rhinomonas nottbecki
- Rhodomonas salina
- Teleaulax acuta
- Tetraselmis sp.
DAPlankton_SEA contains, in total, 64 453 images from 31 plankton classes and 2 different domains. The number of images per class-domain combination varies between 5 and 12 280. The list of classes is as follows:
- Aphanizomenon flosaquae
- Centrales sp
- Chaetoceros sp
- Chaetoceros sp (single)
- Chlorococcales
- Chroococcales
- Ciliata
- Cryptomonadales
- Cryptophyceae Teleaulax
- Cyclotella choctawhatcheeana
- Dinophyceae
- Dinophysis acuminata
- Dolichospermum Anabaenopsis
- Dolichospermum Anabaenopsis (coiled)
- Euglenophyceae
- Eutreptiella sp
- Gymnodiniales
- Gymnodinium like
- Heterocapsa rotundata
- Heterocapsa triquetra
- Heterocyte
- Katablepharis remigera
- Mesodinium rubrum
- Monoraphidium contortum
- Nitzschia paleacea
- Nodularia spumigena
- Oocystis sp
- Pseudopedinella sp.
- Pyramimonas sp.
- Skeletonema marinoi
- Snowella Woronichinia
Show moreYear of publication
2024
Authors
Kaisa Kraft - Creator, Contributor
Lumi Haraguchi - Creator, Contributor
Jukka Seppälä - Contributor
Sanna Suikkanen - Contributor
Instituto Español de Oceanografia
Maria Teresa Camarena-Gomez - Creator, Contributor
Daniel Batrakhanov - Creator, Contributor
Heikki Kälviäinen - Contributor
Lasse Lensu - Contributor
Other information
Fields of science
Computer and information sciences; Environmental sciences
Open access
Open
License
Creative Commons Attribution 4.0 International (CC BY 4.0)