Despite the increasing amounts of underwater acoustic data that become available every day, only a small proportion is actually labelled. MERIDIAN is developing a crowdsourcing platform for labeling datasets, which can then be used to train machine learning models. The platform will allow users to create labeling sessions for their datasets and invite participants to label. With a flexible structure, it is possible to choose the public that will have access to a particular dataset, from a select group of experts to anyone with internet access.

Screenshot of crowd-sourcing application

It can even be limited to a classroom, in which case the platform can be turned into a teaching resource. We are still at an early stage of development, but the features that have been implemented so far were enough to set up a web app that was used as an educational activity for primary school students in Nova Scotia. In this case, no labels were collected as the data used was already labelled, so it worked as a fun activity to for students to learn more about marine bioacoustics and the problems caused by noise pollution. This activity will be presented at the 6th International Symposium in Education of CRIFPE in April under the title “A Big Data Educational Program for Mi’kmaq Children in Nova Scotia”.

As we continue to work on this tool, we hope it will support use cases at several levels, from expert bioacousticians who can use it as a convenient platform for collaboration across institutions, to citizen science initiatives that leverage the contributions of volunteers around the world, all contributing to an increased availability of labelled data to support the development of AI-assisted research.