Threatened Species News Dataset

Description

This data is part of the research article: Automated retrieval of information on threatened species from online sources using machine learning, Ritwik Kulkarni and Enrico Di Minin, 2021, Methods in Ecology and Evolution Kindly cite this article for the dataset. 1 Considering limited conservation resources, gathering and analyzing information from digital data sources can help investigate the global biodiversity crisis in a cost-efficient manner. Development and application of methods for automated content analysis of digital data sources are especially important in the context of investigating human-nature interactions. 2. In this study, we introduce methods to automatically collect information on species threatened by wildlife trade from online news. An end to end pipeline is constructed that begins from searching and downloading news articles about species listed in Appendix I of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) and proceeds with implementing natural language processing and machine learning methods to filter and retain only relevant articles. Additional relevant information is then extracted for each article using a Named Entity Recognition model. 3. The data collected over a one month period included 15,088 articles and focused on 585 species listed in Appendix I of CITES. The accuracy of the neural network to detect relevant articles was 95.91% while the Named Entity recognition model helped extract information on prices, location, and quantities of traded animals. A regularly updated database is generated by the system, which can be queried and analysed for various research purposes and to inform conservation decision-making. 4. The results demonstrate that natural language processing can be used in an efficient manner to extract information from digital text content. The proposed methods can be applied to multiple digital data platforms at the same time and used to investigate human-nature interactions in conservation science and practice.
Show more

Year of publication

2021

Type of data

Authors

Enrico Di Minin - Contributor, Creator, Curator, Rights holder

Ritwik Kulkarni - Contributor, Creator, Curator, Publisher, Rights holder

Project

Other information

Fields of science

Environmental sciences

Language

English

Open access

Open

License

Creative Commons Attribution NonCommercial ShareAlike 4.0 International (CC BY NC SA 4.0)

Keywords

conservation, Machine learning, Natural language processing, CITES, Online News, threatened species

Temporal coverage

undefined

Related to this research data