Data management planning in BiSC

In this post we want to explain you in detail the data management process of the BiSC Project. Any data analysis project begins with data collection, followed by validation and cleansing. Once you are sure that all the information is true, you can proceed with the statistical analysis to try to draw significant conclusions.

In the case of the BiSC Project, data collection and processing is the longest and most important stage. As you already know, this project requires the collection of large-scale data, coming from very diverse sources and topics in a time window that includes both pregnancy monitoring and post-natal follow up. The data collected is not only on the development of the baby, but also other aspects such as the characterization of lifestyle, exposure to air pollution and noise, or the presence of green spaces in the city. Dealing with data of a very diverse nature increases the complexity of the analysis, and for this reason we need to have different groups of experts in the review of the information (obstetricians, biologists, experts in geospatial systems, geneticists, psychologists, experts in physical activity, noise experts, etc.).

Good data quality is essential to ensure that the conclusions drawn from the analysis are accurate and reliable. This can be ensured by reviewing the sources where the data is collected, using statistical methods to analyze the data, and checking that the data is complete, consistent, and coherent. Without proper validation, the study’s conclusions could be wrong or false, leading to wrong decisions or waste of resources. That’s why we put a lot of time and effort into it!

In addition, as the BiSC Project collects a lot of personal data, it is necessary to establish protocols that guarantee the privacy of the participants. These measures include the use of anonymization or pseudonymization of the data, as well as the implementation of physical and digital security measures to avoid the loss or escape of data. This ensures that personal data is protected and is not shared without the consent of the participants.

In the case of collaborating research centers or groups, if in order to proceed with any epidemiological or sample analysis related to our project they have to use data collected in the BiSC Project, it is drawn up what we call a DTA agreement (“Data Transfer Agreement”). There it is specified what data is going to be transferred, with what permissions and all the legal component of this information transfer. These agreements are signed between institutions and are very strict to guarantee data security.

Regarding the current state of the data analysis at BiSC, in general terms it should be noted that data is still being collected in the follow-up visits, but our data management team is working at full speed. For the data that refer to the prenatal stage, it is expected that the entire validation and cleaning part will be completed in the coming weeks, so that the statistical analysis can begin, for which its protocol has already been designed. This will not only consist of the descriptive analysis of the information already treated, but also a selection of the most important factors will be carried out, the study of the associations of the mother’s exposure to pollutants during pregnancy and finally the verification that this impact is not biased by other confounding factors.

The hypotheses raised in this part have been clearly defined and will be rigorously studied through the use of appropriate statistics. It is expected that the first results and conclusions of this part in relation to the pregnancy period will arrive before the summer.

Example of structure and planning of one of the BiSC projects, FRONTIER. “WP” refers to “work packages”, in the case of Frontier the characterization of lifestyle (WP2), exposure to air pollution (WP3), exposure to noise (WP4), green spaces (WP5). All of them are part of the statistical analysis (WP6).

This post has been writen by Toni Galmés, data manager at the BiSC Project.