Advances in the abilities of generating large datasets have drastically improved the way we nowadays conduct biological research. It is therefore crucial that these data resources are integrated as tightly and efficiently as possible to allow researchers to use them to their full potential.
The ORCAE portal is a well-established online resource enabling communities to conduct manual curation of automatically generated genome annotations55. It currently contains more than 25 genomes including the brown seaweed model Ectocarpus siliculosus. ORCAE attracts several thousands of visitors a month. Currently the main focus of the resource is the curation of structural and functional annotations of genes. However, in its current form, ORCAE should be well-suited to be upgraded into a central integration point for all sorts of ‘-omic’ information.
This project, which is based at VIB – a life sciences research institute in Belgium, is developing the ORCAE platform for linking together and manually curating metabolome and pathway information, on top of the existing gene and genome functionalities. Shu-Min Kao has been appointed researcher for project 14 and he is aiming to incorporate pathway information such as KEGG, reactome and BioCyc into the existing ORCAE platform, as a backbone to anchor metabolome data already available or generated by ALFF partners. As a second step, he will develop links between metabolomics and transcriptomics (in collaboration with the researcher appointed to project 15). This will open up possibilities to integrate pathway information with gene set (enrichment) analysis results. We plan to, at least in the initial stages, rely on software such as the bioconductor package PathView to perform large-scale analyses. Finally, Shu-Min is aiming to set up means and protocols to allow users to incorporate custom datasets.
The specifics on how and what to integrate and visualise will greatly depend on the needs expressed by ALFF scientists. To maximise the biological relevance of this project, we will address the abovementioned objectives 1) & 2) using two case-studies of immediate relevance for ALFF: i) integration of EctoGEM, a recently-released Ectocarpus metabolic network (in collaboration with ESRs 10 & 15); ii) analysis and integration of available and newly-generated metabolome, transcripto-me, and genome data for Nannochloropsis (ESR 12) and other eustigmatophytes (ESR 5). These data will be used to investigate the effect of bacterial communities on algal pathways (e.g. C fixation or lipid synthesis) and to identify the cellular response mechanisms responsible.
The key outcomes of this research will be new releases of ORCAE which provide:
- Integrated pathway information;
- Protocols for users to incorporate and analyse mixed transcriptomics datasets, allowing efficient querying and hypothesis generation; and
- Integrated resource of genomics, transcriptomics and metabolomics data for Nannochloropsis, including analysis of metabolic efficiency as an effect of its bacterial community changes.