Hands-on-workshop (hackathon): From signals to environmentally tagged sequences III (SeqEnv III)

Dates: Monday 22 - Thursday 25 Sept 2014

Venue: Hellenic Centre for Marine Research, Crete, Greece

Organizers: Dr. Evangelos Pafilis (pafilis@hcmr.gr) (local)

                     Dr. Christopher Quince (University of Glasgow, cq8u@udcf.gla.ac.uk)

                     Dr. Umer Zeeshan Ijaz (University of Glasgow, Umer.Ijaz@glasgow.ac.uk)

Scientific Background

β"SeqEnv" (developed in two previous COST ES1103 workshops) is a pipeline combining sequence analysis and text mining to annotate 16S rRNA and metagenomics microbial sequences based on environment descriptive terms (home page: http://environments.hcmr.gr/seqenv.html).   

Sequence similarity searches against public databases and the recognition of terms such as "glacier, pelagic, forest, lagoon" (based on the Environment Ontology) within Genbank records (e.g. "œisolation source" field) and/or in the relevant literature (PubMed abstracts) are being employed to characterize novel microbial sequences.

Subsequently, a range of visualizations, such as tag clouds, heatmaps, and plots, are generated to describe OTUs and samples.

The pipeline may be invoked either a. for DNA sequences or b. for protein sequences. Currently, two text sources are being mined for environment descriptive terms: 1. Sequence related PubMed abstracts (collected via eutils on-the-fly) and 2. the sequence isolation source field (in Genbank records) (also collected via eutils on-the-fly).

The pipeline has already been applied to a range of datasets (e.g Greek lagoon, Swedish lake/river, African and Asian pitlatrine datasets, Black Sea sediment sample datasets have been processed).

SEQenvIII, although Information Technology oriented, aims closing the gap with the biologist end users and promote the service uptake (in particular among the COST ES1103 community).

Participants:

Scientific Area

Participants

Statistical Analysis

Microbial Ecology

Machine Learning

Dr. Christopher Quince (cq8u@udcf.gla.ac.uk , Uni Glasgow)

Dr. Umer Zeeshan Ijaz (Umer.Ijaz@glasgow.ac.uk, Uni Glasgow, via Skype)

Sequence Analysis

Bioinformatics

Dr. Anastasis Oulas (oulas@hcmr.gr , HCMR)

Biology Use Cases

Ms. Christina Pavloudi (cpavloud@hcmr.gr , HCMR)

Text Mining

Systems Biology

Bioinformatics

Dr. Lars Juhl Jensen (lars.juhl.jensen@cpr.ku.dk, CPR-NNF, Copenhagen)

Dr. Evangelos Pafilis (pafilis@hmcr.gr , HCMR)

Statistical / Phylogenetic Analysis

Dr. Tomas Flouri (Tomas.Flouri@h-its.org , HITS, Heidelberg)

Statistics Analysis

Web development Vizualization

Dr. Lex Overmars (L.Overmars@uva.nl , UVA Amsterdam)

Microbial Ecology

Computational Biology

Dr. Conor Meehan (cmeehan@itg.be, Institute of Tropical Medicine, Antwerp, Belgium)
Mr. Tomas Vetrovsky (kostelecke.uzeniny@seznam.cz , Institute of Microbiology, Prague)

Mr. Lucas Sinclair (lucas.sinclair@me.com , Uppsala University)