Dr. Jorge Abreu Vicente
🔭🪐 Astrophysicist
🤖🧬 ML / AI researcher in biomedical sciences
🖋🗣 Science communicator
I research how to use natural language processing (NLP) to build open science tools that revolutionize the way we do and understand science, by developing generative means of structuring data via large language models (LLMs) and knowledge graphs (KGs). I use these technologies to annotate and curate all molecular and cell biology knowledge into data structures that are understandable by both, human and machine.
Publication records: NASA/ADS Abstract System | ORCID | Google Scholar
2017 | Ph.D. cum laude in Natural Sciences
Ruprecht-Karls-Universität Heidelberg, Heidelberg, DE Thesis: Molecular clod structure at Galactic scales Written score: 1/1 Member of the prestigious International Max Planck Research School. |
2012 | M. S. in Physics and Mathematics
IRAM & Universidad de Granada, Granada, ES Thesis: Carbono ionizado en el eje mayor de M33. Honors in Master Thesis |
2010 | B. S. in Physics
Universidad de La Laguna, La Laguna, ES Graduated with honors in optics. |
Academic research experience
2022- | Senior staff: Machine learning developer
EMBO (European Molecular Biology Organization), Heidelberg, DE Research and development of computing language models for biomedical data curation. Generation of knowledge graph for molecular and cell biology. Transforming open science through AI initiatives. |
2013-17 | Post doctorate researcher and PhD student
Max-Planck-Institute for Astronomy, Heidelberg, DE World-class research on star formation and molecular cloud structure at Galactic scales. |
2013 | Research associate
Instituto de Astrofísica de Andalucía, Granada, ES Automated data processing and analysis pipeline for the IRAM 30m telescope. |
2010-13 | Astronomer on duty, IRAM 30m telescope
Instituto de Radioastronomía Milimétrica (IRAM), Granada, ES Quality assurance of observational data. Spectral and image processing and analysis. Writing technical documentation and reports. |
Industry experience
2019-22 | Head of Center of Excellence Data Science and Innovation
CAMELOT Group, Mannheim, DE Lead company wide AI transformation. Leading collaboration and connection with AWS. Data strategy design and lead implementation of Camelot Data Intelligent Digital Services. Intelligent document processing: Automated data extraction from unstructured documents. |
2017-19 | Data scientist
Datavard AG, Heidelberg, DE Creation and development of award-winning application. Evaluation and implementation of the data science projects. Fast iteration and experimentation, complex application prototyping. |
Honors and awards
2021 | Selected as part of the book 'Inspiraciones Nocturnas VII' (Diversidad Literaria) |
2018 | Most Innovative Project 2018 by the IA4SP. (Datavard AG) |
2018 | 3rd Prize in the Innojam, SAP Campus Basel 2018. (Datavard AG) |
2017 | Cover Page of the Astronomy and Astrophysics Journal. |
2013-17 | PhD Fellowship in International Max Planck Research School. |
2012 | Master Thesis Honors. Thesis grade: 10/10. |
2010-12 | MsC fellowship Instituto de Radioastronomía Milimétrica. |
2010 | Best scientific poster in the III COEFFIS. |
2010 | Honors in Optics. |
Research publications
Refereed publications
2023 |
The SourceData-NLP dataset: integrating curation into scientific publishing for training large language models
Abreu-Vicente et al. (2023) arXiv.cs.CL, 2310.20440 |
Other publications
2022 | Paleontología galáctica Artículo en Astronomía Magazine (Cover page) |
2021 | El viaje más largo (Inspiraciones Nocturnas VII - Varios Autores)
Diversidad Literaria |
Generated AI and coding resources
2023 | The SourceData-NLP dataset
The largest biomedical AI-ready dataset for NER and NEL. 🤗 HuggingFace Dataset for Biomedical NER/NEL. soda-data: generate the data. soda-model: generate models with the dataset. |
2023 | EMBO/sd-geneprod-roles-v2
Model for detecting the empirical roles of genes and proteins in experiments described in biomedical literature. 🤗 HuggingFace model |
2023 | EMBO/sd-smallmol-roles-v2
Model for detecting the empirical roles of chemicals and small molecules in experiments described in biomedical literature. 🤗 HuggingFace model |
2023 | EMBO/sd-ner-v2
Model that generates NER for 9 classes of biomedial entities gene products, small molecules, cell lines, cell types, subcellular, organism, tissues, diseases, and experimental assays. 🤗 HuggingFace model |
2023 | EMBO/sd-panelization-v2
Model for separating figure captions of biomedical literature into their constituent panels. 🤗 HuggingFace model |
Conferences, workshops, seminars
Invited talks
2019 | Neural Networks in Astrophysics
Universidad Interamericana de Puerto Rico |
2016 | Molecular cloud structure at Galactic scales
Universität zu Köln, DE |
2016 | Density distribution and star formation: a Galactic perspective
Instituto de Astrofísica de Canarias, La Laguna, ES |
2014 | Density distribution and star formation: a Galactic perspective
Institute of Theoretical Astrophysics, Heidelberg, DE |
Academic AI
2023 | Transforming open science through AI initiatives at EMBO
AI InScide Out, Advances, challenges, and issues for AI in Science EMBL, Heidelberg, DE |
In proceedings academic contributions for Astronomy
2016 | Giant Molecular Filaments in the Milky Way
VIALACTEA 2016: Pontificia Università S. Tommaso D’Aquino, Rome, IT Poster |
2016 | Giant Molecular Filaments in the Milky Way
Star Formation 2016, University of Exeter, UK Poster |
2015 | Column density distribution in molecular clouds with ATLASGAL
2nd Heidelberg Harvard workshop on Star Formation, CfA, Boston, MA, USA Talk |
2014 | Column density distribution in molecular clouds with ATLASGAL
ISM–SPP Student meeting, Kardinal-Doepfner Haus, Freising, DE Talk |
2011 | [CII] emission in the major axis of M33
III Congreso de Estudiantes de la Facultad de Física, La Laguna, ES Poster (Awarded with the best science poster of the conference) |
2010 | Fue Cassiopea a la luminaria observada por John Flamsteed?
Young European Radio Astronomers Conference, Manchester, UK Talk |
Data science in industry
2020 | Machine Learning in business language I (Supervised learning)
Workshops organized by CAMELOT Group |
2020 | Machine Learning in business language II (Unsupervised learning)
Workshops organized by CAMELOT Group |
2020 | Machine Learning in business language III (Deep learning)
Workshops organized by CAMELOT Group |
2020 | CADET: Camelot Data Extraction Tool and intelligent optical character recognition
Online Workshop |
2019 | Enterprise deployment: The ultimate data science challenge
Online workshop of AWS and CAMELOT Group |
Media, outreach, and teaching
2020-21 | Astronomy podcast: La cúpula
Four chapters with Dr. Francisco Parra-Rojas. ivoox |
2020 | Founder of Punto Vernal. Amateur astronomy company and YouTube channel.
YouTube Chanel |
2017 | Podcast: Primer exoplaneta similar a la tierra con atmósfera
Luciérnagas, Radiotelevisión diocesana ivoox |
2016 | Astronomy in elementary school Colegio PP Somascos, A Guarda, ES |
2014 | Teacher for astronomical lab course on stellar photometry
MPIA/Ruprecht Karls-Universität Heidelberg, Heidlberg, DE |
2012 | Teaching assistant
PIIISA: Research Investigation by Young Students in Science, Granada, ES |
2011 | Teaching assistant at IRAM 30m Summer School: Star formation near and far.
Puertollano, ES |
2010-12 | Outreach talks and guiding for visits to the Sierra Nevada optical and radio observatories Puertollano, ES |