Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–19 of 19 results for author: Sharifzadeh, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.07750  [pdf, other

    cs.CV cs.AI

    Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings

    Authors: Sahand Sharifzadeh, Christos Kaplanis, Shreya Pathak, Dharshan Kumaran, Anastasija Ilic, Jovana Mitrovic, Charles Blundell, Andrea Banino

    Abstract: The creation of high-quality human-labeled image-caption datasets presents a significant bottleneck in the development of Visual-Language Models (VLMs). In this work, we investigate an approach that leverages the strengths of Large Language Models (LLMs) and image generation models to create synthetic image-text pairs for efficient and effective VLM training. Our method employs a pretrained text-t… ▽ More

    Submitted 7 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 9 pages, 6 figures

  2. arXiv:2402.13653  [pdf, other

    cs.LG

    PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models

    Authors: Eli M Carrami, Sahand Sharifzadeh

    Abstract: We introduce the novel task of zero-shot Protein Question Answering (PQA) for free-form scientific enquiry. Given a previously unseen protein sequence and a natural language question, the task is to deliver a scientifically accurate answer. This task not only supports future biological research, but could also provide a test bed for assessing the scientific precision of large language models (LLMs… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  3. arXiv:2305.18391  [pdf, other

    cs.LG cs.CL cs.CV

    MemeGraphs: Linking Memes to Knowledge Graphs

    Authors: Vasiliki Kougia, Simon Fetzel, Thomas Kirchmair, Erion Çano, Sina Moayed Baharlou, Sahand Sharifzadeh, Benjamin Roth

    Abstract: Memes are a popular form of communicating trends and ideas in social media and on the internet in general, combining the modalities of images and text. They can express humor and sarcasm but can also have offensive content. Analyzing and classifying memes automatically is challenging since their interpretation relies on the understanding of visual elements, language, and background knowledge. Thus… ▽ More

    Submitted 26 June, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

  4. arXiv:2303.13818  [pdf, other

    cs.CV

    Prior-RadGraphFormer: A Prior-Knowledge-Enhanced Transformer for Generating Radiology Graphs from X-Rays

    Authors: Yiheng Xiong, Jingsong Liu, Kamilia Zaripova, Sahand Sharifzadeh, Matthias Keicher, Nassir Navab

    Abstract: The extraction of structured clinical information from free-text radiology reports in the form of radiology graphs has been demonstrated to be a valuable approach for evaluating the clinical correctness of report-generation methods. However, the direct generation of radiology graphs from chest X-ray (CXR) images has not been attempted. To address this gap, we propose a novel approach called Prior-… ▽ More

    Submitted 18 September, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: In GRAIL @ MICCAI 2023

  5. arXiv:2303.08046  [pdf, other

    physics.ins-det cs.AI cs.CV hep-ph physics.data-an

    Ultra-High-Resolution Detector Simulation with Intra-Event Aware GAN and Self-Supervised Relational Reasoning

    Authors: Baran Hashemi, Nikolai Hartmann, Sahand Sharifzadeh, James Kahn, Thomas Kuhr

    Abstract: Simulating high-resolution detector responses is a storage-costly and computationally intensive process that has long been challenging in particle physics. Despite the ability of deep generative models to make this process more cost-efficient, ultra-high-resolution detector simulation still proves to be difficult as it contains correlated and fine-grained mutual information within an event. To o… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  6. arXiv:2212.12249  [pdf, other

    cs.CV cs.LG

    Do DALL-E and Flamingo Understand Each Other?

    Authors: Hang Li, Jindong Gu, Rajat Koner, Sahand Sharifzadeh, Volker Tresp

    Abstract: The field of multimodal research focusing on the comprehension and creation of both images and text has witnessed significant strides. This progress is exemplified by the emergence of sophisticated models dedicated to image captioning at scale, such as the notable Flamingo model and text-to-image generative models, with DALL-E serving as a prominent example. An interesting question worth exploring… ▽ More

    Submitted 18 August, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

    Comments: Accepted to ICCV 2023

  7. arXiv:2208.10547  [pdf, other

    cs.CV

    InstanceFormer: An Online Video Instance Segmentation Framework

    Authors: Rajat Koner, Tanveer Hannan, Suprosanna Shit, Sahand Sharifzadeh, Matthias Schubert, Thomas Seidl, Volker Tresp

    Abstract: Recent transformer-based offline video instance segmentation (VIS) approaches achieve encouraging results and significantly outperform online approaches. However, their reliance on the whole video and the immense computational complexity caused by full Spatio-temporal attention limit them in real-life applications such as processing lengthy videos. In this paper, we propose a single-stage transfor… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Report number: InstanceFormer:08-22

    Journal ref: Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-2023)

  8. arXiv:2204.14198  [pdf, other

    cs.CV cs.AI cs.LG

    Flamingo: a Visual Language Model for Few-Shot Learning

    Authors: Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals , et al. (2 additional authors not shown)

    Abstract: Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily i… ▽ More

    Submitted 15 November, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: 54 pages. In Proceedings of Neural Information Processing Systems (NeurIPS) 2022

  9. arXiv:2203.10202  [pdf, other

    cs.CV

    Relationformer: A Unified Framework for Image-to-Graph Generation

    Authors: Suprosanna Shit, Rajat Koner, Bastian Wittmann, Johannes Paetzold, Ivan Ezhov, Hongwei Li, Jiazhen Pan, Sahand Sharifzadeh, Georgios Kaissis, Volker Tresp, Bjoern Menze

    Abstract: A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which pr… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

  10. arXiv:2109.13392  [pdf, other

    cs.AI

    The Tensor Brain: A Unified Theory of Perception, Memory and Semantic Decoding

    Authors: Volker Tresp, Sahand Sharifzadeh, Hang Li, Dario Konopatzki, Yunpu Ma

    Abstract: We present a unified computational theory of an agent's perception and memory. In our model, perception, episodic memory, and semantic memory are realized by different operational modes of the oscillating interactions between a symbolic index layer and a subsymbolic representation layer. The two layers form a bilayer tensor network (BTN). Although memory appears to be about the past, its main purp… ▽ More

    Submitted 22 January, 2023; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: Neural Computation, Volume 35, Issue 2, February 2023

    MSC Class: 68T10 ACM Class: I.2.6; I.2.10

    Journal ref: Neural Computation, Volume 35, Issue 2, February 2023

  11. arXiv:2103.10478  [pdf, other

    cs.LG

    Unsupervised Doppler Radar-Based Activity Recognition for e-Healthcare

    Authors: Yordanka Karayaneva, Sara Sharifzadeh, Wenda Li, Yanguo Jing, Bo Tan

    Abstract: Passive radio frequency (RF) sensing and monitoring of human daily activities in elderly care homes is an emerging topic. Micro-Doppler radars are an appealing solution considering their non-intrusiveness, deep penetration, and high-distance range. Unsupervised activity recognition using Doppler radar data has not received attention, in spite of its importance in case of unlabelled or poorly label… ▽ More

    Submitted 2 November, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

  12. arXiv:2102.04760  [pdf, other

    cs.CV cs.AI

    Improving Scene Graph Classification by Exploiting Knowledge from Texts

    Authors: Sahand Sharifzadeh, Sina Moayed Baharlou, Martin Schmitt, Hinrich Schütze, Volker Tresp

    Abstract: Training scene graph classification models requires a large amount of annotated image data. Meanwhile, scene graphs represent relational knowledge that can be modeled with symbolic data from texts or knowledge graphs. While image annotation demands extensive labor, collecting textual descriptions of natural scenes requires less effort. In this work, we investigate whether textual scene description… ▽ More

    Submitted 8 October, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

  13. arXiv:2011.10084  [pdf, other

    cs.CV cs.LG

    Classification by Attention: Scene Graph Classification with Prior Knowledge

    Authors: Sahand Sharifzadeh, Sina Moayed Baharlou, Volker Tresp

    Abstract: A major challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another. Previous works have addressed this by relational reasoning over all objects in an image or incorporating prior knowledge into classification. Unlike previous works, we do not consider separate models for perception and prior knowledge. Instead, w… ▽ More

    Submitted 17 December, 2020; v1 submitted 19 November, 2020; originally announced November 2020.

    Comments: Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-2021)

  14. arXiv:2007.14175  [pdf, ps, other

    cs.LG cs.AI stat.ML

    PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings

    Authors: Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Sahand Sharifzadeh, Volker Tresp, Jens Lehmann

    Abstract: Recently, knowledge graph embeddings (KGEs) received significant attention, and several software libraries have been developed for training and evaluating KGEs. While each of them addresses specific needs, we re-designed and re-implemented PyKEEN, one of the first KGE libraries, in a community effort. PyKEEN 1.0 enables users to compose knowledge graph embedding models (KGEMs) based on a wide rang… ▽ More

    Submitted 30 July, 2020; v1 submitted 28 July, 2020; originally announced July 2020.

  15. arXiv:2006.13365  [pdf, other

    cs.LG cs.AI stat.ML

    Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework

    Authors: Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Mikhail Galkin, Sahand Sharifzadeh, Asja Fischer, Volker Tresp, Jens Lehmann

    Abstract: The heterogeneity in recently published knowledge graph embedding models' implementations, training, and evaluation has made fair and thorough comparisons difficult. In order to assess the reproducibility of previously published results, we re-implemented and evaluated 21 interaction models in the PyKEEN software package. Here, we outline which results could be reproduced with their reported hyper… ▽ More

    Submitted 1 November, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

  16. arXiv:2001.11027  [pdf, other

    cs.AI cs.CV cs.LG stat.ML

    The Tensor Brain: Semantic Decoding for Perception and Memory

    Authors: Volker Tresp, Sahand Sharifzadeh, Dario Konopatzki, Yunpu Ma

    Abstract: We analyse perception and memory, using mathematical models for knowledge graphs and tensors, to gain insights into the corresponding functionalities of the human mind. Our discussion is based on the concept of propositional sentences consisting of \textit{subject-predicate-object} (SPO) triples for expressing elementary facts. SPO sentences are the basis for most natural languages but might also… ▽ More

    Submitted 10 February, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

  17. arXiv:1905.00966  [pdf, other

    cs.CV cs.AI

    Improving Visual Relation Detection using Depth Maps

    Authors: Sahand Sharifzadeh, Sina Moayed Baharlou, Max Berrendorf, Rajat Koner, Volker Tresp

    Abstract: Visual relation detection methods rely on object information extracted from RGB images such as 2D bounding boxes, feature maps, and predicted class probabilities. We argue that depth maps can additionally provide valuable information on object relations, e.g. helping to detect not only spatial relations, such as standing behind, but also non-spatial relations, such as holding. In this work, we stu… ▽ More

    Submitted 17 October, 2020; v1 submitted 2 May, 2019; originally announced May 2019.

    Comments: International Conference on Pattern Recognition 2020

  18. arXiv:1904.09447  [pdf, other

    cs.CL cs.AI

    An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing

    Authors: Martin Schmitt, Sahand Sharifzadeh, Volker Tresp, Hinrich Schütze

    Abstract: Knowledge graphs (KGs) can vary greatly from one domain to another. Therefore supervised approaches to both graph-to-text generation and text-to-graph knowledge extraction (semantic parsing) will always suffer from a shortage of domain-specific parallel graph-text data; at the same time, adapting a model trained on a different domain is often impossible due to little or no overlap in entities and… ▽ More

    Submitted 17 November, 2020; v1 submitted 20 April, 2019; originally announced April 2019.

    Comments: Accepted as long paper to EMNLP 2020

  19. arXiv:1612.03653  [pdf, other

    cs.AI cs.RO

    Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks

    Authors: Sahand Sharifzadeh, Ioannis Chiotellis, Rudolph Triebel, Daniel Cremers

    Abstract: We propose an inverse reinforcement learning (IRL) approach using Deep Q-Networks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that,… ▽ More

    Submitted 21 September, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

    Comments: NIPS workshop on Deep Learning for Action and Interaction, 2016