Automatic image captioning with deep neural networks

نوع: Type: thesis

مقطع: Segment: masters

عنوان: Title: Automatic image captioning with deep neural networks

ارائه دهنده: Provider: Elham Heidari

اساتید راهنما: Supervisors: Phd. MirHossein Dezfoulian

اساتید مشاور: Advisory Professors: Phd. Muharram MansooriZadeh

اساتید ممتحن یا داور: Examining professors or referees: Phd. Mehdi Abbasi Phd. Reza Mohammadi

زمان و تاریخ ارائه: Time and date of presentation: 13 March 2021 13:00

مکان ارائه: Place of presentation: http://vc.basu.ac.ir/eng-thesis04

چکیده: Abstract: In practical applications of machine vision and language underestanding, accurate image display is of great importance. Most current systems use visual features and textual concepts as an outline of the image. However, purely inferential representations are usually undesirable in that they are composed of separate components and the relationships between them are incalculable. In addition, they cannot place important concepts of the image in the captions produced. In this paper, an iterative process is proposed to achieve the caption. We process input images with a set of visual areas and corresponding textual concepts that reflect specific semantic concepts. For this purpose, we create two attention modules that integrate visual features and textual concepts extracted from the image by reciprocal updating, respectively. The output of the previous two modules is sent to the language model and this iterative process continues until the desired captions is reached. Genetic algorithm has been used to optimally select the model hyperparameters. The experiments were performed on the MS COCO dataset. The results show that our method is effective and converges very quickly. The proposed model can be generalized to a wide range of models for image and language applications.

فایل: ّFile: Download فایل