Improve the performance of Text based Visual Question Answering

نوع: Type: thesis

مقطع: Segment: masters

عنوان: Title: Improve the performance of Text based Visual Question Answering

ارائه دهنده: Provider: kobra farshidi

اساتید راهنما: Supervisors: Dr Hassan Khotanlou‬,Dr ‪Muharram Mansoorizadeh

اساتید مشاور: Advisory Professors:

اساتید ممتحن یا داور: Examining professors or referees: Dr Mir Hossein Dezfoulian,Dr ‪Reza Mohammadi

زمان و تاریخ ارائه: Time and date of presentation: 19/02/2023

مکان ارائه: Place of presentation: amphitheater

چکیده: Abstract: In recet years, researchers came to this conclution, to find solution so that when the use asks a question in the form of text from the component inside the image, it will lead the user to the desired answer with ideal methods.researchers presented models suitable for this field, which achieved success. Of cource, despite efforts on different architectures, this area has not yet reached a high level of accuracy that can be used in general. In 2019, researchers found that most of questions asked from the image were from the analysis of the text inside the image(for example, having an image of several books, a question could be asked about the writing on the cover of the book, or the writing on it, a question would be asked, or ask the image of the signs and boards in the strees about name of the particular store.) researchers tried to present a new field called asking the textes inside the image to analyse and investigate this field further. Their efforts have been from 3 dimensions. On the one hand, it was the creation of models and architurces to improve the accuracy of this field day by day emerging architecture. From the other side, they tried to create related datasets with different methods that pay more attention to the text inside the images. From the other side, they calculated evaluation criteria to achieve better accuracy in this filed. The researcher came to the conclusion that to check these models, first of all, they should extract the text inside the image. For this, they used the most up-to-date character recognition engine, but since the question asked requires a connection between the question asked and a combination of visual object and token in the image, so they used multifaced models because the image had to be processed as well and integrated text processing synchronously and at the dame time and reached the best answer. During the last 3 years, they have used the latest technologies and presented many different models. The results show that the accuracy of this fhild is increasing day by day with the advancement of models and the use of more relevant datasets, but more effort is still needed in this field until it reaches acceptable accuracy and use in the industry and the public

فایل: ّFile: Download فایل