The Combination and Fusion of Text and Image for Multimedia Document Retrieval using Neural Networks

نوع: Type: thesis

مقطع: Segment: Masters

عنوان: Title: The Combination and Fusion of Text and Image for Multimedia Document Retrieval using Neural Networks

ارائه دهنده: Provider: Mohammad Moradli

اساتید راهنما: Supervisors: Dr. MirHossein Dezfiulian

اساتید مشاور: Advisory Professors: Muharram Mansoorizadeh

اساتید ممتحن یا داور: Examining professors or referees: Dr. Mahdi Sakhaeinia, Dr. Mahdi Abbasi

زمان و تاریخ ارائه: Time and date of presentation: 22.9.2021 , 4:00pm

مکان ارائه: Place of presentation: virtual conference

چکیده: Abstract: In the last decade, due to the rapid growth of multimedia information, the need to retrieve multimedia documents has increased. Retrieving multimedia documents means finding the closest samples from the available information to the query sample. These samples can be of different data types. In this research, two types of text and image have been used. The challenging part is the semantic gap between different data types, which makes it difficult to calculate the similarity between data modality. In the proposed model for calculating similarity, first the necessary preprocesses are performed on the raw text, then BERT network extracts text's feature vector. In parallel, the VGGNet16 extracts image's feature vector. Then these feature vectors are passed on to the GCN to learn intra-modality similarity. In the next step, the output of the GCN network is given to a Siamese network with two subnets to learn inter-modality correlation. Finally, the samples are mapped in the hamming space as specified length hash codes. This structure is learnt end-to-end using an error function that minimizes the distance of similar entities in the hamming space. In this research, the Wikipedia dataset has been used with semi-supervised setting. Examination of the obtained results shows that the proposed structure has achieved good accuracy compared to previous modelsl

فایل: ّFile: Download فایل