The Combination and Fusion of Text and Image for Multimedia Document Retrieval using Neural Networks - دانشکده فنی و مهندسی
The Combination and Fusion of Text and Image for Multimedia Document Retrieval using Neural Networks
نوع: Type: thesis
مقطع: Segment: Masters
عنوان: Title: The Combination and Fusion of Text and Image for Multimedia Document Retrieval using Neural Networks
ارائه دهنده: Provider: Mohammad Moradli
اساتید راهنما: Supervisors: Dr. MirHossein Dezfiulian
اساتید مشاور: Advisory Professors: Muharram Mansoorizadeh
اساتید ممتحن یا داور: Examining professors or referees: Dr. Mahdi Sakhaeinia, Dr. Mahdi Abbasi
زمان و تاریخ ارائه: Time and date of presentation: 22.9.2021 , 4:00pm
مکان ارائه: Place of presentation: virtual conference
چکیده: Abstract: In the last decade, due to the rapid growth of multimedia information, the need to retrieve multimedia documents has increased. Retrieving multimedia documents means finding the closest samples from the available information to the query sample. These samples can be of different data types. In this research, two types of text and image have been used. The challenging part is the semantic gap between different data types, which makes it difficult to calculate the similarity between data modality. In the proposed model for calculating similarity, first the necessary preprocesses are performed on the raw text, then BERT network extracts text's feature vector. In parallel, the VGGNet16 extracts image's feature vector. Then these feature vectors are passed on to the GCN to learn intra-modality similarity. In the next step, the output of the GCN network is given to a Siamese network with two subnets to learn inter-modality correlation. Finally, the samples are mapped in the hamming space as specified length hash codes. This structure is learnt end-to-end using an error function that minimizes the distance of similar entities in the hamming space. In this research, the Wikipedia dataset has been used with semi-supervised setting. Examination of the obtained results shows that the proposed structure has achieved good accuracy compared to previous modelsl
فایل: ّFile: Download فایل