Investigating and solving data division and partitioning challenge in big data systems

Investigating and solving data division and partitioning challenge in big data systems


Investigating and solving data division and partitioning challenge in big data systems

نوع: Type: thesis

مقطع: Segment: masters

عنوان: Title: Investigating and solving data division and partitioning challenge in big data systems

ارائه دهنده: Provider: Zahra Amighi

اساتید راهنما: Supervisors: Morteza Yousef Sanati (Ph. D) - Mir Hosein Dezfoulian (Ph. D)

اساتید مشاور: Advisory Professors:

اساتید ممتحن یا داور: Examining professors or referees: Moharam Mansouri Zade (Ph. D) - Mahdi Sakhaei Nia (Ph. D)

زمان و تاریخ ارائه: Time and date of presentation: October,14,2020

مکان ارائه: Place of presentation:

چکیده: Abstract: Data stream is an unlimited sequence of data that is generated quickly and in high volume. Given such a definition, the processing of stream information as a single entity is very difficult and in some streams impossible. Therefore, methods have been developed that can process such data. One of the most common of these methods is called clustering, which can group similar information items into a number of groups. EvoStream is one of the data stream clustering algorithms that performs the final clustering using an evolutionary algorithm gradually in idle times. This algorithm, while creating competitive results with other algorithms in this field, effectively reduces the computational overhead of the offline phase. The number of clusters in the EvoStream algorithm is considered constant, while in the real data stream this number varies over time and depends on the complexity of the input data. On the other hand, since the time of onset of unemployment and the length of time mentioned do not follow a specific pattern, some evolutionary steps may not be completed, which reduces the quality of clusters due to the instability of the number of clusters. In order to solve these problems, this thesis presents a new algorithm that correctly detects the number of clusters and improving the quality of clusters, accelerates the implementation of the evolutionary stage up to four times.

فایل: ّFile: