Outlier detection in data stream using clustering

نوع: Type: thesis

مقطع: Segment: masters

عنوان: Title: Outlier detection in data stream using clustering

ارائه دهنده: Provider: maede motallebi

اساتید راهنما: Supervisors: Dr.yousef sanati

اساتید مشاور: Advisory Professors:

اساتید ممتحن یا داور: Examining professors or referees: Dr.sakhayi niya,Dr.nosrati

زمان و تاریخ ارائه: Time and date of presentation: 2024

مکان ارائه: Place of presentation: Faculty of Engineering

چکیده: Abstract: In today’s world, there is a vast amount of data containing various interesting patterns that sometimes occur regularly within the data. Discovering such patterns can help in identifying outliers, which are unexpected data points scattered throughout. As more data is generated, the likelihood of finding newer patterns and outliers increases, making the use of big data processing methods beneficial due to the sheer volume and rapid generation of data. Generating more data can lead to the creation of data streams that emerge continuously, instantly, and sequentially, needing exploitation as storing all this data practically becomes unfeasible. Data streams are observed in certain domains such as sensor networks, traffic management, and social networks. The data within data streams can contain valuable knowledge that requires various processing techniques to uncover, one of which is stream data clustering. This processing falls under the realm of big data mining. To discover hidden knowledge within big data using data mining methods, a crucial stage called preprocessing is necessary. In the preprocessing stage, data is prepared for data mining. For instance, in this stage, outliers or missing data are removed or corrected. Hence, the preprocessing stage is highly important where tasks like identifying outliers and other anomalies in the data need to be addressed. Therefore, the existence of preprocessing techniques to extract useful knowledge from all generated data is essential. The goal of using these techniques is to reduce complexities present in real-world data, enabling data mining methods to be more effective in pattern extraction, speeding up the learning process, making the raw data structure more understandable for data mining algorithms. In data mining, there are various methods, one of which is clustering. Clustering, as a technique in machine learning, attempts to group data based on its inherent characteristics into different clusters. In general, data stream clustering involves dividing data into homogeneous groups in a way that maximizes similarity among observations within each group and minimizes similarity between members of different groups. This type of processing can also be used to detect outliers. In the proposed method, the STARE algorithm is utilized. In this research, efforts have been made to enhance the STARE algorithm. Using DBSCAN clustering as a preprocessing step can improve the algorithm’s accuracy.

فایل: ّFile: Download فایل