Towards Automated Machine Learning: Hyperparameter Optimization in Online Clustering

dc.contributor.advisorEl Shawi, Radwa, juhendaja
dc.contributor.authorRozgonjuk, Dmitri
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-10-20T11:53:46Z
dc.date.available2023-10-20T11:53:46Z
dc.date.issued2023
dc.description.abstractMachine Learning (ML) has demonstrated significant potential in data-driven applications, particularly in real-time use cases through online ML, which processes data streams and handles concept drift (changes in data distribution) dynamically. Automated ML (AutoML) seeks to streamline ML pipeline tasks like hyperparameter optimization (HPO) and model selection for improved performance. While some efforts have been made to integrate online ML and AutoML, research on automated online clustering remains limited. This thesis focuses on developing a potential HPO solution in online clustering settings. The aim was to propose an ensemble-based approach that leverages more than one internal clustering validation index (CVI) to address the evaluation problem in online clustering. HPO was implemented on top of the river framework. To compare the performance of HPO in online clustering, two online clustering algorithms were used on six synthetic datasets with ground truth labels. In HPO, models were separately optimized towards two internal CVIs, the Silhouette score and the Calinski-Harabasz Index, and models were compared by using an external CVI, the Adjusted Rand Index. In the experiments, (a) default online clustering algorithms with default parameters, (b) the best optimized online clustering algorithms, and (c) the ensemble of the best optimized models were compared. The findings revealed that the efficacy of HPO varies depending on the data type. In k-centroid-based datasets, the Silhouette-optimized model and the ensemble model outperformed other clustering solutions, while HPO and ensembling did not yield superior results in S-curve datasets.et
dc.identifier.urihttps://hdl.handle.net/10062/93651
dc.language.isoenget
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectautoMLet
dc.subjectonline MLet
dc.subjectonline clusteringet
dc.subjecthyperoptet
dc.subjectriveret
dc.subject.othermagistritöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titleTowards Automated Machine Learning: Hyperparameter Optimization in Online Clusteringet
dc.typeThesiset

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
MSc_thesis_Rozgonjuk.pdf
Size:
1.4 MB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
online_autoclust_hpo.zip
Size:
768.67 KB
Format:
Compressed ZIP
Description:
Lisad

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: