Efficient algorithms and novel big data management techniques and their applications in ubiquitous computing

Thumbnail Image
Βονιτσάνος, Γεράσιμος
Journal Title
Journal ISSN
Volume Title
The thesis explores the pivotal role of data mining in computer science and data analysis, particularly in the context of the 4th Industrial Revolution. With the exponential increase in data generation from various sources like databases, mobile devices, and social media, the demand for effective data mining tools has become imperative. Data mining aims to unveil patterns and insights from large, complex, and structured datasets, enabling the identification of often hidden trends and interactions. This thesis covers diverse domains where data mining is applied, ranging from business analysis and bioinformatics to financial forecasting and sentiment analysis. It explores into clustering, classification, and anomaly detection algorithms, harnessing data analysis tools and visualization techniques for presenting findings. One key focus of the research is the application of data mining techniques using Apache Spark, specifically addressing challenges posed by heterogeneous and semi-structured data. The architecture of Apache Spark is leveraged for data management and analysis. Real-time information retrieval from cultural content is emphasized through extensive dataset analysis, leading to customized content for users and improved engagement. The adoption of Apache Spark ensures efficient processing and analysis of massive data volumes, utilizing its streaming architecture for managing data streams. The study validates the proposed approach with Twitter data, employing Apache Spark streaming for real-time cultural content analysis. The thesis further explores Collaborative Filtering (CF) technique for recommendation systems, extending its application to higher-order systems using GeoSpark. This technique enhances understanding of user behavior by gathering inputs from varying distances. Another significant aspect of the research involves the utilization of GeoSpark for managing and analyzing spatiotemporal data. By employing methods like Decision Trees and Random Forests, the study aims to extract insights from spatiotemporal data while focusing on privacy management. The thesis also investigates preprocessing of documents for analysis, utilizing the Term FrequencyInverse Document Frequency (TF-IDF) approach to create representative vectors. Furthermore, it presents predictive modeling for stock movements and explores the integration of emotional information from Twitter using Apache Spark. Chapters explore into diverse applications like community detection algorithms, protein structure prediction, and genetic variations analysis. The application of data mining in movie recommendations and understanding cryptocurrency sentiment through Twitter data is also discussed.
Algorithms, Big data, Machine learning, Data structures