Efficient algorithms for big data management

Thumbnail Image
Δρίτσας, Ηλίας
Journal Title
Journal ISSN
Volume Title
In the context of the doctoral research, I dealt with data management problems by developing methods and techniques that, on the one hand, maintain or improve the privacy and anonymity of users and, on the other hand, are efficient in terms of time and storage space for large volumes of databases. The research results of the work focus on the following: Evaluate the performance of queries in a large volume database using or not the Bloom Filter structure. Evaluate workload time, memory and disk usage of the Privacy Preserving Record Linkage (PPRL) problem in Hadoop MapReduce Framework. Methods of answering queries of nearest neighbors to spatio-temporal data (moving users trajectories) in order to preserve anonymity, where queries are applied to clustered or non-clustered data. The k anonymity method was used, where, the set of anonymity with which each moving object of the space-time database is being camouflaged, consists of its k nearest neighbors. The robustness of the method was quantified with a probability of 1/k and the effect of dimensionality and correlation of the data on the preservation of anonymity and privacy was studied. The above method was improved in terms of efficient storage of spatio-temporal data by applying queries of nearest neighbors to Hough transformed nonlinear trajectories of moving objects. The application of secure k-NN queries was evaluated in the GeoSpark environment. Sentiment Analysis on Twitter Data and Tourist Forecasting at Apache Spark
Bloom filters, Privacy preserving, K-NN queries, K-anonymity, Spatiotemporal batabases, Sentiment analysis, Twitter, Apache spark, Geospark