1. Hadoop Map Reduce for Mobile Clouds

The new generations of mobile devices have high processing power and storage, but they lag behind in terms ofsoftware systems for big data storage and processing. Hadoop is a scalable platform that provides distributed storage andcomputational capabilities on clusters of commodity hardware. Building Hadoop on a mobile network enables the devices torun data intensive computing applications without direct knowledge of underlying distributed systems complexities. However,these applications have severe energy and reliability constraints (e.g., caused by unexpected device failures or topology changesin a dynamic network). As mobile devices are more susceptible to unauthorized access, when compared to traditional servers,security is also a concern for sensitive data. Hence, it is paramount to consider reliability, energy efficiency and security for suchapplications. The MDFS (Mobile Distributed File System) [1] addresses these issues for big data processing in mobile clouds. Wehave developed the Hadoop MapReduce framework over MDFS and have studied its performance by varying input workloads ina real heterogeneous mobile cluster. Our evaluation shows that the implementation addresses all constraints in processing largeamounts of data in mobile clouds. Thus, our system is a viable solution to meet the growing demands of data processing in amobile environment.

2. Practical Privacy-Preserving MapReduce Based K-means Clustering over Large-scale Dataset

Clustering techniques have been widely adopted in many real world data analysis applications, such as customer behavior analysis, targeted marketing, digital forensics, etc. With the explosion of data in today’s big data era, a major trend to handle a clustering over large-scale datasets is outsourcing it to public cloud platforms. This is because cloud computing offers not only reliable services with performance guarantees, but also savings on in-house IT infrastructures. However, as datasets used for clustering may contain sensitive information, e.g., patient health information, commercial data, and behavioral data, etc, directly outsourcing them to public cloud servers inevitably raise privacy concerns. In this paper, we propose a practical privacy-preserving Kmeans clustering scheme that can be efficiently outsourced to cloud servers. Our scheme allows cloud servers to perform clustering directly over encrypted datasets, while achieving comparable computational complexity and accuracy compared with clusterings over unencrypted ones. We also investigate secure integration of MapReduce into our scheme, which makes our scheme extremely suitable for cloud computing environment. Thorough security analysis and numerical analysis carry out the performance of our scheme in terms of security and efficiency. Experimental evaluation over a 5 million objects dataset further validates the practical performance of our scheme.

3. Robust Big Data Analytics for Electricity Price Forecasting in the Smart Grid

Electricity price forecasting is a significant part of smart grid because it makes smart grid cost efficient. Nevertheless,existing methods for price forecasting may be difficult to handle with huge price data in the grid, since the redundancy from featureselection cannot be averted and an integrated infrastructure is also lacked for coordinating the procedures in electricity priceforecasting. To solve such a problem, a novel electricity price forecasting model is developed. Specifically, three modules are integratedin the proposed model. First, by merging of Random Forest (RF) and Relief-F algorithm, we propose a hybrid feature selector based onGrey Correlation Analysis (GCA) to eliminate the feature redundancy. Second, an integration of Kernel function and PrincipleComponent Analysis (KPCA) is used in feature extraction process to realize the dimensionality reduction. Finally, to forecast priceclassification, we put forward a differential evolution (DE) based Support Vector Machine (SVM) classifier. Our proposed electricityprice forecasting model is realized via these three parts. Numerical results show that our proposal has superior performance than othermethods.





