There are many challenges to maintain big data security and privacy in the distributed data storage. It is also a challenge to discover meaningful information from large quantities of data with unknown structure, which is beyond the abilities of human to manually process such a massive amount of data without machine assistance. The centralized data processor is unable to cope with the nature of distributed data sources. On the other hand, cloud architecture offers an efficient way to share and store data in a distributed cloud nodes. It is imperative to develop a system which utilizes the cloud infrastructure for parallel processing of distributed data. However, several technical challenges such as maintaining common data model and reasoning across data nodes need to be addresses before it can be used. In this talk, we explore the technology and applications as well as product development and markets, focusing on secure data storage and data fusion for knowledge discovery for distributed heterogeneous data sources in cloud environments. For data storage, we exclude the traditional data storage file by file and advocates new products that use fragment-based data storage technology, where fragment is a piece of data that discloses nothing about the data content while providing higher security and privacy protection than encryption. For data fusion, we discuss a number of important functionalities based on a machine learning approach for information extraction and discovery in cloud environments. It can advance the state-of-the-art fusion technique, and lead to several practical applications in commercial sectors. The approach provides an efficient way for information discovery and integration based on the parallel programming structure of MapReduce. Cloud nodes can further collaborate with each other via the iterative processing to improve the system performance. It can quickly retrieve useful information from massive datasets to obtain business opportunities.
Keynote Speech presented at Big Data Summit Oct. 10 2014.