MapReduce编程模型下的上下文离群数据挖掘算法Contextual Outlier Mining Algorithm Based on MapReduce
杨海峰,于晓龙,荀亚玲,张继福
摘要(Abstract):
目前,对于离群数据挖掘算法研究颇多,但对于离群数据挖掘结果可理解性和可解释性的研究相对较少。采用相关子空间,给出一种MapReduce编程模型下的上下文离群数据挖掘算法。该算法利用局部稀疏差异度,确定相关子空间,并计算其数据对象离群因子值;将离群因子和相关属性维,定义为数据对象的上下文信息,提高了数据对象的可理解性;选取离群因子最大的N个数据对象,作为上下文离群数据;利用MapReduce编程模型,实现了一种上下文离群数据并行挖掘算法;最后,采用UCI数据集,实验验证了该算法的可解释性和有效性。
关键词(KeyWords): 离群数据;上下文信息;相关子空间;可理解性;MapReduce
基金项目(Foundation): 太原科技大学研究生科技创新项目(20151028)
作者(Author): 杨海峰,于晓龙,荀亚玲,张继福
参考文献(References):
- [1]KRIEGEL H P,KROGER P,SCHUBERT E.Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data[C]//Advances in Knowledge Discovery and Data Mining,Pacific-Asia Conference,PAKDD 2009,Bangkok,Thailand,2009:831-838.
- [2]KRIEGEL H P,KROGER P,SCHUBERT.Outlier detection in arbitrarily oriented subspaces[C]//Data Mining(ICDM),2012 IEEE 12th International Conference on IEEE,Belgium,Brussels,2012:379-388.
- [3]李永红.相关子空间中的局部离群数据挖掘及应用[D].太原:太原科技大学,2014.
- [4]GARULE S,SHINDE S M,GARULE S.Outliers Detection using Subspace Method:A Survey[J].International Journal of Computer Applications,2015,112(16):20-22.
- [5]张继福,李永红,秦啸.基于MapReduce与相关子空间的局部离群数据挖掘算法[J].软件学报,2015,26(5):1079-1095.
- [6]MULLER E,SCHIFFER M,SEIDL T.Statistical selection of relevant subspace projections for outlier ranking[C]//Data Engineering(ICDE),2011 IEEE 27th International Conference on.IEEE,Germany,Hannover,2011:434-445.
- [7]KELLER F,MULLER E,BOHM K.Hi CS:high contrast subspaces for density-based outlier ranking[C]//Data Engineering(ICDE),2012 IEEE 28th International Conference on IEEE,USA:Virginia,2012:1037-1048.
- [8]ZHANG Z P,SUN Y,FANG C Z.The Outlier Detection Algorithm Based on Cumulative Holoentropy in Clustering Subspace[J].2015,8(8):2249-2256.
- [9]BAO Z,KAMEYAMA W.Two Phases Outlier Detection in Different Subspaces[C]//Pikm'14 Proceedings of the,Workshop on Ph.d Students in CIKM,2014:57-62.
- [10]WANG X,DAVIDSON I,Discovering contexts and contextual outliers using random walks in graphs,in:Proceedings of Ninth IEEE International Conference on Data Mining,Shanghai,2009.
- [11]TANG G,PEI J.Mining multidimensional contextual outliers from categorical relational data[C]//Proceedings of the Scientific and Statistical Database Management’13,Baltimore,MD,USA,2013.
- [12]王也,张继福,赵旭俊.基于微粒群算法的上下文离群数据挖掘算法[J].太原科技大学学报,2015,36(5):327-332.