一种基于子空间的两阶段离群点检测算法A Two-Phase Outliers Detection Algorithm Based on Subspace
殷跃杰,赵旭俊
摘要(Abstract):
将高维数据投影在子空间中,是解决"维灾"的有效途径之一。从提高挖掘效率的角度,给出一种新的基于子空间的两阶段离群检测算法,利用密度阈值筛选候选离群对象减少计算量。该算法首先,计算每个数据对象在每一维的密度比,所有维的密度比乘积取对数平均作为密度系数,并选取候选离群对象;其次,候选离群对象的邻居在每一个关联子空间中的偏离程度之积作为偏差比,密度系数与偏差比的乘积作为离群系数,并确定离群数据对象。由于仅计算候选离群对象的离群系数,因此有效地提高挖掘效率;最后,采用UCI数据集,实验验证了该算法不仅保证挖掘结果精度,而且有效提高了挖掘效率。
关键词(KeyWords): 离群点检测;高维;投影;关联子空间
基金项目(Foundation):
作者(Author): 殷跃杰,赵旭俊
参考文献(References):
- [1]BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identifying density-based local outliers[C]//ACM sigmod record.ACM,2000,29(2):93-104.
- [2]PHAM N,PAGH R.A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data[C]//Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM,2012:877-885.
- [3]乔少杰,唐常杰,彭京,等.基于个性特征仿真邮件分析系统挖掘犯罪网络核心[J].计算机学报,2008,31(10):1795-1803.
- [4]SEQUEIRA K,ZAKI M.ADMIT:anomaly-based data mining for intrusions[C]//Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining,ACM,2002:386-395.
- [5]BARNETT V,LEWIS T.Outliers in statistical data[M].Ind ed.Chichester:Wiley,1984.
- [6]PAPADINITRIOU S,KITAGAWA H,GIBBONS P B,et al.Loci:Fast outlier detection using the local correlation integral[C]//Data Engineering,2003.Proceedings.19th International Conference on IEEE,2003:315-326.
- [7]刘爱琴,荀亚玲.基于属性熵和加权余弦相似度的离群算法[J].太原科技大学学报,2014,35(3):171-175.
- [8]KRIEGEL H,KROGER P,SCHUBERT E,et al.Outlier detection in arbitrarily oriented subspaces[C]//Data Mining(ICDM),2012 IEEE 12th International Conference on IEEE,2012:379-388.
- [9]KELLER F,MULLER E,BOHM K.Hi CS:high contrast subspaces for density-based outlier ranking[C]//Data Engineering(ICDE),2012 IEEE 28th International Conference on IEEE,2012:1037-1048.
- [10]MULLER E,SCHIFFER M,SEIDL T.Statistical selection of relevant subspace projections for outlier ranking[C]//Data Engineering(ICDE),2011 IEEE 27th International Conference on.IEEE,2011:434-445.
- [11]DANG X H,ASSENT I,NG R T,et al.Discriminative features for identifying and interpreting outliers[C]//Data Engineering(ICDE),2014 IEEE 30th International Conference on IEEE,2014:88-99.
- [12]HIDO S,TSUBOI Y,KASHIMA H,et al.Statistical outlier detection using direct density ratio estimation[J].Knowledge and information systems,2011,26(2):309-336.
- [13]MULLER E,ASSENT I,KRIEGER R,et al.Dens Est:Density Estimation for Data Mining in High Dimensional Spaces[C]//SDM.2009:175-186.
- [14]KRIEGEL H P,KROGER P,SCHUBERT E,et al.Lo OP:local outlier probabilities[C]//Proceedings of the 18th ACM conference on Information and knowledge management.ACM,2009:1649-1652.
- [15]张继福,蒋义勇,胡立华,等.基于概念格的天体光谱离群数据识别方法[J].自动化学报,2008,34(9):1060-1066.
- [16]AGGARWAL C,YU S.An effective and efficient algorithm for high-dimensional outlier detection[J].The VLDB Journal-The International Journal on Very Large Data Bases,2005,14(2):211-221.
- [17]KRIEGEL H,KROGER P,Schubert E,et al.Outlier detection in arbitrarily oriented subspaces[C]//Data Mining(ICDM),2012 IEEE 12th International Conference on.IEEE,2012:379-388.
- [18]BAO Z,KAMEYAMA W.Two phases outlier detection in different subspaces[C]//Proceedings of the 7th Workshop on Ph.D Students.ACM,2014:57-62.