基于邻域粗糙集的在线流特征选择Online Streaming Feature Selection based on Neighborhood Rough Sets
蔡晶晶,荀亚玲,王林青,贺慧爱,孙晶晶
摘要(Abstract):
针对现有基于粗糙集的在线流特征选择算法侧重于处理不相关和冗余特征,忽略了特征与特征之间的相关性,以及缺乏有效的动态更新机制,提出一种基于相关性邻域粗糙集的在线流特征选择算法OFSGN.使用欧氏距离计算特征之间的相似性,通过一种新定义的邻域粗糙集方法获取邻域集合,结合了特征与特征和标签之间的相关性,通过在线重要度分析和在线冗余度分析获取了最优特征子集。大量实验结果表明了该特征选择算法的有效性,且该算法在准确率方面具有明显的优势。
关键词(KeyWords): 粗糙集;在线流;特征选择;邻域粗糙集;相关性
基金项目(Foundation): 国家自然科学基金(62272336);; 山西省研究生教育创新项目(2022Y699Z)
作者(Author): 蔡晶晶,荀亚玲,王林青,贺慧爱,孙晶晶
参考文献(References):
- [1] YAN X,NAZMI S,EROL B A,et al.An efficient unsupervised feature selection procedure through feature clustering[J].Pattern Recognition Letters,2020,131(3):277-284.
- [2] WANG L,JIANG S,JIANG S.A feature selection method via analysis of relevance,redundancy,and interaction[J].Expert Systems with Applications,2021,183:115365.1-115365.11.
- [3] KARL Y M,CRISTINA S-R,FILIPPO,et al.Noisy multi-label semi- supervised dimensionality reduction[J].Pattern Recognition,2019,90(1):257- 270.
- [4] 蔡江辉,杨雨晴.大数据分析及处理综述[J].太原科技大学学报,2020,41(6):417-424.
- [5] 段洁,胡清华,张灵均,等.基于邻域粗糙集的多标记分类特征选择算法[J].计算机研究与发展,2015,52(1):56-65.
- [6] ZHOU J,DEAN P F,STINE R A,et al.Streamwise feature selection[J].Journal of Machine Learning Research,2006,7(1):1861-1885.
- [7] WU X,YU K,DING W,et al.Online feature selection with streaming features[J].IEEE transactions on pattern analysis and machine intelligence,2013,35(5):1178-1192.
- [8] HAUG J,PAWELCZYK M,BROELEMANN K,et al.Leveraging model inherent variable importance for stable online feature selection[C]//Proceedings of the 26th ACM SIGKDD International Conferenceon Knowledge Discovery & Data Mining,CA,USA,2020:1478-1502.
- [9] YU K ,WU X ,DING W ,et al.Scalable and Accurate Online Feature Selection for Big Data[J].ACM Transactions on Knowledge Discovery from Data (TKDD),2016,11(16):1-39.
- [10] WANG C Z ,SHII Y P ,FAN X D ,et al.Attribute reduction based on k-nearest neighborhood rough sets[J].International Journal of Approximate Reasoning,2019,106(3):18-31.
- [11] LIU J,LIN Y,LI Y,et al.Online multi-label streaming feature selection based on neighborhood rough set[J].pattern recognition,2018,84:273-287.
- [12] PENG Z A ,XHA B ,PL A ,et al.Online streaming feature selection using adapted Neighborhood Rough Set-Science Direct[J].Information Sciences,2019,481(5):258- 279.
- [13] ZHOU P ,HU X G,LI P P,et al.Online streaming feature selection using adapted Neighborhood Rough Set[J].Information Sciences,2019,481(12):258- 279.
- [14] RITAM G ,BITANU C ,SK KHALID HASSAN,et al.Py_FS:A python package for feature selection using meta-heuristic optimization algorithms[J].AISC,2022,1349 (16):495-504.
- [15] PARK S J,YOON B H,KIM S K,et al.GENT2:an updated gene expression database for normal and tumor tissues[J].BMC medical genomics,2019,12(5):1-8.