当前位置: X-MOL 学术J. Ind. Inf. Integr. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Making data classification more effective: An automated deep forest model
Journal of Industrial Information Integration ( IF 10.4 ) Pub Date : 2024-11-10 , DOI: 10.1016/j.jii.2024.100738
Jingwei Guo, Xiang Guo, Yihui Tian, Hao Zhan, Zhen-Song Chen, Muhammet Deveci

Despite a small overfitting risk, the deep forest model and its variants cannot automatically match data features; they rely on manual experience and comparative experiments for forest learner selection. This study proposes an automated deep forest model (ATDF) to enhance deep forest automation by automatically determining forest learners’ types and numbers based on training data. The model introduces a forest learner variability measure based on normalized mutual information, serving as a theoretical foundation for the automated process in deep forests. Then, a novel hierarchical clustering algorithm based on normalized mutual information is proposed to group forest learners at different granularities, determining the optimal forest learner type. This advanced technical method enables the determination of the model structure for stacking models, including deep forests. Finally, with the goal of maximizing cross-validation scores, the tree parson estimator-based Bayesian optimization algorithm determines the ideal number of forest learners for each type. Additionally, a standardized method for identifying forest learners is developed to guarantee the consistency of model outcomes. Most importantly, a series of comparative experiments on seven datasets from the UCI Machine Learning Repository confirmed the effectiveness and superiority of the proposed model. The results demonstrate that the proposed model has superior adaptability to new data and tasks, besides having a high level of automation, and performs excellently in the classification task.

中文翻译:


使数据分类更有效:自动化的深层森林模型



尽管过拟合风险较小,但 deep forest 模型及其变体无法自动匹配数据特征;他们依靠人工经验和比较实验来选择森林学习者。本研究提出了一种自动化的深部森林模型 (ATDF),通过根据训练数据自动确定森林学习者的类型和数量来增强深部森林自动化。该模型引入了基于归一化互信息的森林学习器变异性测量,作为深林自动化过程的理论基础。然后,提出了一种基于归一化互信息的新型分层聚类算法,以不同粒度对森林学习者进行分组,确定最优森林学习者类型。这种先进的技术方法可以确定堆叠模型(包括深林)的模型结构。最后,为了实现交叉验证分数的最大化,基于树 Parson 估计器的贝叶斯优化算法确定了每种类型的理想森林学习器数量。此外,还开发了一种用于识别森林学习者的标准化方法,以保证模型结果的一致性。最重要的是,对 UCI Machine Learning Repository 的 7 个数据集进行了一系列比较实验,证实了所提出模型的有效性和优越性。结果表明,所提出的模型除了具有高度的自动化外,还对新数据和任务具有优异的适应性,并且在分类任务中表现出色。
更新日期:2024-11-10
down
wechat
bug