NSF Org: |
IIS Div Of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | September 13, 2001 |
Latest Amendment Date: | July 7, 2006 |
Award Number: | 0121175 |
Award Instrument: | Continuing Grant |
Program Manager: |
Maria Zemankova
IIS Div Of Information & Intelligent Systems CSE Direct For Computer & Info Scie & Enginr |
Start Date: | October 1, 2001 |
End Date: | September 30, 2007 (Estimated) |
Total Intended Award Amount: | $210,000.00 |
Total Awarded Amount to Date: | $2,187,700.00 |
Funds Obligated to Date: |
FY 2002 = $305,000.00 FY 2003 = $590,000.00 FY 2004 = $433,100.00 FY 2005 = $334,600.00 FY 2006 = $450,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
341 PINE TREE RD ITHACA NY US 14850-2820 (607)255-5014 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
341 PINE TREE RD ITHACA NY US 14850-2820 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | INFORMATION & KNOWLEDGE MANAGE |
Primary Program Source: |
app-0102 app-0103 app-0104 app-0105 app-0106 |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Data mining is one of the very promising information technologies today. This project studies decision trees, one of the most widely used data mining models. The approach addresses three complementary components of decision tree construction: Bias in split selection, pruning, and regression tree construction. Bias in split selection is a very important problem, as the choice of the "wrong" split attribute destroys the interpretability of the decision tree, and users can no longer trust the information from the tree. Through a large experimental study and a theoretical investigation, this project develops a framework to devise split selection methods with absolutely zero bias. The new methods will permit users of decision trees to interpret the tree without any doubt of misinformation. The second topic addresses pruning of decision trees. Through a large experimental study of pruning of decision trees for large datasets, the project investigates the computational and qualitative trade-offs between different pruning methods, solving an ongoing debate about how to prune with large datasets. Third, this research investigates scalable regression tree construction, developing methods to construct regression trees with linear models in the leaf nodes of the tree and multivariate splits at intermediate nodes - all completely scalable over very large datasets with millions of records. The results are implemented in a publicly available decision tree construction tool and performance testbed and software contribution to the research community. This research has many applications in electronic commerce, scientific data analysis, and computational biology.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
Please report errors in award information by writing to: awardsearch@nsf.gov.