Instead, we can actually experiment with various combinations of hyperparameters with the help GridSearchCV module of Sklearn to arrive at a better accuracy. Moreover, in that example, we did not pass any hyperparameters in DecisionTreeRegressor object, hence all default hyperparameters were used. To produce a good model that is less prone to overfitting, we can use K Cross Validation rather than splitting the data into just two parts. This is due to the small size of the dataset (29 rows), and splitting it into train and test sets can result in information loss for training. We observed slight overfitting in the trained model in the preceding example. Improving Results with K Cross Validation & Hyperparameter Tuning To begin, we import all of the libraries that will be needed in this example, including DecisionTreeRegressor. It is a simple dataset with only 29 records. In this example, we’ll use the Salary dataset, which has two attributes: ‘YearsExperience’ and ‘Salary’. Example of Decision Tree Regression in Sklearn About Dataset It can have the values ‘auto,”sqrt,’ ‘log2’, ‘None,’ int, or float. Max_features: It indicates the number of features to be considered in order to find the best split. It can be any int or float value and the default is 1. Min_samples_leaf: It refers to the minimum no. It supports any int or float value and the default is 2. Min_samples_split: It refers to the minimum number of samples needed to split an internal node. If “None”, nodes are expanded until all leaves are pure or contain fewer than min samples split samples. Max_depth: It denotes the tree’s maximum depth. The supported strategies are “best” (default) and “random”. Splitter: This denotes the strategy used for splitting at each node while creating the tree. The following values are supported:’squared error’ (the default), ‘absolute error’, ‘friedman mse’, and ‘poisson’. Some of the main hyperparameters provided by Sklearn’s DecisionTreeRegressor module are as follows:Ĭriterion: This refers to the criteria that is used to evaluate the quality of the split in decision trees. Hyperparameters are parameters that can be fine-tuned to improve the accuracy of a machine learning model. Decision Tree Regressor Hyperparameters (Sklearn) In Sklearn, decision tree regression can be done quite easily by using DecisionTreeRegressor module of ee package. Using the Model: Once trained and evaluated, the model can be used to make predictions on new, previously unseen data.The model’s evaluation metric is chosen by the problem at hand, but it could be mean squared error, mean absolute error, R-squared, R Score, etc. Evaluating the Model: The model’s performance is evaluated using the testing set.The leaf node’s output value is then used as a prediction for the input data. Making Predictions: Predictions are made by passing the input data down the tree, traversing the branches until it reaches a leaf node.Pruning involves removing branches of the tree that do not improve the predictive accuracy of the model. Pruning the Tree: Once the tree is fully grown, it is often pruned to reduce overfitting.Choosing the Best Split: At each stage of the tree-building process, the algorithm chooses the feature that best divides the data into the most homogeneous subsets based on a predefined criterion, such as mean squared error.The goal is to create a tree with high predictive accuracy while remaining simple to understand.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |