Abstract
Questions: What is the relative importance of our methodological decisions concerning sampling (plot size) and data analysis (data transformation, resemblance coefficient, hierarchical clustering strategy and number of clusters) in vegetation classification? Are there differences between the conclusions when the full range or only a more practical narrow range of methodological choices is tested? What is the difference between results for actual and random data?
Location: Rock grassland in Hungary.
Methods: The full procedure of vegetation classification was simulated using actual and random data. Variation in classification results was partitioned using distance-based redundancy analysis. The RDA models were subjected to variation partitioning to determine the relative importance of methodological decisions.
Results: RDA models explained more variation in classifications of random than in real data. Classification algorithm, cluster level, data transformation and mean plot size were always included among the most significant variables, however, the other variables also had a considerable effect in certain situations.
Conclusions: As adjusted R2 values suggest, the overall effect of methodological decisions on classifications is larger for randomly structured than actual data, due possibly to a stronger clustering tendency in the latter. The clustering algorithm, cluster level, data transformation and plot size should be chosen most carefully before classification analyses, but any of the examined decisions can significantly affect the result. In addition to the mean, the range of plot sizes should also be carefully delimited during relevé selection for classification studies. The main decision about the classification algorithm is whether a chain-forming or group-forming method is used. The data transformation had a more significant effect on real data than on simulations with random variation, thus supporting the ability of the application of different abundance scales in revealing different facets of biologically relevant patterns in community composition. The resemblance measure had a relatively weak effect, suggesting that it is not as influential as previously thought.
Keywords
Data transformation, Flexible clustering, Model selection, Multivariate analysis, Plot size, Resemblance measure
Questions: What is the relative importance of our methodological decisions concerning sampling (plot size) and data analysis (data transformation, resemblance coefficient, hierarchical clustering strategy and number of clusters) in vegetation classification? Are there differences between the conclusions when the full range or only a more practical narrow range of methodological choices is tested? What is the difference between results for actual and random data?
Location: Rock grassland in Hungary.
Methods: The full procedure of vegetation classification was simulated using actual and random data. Variation in classification results was partitioned using distance-based redundancy analysis. The RDA models were subjected to variation partitioning to determine the relative importance of methodological decisions.
Results: RDA models explained more variation in classifications of random than in real data. Classification algorithm, cluster level, data transformation and mean plot size were always included among the most significant variables, however, the other variables also had a considerable effect in certain situations.
Conclusions: As adjusted R2 values suggest, the overall effect of methodological decisions on classifications is larger for randomly structured than actual data, due possibly to a stronger clustering tendency in the latter. The clustering algorithm, cluster level, data transformation and plot size should be chosen most carefully before classification analyses, but any of the examined decisions can significantly affect the result. In addition to the mean, the range of plot sizes should also be carefully delimited during relevé selection for classification studies. The main decision about the classification algorithm is whether a chain-forming or group-forming method is used. The data transformation had a more significant effect on real data than on simulations with random variation, thus supporting the ability of the application of different abundance scales in revealing different facets of biologically relevant patterns in community composition. The resemblance measure had a relatively weak effect, suggesting that it is not as influential as previously thought.
Keywords
Data transformation, Flexible clustering, Model selection, Multivariate analysis, Plot size, Resemblance measure
Nincsenek megjegyzések:
Megjegyzés küldése