|
Saying goodbye to Bernie
This week, the local angling community lost one of its most well-known members. On Sunday, July 10, Bernie Schnieders died suddenly at his home while exercising. He was 49. His unexpected passing shocked everyone who knew him. Few people were as full of fire and energy as Bernie was.
SMILES ARE THE CATCH OF THE DAY
MEXICO - Nearly 100 special needs individuals reeled in some fun on Monday at the fourth annual Parents of Special Children Fishing Day, based at Mike's Marina in Mexico.
Rip-jigging tricks
SUMMER is certainly the most popular time of the year for walleye angling in Northern Ontario . Long days, sunny skies and warm air temperatures make for a very pleasing combination on the lake.
|
...Continued
from top
Tree regression analysis is a relatively new (Breiman et al. 1984) and useful alternative statistical approach (Efron and Tibshirani 1991) since it does not require the existence of linear relationships among the variables or homoscedasticity in variances, and because interactions among habitat variables in relation to the organism are detected automatically in the analysis. Tree regression results present the relations among physical habitat conditions and nest density in an easily interpretable way by expressing nest-site selection as a hierarchy of decision and/or selection processes, based on physical habitat criteria. Since both tree and standard multiple regression are statistical analyses that quantify the significance and the relative importance of independent (habitat) variables on a dependent variable (nest density), the usefulness of tree regression analysis was compared with the better known standard regression analysis.
FIELD SAMPLING METHODS AND PRELIMINARY ANALYSES
Detection of smallmouth bass nests
Sampling was conducted in Lake Opeongo, Ontario (5780 ha., 45[degrees]42[minutes] N, 78[degrees]22[minutes] W), where smallmouth bass nest depths range between and m (most frequently from to m). By snorkelling along the 1-m depth contour, exact nest locations were recorded throughout the 155-km perimeter of Lake Opeongo in four years (1984, 1992, 1993, 1994), and throughout Jones Bay (a particularly high-density nesting area in Lake Opeongo) in eleven years (1977-1979, 1984, 1988-1994). These surveys of the smallmouth bass nesting population are among the most comprehensive records known for the breeding locations of a fish population (Rejwan et al. 1997).
Patchiness in nest distributions has been identified throughout a large range of spatial scales in Lake Opeongo (100-m to 10-km quadrat sizes) (Rejwan et al. 1997). Thirty-six 1-km-long littoral sites throughout Lake Opeongo and thirty-one 100-m-long littoral sites throughout Jones Bay, Lake Opeongo, were selected for measuring habitat conditions in such a way that they encompassed the entire ranges in nest densities in approximately uniform proportions.
Measurement of physical habitat variables
The wind/wave exposure of each study site was measured as the maximum fetch length along the prevailing wind directions at three equally spaced locations along the length of each 100-m and 1-km study site. Analysis (ANOVA: F = , F = , df = 7, 72, P [much less than] , Tukey's multiple comparisons) of 10 years of continuously recording anemometer readings at the south tip of Lake Opeongo indicated that northwesterly and southwesterly wind velocities (in kilometers per hour) were significantly greater than winds from other directions over the duration of the average nest-guarding period (29 May to 30 June, Rejwan 1996). The maximum southwesterly and northwesterly fetch lengths were measured at each of three locations at 25%, 50%, and 75% of the distance along each 1-km and 100-m study site and were averaged to characterize exposure at each site. Southwest fetch lengths were multiplied by since northwesterly winds were an average of times slower than southwest winds (P [less than] ).
Water temperatures were recorded in a single central location within each 1-km and 100-m site at 1 m (approximate nest depth) using one of three types of temperature-recording instruments: continuous digital recorders (Ryan Temp-Mentors and DataSonde Hydrolabs), continuous paper recorders (Ryan Thermistors), and Geneq max-min thermometers. Midpoint temperatures were recorded from the minimum and maximum temperatures from each 2-d time interval at each site over the duration of the nesting period (25 May to 30 June 1993 and 1 June to 27 June 1994), regardless of the type of temperature recorder deployed (Rejwan 1996). Highly significant correlations between 1994 and 1993 temperatures at both the 1-km ([.2] = , P [less than] , n = 27 sites) and the 100-m ([.2] = , P [less than] , n = 30) spatial scales indicated that there was consistency between the years in the among-site temperature differences. Consequently, 1994 temperatures were used to represent the site-specific temperatures, since the temperature records for the 1993 nesting season were unavailable for 10 of 67 sample sites.
Reticulation of the shoreline along each 1-km and 100-m study site was calculated using a measurement protocol that facilitates inter-scale comparisons. This method takes into account the fractal properties of shorelines, whereby reticulated shoreline lengths increase rapidly with increasing measurement resolution relative to lengths of straighter shorelines (Kent and Wong 1982, Sander 1987). At the small and large spatial scales in this study, the shoreline reticulation of each site was measured using one high-resolution (with a Run-Mate curvimeter) and one low-resolution (with a ruler) measurement. On a 1:1793 scale map (100-m sites) and a 1:10000 scale map (1-km sites) the real-image high resolutions were m and 10 m, respectively, and low resolutions were 20 m and 100 m, respectively. Measurements of shoreline length with the low-resolution method were subtracted from the high-resolution site lengths of either 100 m or 1 km to generate a measure of shoreline complexity that increases with increasing complexity (Rejwan 1996).
The fourth variable, littoral-floor rugosity, was quantified by measuring the straight-line end-to-end length of an 18-m-long fine-link ( cm) chain after lowering it over the profile of the littoral floor along the 1-m depth contour (approximate nest depth). The 18-m chain length was equal to the average territory range of a male smallmouth bass during the nest-guarding period (Scott 1993). Rugosity was measured at (and averaged over) ten and three equally spaced sampling sites within each 1-km and 100-m-long site, respectively. To produce a measure that was directly related to the littoral ruggedness of the terrain, these lengths were subtracted from the total chain length of 18 m.
METHODS OF STATISTICAL ANALYSIS
Standard multiple-regression analysis
A best-combinations method of standard multiple-regression analysis was employed whereby all combinations of physical variables were tested, and the smallest combination of physical variables that yielded the highest adjusted [.2] value was included in the standard multiple-regression equation. Triangular-shaped relationships that could not be linearized between nest density and both shoreline complexity and wave exposure at the 1-km scale, were expected to limit the effectiveness of the standard multiple-regression analysis.
The mechanisms involved in tree regression analysis
Tree regression analysis involves a recursive partitioning of the study sites into two groups (of high and a low nest density) that are as similar as possible in nest density within each. The exact allocation of sites into the groups is determined by repeatedly dividing the data set (of thirty-six 1-km-long sites, or thirty-one 100-m-long sites) into two groups at every possible measurement within the entire range of measurements of each of the four habitat variables. After each of the binary splits, the variability (within-groups sum of squares) in nest density is calculated for the two groups of samples that are created, and these two values are summed. Ultimately, the habitat variable and split-point that results in the smallest amount of variance in nest density within the two created groups, combined, is used to divide the data set in the regression tree. Exactly the same assessment is then repeated with each of the two groups that were created, and each of the independent variables is assessed regardless of whether or not they were previously used in the tree. The data set is divided through this recursive binary-partitioning mechanism until there is no more than a single site in a group, or until there is no variation in nest density among sites within the group (Clark and Pregibon 1992, StatSci 1993).
The regression tree that is created is structured in a hierarchical fashion with the initial undivided data set at the top (the root) followed by binary splits, each of which are called "nodes," to final undivided groups of sites ("leaves") at the bottom of the tree. The proportion of variance in nest density that is explained by each split is indicated by the vertical length of the "branches" that extend from each node to each subsequent node or leaf. The size of the regression tree is measured by the number of leaves (final groups) in the tree. Thus, the undivided data set is considered to have a tree size of 1.
Cross-validation analysis: determining tree size
Near the top of a regression tree, the early partitions of study sites are relatively likely to reflect the relationships that actually exist between nest densities and the habitat conditions. However, as tree branching continues, involving the partitioning of smaller and smaller sample sizes, the precision of each split diminishes. Consequently, branches near the bottom of the regression tree are less generalizable outside of the sample to the rest of the population that has not been sampled. To estimate what part of the regression tree quantifies the relations between nest density and habitat conditions that exist outside of the samples, across the entire Lake Opeongo (1-km scale) and Jones Bay (100-m scale) populations, a procedure known as "cross-validation" was employed (Clark and Pregibon 1992, StatSci 1993).
Cross-validation analysis involved the random partitioning of the data set into 10 groups of equal or similar size and the creation of a "cross-validation regression tree" with only 9 of the 10 groups. This cross-validation regression tree was used to predict nest density for each of the sites in the remaining 10th group (the Cross-Validation Group). Predicted and observed nest densities were compared by calculating variance from the predicted mean value within each leaf at all possible tree sizes, from the initial division of the data into a two-leaved tree, onwards. This procedure was repeated 10 times (10 cross-validation trees were created), so that each of the 10 groups of sites was used as the Cross-Validation Group once. The entire procedure was repeated 200 times, each time with a new random assortment of sites into 10 groups to ultimately produce 2000 cross-validation assessments (Manly 1991).
The precision of the regression-tree predictions increases with tree size as long as the total amount of explained variation in nest density (total variance subtracted by unexplained variance) among the cross-validated sites increases with increasing tree size. However, as tree size continues to increase, the total amount of explained variance in nest density will eventually decrease. This component of the regression tree is considered to be too imprecise to be generalizable beyond the sample. Therefore, all tree sizes for which no further increase in explained variance occurs are removed ("pruned") from the tree (Clark and Pregibon 1992).
Cross-validation analysis: precision tests
Significance tests (which are used in tree and standard multiple-regression analyses, and are described in the next section) do not indicate the extent to which the samples (in this case, 1-km and 100-m sites selected from Lake Opeongo and Jones Bay, respectively) are representative of the whole population. Precision tests are an important method of providing this information. The loss of model precision in extrapolating to the whole population depends on the proportion of the population that was sampled (which is controllable), the amount of variation in the population (which is uncontrollable), and the extent to which models over-fit the data (the ratio of the sample size to the number of explanatory variables used in the analysis, and the intricacy of the model, both of which are controllable). The assumption that tree and standard multiple-regression models can be extrapolated to whole populations without much loss of accuracy was tested.
In this study, sample sites encompassed large proportions (25% [n = 36 sites] and 50% [n = 31 sites]) of the total nesting populations at the 1-km and 100-m spatial scales in Lake Opeongo and Jones Bay, respectively. Precision was calculated using the same cross-validation analyses that were used to determine the appropriate size of the regression tree models, described in the previous section. In estimating model precision, the important information from the cross-validation results is the actual amount of the total variance in nest density that can be explained in the Cross-Validation Group, rather than relative changes in the explained variance across different tree sizes. This cross-validation procedure was repeated (replicated exactly) in a cross-validation analysis of the standard multiple-regression results.
Tests of statistical significance
After the regression tree has been created and pruned using the cross-validation procedure, it is necessary to determine whether the pruned tree explains significantly more variance than a random regression tree of equal complexity. To determine this, the amount of variance (expressed as the [.2] value) explained by the True Regression Tree (created using the complete data set) was compared with [.2] values of regression trees (pruned to the same size as the True Tree) generated from 2000 random associations (Manly 1991) between nest density and the habitat variables. If the 'true' [.2] value was among the top 5% of the randomly generated [.2] values, the True Regression Tree represents significant (rather than chance) relations between nest density and the habitat variables used in the tree model.
Using this permutation method to determine the significance of the True Regression Tree is in many cases a more accurate way to test for significance than traditional significance tests used in standard multiple-regression analysis. This is because the traditional standard multiple-regression test for significance considers only the combination of physical variables that were used in the final best-fit regression line (or at most, the total number of independent variables used in the research), rather than adjusting for the a posteriori evaluation of often hundreds of possible combinations of independent variables that are initially considered in the search for the best-fit regression line. Thus, the calculated P value is many times smaller than it should be, resulting in an overestimate of the significance of the standard regression fit. This is a very important and common problem in standard multiple-regression analyses. In contrast, the P value that is calculated from the repeated sampling technique in tree regression correctly takes into account the approximate number of combinations of physical variables that are possible, since all combinations of the four physical habitat variables are considered in creating both the True Regression Tree, and the 2000 randomly generated trees.
RESULTS AND DISCUSSION
Smallmouth bass nest distributions at the 1-km spatial scale are not well described unilaterally by any of the four physical variables that were measured (temperature [.2] = , wind/wave exposure [.2] = , shoreline reticulation [.2] = , benthic rugosity [.2] = ). Standard multiple-regression results explained a significant amount of variance in nest density among the sites ([.2] = , adjusted [.2] = , P [less than] , n = 36 sites; nest density = - + [.2] + ( x [.-4])f + , where r = shore reticulation, t = temperature, f = fetch exposure) despite the triangular (distinctly nonlinear) relations of fetch, and of littoral benthic rugosity, to nest density. Shoreline complexity, fetch, the temperature index, and a squared-transformation of the temperature index (to remove observed curvilinearity of residuals) were all significant components in this regression equation. However, on average, none of the variance in the cross-validated sites (median [.2] = , n = 2000 permutations) could be explained by standard regression equations created with the other 90% of the data. Thus, the standard multiple regression model would probably not be useful in explaining differences in nest density outside of the sample in the rest of the population.
The pruned regression tree ([.2] = , P [less than] , n = 36 sites, [ILLUSTRATION FOR FIGURE 1 OMITTED] indicates a strong positive relationship between nest density and both temperature and shoreline complexity. The tree was pruned to a size of three leaves since subsequent sections of the tree model did not improve the predictions of nest densities among the cross-validated samples [ILLUSTRATION FOR FIGURE 2 OMITTED]. Cross-validation analysis also indicated that the tree regression model would likely maintain substantial precision in extrapolations outside the sample, to the population (median [.2] = , n = 2000 permutations, [ILLUSTRATION FOR FIGURE 2 OMITTED]).
The regression tree results suggest that decision making by adult smallmouth bass is analogous to a hierarchical assessment of physical habitat variables, from primary considerations of temperature, to secondary considerations of shoreline complexity [ILLUSTRATION FOR FIGURE 1 OMITTED]. The large proportion of variance in nest density is explained by distinguishing the warmest 23% of the sites (above [degrees] C) from all other sites in the regression tree ([.2] = , n = 8 sites). This is probably largely generated by one or both of two known effects of temperature on reproduction in smallmouth bass populations near the northern limits of their range. First, early growth and survival rates of broods are very closely tied to local thermal conditions (Shuter et al. 1980), which are in turn tied to the abundance of that cohort at adulthood (MacLean et al. 1981). At water temperatures below [approximately]15 [degrees] C, survival rates of young in their nests decline dramatically (Shuter et al. 1980). Since phylopatry likely exists in the Lake Opeongo smallmouth bass population (Gross et al. 1994), high densities of nests would be less likely in areas where brood survival is consistently low. Second, in relatively warm environments (where the young develop more rapidly), guarding males can complete their nest-guarding duties sooner. Since adult food availability during the nest-guarding period likely limits the lifetime reproductive success of nest-guarding males (Ridgway and Shuter 1994), adults nesting in relatively warm environments are perhaps least likely to abandon their broods early, and/or are most likely to be able to return to nest in subsequent years.
If nest-patch locations in Lake Opeongo are restricted by minimum-temperature tolerances of smallmouth bass, as the above two mechanisms would suggest, nests in relatively warm areas may supply a larger proportion of their young to the cohort than do nests in colder regions. Since temperature patterns in Lake Opeongo at the 1-km scale show significant consistency between years (Rejwan et al. 1997), temperature sensitivity may create the stable patterns in nest distributions that have been documented. If this association were common in lakes, then particularly warm regions should be protected from anthropogenic activities.
Despite significant consistency in nest-patch locations, nest-site fidelity is not perfect (Ridgway et al. 1991a); sparse distributions of nests that occur outside of nest patches (Rejwan et al. 1997) probably exist as a consequence of imperfect nest-site fidelity and environmental conditions that are adequate (at least in some years) for brood survival. It is also possible that a few nesters persist in suboptimal, low-density nesting environments as a consequence of some survival advantage over nesters from high-density areas at other stages in their life history.
An interaction was identified by the regression tree whereby nest densities are particularly high among the eight warmest sites that have relatively reticulated shorelines (but no such relationship existed among the cooler sites). This may be the result of various possible mechanisms. Shoreline complexity was included in this study with the expectation that complex shorelines would be preferred in typically high-exposure windward environments, where warm epilimnetic water accumulates. Consequently, complex shorelines were expected to provide protection from wind-driven waves, known to be detrimental to brood survival (Goff 1985), while containing beneficial warm water temperatures. However, this mechanism does not appear to explain the positive relationship between shoreline complexity and high nest densities in Lake Opeongo. The three 1-km Jones Bay sites are among the four 1-km sites categorized in the regression tree by their particularly warm water conditions, complex shorelines, and high nest densities [ILLUSTRATION FOR FIGURE 1 OMITTED], and Jones Bay is downwind from very large fetch lengths (up to 6 km) across the prevailing wind directions. Even though significant nest patchiness exists at a smaller (100-m) scale within Jones Bay (Rejwan et al. 1997), the nest patches are not confined to protected sections of the bay. Numerous alternative mechanisms may be responsible for the relation between complex shorelines and high nest density in warm areas, such as increased concealment of nest aggregations from brood predators.
At the 100-m scale (in Jones Bay) no significant relationship was detected between any of the four physical habitat variables and nest density from simple bivariate plots ([.2] range: .09, n = 31 sites for each of four plots), standard multiple regression ([.2] = , P [less than] , n = 31 sites), or tree regression ([.2] = , P [less than] , n = 31 sites) analyses. These results may be explained by one of two possibilities. First, statistical power may have been too low to detect relationships that indeed exist at the 100-m scale (relative to the 1-km scale) where less extreme (although significant) patchiness was observed (Rejwan et al. 1997). Alternatively, having satisfied the large-scale requirements of favorable thermal and shoreline reticulation conditions, small-scale nest patchiness in Jones Bay may be determined by other environmental factors such as temporally fine-scale differences in temperature (., differences in short-term temperature fluctuations due to seiche activity, or in temperatures at the onset of the nesting period when among-site differences appear to be greatest).
The tree regression technique, based on the premise that bimodal relations among the variables exist, is better suited to the data in this study than the more commonly used standard multiple-regression analysis, which was extremely imprecise despite its statistical significance. Cross-validation results from both the standard and tree regression analyses demonstrate that caution should be taken in extrapolating sample-generated findings to processes underway within whole populations, even when very large proportions (23% of all 155 1-km-long sites) of the population have been sampled. Cross-validation analysis is an excellent method of evaluating the precision of new predictions, and should prove broadly useful for evaluating predictive models, rather than relying on significance tests alone.
ACKNOWLEDGMENTS
We are very grateful to Gene Wilde for his introduction to the subject of tree regression analysis. The work by Brenda Leach, Jenn Rowley, and numerous other field assistants from the Harkness Laboratory of Fisheries Research were very much appreciated. We are also grateful to Wayne Burchat from Ontario Hydro for the loans of Hydrolab temperature recorders. The advice and helpful suggestions of Philip Crowley and two anonymous reviewers were much appreciated.
This paper is a contribution of the Harkness Laboratory of Fisheries Research, Ontario Ministry of Natural Resources.
LITERATURE CITED
Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and regression trees. Wadsworth, Belmont, California, USA.
Christensen, D. L., B. R. Herwig, D. E. Schindler, and S. R. Carpenter. 1996. Impacts of lakeshore residential development on coarse woody debris in north temperate lakes. Ecological Applications 6:1143-1149.
Clark, L. A., and D. Pregibon. 1992. Tree-based models. Pages 377-419 in J. M. Chambers and T. J. Hastie, editors. Statistical models in S. Wadsworth & Brooks, Pacific Grove, California, USA.
Dutilleul, P., and P. Legendre. 1993. Spatial heterogeneity against heteroscedasticity: an ecological paradigm versus a statistical concept. Oikos 66:152-171.
Efron, B., and R. Tibshirani. 1991. Statistical data analysis in the computer age. Science 253:390-305.
Goff, G. P. 1985. Environmental influences on annual variation in nest success of smallmouth bass, Micropterus dolomieui, in Long Point Bay, Lake Erie. Environmental Biology of Fishes 14:303-307.
Gross, M. L., A. R. Kapuscinski, and A. J. Faras. 1994. Nest-specific DNA fingerprints of smallmouth bass in Lake Opeongo, Ontario. Transactions of the American Fisheries Society 123:449-459.
Kent, C., and J. Wong. 1982. An index of littoral zone complexity and its measurement. Canadian Journal of Fisheries and Aquatic Sciences 39:847-853.
MacLean, J. A., B. J. Shuter, H. A. Regier, and J. C. MacLeod. 1981. Temperature and year class strength of smallmouth bass. Proceedings of the symposium on early life history of fish. International Council for the Exploration of the Sea 178:30-40.
Manly, B. F. J. 1991. Randomization and Monte Carlo methods in biology. Chapman & Hall, London, UK.
Philipp, D. P., C. A. Toline, M. Kubacki, D. B. F. Philipp, and F. Phelan. 1997. The impact of pre-season catch and release angling on the reproductive success of smallmouth and largemouth bass. North American Journal Fish of Management, 17:557-567.
Philippi, T. E. 1993. Multiple regression: herbivory. Pages 84-94 in S. M. Scheiner and J. Gurevitch, editors. The design and analysis of ecological experiments. Chapman & Hall, New York, New York, USA.
Rejwan, C. 1996. The relations between smallmouth bass (Micropterus dolomieui) nest distributions and characteristics of their habitat in Lake Opeongo, Ontario. Thesis. University of Toronto, Mississauga, Ontario, Canada.
Rejwan, C., B. J. Shuter, M. S. Ridgway, and N. C. Collins. 1997. Spatial and temporal distributions of smallmouth bass (Micropterus dolomieui) nests in Lake Opeongo, Ontario. Canadian Journal of Fisheries and Aquatic Sciences 54:2007-2013.
Ridgway, M. S., J. A. MacLean, and J. C. MacLeod. 1991a. Nest-site fidelity in a centrarchid fish, the smallmouth bass (Micropterus dolomieui). Canadian Journal of Zoology 69: 3103-3105.
Ridgway, M. S., and B. J. Shuter. 1994. The effects of supplemental food on reproduction in parental male smallmouth bass. Environmental Biology of Fishes 39:201-207.
Ridgway, M. S., B. J. Shuter, and E. E. Post. 1991b. The relative influence of body size and territorial behaviour on nesting asynchrony in male smallmouth bass, Micropterus dolomieui (Pisces: Centrarchidae). Journal of Animal Ecology 60:665-681.
Sabo, M. J., and D. J. Orth. 1994. Temporal variation in microhabitat use by age-0 smallmouth bass in the North Anna River, Virginia. Transactions of the American Fisheries Society 123:733-746.
Sander, L. M. 1987. Fractal growth. Scientific American 256: 94-100.
Scott, R. J. 1993. The influence of parental care behaviour on space use by adult smallmouth bass, Micropterus dolomieui. Thesis. University of Guelph, Guelph, Ontario, Canada.
Serns, S. L. 1982. Relation of temperature and population density to first-year recruitment and growth of smallmouth bass in a Wisconsin lake. Transactions of the American Fisheries Society 111:570-574.
Shuter, B. J., J. A. MacLean, F. E. J. Fry, and H. A. Regier. 1980. Stochastic simulation of temperature effects on first-year survival of smallmouth bass. Transactions of the American Fisheries Society. 109:1-34.
StatSci. 1993. Splus for Windows, version . Reference manual. Volume 1. Statistical Sciences, Inc., Seattle, Washington, USA.
|