Wednesday, September 17, 2014


Science: Data Selection


Scientists view data selection as forbidden.  That is certainly appropriate in a well-designed study if everything goes smoothly.  But it seems appropriate to exclude data that results from defective instrumentation, obvious errors in recording data, and inclusion of data not meant to be part of a class being compared.  Any departures from selecting data after it has been collected should be included in notes of research reports.  Even the best of intentions of researchers can sometimes result in compromised data that should never have reached publication after good peer review.  Some examples I have encountered follow.

Salamander growth 

An excellent field study done by a former colleague showed a continuously rising straight-line growth curve.  My familiarity with growth curves from investigating the principle of growth pertinent to my study of protozoans during my master's research project and crustacean growth in my doctoral research made me dubious about the salamander growth reported for the Central American salamanders.  Dr. Vial informed me that his doctoral committee had approved excluding some data thought to be defective in the study.

I was discussing his study with him after a seminar he presented about it.  Successive years data from salamanders that had not grown were excluded for fear that they were not from the same salamanders but were ones that had lost toes that made them misidentified as ones from his toe-clipped sample.  I don't remember for certain if they were also excluded for possibly not showing any growth due to tail loss and incomplete regeneration of a new tail.

It is likely that his salamanders growth slowed with age due to one or more of the many factors that can slow growth.  The rapid growth of young individuals slows and in many species stops before natural death occurs.
1. Energy put into growth diminishes as more energy is put in to reproduction.  2. Increased size may not be accompanied by an accompanying supply of materials for growth.  3. Accumulation of waste within cells may slow growth.  4.  Conversion of l-isomers of amino acids to d-isomers may interfere with metabolism.  5. Telomeres of the chromosomes may be reduced beyond numbers needed for growth.  And, 6.  Genetic control causing growth cessation may have evolved through natural selection to keep the species age class composition supplied with young and vibrant individuals.  Regions where such populations occurred would likely replace adjacent populations not doing so.

Remote sensing

Another colleague made the assumption that remote sensing of lake colors by satellite could measure lake quality.  The assumption was correct in a general way but I knew of one specific case where it did not work as planned.  Asylum Lake was one I frequently sampled with my aquatic ecology (limnology) classes.  Years of food wastes had polluted it severely although recovery was proceeding after it stopped.

During the years of his study I had noticed the surface often had many floating bits of duckweed, one of the smallest aquatic green plants.  Their chlorophyll is of a type associated with green algae that are found in lakes not severely polluted.  The algae suspended in water beneath the surface were primarily blue-green algae which are indicative of lakes over-enriched with phosphorus.  So the lake by that assessment was eutrophic and not oligotrophic as the remote sensing indicated.

Molecular biology errors

From my reference file is the following entry.
Lewin, Roger.  1988.  DNA clock conflict continues.  Science, 241:1756-1759. describes article of Charles Sibley and Jon Ahlquist using DNA hybridization criticized by Vincent Sarich and others – wrong side (Sarich) seemed to win charging selection of data but Sarich Tmode used by him is selection of a worse sort. 

Sibley's work was a study of bird relationships.  He had eliminated some data that were obviously contaminated as he knew by his experience with the technique and had properly noted in his work.  His approach seems to have been abandoned, unfortunately since it avoids data selection flaws found in much recent work.

This blog's post of 5/31/2013 entitled Science screw-up No. 1 describes a major episode still not resolved resulting in an invalid interpretation of animal phyla relationships.

Missing data

Finding data that we did not know was missing is an unforeseeable event that, when found, enabled me to see some important aspects of origin of animal groups.  Many posts starting in late June 2013 related to annelid theory of chordate origin and the pogonophorans may help clarify the significance of Webb's publications about the finding of the formerly unknown segmented posterior extremity of pogonophorans.

Estimating the time of glacial retreat in Michigan based on strata of bog vegetation carbon-dated ages has developed with a possible error.  Fossil carbon of low activity was probably incorporated in bog vegetation from carbon derived from ancient carbonate deposits dissolved in water entering lakes where the plants grew.  This missing aspect of the data would give an older age to the age estimated proportional to the percentage incorporated.

The errors and other criticized findings you may find in my blog may be indicative of the value of an old saying -  that we can or should learn from our mistakes; also that if you are afraid to make a mistake you are not likely to find anything.  I thank them all, if they hadn't tried, I wouldn't have much to say.

Joseph G. Engemann   September 17, 2014

No comments:

Post a Comment