Edited by Chang Zhu
1. Normalization is to scale data such that different arrays can be
compared.
2. Usually it requires the arrays be identical (e.g., cDNA arrays
with the same clones printed). Although possible, it is difficult
to normalize data for arrays of different types and/or gene list.
3. The two frequently used normalization options are:
A. Unit Column Mean --- It is suggested for data in "unlog" scale.
The data for each array (column) is adjusted such that the column mean
is 1. This is suggested for two-channel ratio data and is also applicable
to Affymetrix data.
B. Zero Column Mean --- It is suggested for
data (cDNA or Affy) that has been transformed to log scale.
The data for each array (column) is adjusted such that the column
mean is 0. Note for ratio data, log scale is the "natural" scale since
the "up" and "down" will be symmetric in the log scale.
4. Normalization relies on genes whose expression don't change to
align different arrays. Filtration however selects for changes across
arrays. These are conflicting goals and yet intertwined. Care should be
taken in the order and extent of filtration and normalization.
5. Users are cautioned not to re-normalize data after
extensive filtering. For example, if you have a dataset composed of
20 normal and 20 tumor samples. If you set the filtering condition
to be at least 15 samples must have a minimum 2 fold of changes
compared to the row mean, the majority of genes in the resulting
dataset will have either higher or lower expressions in normal
compared to tumor samples. Renormalization will erase these
differences.
6. The above normalization options are provided in MicroHelper.
7. For more sophisticated normalization options, consult a statistician.
|