Journal of Leukocyte Biology Myeloid cells, immune suppression, tumor immunology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Originally published online as doi:10.1189/jlb.0203085 on July 15, 2003

Published online before print July 15, 2003
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
jlb.0203085v1
74/4/602    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hoffmann, R.
Right arrow Articles by Dugas, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hoffmann, R.
Right arrow Articles by Dugas, M.
(Journal of Leukocyte Biology. 2003;74:602-610.)
© 2003 by Society for Leukocyte Biology

Developmental markers of B cells are superior to those of T cells for identification of stages with distinct gene expression profiles

Reinhard Hoffmann*,1, Thomas Seidl{dagger}, Ludovica Bruno{dagger} and Martin Dugas{ddagger}

* Max von Pettenkofer-Institut, Department Bacteriology, Munich, Germany;
{dagger} Institute of Cancer Research, Section of Gene Function and Regulation, Chester Beatty Laboratories, London, United Kingdom; and
{ddagger} Department of Medical Informatics, Biometrics and Epidemiology; University of Munich, Germany

1 Correspondence: Max von Pettenkofer-Institut, Department Bacteriology, Pettenkoferstr. 9a, 80336 Munich, Germany. E-mail: r_hoffmann{at}m3401.mpk.med.uni-muenchen.de


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
B and T lymphocytes develop through a series of cellular stages, which are defined by recombination status of the immunoglobulin and T cell receptor loci and can be separated by analysis of cell-surface markers. We evaluated how well 26 and 41 samples from five and eight developmental stages of B and T cell development, respectively, could be correctly assigned to their lineage of origin and developmental stage by analysis of the expression of 13,026 genes and expressed sequence tags (ESTs). The RNA expression patterns of eight genes correctly classified all 67 samples as belonging to the B cell or to the T cell lineage. Ninety-two to 100% of B-lineage samples could be correctly assigned to the protein-defined developmental stage by the RNA expression pattern of 29 genes. By contrast, RNA expression patterns of 39 genes were necessary to correctly assign 85–100% of T-lineage samples to the correct developmental stage. The sets of genes used for these classifications contain ESTs as well as known genes that have not previously been associated with lymphocyte development. Graphical display of the classifications shows that B-lineage samples are well separated from T-lineage samples, and samples from the five stages of B cell development are well separated from each other. By contrast, samples from the eight stages of T cell development cannot be separated precisely. We conclude that the protein markers currently widely used for separating stages of B cell development better identify molecularly distinct stages than those used for separating stages of T cell development.

Key Words: Microarray • B/T cell lineage • Principal Component Analysis


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
B and T lymphocytes are closely related cell types. They separate lately during hematopoietic ontogeny and develop through a sequence of cellular stages before becoming mature lymphocytes expressing antigen-specific receptors. These cellular differentiation programs include somatic recombination of immunoglobulin (Ig) heavy- and light-chain gene loci in B cells and of T cell receptor ß and {alpha} (TCR-ß/{alpha}) chains in T cells. The recombination status of these loci can be used to temporarily order and define the cellular stages [1 , 2 ].

Cells undergoing this differentiation process express a stage- and lineage-specific set of surface markers. The B cell classification scheme of Rolink et al. [3 ] uses four cell-surface markers to distinguish among the five stages of B cell development (B220, c-kit, CD25, and surface IgM (sIgM); Fig. 1A ]. In T cell development, the expression of four genes (CD25, CD44, CD4, and CD8) distinguishes eight consecutive cellular differentiation stages (Fig. 1B) [4 ]. Here, cellular stages are identified by the specific expression patterns of only a few genes. Although these stage assignments have been shown to perform well in a number of experiments, there might be additional, unknown marker genes. These might distinguish more precisely between the currently known stages, or they might help to define novel subpopulations functionally distinct from the currently known stages.



View larger version (31K):
[in this window]
[in a new window]
 
Figure 1. Markers used for flow cytometric cell sorting of (A) B and (B) T cell precursors. The cellular stages examined in the current study are represented as circles containing the stage designation, with the most immature stages on the left and the most mature stages on the right. Markers used for flow cytometric sorting are in boxes below the respective stage. Note that both CD4 single-positive (SP) and CD8 SP T cells are mature T cells originating from DPS cells. DN: CD4, CD8 Double-negative; DPs: CD4, CD8 Double-positive, small; DPL: CD4, CD8 Double-positive, large.

 
We previously established gene expression profiles of thymic T cell and bone marrow (BM) B cell precursors [5 , 6 ]. For generation of these datasets, cells from five consecutive stages of mouse B cell development in BM and from eight consecutive stages of thymic T cell development have been purified ex vivo by flow cytometric cell sorting. Gene expression profiles were generated using high-density oligonucleotide arrays [7 ], interrogating a total of 13,026 transcripts. These gene expression profiles have been described earlier [5 , 6 ]. These studies comprehensively describe expression patterns of the more than 1000 genes that can be detected as differentially expressed by standard statistical methods and compare those expression patterns between B and T cell development. The rationale of the present study was fundamentally different. Here, we attempted to identify as few genes as possible that most precisely separate the different cellular populations from each other, using statistical algorithms especially suited for this purpose. This approach not only identifies genes that could serve as novel markers for identification of lymphocyte developmental stages, but it also evaluates how well the currently used markers separate cellular populations with distinct gene expression profiles. This type of investigation is not possible with the analysis as described previously [5 , 6 ].

For identification of those genes that were able to separate the distinct cellular populations, we used supervised algorithms previously applied to the classification of leukemias [8 , 9 ], other cancers [10 , 11 ], and cancer cell lines, according to the susceptibility to chemotherapeutic agents [12 ]. Briefly, all genes are first ranked for their ability to distinguish between the classes. Next, different numbers of the top-ranking genes are assembled into models. Based on these models, the samples were classified into two categories using a weighted voting scheme, where every gene casts a vote for the sample to belong to one of the two classes. Next, the models were tested using a leave-one-out cross-validation approach. As a reference, an empirical distribution obtained by permuting the phenotype class labels was used. To avoid over-training, a final model was generated using the smallest number of genes capable of performing the desired classification with high confidence.

The accuracy of this classification process can be analyzed in two ways: First, the proportion of correctly predicted samples among all samples is recorded, resulting in a percentage of correctly classified samples (apparent classification accuracy). Second, by means of Principal Component Analysis (PCA), individual samples are represented on a two-dimensional plot according to the expression patterns of the genes used for classification. This graphical representation enables us to assess how well samples from different cell populations can be separated. We show that this strategy is able to identify novel lineage and stage-specific marker genes and that the currently used protein markers better identify molecularly distinct stages in B cell development than those for T cell development.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Gene expression datasets
The B cell and the T cell development gene expression datasets described here have been published previously in detail [5 , 6 ]. Briefly, for the isolation of B cell precursors, total femoral BM cells of 5- to 6-week-old C57/BL6 mice (n=4 per experiment) were aliquotted into three parts. Cells were stained with monoclonal antibodies (mAb) as shown in Figure 1A , and five populations representing consecutive cellular differentiation stages were sorted by fluorescein-activated cell sorter. A total of 50,000 (pre-BI, large pre-BII) or 150,000 (small pre-BII, immature, and mature B) cells were sorted directly into TRIzol RNA isolation reagent (Life Technologies, Gaithersburg, MD) at 50,000 cells/500 µl TRIzol.

For isolation of thymic T cell precursors, cell suspensions from 4-week-old B6 mice (10 mice per experiment) were divided in two portions. One portion was used for the isolation of DP and SP thymocytes and was directly stained with CD4 and CD8 mAb (Fig. 1B) . The second portion of the sample was used to isolate DN cells. First, CD4- and CD8-positive cells were removed by complement-mediated lysis. Subsequently, samples were stained with a cocktail of lineage-specific mAb (B220, NK11, CD3, CD4, CD8), antigen-presenting cell-conjugated together with CD25 (fluorescein isothiocyanate-conjugated) and CD44 (phycoerythrin-conjugated). Populations were sorted as shown in Figure 1B .

A cell purity of >=98% was routinely achieved with flow cytometric profiles, expression patterns of genes known to be involved in lymphocyte development, and independent confirmation by polymerase chain reaction of some genes newly found to be differentially expressed can be reviewed in refs. [5 , 6 ].

RNA was then subjected to two rounds of in vitro transcription-based RNA amplification as described earlier [5 , 13 , 14 ]. Affymetrix Mu11k subA and subB GeneChip® arrays were hybridized, washed, stained, and scanned according to the manufacturer’s specifications. Five to six independent, replicate experiments were performed per cell population examined. Scanned raw data images were processed with Affymetrix GeneChip v3.2 software, resulting in processed image (.cel) and numerical (.chp) files.

Data analysis
Arrays were normalized for differences in overall fluorescence using a piecewise running median line fitted onto an invariant set of synthesis features as a normalization curve [15 ]. "Average difference" values were then calculated based on the normalized fluorescence levels as a measure of transcript abundance. The average differences were log-transformed, and class prediction was performed according to the method of Golub et al. [8 ], as modified by Pomeroy et al. [16 ].

Notably, the signal-to-noise ratio of gene x was defined as Sx = (µ1–µ2)/({sigma}1+{sigma}2), where µk and {sigma}k denote the mean expression and standard deviation of gene x in class k (k=1,2). According to a specified number of "informative" genes (e.g., 20), the best discriminating genes are selected. For each informative gene, a decision limit is calculated as bx = (µ12)/2. To classify a new sample, the gene expression levels of informative genes are taken, and for each gene x and sample y, a so-called vote is calculated as Vx = Sx (gxy–bx), where gxy denotes expression level of gene x in sample y. The votes of all informative genes are summed up ("weighted voting"), and depending on the sign of this sum, the new sample is classified as class 1 or class 2. The confidence in the prediction is calculated as |{Sigma} Vx/{Sigma}|Vx||.

To assess the significance of each gene, a permutation test is performed, which determines signal-to-noise ratios when class labels are permuted randomly.

To assess the robustness of the classifier, a leave-one-out cross-validation is performed. Apparent accuracy is the rate of correctly classified samples.

In addition to Golub’s method [8 ], we apply a heuristic approach to select a minimal set of differential genes with high classification accuracy: For each informative gene, accuracy and confidence are calculated. The gene with highest accuracy and confidence is entered into the model. Next, we add each of the remaining informative genes, one-by-one, into the model and repeat the leave-one-out cross-validation. If the gene improves accuracy and confidence as measured in the leave-one-out cross-validation, it is added to the weighted voting model; otherwise, it is discarded. By this method, a subset of informative genes is selected, which is optimized in terms of accuracy and confidence.

For PCA, the program JExpress (www.molmine.com) was used.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Classification of samples according to the B or T cell lineage
First, we tested the ability of our gene expression datasets to classify samples according to their cellular lineage of origin, i.e., as B- or as T-lineage-derived cells. All samples derived from the B cell lineage (26 samples from five developmental stages) formed one class, and all samples derived from the T cell lineage (41 samples from eight developmental stages) formed the other class.

Figure 2 shows the performance of models containing different numbers of genes in predicting whether the samples belong to the B or T cell lineage, respectively. Here, classification accuracy is defined as number of samples classified correctly, divided by the total number of samples. High-confidence values (maximum 1) indicate a homogeneous pattern of the different genes in the model.



View larger version (15K):
[in this window]
[in a new window]
 
Figure 2. Performance of different models in distinguishing between B- and T-lineage cells. x-axis, Number of informative genes used for initiation of the model-building process. Left y-axis, Classification accuracy and confidence of the model initiated with the respective number of informative genes. Right x-axis, Number of genes remaining in the final, optimized model.

 
Initiating the model-building process with the top 10 informative genes, a classification accuracy of 100% and confidence of 0.95 were achieved with an optimized model containing eight genes. Notably, incorporation of more genes into the model-building process reduces classification accuracy and confidence (Fig. 2) , so this eight-gene model was chosen as the optimal one.

Of the eight genes forming this most reliable model, three are highly expressed in T-lineage cells (Fig. 3A ). Two of these (TCR-ß chain, Fyn-binding protein/Src-like adaptor protein) are known to be expressed in T cells [17 , 18 ]. The third probe set is an EST with no known homologues. The remaining five classifying probe sets are up-regulated in B-lineage cells and represent four distinct genes (Fig. 3B) . The early B cell factor is a known B cell-specific transcription factor [19 ], and the µ chain-associated protein 8HS20/VpreB3 complexes with Ig µ heavy chains during development [20 ]. {alpha}-Globin has not previously been shown to be expressed in lymphocytes or B cells in particular. However, a recent study shows expression of a human {alpha}-globin transgene in mouse BM but not thymus [21 ]. Again, one of the newly identified markers is an EST without known homologues.



View larger version (44K):
[in this window]
[in a new window]
 
Figure 3. Expression pattern of eight genes distinguishing between B- and T-lineage samples. x-axis, Forty-one T-lineage samples (left) and 26 B-lineage samples (right). y-axis, Average difference intensity. (A) Genes up-regulated in T-lineage samples; (B) genes up-regulated in B-lineage samples. EST, Expressed sequence tag.

 
To determine how well the B- and T-lineage samples are separated using this eight-gene model, we used PCA. Figure 4A shows a two-dimensional graph based on the first two principal components (see legend for details). 90.5% of the total variance is retained, and the B- and T-lineage samples separate perfectly.



View larger version (23K):
[in this window]
[in a new window]
 
Figure 4. (A) B- and T-lineage distinction based on the eight-gene model. In this PCA plot, every dot represents one sample derived from B- or from T-lineage cells. The expression levels of the eight genes in the model distinguishing optimally between the two lineages were transformed into principal components, and the first principal component contained most of the variation, the second principal component contained most of the variation remaining after the first principal component has been subtracted, and so on. According to the firsttwo principal components, the samples are then positioned in the two-dimensional plot. This allows for visualization of higher-dimensional data in a two-dimensional plane. (B) Stage distinction within the B cell lineage. PCA based on the five stage-specific models containing a total of 29 genes was performed, and the first two principal components were plotted. Symbols indicate samples derived from different B-lineage stages (early precursors to mature B cells). (C) Stage distinction within the T cell lineage. PCA based on the eight stage-specific models containing a total of 39 genes was performed, and the first two principal components were plotted. Symbols indicate samples derived from different T-lineage stages (early precursors to mature T cells).

 
Surprisingly, this highly specific model does not contain well-known markers such as CD19, CD20, or CD3. CD19 first appears in a model based on 80 informative genes with a classification accuracy of 97%, whereas CD20 and CD3 cannot be found in any model (data not shown). Thus, the performance of the eight genes identified here as novel lineage-specific markers outperforms the known markers, at least on the RNA level.

Classification of B-lineage samples according to developmental stages
We next evaluated how well our machine-learning algorithm would be able to classify individual B cell precursor cell stages according to the global RNA gene expression program. This was achieved by classifying samples from one particular stage versus all others in turn for every developmental stage. The relative expression of the marker is characterized by the signal-to-noise ratio Sx (Table 1 ). All the Sx values of the novel marker genes are below the 0.1 or above the 99.9 percentile of those obtained after randomly permuting the dataset (data not shown), indicating that the association between RNA expression pattern and developmental stage is statistically significant.


View this table:
[in this window]
[in a new window]
 
Table 1. Genes Contained in the Five Models for Distinction of Stages in the B Cell Lineage

 
It is interesting that the number of genes used in these optimized models and the confidence achieved vary (Table 1) : Expression levels of five genes predict a sample to be derived from mature B cells with a confidence of 0.98, and a model of 13 genes reaches a confidence of only 0.64 in predicting immature B cell origin of the sample. However, more than 90% of all samples are classified correctly.

Of seven probe sets contained in the pre-BI cell prediction model, three are underexpressed in pre-BI cells as compared with the other B cell precursor stages. Two of these are specific for Ig{kappa} sequences, and the third one is specific for an EST contained in one UniGene cluster with an Fc receptor family member [22 ].

Of the four probe sets overexpressed in pre-BI cell as compared with the remainder of the B cell precursor stages, one is an EST contained in one UniGene cluster with and highly similar to the growth factor receptor-bound protein 7, an Src homology 2 domain containing protein involved in the human epidermal growth factor 2 signaling pathway in breast cancer [23 ]. The other three are known genes. Endoglin is a transforming growth factor-ß receptor [24 ]. Thy-1 is a well-known cell-surface glycoprotein thought to be specific for thymocytes and neurons with a possible role in signaling from the T cell antigen receptor [25 ]. Finding this gene as a pre-BI cell-specific marker indicates that Thy-1 has a previously unrecognized role specifically in early B cell development or that these fully committed precursors still show some degree of multilineage gene expression characteristic of uncommitted precursors.

The model for predicting large pre-BII cells contains only citron, a putative rho/rac effector that is otherwise poorly characterized [26 ]. Still, up-regulation of this gene correctly predicts 92% of the samples.

Three probe sets form the best model for predicting small pre-BII cells. One EST contained in one UniGene cluster with and very similar to the interferon (IFN)-related developmental regulator 2 is underexpresed in small pre-BII cells. The remaining two probe sets are specific for protamine, a sperm nuclear basic protein with a function similar to histones [27 ].

The model predicting immature B cells contains 13 genes; together, these correctly predict 96% of the samples. Seven probe sets are underexpressed in immature B cells. These contain the Ig surrogate light chain {lambda}5. This gene is expressed only in pre-BI and a small subpopulation of large pre-BII cells [28 ]. It appears as a negative, immature B cell-specific marker, as low-level signals are uniformly detected in the small pre-BII cell samples, and one of five mature B cell samples shows a signal above background, which is likely to be an experimental outlier. Other negative markers include ESTs contained in UniGene clusters with the thymocyte B cell antigen, the kinesin-related protein KIFC5A, and the antimicrobial peptide hepcidin. One EST has no known homologues. Known genes that serve as negative markers include the prostaglandin receptor EP3 subtype and gelsolin, an actin filament severing and capping protein that is implicated in actin remodeling in growing and in apoptotic cells [29 ]. The model contains the following up-regulated genes: acrogranin, a secreted factor modifying mouse preimplantation development; properdin, a molecule stabilizing the C3 convertase in the alternative, complement-activation pathway [30 ], one antibody sequence, and three ESTs. All ESTs are contained in one UniGene cluster with a IFN-{gamma}-inducible lysosomal thiol reductase, which facilitates the processing and presentation to antigen-specific T cells of protein antigens containing disulfide bonds [31 ].

Samples derived from mature B cells can be predicted with 100% accuracy using a model consisting of five probe sets. Two of them are underexpressed: one EST without known homologues and the ß-galactoside-binding protein (bgbp), an autocrine-negative growth regulator. Three probe sets overexpressed in mature B cells represent one single gene, apolipoprotein B.

PCA tested resolving the developmental stages (Fig. 4B) . A near-perfect separation of all stages is achieved, and 81.4% of the total variance is retained.

Classification of T-lineage samples according to developmental stages
Finally, a similar analysis was performed for samples derived from eight stages of T cell differentiation. Thirty-nine genes are contained in the cross-validated models (Table 2 ). Again, all the signal-to-noise ratios differ significantly from the values obtained after random permutation of the data (data not shown); thus, the association between RNA gene expression and developmental stage is statistically significant.


View this table:
[in this window]
[in a new window]
 
Table 2. Genes Contained in the Eight Models for Distinction of Stages in the T Cell Lineage

 
The DN1-specific model contains six genes, four of which are underexpressed in DN1 cells. Strikingly, five of six probe sets in the model are ESTs without any known genes contained in their UniGene clusters. The remaining gene is a calcium-binding protein from the S100 family. However, 98% of the samples are classified correctly.

Surprisingly, highly accurate models are also identified for the DN2, DN3, and DN4 cell populations, as we have shown earlier that these have very similar gene expression profiles [6 ]. For predicting DN2 cells, a combination of seven genes achieves a classification accuracy of 93%. Only two ESTs are contained in the model, both without homologous genes. The hyltk gene is a CSK homologous C-terminal Src kinase shown to be active in platelets and megacaryocytes [32 ]. T11 is a very late antigen-4 ligand serving as a cell adhesion molecule [33 ]. TRA1 regulates transmembrane transport of plasma membrane phospholipids [34 ], and ADSEVERIN has a gelsolin-like function [35 ]. Mast cell carboxypeptidase A is an extracellular protease.

Prediction of DN3 cells uses two EST probe sets without known homologues. It is interesting that one of them is already contained in the DN2 model, underlining necessity to use more than one marker gene for classification in some cases.

Models for predicting DN4 and DPL cells are based almost exclusively on ESTs, and only one gene is contained in either of the two models, respectively. Nevertheless, classification accuracy is very high, with 98% and 100% for the DN4- and DPL-specific models, respectively. The putative pheromone receptor VR10 serves as a negative marker for DN4 cells, and another S100 family protein, calgizzarin/S100C/endothelial monocyte-activating polypeptide I, constitutes a positive marker for DPL cells.

DPS cells are characterized by exit from the cell cycle after TCR-{alpha}ß selection, a process that involves down-regulation of many genes. Thus, nine of the 11 genes needed to achieve a classification accuracy of 98% are down-regulated. These include cell cycle-related genes, such as a D-type cyclin or the BRCA-1 breast- and ovarian cancer-susceptibility gene. Of the two positive markers, one is an EST, and the other is dynamin, a microtubule motor protein.

Three genes correctly predict 93% of the CD4 SP cell-derived samples. Again, the predictors come from a variety of functional classes. Vascular endothelial growth factor B is a growth factor for endothelial cells. Annexin IV regulates membrane behavior in a calcium-dependent manner. These two genes serve as negative markers, and the creatine kinase B gene is contained as a positive marker.

In contrast to these highly specific models, prediction of CD8 single-positive T cell appears significantly more difficult. The best model involves only one gene, asparagine synthetase. The classification accuracy achieved is 85%, compared with >90% for the other stage-specific models.

Using PCA, only 63.5% of the total variance is retained by the first two components, and only DPL and DPS cells separate clearly from the rest of the samples (Fig 4C) . Thus, the amount of variance retained in the first two principal components is not sufficient for a clear distinction between the different T cell precursor stages.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The present study evaluated how well cell samples from developmental stages of B and T cell development can be assigned to their cellular lineage of origin and developmental stage (as defined by expression of marker genes on the protein level) by their global mRNA-based gene expression patterns. This RNA-based classification works well for the distinction between B- and T-lineage cells, as well as for distinction of B cell developmental stages. However, stages of T cell development are much more difficult to classify by their global RNA-based gene expression patterns.

Before classification is performed, the algorithms used result in a list of genes that could serve as novel markers for a given developmental stage. As gene expression on the RNA level must not always be concordant with protein expression, the first question would be whether the RNA-based analysis performed here actually discovers genes indicative for the cellular stages as defined on the protein level. Two lines of evidence show that this is indeed the case. First, in the leave-one-out cross-validation, the samples are usually classified to the correct (protein-defined) stage when classified according to the markers identified on the RNA level. Second, analysis of randomly permuted datasets shows that the association between gene expression patterns and developmental stage is statistically highly significant.

The marker genes thus identified were assembled in lineage- or stage-predicting models. The number of genes contained in any model had significant influence on the quality of the prediction, as inclusion of more genes than required can lead to misclassification of individual samples (Fig. 2) . This effect is termed "overfitting". Therefore, the final models contain as few genes as possible, that is, between one and 13 genes. This is smaller than most of the models described in the literature that have been identified with similar algorithms [8 , 12 ]. Although overfitting is effectively controlled by small model sizes, this might render the model more sensitive toward experimental noise. Obviously, these two effects must be balanced against each other in choosing the optimal size of the model. This has been achieved in choosing models with maximum classification accuracy and confidence.

Surprisingly, in the distinction between B- and T-lineage cells, none of the well-established marker genes such as CD19, CD20, CD3 appears in the optimal or another highly reliable model. As these markers are often associated with antigen-receptor complexes [1 , 2 ], they are developmentally regulated and show substantial variation in expression throughout the different precursor stages. Markers differentiating B- from T-lineage cells should, however, be uniformly expressed in all stages of one of the two lineages, as shown for the novel marker genes identified here (Fig. 3A and 3B) . Although most of these genes have previously been described as expressed in B- or T-lineage cells, it has not been evident that they could, in combination, perfectly predict to which of the two lineages investigated a cell belongs.

In B cell development, models containing between one and 13 genes reach classification accuracies between 92% and 100%. Notably, these models contain genes that are up-regulated only in the stage under investigation as well as genes that are specifically not expressed in that stage. It will be interesting to determine whether these genes have some functional relevance, i.e., whether a block in development can be induced by enforced expression of these genes in the respective stage.

Many genes that have not previously been associated with lymphocyte development, including ESTs, are detected as novel marker genes. ApoE, for example, although known to have immune-regulating function [36 ], has never been assessed in B cell development but appears repeatedly in the mature B cell-specific model.

Classification of T cell developmental stages on the RNA level is more difficult than in B cell development. First, only one of eight populations examined can be classified with 100% accuracy, compared with three out of five stages of B cell development. Second, when analyzed by PCA, substantially less of the total variation is retained in the first two principal components, and the stages are not separated well on the corresponding plot.

One possible explanation is that the DN1 cell population is included in this analysis. These cells have multilineage potential [37 , 38 ] and express many genes that are down-regulated during T cell differentiation and re-expressed in mature T cells [6 ]. However, omitting the DN1 cell population from the analysis does not result in an improved accuracy of the classification (data not shown).

A second explanation is that the RNA gene expression patterns are more similar in the different stages of T cell development than they are in the different stages of B cell development. This indicates that separation of T cell precursors by expression of CD25, CD44, CD4, and CD8 incompletely distinguishes distinct entities on the RNA level. It might well be possible that the genes identified here as significantly associated with the T cell developmental stages could help to better define those precursors.

Received February 28, 2003; revised April 25, 2003; accepted May 27, 2003.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Benoist, C., Mathis, D. (1999) T-lymphocyte differentiation and biology Paul, W. E. eds. Fundamental Immunology ,367-411 Lippincott-Raven Philadelphia, PA; New York.
  2. Melchers, F., Rolink, A. (1999) B-lymphocyte development and biology Paul, W. E. eds. Fundamental Immunology ,183-224 Lippincott-Raven Philadelphia, PA; New York.
  3. Rolink, A., Grawunder, U., Winkler, T. H., Karasuyama, H., Melchers, F. (1994) IL-2 receptor alpha chain (CD25, TAC) expression defines a crucial stage in pre-B cell development Int. Immunol. 6,1257-1264[Abstract/Free Full Text]
  4. Godfrey, D. I., Kennedy, J., Suda, T., Zlotnik, A. (1993) A developmental pathway involving four phenotypically and functionally distinct subsets of CD3-CD4-CD8-triple-negative adult mouse thymocytes defined by CD44 and CD25 expression J. Immunol. 150,4244-4252[Abstract]
  5. Hoffmann, R., Seidl, T., Neeb, M., Rolink, A., Melchers, F. (2002) Changes in gene expression profiles in developing B cells of murine bone marrow Genome Res. 12,98-111[Abstract/Free Full Text]
  6. Hoffmann, R., Bruno, L., Seidl, T., Rolink, A., Melchers, F. (2003) Rules for gene usage inferred from a comparison of large-scale gene expression profiles of T and B lymphocyte development J. Immunol. 170,1339-1353[Abstract/Free Full Text]
  7. Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., Brown, E. L. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays Nat. Biotechnol. 14,1675-1680[CrossRef][Medline]
  8. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., Lander, E. S. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring Science 286,531-537[Abstract/Free Full Text]
  9. Armstrong, S. A., Staunton, J. E., Silverman, L. B., Pieters, R., den Boer, M. L., Minden, M. D., Sallan, S. E., Lander, E. S., Golub, T. R., Korsmeyer, S. J. (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia Nat. Genet. 30,41-47[CrossRef][Medline]
  10. Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C. H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J. P., Poggio, T., Gerald, W., Loda, M., Lander, E. S., Golub, T. R. (2001) Multiclass cancer diagnosis using tumor gene expression signatures Proc. Natl. Acad. Sci. USA 98,15149-15154[Abstract/Free Full Text]
  11. Bhattacharjee, A., Richards, W. G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E. J., Lander, E. S., Wong, W., Johnson, B. E., Golub, T. R., Sugarbaker, D. J., Meyerson, M. (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses Proc. Natl. Acad. Sci. USA 98,13790-13795[Abstract/Free Full Text]
  12. Staunton, J. E., Slonim, D. K., Coller, H. A., Tamayo, P., Angelo, M. J., Park, J., Scherf, U., Lee, J. K., Reinhold, W. O., Weinstein, J. N., Mesirov, J. P., Lander, E. S., Golub, T. R. (2001) Chemosensitivity prediction by transcriptional profiling Proc. Natl. Acad. Sci. USA 98,10787-10792[Abstract/Free Full Text]
  13. Luo, L., Salunga, R. C., Guo, H., Bittner, A., Joy, K. C., Galindo, J. E., Xiao, H., Rogers, K. E., Wan, J. S., Jackson, M. R., Erlander, M. G. (1999) Gene expression profiles of laser-captured adjacent neuronal subtypes Nat. Med. 5,117-122[CrossRef][Medline]
  14. Eberwine, J., Yeh, H., Miyashiro, K., Cao, Y., Nair, S., Finnell, R., Zettel, M., Coleman, P. (1992) Analysis of gene expression in single live neurons Proc. Natl. Acad. Sci. USA 89,3010-3014[Abstract/Free Full Text]
  15. Li, C., Wong, W. H. (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection Proc. Natl. Acad. Sci. USA 98,31-36[Abstract/Free Full Text]
  16. Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., Kim, J. Y., Goumnerova, L. C., Black, P. M., Lau, C., Allen, J. C., Zagzag, D., Olson, J. M., Curran, T., Wetmore, C., Biegel, J. A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D. N., Mesirov, J. P., Lander, E. S., Golub, T. R. (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression Nature 415,436-442[CrossRef][Medline]
  17. Griffiths, E. K., Krawczyk, C., Kong, Y. Y., Raab, M., Hyduk, S. J., Bouchard, D., Chan, V. S., Kozieradzki, I., Oliveira-Dos-Santos, A. J., Wakeham, A., Ohashi, P. S., Cybulsky, M. I., Rudd, C. E., Penninger, J. M. (2001) Positive regulation of T cell activation and integrin adhesion by the adapter Fyb/Slap Science 293,2260-2263[Abstract/Free Full Text]
  18. Peterson, E. J., Woods, M. L., Dmowski, S. A., Derimanov, G., Jordan, M. S., Wu, J. N., Myung, P. S., Liu, Q. H., Pribila, J. T., Freedman, B. D., Shimizu, Y., Koretzky, G. A. (2001) Coupling of the TCR to integrin activation by Slap-130/Fyb Science 293,2263-2265[Abstract/Free Full Text]
  19. Lin, H., Grosschedl, R. (1995) Failure of B-cell differentiation in mice lacking the transcription factor EBF Nature 376,263-267[CrossRef][Medline]
  20. Shirasawa, T., Ohnishi, K., Hagiwara, S., Shigemoto, K., Takebe, Y., Rajewsky, K., Takemori, T. (1993) A novel gene product associated with mu chains in immature B cells EMBO J 12,1827-1834[Medline]
  21. Feng, D. X., Liu, D. P., Huang, Y., Wu, L., Li, T. C., Wu, M., Tang, X. B., Liang, C. C. (2001) The expression of human alpha-like globin genes in transgenic mice mediated by bacterial artificial chromosome Proc. Natl. Acad. Sci. USA 98,15073-15077[Abstract/Free Full Text]
  22. Mechetina, L. V., Najakshin, A. M., Volkova, O. Y., Guselnikov, S. V., Faizulin, R. Z., Alabyev, B. Y., Chikaev, N. A., Vinogradova, M. S., Taranin, A. V. (2002) FCRL, a novel member of the leukocyte Fc receptor family possesses unique structural features Eur. J. Immunol. 32,87-96[CrossRef][Medline]
  23. Margolis, B., Silvennoinen, O., Comoglio, F., Roonprapunt, C., Skolnik, E., Ullrich, A., Schlessinger, J. (1992) High-efficiency expression/cloning of epidermal growth factor-receptor-binding proteins with Src homology 2 domains Proc. Natl. Acad. Sci. USA 89,8894-8898[Abstract/Free Full Text]
  24. Arthur, H. M., Ure, J., Smith, A. J., Renforth, G., Wilson, D. I., Torsney, E., Charlton, R., Parums, D. V., Jowett, T., Marchuk, D. A., Burn, J., Diamond, A. G. (2000) Endoglin, an ancillary TGFbeta receptor, is required for extraembryonic angiogenesis and plays a key role in heart development Dev. Biol. 217,42-53[CrossRef][Medline]
  25. Seki, T., Chang, H. C., Moriuchi, T., Denome, R., Ploegh, H., Silver, J. (1985) A hydrophobic transmembrane segment at the carboxyl terminus of thy-1 Science 227,649-651[Abstract/Free Full Text]
  26. Madaule, P., Furuyashiki, T., Reid, T., Ishizaki, T., Watanabe, G., Morii, N., Narumiya, S. (1995) A novel partner for the GTP-bound forms of rho and rac FEBS Lett 377,243-248[CrossRef][Medline]
  27. Ausio, J. (1999) Histone H1 and evolution of sperm nuclear basic proteins J. Biol. Chem. 274,31115-31118[Free Full Text]
  28. Rolink, A., Haasner, D., Melchers, F., Andersson, J. (1996) The surrogate light chain in mouse B-cell development Int. Rev. Immunol. 13,341-356[Medline]
  29. Sun, H. Q., Yamamoto, M., Mejillano, M., Yin, H. L. (1999) Gelsolin, a multifunctional actin regulatory protein J. Biol. Chem. 274,33179-33182[Free Full Text]
  30. Goundis, D., Reid, K. B. (1988) Properdin, the terminal complement components, thrombospondin and the circumsporozoite protein of malaria parasites contain similar sequence motifs Nature 335,82-85[CrossRef][Medline]
  31. Maric, M., Arunachalam, B., Phan, U. T., Dong, C., Garrett, W. S., Cannon, K. S., Alfonso, C., Karlsson, L., Flavell, R. A., Cresswell, P. (2001) Defective antigen processing in GILT-free mice Science 294,1361-1365[Abstract/Free Full Text]
  32. Hamaguchi, I., Yamaguchi, N., Suda, J., Iwama, A., Hirao, A., Hashiyama, M., Aizawa, S., Suda, T. (1996) Analysis of CSK homologous kinase (CHK/HYL) in hematopoiesis by utilizing gene knockout mice Biochem. Biophys. Res. Commun. 224,172-179[CrossRef][Medline]
  33. Clayton, L. K., Sayre, P. H., Novotny, J., Reinherz, E. L. (1987) Murine and human T11 (CD2) cDNA sequences suggest a common signal transduction mechanism Eur. J. Immunol. 17,1367-1370[Medline]
  34. Wiedmer, T., Zhou, Q., Kwoh, D. Y., Sims, P. J. (2000) Identification of three new members of the phospholipid scramblase gene family Biochim. Biophys. Acta 1467,244-253[Medline]
  35. Rodriguez Del Castillo, A., Lemaire, S., Tchakarov, L., Jeyapragasan, M., Doucet, J. P., Vitale, M. L., Trifaro, J. M. (1990) Chromaffin cell scinderin, a novel calcium-dependent actin filament-severing protein EMBO J 9,43-52[Medline]
  36. Mahley, R. W., Rall, S. C., Jr (2000) Apolipoprotein E: far more than a lipid transport protein Annu. Rev. Genomics Hum. Genet. 1,507-537[CrossRef][Medline]
  37. Ardavin, C., Wu, L., Li, C. L., Shortman, K. (1993) Thymic dendritic cells and T cells develop simultaneously in the thymus from a common precursor population Nature 362,761-763[CrossRef][Medline]
  38. Lee, C. K., Kim, J. K., Kim, Y., Lee, M. K., Kim, K., Kang, J. K., Hofmeister, R., Durum, S. K., Han, S. S. (2001) Generation of macrophages from early T progenitors in vitro J. Immunol. 166,5964-5969[Abstract/Free Full Text]



This article has been cited by other articles:


Home page
J. Immunol.Home page
M. E. Hystad, J. H. Myklebust, T. H. Bo, E. A. Sivertsen, E. Rian, L. Forfang, E. Munthe, A. Rosenwald, M. Chiorazzi, I. Jonassen, et al.
Characterization of Early Stages of Human B Cell Development by Gene Expression Profiling
J. Immunol., September 15, 2007; 179(6): 3662 - 3671.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
jlb.0203085v1
74/4/602    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hoffmann, R.
Right arrow Articles by Dugas, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hoffmann, R.
Right arrow Articles by Dugas, M.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS