|
|
||||||||
a Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada;
b Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada;
c Institute of Biomaterials and Biomedical Engineering, and
d Department of Anatomy and Cell Biology, University of Toronto, Toronto, Ontario, Canada
Key Words. Stochastic model • Lineage analysis • Osteoprogenitor • Global amplification PCR • Stem cells
Jane E. Aubin, Ph.D., Department of Anatomy and Cell Biology, Faculty of Medicine, University of Toronto, Room 6255 Medical Sciences Building, One King's College Circle, Toronto, Ontario M5S 1A8, Canada. Telephone: 416-978-4220; Fax: 416-978-3954; e-mail: jane.aubin{at}utoronto.ca
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Morphologically and molecularly recognizable osteoblasts and bone nodules form in such cultures at predictable and reproducible times after plating [5, 13]. The process is often subdivided into three stages: A) proliferation; B) extracellular matrix development and maturation, and C) mineralization, with characteristic changes in gene expression at each stage [14]. Expression of osteoblast-associated genes (e.g., type I collagen [COLL I], alkaline phosphatase [ALP], osteopontin [OPN], bone sialoprotein [BSP], osteocalcin [OCN], parathyroid hormone/parathyroid hormone-related protein [PTH/PTHrP] receptor [PTH1R] and others) is asynchronously acquired and/or lost as the progenitor cells differentiate and matrix matures and mineralizes. In general, ALP is thought to increase then decrease when mineralization is well progressed, OPN is high during both proliferation and differentiation stages and is upregulated prior to certain other matrix proteins, including BSP and OCN, with OCN first appearing in postproliferative osteoblasts approximately concomitant with mineralization [9, 15] (Fig. 1A
). However, many of these conclusions are based, in large part, on models in which differentiation and bone nodule formation are not synchronous throughout the population, and many of the available osteoblast markers used are not restricted to osteoblasts but may be expressed at substantial levels by other cells (e.g., fibroblasts) in the population, leading to discrepancies in the reported osteoblast developmental sequence in the published literature [5, 16]. In addition, although limiting dilution analysis of bone-derived cell populations, such as those used here (i.e., the RC cell model), has suggested that osteoprogenitor cell differentiation/bone colony formation is a single-hit phenomenon with only one cell type (likely the osteoprogenitor cell) limiting [7], we have also reported evidence that osteoblast differentiation in this model appears to have a stochastic component [9, 15, 17].
|
| MATERIALS AND METHODS |
|---|
|
|
|---|
-MEM) containing 15% fetal bovine serum. Total RNA was extracted using a miniguanidine thiocyanate method, and cDNA was synthesized by oligo(dT) priming, poly(A)-tailed, and amplified by PCR with oligo(dT) primer as described [8, 21]. The colonies analyzed were collected from 12 independent cell isolations (each resulting from a pool of at least 25 calvariae) and replica plating experiments.
Southern Blots and Hybridization
Amplified cDNA (5 µl) was run on 1.5% agarose gels, transferred onto 0.2 µm-pore-size nylon membrane (ICN; Costa Mesa, CA), and immobilized by baking at 80°C for 2 hours. Prehybridization, hybridization with labeled probes for mRNAs of interest (Table 1
), and probe signal quantification were performed as described [8]. The signal intensity for each probe was standardized against total cDNA [8, 20]. For the analysis reported here, each colony is reported as expressing (+) or not expressing () a particular gene of interest relative to background; the actual level of expression is not considered (Fig. 2
).
|
|
| RESULTS |
|---|
|
|
|---|
. We define the variable Tk(i) to be the time at which gene k is first expressed in the i-th colony (i = 1,...,99 , k = 1,...,9). These times, Tk(i), are not observed directly; we can only observe whether or not they occur before the end of the experiment. That is, if there were no errors in detecting gene expression, then for each i and k we could say that Tk(i)
if and only if Bk(i) = 1. To allow for detection errors, we define qk to be the "false negative" probability that we do not detect the expression of gene k in a particular colony when in fact that gene is expressed there. In terms of conditional probabilities, this says that qk = Pr{Bk(i) = 0 | Tk(i)
} = 1 - Pr{Bk(i) = 1 | Tk(i)
}. We assume that there are no "false positive" errors, i.e., Pr{Bk(i) = 0 | Tk(i) >
} = 1. The overriding tenet of our model is that the 99 colonies are stochastically independent and governed by the same probability law. Formally, for each i = 1,...,99, we define the vector V(i) = (B1(i),...,B9(i),T1(i),...,T9(i)) consisting of the i-th colony's 18 associated variables. Then V(1),...,V(99) is a sequence of 99 independent random vectors drawn from some 18-dimensional multivariate probability distribution. Our goal is to describe the salient features of this multivariate distribution. When discussing the distribution (as opposed to the 99 observations drawn from it) we omit the superscripts and use the random vector V = (B1,...,B9,T1,...,T9).
A key aspect of our data is displayed in the 9 x 9 matrix E shown in Figure 3
. For each pair of genes k and l, let Ekl be the number of colonies in which we detect the expression of gene k but do not detect the expression of the gene l. Formally, Ekl is the number of i's such that Bk(i) = 1 and Bl(i) = 0. Observe that Ekl is 0 and E1k is large for every k = 2,3,...,9. This strongly suggests that gene 1 (OPN) is expressed before every other gene, i.e., Pr{T1 < Tk} = 1 for every k = 2,...,9. In contrast, E32 = 19 and E23 = 20, so it seems unlikely that one of genes 2 or 3 must always precede the other. This paper shows how to analyze such claims statistically. For the most part, our methods are novel adaptations of standard techniques in statistics and probability, which may be found in [22] and many other general statistics references.
|
0.05, which is equivalent to q1
1 - (0.05)1/99 = 0.030 (with 95% confidence).
It is reasonable to assume that qk increases as the average expression level decreases, and that genes with similar expression levels should have similar qk's. Therefore, from Table 1
, we obtain
![]() | (Equation 1) |
In Section A.1 of the Appendix, we generalize the method used above for bounding q1 to obtain the bounds
![]() | (Equation 2) |
Can Bone Development Be Explained by a Linear/ Deterministic Expression of Genes?
Although some aspects of the order of expression events are clear from the data (e.g., OPN is expressed early and OCN late), many others are not (e.g., does platelet-derived growth factor receptor alpha [PDGFR
] precede fibroblast growth factor receptor 1 [FGFR1] or vice versa?). We first asked whether the data are consistent with a deterministic linear order of the expression events, with the apparent inconsistency being due to observational errors.
To answer this, we suppose that the events T1,...,T9 always occur in the same order. For each k = 1,...,9, write
(k) for the label of the gene (as listed in Table 1
) that is always k th in this order. For example, if COLL always occurs second, then
(2) would be 6. Then, we would have
![]() | (Equation 3) |
The sequence,
(1),...
(9), is a permutation of 1,2,...,9. We ask if there is any permutation such that the data are statistically consistent with Equation 3
.
Each permutation would imply certain false negatives in the data. For example, suppose that Equation 3
holds, and that in colony 14 we observe B
(3)(14) = 0 and B
(7)(14)= 1. Since gene
(3) is always expressed before gene
(7), and since there are no false positives, the observation B
(3)(14) = 0 must be a false negative error. For each permutation, we computed the minimum number of false negatives that would be necessary in order to explain the data if Equation 3
were true (this is the number of values of i and k such that B(i)
(k) = 0 and there exists a value l > k such that B(i)
(l) = 1). The fewest such errors is 85, attained by two different permutations: 1,2,5,4,3,6,8,7,9 (i.e., OPN, PDGFR
, PTHrP, PTH-R, FGFR1, COLL, BSP, ALP, OCN) and 1,5,2,4,3,6,8,7,9. In Appendix A.2, we show that this many errors is very unlikely, which in turn provides strong evidence against Equation 3
and leads us to reject the possibility that there is a linear sequence of genes that accounts for the development of a bone nodule colony.
Is the Expression of One Gene Independent of the Expression of Other Genes?
We next determined, for each pair of genes k and m, whether their expressions Bk and Bm were independent. (The error probabilities qk and qm do not play a role here, since all errors are assumed to be independent.) We excluded gene 1 from this analysis, because B1(i) = 1 for every colony i, which prevents any meaningful testing of independence. To illustrate the method, consider the data for PDGFR
and FGFR1 (k = 2 and m = 3) in Table 2
. Testing it for independence using Fisher's exact test (two-tailed) [23] gives a p value of 0.788. Table 3
lists the results for all pairs. Any p value below 0.05 is sufficient to reject the independence hypothesis on any single test. However, since we are testing 28 different pairs, we can reasonably expect to see one or two p values below 0.05 purely by chance, even when the corresponding pair of variables is independent. Thus, we need to be more cautious unless the p value is very small indeed. See Section A.3 of the Appendix for further discussion of this matter. In particular, that section shows that any p value below 0.0018 can be understood as a clear case for rejecting independence, while values between 0.0018 and 0.05 should be regarded as inconclusive indications of dependence. However, of the five "inconclusive" values that appear in Table 3
, we expect all but at most one or two to be truly caused by dependence.
|
|
The following model illustrates how positive correlations could arise. Consider three events (C, D, and E) which occur at the random times TC, TD, and TE. Event C appears first. Once C occurs, then D and E each occur after an additional random (and independent) amount of time. Letting XCD be the duration of time between events C and D, and XCE the time between C and E, then TD = TC + XCD and TE = TC + XCE. Assume that XCD, XCE, and TC are independent. Then TC, TD, and TE are mutually positively correlated (in fact, Cov(Ti, Tj) = Var(TC) > 0 when i
j). Moreover, consider the binary variables BC, BD,and BE (where Bk is 1 if Tk
and is 0 otherwise). Then each of Cov(BC, BD), Cov(BC, BE), and Cov(BD, BE) is nonnegative, by the FKG Inequality [24].
Our analysis rules out situations in which negative correlations could arise. One such scenario has two alternative developmental pathways, with gene D occurring only on one pathway and gene E occurring only on the other. Then no colony would have BD = 1 = BE, and hence Cov(BD, BE) < 0 (since E(BDBE) = 0).
Can Genes Replace Each Other in the Same Developmental Program (Exchangeability)?
We showed above that genes 3, 4, and 6 are mutually dependent, in fact, positively correlated. We can determine if there is a preferred order in which these genes need to be expressed to result in bone nodule development. In Figure 3
, observe that the values of Eij for i, j
{3,4,6} are all approximately equal (all between 11 and 15). This suggests no preferred order among these three genes. We formalize this as follows.
Hypothesis A. The genes 3, 4, and 6 occur in random order. More precisely, as soon as some (early) event occurs (say at a random time T*), each of genes 3, 4, and 6 waits for a random amount of time before it is expressed. Moreover, these three waiting times are i.i.d. (independent and identically distributed).
That is, there exist i.i.d. random variables X3, X4, and X6, all independent of T*, such that Tk = T* + Xk for k = 3,4,6. The event at time T* could be the expression of gene 1 (i.e., T* = T1), the expression of some other gene not examined in this experiment, or some other event.
Under Hypothesis A, T3, T4, and T6 are not independent, but they are identically distributed. In fact, T3, T4, and T6 are exchangeable random variables under Hypothesis A. This means that the probability law of the vector (T3,T4,T6) is the same as that of (T4,T6,T3) and of its four other permutations. For example, exchangeability implies that Pr{B3 = 1, B4 = 0, B6 = 1} = Pr{B4 = 1, B6 = 0, B3 = 1} and similarly for every other permutation.
In Section A.4 of the Appendix, we use this property to devise a statistical test of Hypothesis A. As described there, the tests show that the data are clearly consistent with the hypothesis of exchangeability implied by Hypothesis A. Due to the nature of statistical hypothesis testing, we cannot say that Hypothesis A has been accepted, but at most we can say that our data does not refute it. (Anything more would require information about the power of the test, which is beyond the scope of this paper; indeed, there is no obvious choice of alternative distributions for which one should do power computations.) That is, during osteoblast development, we hypothesize that there is a set of genes necessary for phenotypic maturation, but within this set there is flexibility in the order in which the genes are expressed.
In Section A.4, we also test exchangeability of the random variables {B2, B3, B4, B5, B6 } corresponding to the five "early" genes. That is, we tested whether we could extend Hypothesis A to the group of five genes 2,3,4,5,6. As explained in Section A.4, this was rejected.
Other Precedence Constraints
In this section, we consider the three "late" genes: ALP, BSP, and OCN. First, for gene 7 (ALP), observe from Figure 3
that E7k is much smaller than Ek7 for each of k = 2,3,4,5,6. Therefore, ALP typically is expressed after genes 2 through 6. In particular, E76 = 0, which suggests
Hypothesis B. Gene 6 (COLL) is always expressed before gene 7 (ALP). Moreover, none of genes 2, 3, 4, or 5 are a prerequisite for the expression of gene 7.
By our tests of independence, genes 2 and 5 are clearly not prerequisites for gene 7. Recall that we accepted Hypothesis A, that B3, B4, and B6 are exchangeable among themselves. If gene 6 has a special relationship with gene 7 that genes 3 and 4 do not (in particular, if Hypothesis B is true), then we would expect the following to be false:
Hypothesis B*. The variables B3, B4, and B6 are exchangeable with respect to the joint distribution of genes 3, 4, 6, and 7.
As described in Section A.5 of the Appendix, our tests indicate that Hypothesis B* should be rejected, but we do not reject the exchangeability of B3 and B6 with respect to the joint distribution of genes 3, 4, and 7. This is consistent with Hypothesis B. Further data are needed before one can conclusively accept Hypothesis B, however.
Finally, we considered genes 8 (BSP) and 9 (OCN). In particular, we looked at the possibilities that Pr{T8 < T7} = 1 and that Pr{Tk < T9} = 1 for every k
8, as suggested by the matrix E of Figure 3
. We found that the statistical evidence was not strong enough to either confirm or reject these possibilities. See Section A.5 of the Appendix for details.
Summary of Expression Analysis
Combining the results of the statistical analyses, we have constructed a new model of osteoblast differentiation that is consistent with the observed relationships of expression among particular genes or subsets of genes (Fig. 1B
). Our analysis indicates that the expression of genes 2 (PDGFR
) and 5 (PTHrP) may occur independently of other aspects of osteoblast development. Conversely, gene 1 (OPN), which precedes the expression of all of the other bone-related genes, may initiate the expression of genes 3 (FGFR1), 4 (PTH-R), or 6 (COLL). Once all the genes in this group are expressed, further maturation depends upon the expression of gene 7 (ALP), which may be contingent upon gene 8 (BSP) being independently activated. Finally, when genes 6, 7, and 8 are expressed, gene 9 (OCN) is expressed and the gene cascade associated with the clonal development of osteoblast phenotype is accomplished.
| DISCUSSION |
|---|
|
|
|---|
The calculated gene expression profiles of individual clones were compared with those based on established osteoprogenitor developmental paradigms (Figs. 1A and 1B![]()
). We first asked whether there was a predetermined order to the expression of a set of nine genes associated with osteoblast development. The analysis, which depended on a reduction of sequential expression errors to levels below those due to observational error, did not support a linear or deterministic sequence. Instead, although all the tested colonies eventually developed into fully mineralized bone nodules, the range of expression profiles at different stages of development suggested that it may be possible for osteoprogenitors to differentiate into mature osteoblasts using different developmental routes (Fig. 1B
). To confirm and extend this possibility, we examined the correlation between the expression of a particular gene and any other individual gene. In some cases, strong positive correlations were found between particular pairs of genes, e.g., expression of gene 7 (ALP) required expression of gene 6 (COLL). In other cases, such as genes 2 (PDGFR
) and 5 (PTHrP), expression appeared independent of any other gene tested. One hypothesis is that acquisition of expression of these latter genes occurs at very early times in osteoblast development, i.e., at a stage prior to the earliest replica plating data points we have available. Alternatively, the expression profiles may be indicative of, or a prefacing of, potential development along pathways other than osteoblastic (e.g., chondrocytic or adipocytic) as may occur from multipotent progenitors (i.e., bipotential cells or mesenchymal stem cells) present in the population [1, 5], but these may not progress under the culture conditions in which osteoblast development predominates (and was thus selected for analysis).
Our revised hierarchy of clonogenic osteoprogenitor differentiation (Fig. 1B
) is consistent with two hypotheses of developmental history underlying osteoblast formation. One is that there is more than one pattern of sequential gene expression profiles that results in the same type of mature cell (developmental flexibility hypothesis). The other is that, as each colony matures from an undifferentiated progenitor cell, some divisions result in pathways that are either not supported by the environment (leading to lineage extinction) or give rise to other cell types that may exist in the final colony (i.e., cells expressing gene 2 (PDGFR
) or gene 5 (PTHrP)) but do not notably contribute to its overall phenotype under conditions where osteoblast development predominates (also, lineage extinction). At present, our data do not allow us to select between these two possibilities and it is possible that both occur. There is, for example, some evidence for mixed lineages in a small proportion of osteoblast colonies (e.g., adipocyte-osteoblast colonies, [25, 26]). It is also interesting to note that colony size appears to have no (detectable) impact on expression profiles obtained, i.e., as raised above, there is a stochastic component underlying CFU-O differentiation, such that a small bone colony can be as "mature" as a big colony [9, 15]. In addition, the variability seen in the rate of colony development is such that the gene expression "snapshot" taken during replica plating should give us an adequate description of all potential cell fate decisions during colony development (with the possible exception of very early genes such as gene 1, OPN). Taken together, our results argue against the existence of "rogue" differentiation routes, i.e., the expression of genes 2 and 5, or expression of genes 3 and 4, as deadends in the differentiation process, and for the possibility that their expression is required earlier during osteoblast development than was measured using our replica plating strategy. However, our results also suggest that some gene combinations are exchangeable and that higher order interactions among genes exist during osteoblast development. They support the hypothesis that great complexity underlies the interactions among groups of genes, and that further analysis will require larger gene (e.g., microarray technologies) and cellular (e.g., substantially more progenitor colonies) data sets. Interestingly, in contrast to the gene expression programs that appear to underlie early events in osteoblast development, global late gene expression profiles appear to be more fixed (Fig. 1B
). This is consistent with other well-defined gene expression sequences during the maturation of committed progenitor cells (including, e.g., erythrocytes) and may be usual for cells that have committed to a particular mature state and lost their developmental flexibility. They also support the view that different developmental routes may contribute to and/or account for the substantial molecular heterogeneity seen amongst mature osteoblasts in vitro [20] and in vivo [27] (as predicted in [12]).
Although a stochastic component defines their development, osteoblast differentiation appears to be a default pathway for progenitor cells in RC populations cultured under the conditions we describe here [5]. Whether survival of particular cell types results from a microenvironment effect, or commitment to particular lineages is initiated by the exposure of an uncommitted cell to a particular signal that, upon exit of the cell from the stem cell state, stabilizes the subsequent identity of the progeny, is not yet known. This so-called directive mechanism clearly depends on the interactions between the cellular sensing systemstypically cell surface receptorsand the cellular microenvironment. Substantial evidence exists that both these mechanisms can occur and that the particular mechanism used may be dependent on the tissue system/cellular microenvironment. For example, in hematopoiesis, several studies suggest that exposure to growth factors may not be obligatory for the differentiation of primitive cells and that, at least under certain conditions, the identity of the differentiated cell population may be intrinsically determined [28]. Particularly interesting, in this regard, is the recent demonstration that coexpression of multiple lineage-restricted genes precedes commitment in multipotent progenitors [2931]. This multilineage "priming" process [30] is consistent with the flexibility in the gene expression profiles we have seen during osteoprogenitor development and implies that the commitment of a multipotent cell to a particular pathway may reflect the stabilization of a particular subset of a group of expressed molecules. The stabilization process may occur in a stochastic manner in the absence of a particular instructive signal. Conversely, upon commitment of an undifferentiated cell, an instructive signal may stabilize a particular set or subset of expressed transcription factors, resulting in the production of specific cell types.
| CONCLUSIONS |
|---|
|
|
|---|
| APPENDIX |
|---|
|
|
|---|
30. The probability of detecting 9 and missing 4 in one of these A94 colonies is q4 (1 q9). But there were no misclassifications of this kind (since E94 = 0), and the probability of this (given A94) is (1 q4(1 q9))A94. As above, we are 95% confident that
![]() |
![]() | (Equation A1) |
We believe q9
q4, so we deduce that q9 (1 q9)
0.095, and thus q9
0.106 with 95% confidence. Inserting this bound into Equation A1
gives q4
0.106.
We repeat these calculations for E96, E97, and E98, which are all 0. Each of these cases has 30 colonies in which both genes are detected, implying that qk
0.106 with high confidence for k = 6,7,8. Alternatively, we could apply q4
0.106 to Equation 1
and deduce qk
0.106 for every k, with high confidence. An analogous calculation for E76 yields q6
0.064. By Equation 1
, the same bound should hold for q8. In summary, we have shown
![]() | (Equation A2) |
A.2 Linear/Deterministic Expression of Genes
As described in part 3 of the Results section, we ask if there can be a permutation
such that the data are statistically consistent with Equation 3
, i.e.,
![]() |
Recall that each permutation implies certain false negatives in the data. The fewest such errors is 85, attained by two different permutations: 1,2,5,4,3,6,8,7,9 and 1,5,2,4,3,6,8,7,9. We shall show that this many errors is very unlikely for any permutation, which in turn provides strong evidence against Equation 3
. We split the 99 x 9 observations Bk(i) into three groups: Group A contains B1(i), i = 1,...,99 (i.e., all OPN data); Group B contains B6(i), B8(i), i = 1,...,99 (all COLL and BSP data); and Group C contains Bk(i), k = 2,3,4,5,7,9, i = 1,...,99 (all other data). Recall that our worst-case bounds on the error probabilities for groups A, B, and C, respectively, are qA = 0.030, qB = 0.064, and qC = 0.106. Let MA, MB, and MC denote the true number of expression events {Tk(i)
} that occurred (whether observed or not) in Groups A, B, and C, respectively. We observed 99 of these events in Group A, 139 in Group B, and 386 in Group C. Hence, MA = 99, MB
139, and MC
386. Now let FA, FB, and FC be the number of false negatives in the three groups, i.e., FA = MA 99, FB = MB 139, FC = MC 386. If Equation 3
holds, then 85
FA + FB + FC = MA + MB + MC 624. For each y = A,B,C, let Gy, be a binomially distributed random variable with parameters n = My and p = qy. Then Gy is stochastically greater than Fy. It turns out that Pr{GA + GB + GC
MA + MB + MC 624} is about 103 or smaller for all values of MA, MB, and MC such that MA = 99, MB
139, and MC
386, and MA + MB + MC
624 + 85. (This can be calculated using the normal approximation for (GA + GB + GC µ) /
, where µ = qAMA + qBMB + qCMC and
2 = qA (1 - qA)MA + qB (1 qB)MB + qC (1 qC) MC). That is, the probability of seeing enough errors is negligible. Hence we reject the possibility that there is a linear sequence of genes that accounts for the development of a bone nodule colony.
A.3 Testing Independence
In part 4 of the Results section, we performed Fisher's exact test on each pair Bk and Bm (for 2
k < m
9), which is 28 tests. For a single pair, one would normally reject independence when the p value is less than 0.05. However, in 28 tests, there is a reasonable chance of seeing one or more p values below 0.05 by chance alone, even when independence holds. To be cautious, we can choose a threshold, z, such that Pr{min{p1, K, p28} < z} is at most 0.05, where pi is the p value from the i-th test under the assumption that the null hypothesis is true. Since Pr{pi < z} = z (up to discretization error), we can choose z = 0.05/28 = 0.0018 (by Bonferroni's inequality). Therefore, any p value between 0.05 and 0.0018 should be regarded with some caution as being inconclusive. Any p value below 0.0018 may be safely used to reject independence. There are nine such values in Table 3
. There are five values between 0.0018 and 0.05 in Table 3
that must be regarded with caution. However, it seems highly unlikely that all five of these values are due to chance resulting from five null hypotheses being true. Indeed, there are at most 28 9 = 19 cases where the null hypothesis is not conclusively rejected. If we assume that the null hypothesis holds in all 19 of these cases, and if we further assume that the p values are approximately independent, then we can say that the number of p values below 0.05 in these 19 cases has an approximate Poisson distribution with mean 19 x 0.05 = 0.95. This implies that the approximate probability of seeing j or more p values below 0.05 is 0.246 for j = 1, 0.071 for j = 2, and 0.016 for j = 3. We conclude that all but at most two of the inconclusive p values are likely due to the falsity of the independence null hypothesis.
A.4 Exchangeability
We now develop the statistical test of Hypothesis A based on exchangeability, as described in part 5 of the Results section. First, for each colony i = 1,K, 99, let S346(i)= {r
{3,4,6}: Br(i) = 1}, which is the random subset of {3,4,6} corresponding to the genes that have been detected in the i-th colony by time
. Next, let N346(i) = B3(i) + B4(i) + B6(i), which is the size of S346(i). The number of colonies in which N346(i) was 0 (respectively, 1, 2, 3) was 8 (respectively, 10, 28, 53). For each subset C of {3,4,6}, let n[C] be the number of colonies i such that S346(i) = C.
For each k, exchangeability implies that, conditioned on the event that N346 = k, the random set S346 is uniformly distributed among all k-element subsets of {3,4,6}. We test this uniformity using the classical goodness-of-fit statistic GF =
3r=1 (n[Cr] E(n[Cr]))2 / E(n[Cr]), where the Crs are the k-element subsets of {3,4,6} and E(·) denotes expected value. For k = 2, the observed values (n[{3,4}] = 8, n[{3,6}] = 10, and n[{4,6}] = 10) give E(n[Cr]) = 28/3 and GF = 0.29. Under Hypothesis A, the approximate distribution of GF is x22 (chi-squared with 2 degrees of freedom). A chi-square table shows that 0.29 has a p value near 0.10. For k = 1, the data n[{3}] = 3, n[{4}] = 5, and n[{6}] = 2 give E(n[Cr]) = 10/3 and GF = 1.40. In this case, there are too few observations for the chi-square approximation to be valid, so we used a simple Monte Carlo simulation to compute Pr{GF
1.40} = 0.63 under Hypothesis A. That is, we simulated a million realizations of the test statistic GF under Hypothesis A and found that 63% of simulated values were greater than 1.40.
The k = 1 and k = 2 tests show that the data are clearly consistent with the hypothesis of exchangeability implied by Hypothesis A. We also used this method to test exchangeability of the random variables {B2, B3, B4, B5, B6}, corresponding to the five "early" genes. We rejected uniformity of S23456 for k = 3 (with p value 0.007), marginally accepted it for k = 1, and accepted it for k = 2 and k = 4. Overall, then, the data are not consistent with exchangeability of all five of {B2, B3, B4, B5, B6}.
A.5 Precedence Constraints for Later Genes
Here we describe the methods referred to in part 6 of the Results section. Hypothesis B* from that section implies that (conditioned on the event {N346 = 2}) the expression of gene 7 is independent of the set S346. We therefore test independence of B7 and S346 conditioned on {N346 = 2}, using the data in Table 4
. Fisher's exact test gives the p value 0.045, which marginally rejects Hypothesis B*. The analogous test for independence of B7 and S346 conditioned on {N346 = 1} gives little information since the row sum of the B7 = 1 row is 1.
|
We now turn to gene 8 (BSP). For each of k = 2,3,4,5,6, E8k is considerably smaller than Ek8, but the reverse is true for k = 7. This suggests that BSP usually gets expressed after genes 2 through 6 and before gene 7. We observed 47 colonies in which both 7 and 8 were expressed, and four colonies in which 7 was expressed but 8 was not. Could the latter four all be errors? The number of errors of this type would have a binomial distribution that is approximately Poisson with mean 50q8(1 - q7). The probability that such a distribution has a value of four (or more) is less than 0.05 only if q8 < 0.03. Our bound, q8
0.064, is too weak for this, so we conclude that there is no strong evidence that Pr{T8 < T7} = 1.
Finally, we consider gene 9 (OCN). For every k
8, Ek9 is large and E9k is small. Apparently OCN is usually expressed after the other genes. In fact, E9k = 0 for k = 1,4,6,7,8, but it is not clear whether these zeroes are due to strict precedence constraints or simply to late expression times of OCN. Nor can we reject the possibility that Pr{Tk < T9} = 1 for every k
8, with the nonzero values E92, E93, and E95 due to observational errors. Although the case for strict precedence of gene 9 by the two later genes (7 and 8) is stronger than for the earlier genes, we cannot confirm this statistically without more data or smaller bounds on the qks.
| ACKNOWLEDGMENT |
|---|
|
|
|---|
| REFERENCES |
|---|
|
|
|---|
and
, but not PPAR
alone, converts osteoprogenitor cells to mature adipocytes in primary rat calvaria cell cultures. J Bone Miner Res
2000;15:S553.This article has been cited by other articles:
![]() |
J. E. Aubin Close Encounters of the Bone-Blood Kind IBMS BoneKEy, January 1, 2008; 5(1): 25 - 29. [Full Text] [PDF] |
||||
![]() |
D. Visnjic, Z. Kalajzic, D. W. Rowe, V. Katavic, J. Lorenzo, and H. L. Aguila Hematopoiesis is severely altered in mice with an induced osteoblast deficiency Blood, May 1, 2004; 103(9): 3258 - 3264. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Liu, L. Malaval, and J. E. Aubin Global amplification polymerase chain reaction reveals novel transitional stages during osteoprogenitor differentiation J. Cell Sci., May 1, 2003; 116(9): 1787 - 1796. [Abstract] [Full Text] [PDF] |
||||
| ||||||