|
|
||||||||
a Genome Institute of Singapore, Singapore;
b National Institute of Ageing; Stem Cell, Laboratory of Neuroscience, Baltimore, Maryland, USA;
c Division of Cancer Biology, Beth Israel Deaconess Medical Center, Harvard Institutes of Medicine, Boston, Massachusetts, USA;
d Department of Biological Sciences, National University of Singapore, Singapore;
e Lynx Therapeutics Inc., Hayward, California, USA;
f Embryo Stem Cell International, Singapore;
g Geron Corporation, Menlo Park, California, USA
Key Words. Embryonic stem cells, murine and human • Transcriptome • Massively parallel signature sequencing (MPSS)
Correspondence: Bing Lim, M.D., Ph.D., Genome Institute of Singapore, 60 Biopolis Street, Genome#02-01, Singapore 138672. Telephone: 65-6478-8000; Fax: 65-6478-9005; e-mail: limb1{at}gis.a-star.edu.sg; and Mahendra Rao, M.D., Ph.D., National Institute of Ageing: Stem Cell, LNS, GRC, 333 Cassell Drive, Baltimore, MD 21224. Telephone: 410-558-8204; Fax: 410-558-8249; e-mail: raomah{at}grc.nia.nih.gov
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
ESCs can be propagated as undifferentiated cells in large numbers more easily than can adult stem cells. ESCs are excellent tools for studying early events in development as the generation of ESC-derived embryoid bodies (EBs) recapitulates early embryo development. ESCs have been isolated from multiple species, including murine, swine, simian, and human blastocysts. Mouse and human ESCs are similar in that they grow as colonies of tightly packed cells on inactivated murine embryonic fibroblast (MEF) feeders or in conditioned medium (CM) derived from such MEFs [9]. Both stem cell populations have the potential to form teratomas and to differentiate in vitro into all three germ layersnamely, ectoderm, endoderm, and mesoderm. Many markers characteristic of undifferentiated cells, including oct-4, nanog, sox-2, and utf-1, are expressed by both populations of cells. The expression of these markers, together with the absence of differentiation markers, constitutes a signature profile of undifferentiated ESC cultures irrespective of their species origin [2,8].
Nevertheless, important differences exist in the growth rates, culture requirements, and marker expression of human and murine ESCs. This divergence has generally been ascribed to fundamental differences in the pathways that regulate self-renewal, apoptosis, and proliferation [4,7]. Some examples include SSEA1, SSEA3, and SSEA4 expression [10]; the ability to differentiate into trophoblasts [11]; and the dependency on leukemia inhibitory factor (LIF) [12,13]. These multiple reported differences raise the possibility that additional differences exist and provide a compelling rationale to comprehensively map the transcriptome of ESCs.
Human and mouse ESC transcriptomes have been individually mapped to varying depths and breadths with regard to their respective genomes [2, 4, 5, 8, 1416], though no detailed pairwise comparisons have been performed. Sato et al. [2] compared human and murine ESCs using microarray. More recently, Ginis and colleagues [7] compared the expression of about 400 genes in human and murine ESCs and showed that at least a quarter of the genes tested have significant differences in their expression. These results suggested that the differences represent, at best, a small fraction of the variations that exist and that a large-scale analysis would identify more important differences.
Comparisons reported so far have been limited by the availability of cross-species and homologous arrays. Other large-scale techniques, such as SAGE and the generation of ESTs, have not been used, perhaps because of the cost and the limitations in gene annotation. More recently, better annotation of genomic data and improvements in technology, together with development of alternative techniques such as MPSS [17], have permitted a deeper and more complete mapping of transcriptomes at a significantly cheaper price than with conventional SAGE and EST generation. The underlying principle of MPSS, like SAGE, is that a signature sequence of 20 bases starting from the 3'-most DpnII (GATC) site is generated for each transcript. In MPSS, at least 1 million 20-base tags are identified. Considering that an estimated 200,000300,000 transcripts exist in a single cell, this method theoretically allows for all transcripts in a cell to be measured without the constraints of probe availability. It provides an unprecedented coverage in depth and breadth of any transcriptome at a greatly enhanced sensitivity compared with the average SAGE library tag generation, which is typically tens of thousands of tags. MPSS analysis measures transcript levels using a standard unit of measurement, transcripts per million (tpm), rather than a relative unit in reference to a biological RNA standard, thus allowing one to estimate the copy numbers of different transcripts per cell and compare expression patterns across homologous cells of different species. Furthermore, MPSS has the ability to detect novel transcripts.
We have chosen MPSS to examine gene expression in ESCs from the murine E14 line and in independently derived hES Clines:The HES-2 line is from ES Cell International(Singapore, http://www.escellinternational.com), and the pooled cell lines H1, H7, and H9 are from WiCell Research Institute, Inc.(Madison, WI, http://www.wicell.org). MPSS has allowed us to assess the complexity of ESCs and EBs and to generate a far more exhaustive list of differences and similarities between mouse and human ESCs than has previously been reported. By comparing gene expression between hESC lines, we also identified culture, developmental, and allelic differences. The usefulness and power of developing such a comprehensive expression database is highlighted by our ability not only to identify putative signaling or biochemical pathways active in ESCs but also to assess the integrity of these pathways from receptors to signaling intermediates and then target substrates at the transcript level.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The human HES-2 line [18], provided by and cultured at ES Cell International, was used for extracting RNA by the Genome Institute of Singapore (GIS; http://www.gis.a-star.edu.sg) and subsequently generating the GIS hESC MPSS dataset. These cells (referred to as hESCsESI) were grown on inactivated MEFs in DMEM (Gibco) containing 20% defined FBS (HyClone), 8 ng/ml of basic fibroblast growth factor (bFGF; Gibco), 0.1 mM ß-mercaptoethanol, 2 mM L-glutamine, 1% nonessential amino acids, 1% insulin-transferrin-selenium supplement (Gibco), and 0.5% penicillin/streptomycin (Gibco). The hESCsESI were passaged every 7 days by mechanical splitting under a dissecting microscope and dividing the individual colonies into approximately eight pieces. Cells were collected over a period of time, spanning passage numbers 105115. For RNA isolation, colonies were cut just outside of the inner button and just inside the outer edge of each colony, and the intervening cells were harvested.
The MPSS datasets from NIA (National Institute of Ageing) hESCs and EBs were generated from RNAs of the hESC lines H1, H7, and H9 [10] and are referred to as hESCsWi and hEBsWi. These cells were maintained and passaged under feeder-free conditions as described [9]. Briefly, CM was generated from primary MEFs cultured in hESCWi media comprised of 80% knockout DMEM (KO-DMEM; Gibco), 20% knockout serum replacement (Gibco), 0.1 mM ß-mercaptoethanol, 1 mM L-glutamine, 1% nonessential amino acids, supplemented with 4 ng/ml human bFGF (Gibco). This medium was collected daily and used immediately for feeding hESCWi cultures. MEFs for generating CM were re-fed daily and used for a maximum of 7 days. Before it was added to the hESCWi cultures, the MEF CM was supplemented with an additional 4 ng/ml of human bFGF (Gibco). The hESCWi cultures were maintained on Matrigel in this CM and were passaged by incubation in 200 U/ml collagenase IV (Gibco) for 510 minutes at 37°C and then gently dissociated into small clusters in CM. Cells were passaged once every week. The hEBWi cultures were formed as described previously [19]. Briefly, undifferentiated hESCWi cultures were harvested by incubation with 200 U/ml collagenase at 37°C for 510 minutes. The cells were gently scraped from the dish and resuspended in ultra low attachment polystyrene plates (Corning, Acton, MA, http://www.corning.com) in medium comprised of KO-DMEM, 20% FBS, 1% nonessential amino acids, 1 mM glutamate, and 0.1 mM ß-mercaptoethanol. RNA was isolated on day 12 of differentiation.
RNA Isolation
All RNA purifications were done using Trizol, following the manufacturers protocol. For the pooled samples of hESCsWi and hEBsWi, equal quantities of RNA from the three cell lines were combined. All RNA samples were initially evaluated for the presence or absence of ESC differentiation markers by reverse transcription polymerase chain reaction (RT-PCR). The quality of all RNA preparations was confirmed prior to MPSS analysis by Agilent Bioanalyzer (Palo Alto, CA, http://www.chem.agilent.com).
MPSS Analysis
Poly(A)+ RNA was isolated from total RNA samples and used to generate cDNA, which was subsequently digested with the restriction enzyme DpnII. Poly(A)-containing DpnII restriction fragments were purified by hybridization to oligo(dT). An adapter was added to the 5' end of the fragment that directed a type II restriction enzyme, MmeI, to digest within the cDNA fragment 20 base pairs (bp) from the DpnII site (starting at G of GATC). At this point, all cDNA species (signatures) were a uniform length of 20 bp. A second adapter was added to the 3' end of each signature. These uniform-length cDNA signatures were subsequently placed into the Mega-clone vector, and their sequence was determined following the previously published protocol [17,20] of Lynx Therapeutics (Hayward, CA, http://www.lynxgen.com).
The abundance for each signature was converted to tpm for the purpose of comparison between samples [21]. Only reliable and significant signatures were considered, this being a signature present in at least two MPSS runs out of multiple runs and presented as at least 4 tpm in at least one sample. The tpm shown in all datasets is the average of four runs.
MPSS classifications are as follows:
Class 1forward strand, polyA signal, polyA tail, 3'-most
Class 2forward strand, polyA signal, 3'-most
Class 3forward strand, polyA tail, 3'-most
Class 4forward strand, no polyA info, 3'-most
Class 5forward strand, no polyA info, not 3'-most
Class 22unknown orientation, polyA signal, last before signal
Class 23unknown orientation, polyA tail, last before tail
To simplify the MPSS data analysis, multiple signatures mapping to the same Unigene ID (Mm build 130 and Hs build 163) [22] for each dataset were combined into one tpm count as follows: the sum of tpm for signatures of class 1, 2, 3, 22, and 23, if any are found; if those are not found, the sum of class 4; and if still nothing is found, the sum of class 5 signatures. The HomoloGene database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene) was used to map human and murine orthologues.
While preparing this manuscript, we established a stem cell database to host all the MPSS results we discuss here. Readers can reach this site at the following GIS link: http://www.gis.a-star.edu.sg/homepage/gistools.jsp. To organize the data, each of the unique tags was assigned a unique ID and then was mapped to current genome assembly (mm.5 for mouse; hg.17 for human). Those tags that have genomic coordinates were annotated based on the UCSC (University of California, Santa Cruz) genome browser annotation database (http://genome.ucsc.edu/). Readers are able to navigate and search for genes of interest to obtain information about expression levels, to compare cell types, and to link to other databases. For detailed information, please refer to the GIS Website.
Focused-Chip Assay and RT-PCR
Focused microarray chips (SuperArray, Frederick, MD, http://www.superarray.com) were prepared and hybridized with labeled total RNA following the manufacturers protocol. RT-PCR was performed following standard established protocols. The RT-PCR primers used are shown in supplementary online data 1.
| RESULTS |
|---|
|
|
|---|
RNA samples that passed through quality-control checks were subjected to MPSS analysis (see Materials and Methods). Signature sequence tags of 20 bp in length were generated to a depth of greater than 2.2 million tags for each sample. Tag counts of each unique signature were expressed as tpm. From the MPSS libraries, the total number of tags successfully sequenced from four different runs was 2,660,962; 2,367,247; 2,295,140; 2,403,315; and 2,591,008 for mESCs, mEBs, hESCsESI, hESCsWi, and hEBsW, respectively. Distinct signatures present in at least two MPSS runs and presented as at least 4 tpm per run totaled 13,824; 9,845; 20,027; 23,500; and 17,278 for mESCs, mEBs, hESCsESI, hESCsWi, and hEBsWi, respectively.
We first evaluated the total complexity of the signature generated for each sample, expressed as total significant signatures with the cumulative tpm (with cutoff at >10 tpm) of signature distributions for mESCs, mEBs, hESCsESI, hESCsWi, and hEBsWi , as shown in Table 1
(for complete data, see supplementary online data 2 for hESCs and hEBs and supplementary online data 3 for murine equivalents). The distributions of the number of unique signature tags and their percentages are compiled cumulatively from the highest abundant signatures (0.04%0.20%) to the lowest abundant signatures (total 55%70%).
|
The unique signature sequences were then mapped to Unigene clusters (Mm.130 and Hs.163), resulting in 6,712; 5,779; 9,093; 9,953; and 8,950 unique Unigene IDs identified in mESCs, mEBs, hESCsESI, hESCsWi, and hEBsWi, respectively (Table 1
; see supplementary online Table 1
and accompanying text for explanation). The reduced number of Unigene signatures resulted from multiple signature sequences mapping to the same Unigene cluster. Thus, the complexity of ESCs is comparable to that seen in somatic cell populations examined by MPSS [25]. Interestingly, for both the mouse and human samples, the undifferentiated cells had a slightly higher level of complexity than the corresponding EBs had. Examination of the most abundant genes showed that, to a large extent, the top 200 or so genes were comprised of ribosomal, mitochondrial, and housekeeping genes, while growth factors, transcription factors, and regulators of gene expression were expressed in the low tpm range. The low abundance of regulatory and biologically relevant genes highlighted the importance of analyzing expression at high resolution by methods such as MPSS.
MPSS Provides a Robust Assessment of the ESC State
Theoretically, MPSS at greater than 2.2 million signatures generated per sample should provide relatively comprehensive coverage of gene expression in a given cell type. To directly ascertain the quality of data, we first examined a list of ESC-specific genes that are known to be expressed at moderate abundance in murine and human ESCs. As shown in Table 2
, genes such as oct-4/pou5f1, sox-2, utf-1, and tdgf-1 were well represented in MPSS-derived transcriptome maps of both human and murine ES lines. Markers of differentiation known to be upregulated in differentiating EBs (e.g., COL4A2 [collagen type IV] and AFP [
-fetoprotein]) showed the expected increase in transcript frequency as ESCs differentiated.
|
|
The second observation was that some genes detected by microarrays were not detected by MPSS. This arose from some technical limitations of the MPSS technology that include failure to identify cDNAs lacking a DpnII site (e.g., murine trap1a, Table 3
), cDNAs containing a double palindrome within the tag (preventing sequencing by MPSS), and cDNAs with the respective tag falling in a repeat region (e.g., human nanog and rex-1). For these reasons, it is important to know the tag status of the genes of interest before concluding that it is not expressed based on the MPSS data alone. From EST data, trap1a is known to be expressed in mESCs, and likewise with nanog and rex-1. For all subsequent analysis of genes presented in the remaining figures, we took into consideration the technical limitations of MPSS data before calling a tag count as zero.
A third observation was that mouse and human ESCs appeared to differ in fundamental ways based on differing expression levels between homologous genes identified from our analysis. Many of the differences could be verified by RT-PCR (see next section), suggesting that MPSS can be used for cross-species comparisons. Thus, while MPSS was unable to detect a small fraction of genes, this methodology appeared sensitive and reliable. Additional expression data, not described here, further confirmed that MPSS analysis accurately and robustly described the transcriptome of human and murine ESCs and revealed true differences between the species.
Global Comparison of Mouse and Human ESC Transcriptome
To capture an overall impression of the similarities and differences between the ESCs, we compared the transcriptome of human and murine ESCs on a global scale in which homologues that could be reliably identified were compared in a pairwise manner and displayed in dot plots (Fig. 1
). Based on 5,921 identified homologous genes, transcriptomes of mouse cells (mESCs) and human cells (hESCsESI) were significantly different, with a poor correlation coefficient of .41 (Fig. 1B
). This degree of correlation is less than correlation typically observed between different lineages of the same species compared using microarray profiling. The coefficient was also lower than that between ESCs and their differentiated derivative EBs: .82 for mouse (Fig. 1C
) and .49 for human (Fig. 1D
). The discrepancy between the ESC/EB correlation coefficients was most likely a result of the difference in the length of time allowed for EB formation: 4 days for murine and 12 days for human cells. Hence, murine day-4 EBs were less differentiated than human day-12 EBs.
|
The differences between murine and human transcript levels ranged from one- or twofold to over 50-fold, and even genes known to be important for ESC self-renewal varied by as much as six- to sevenfold. For example, the oct-4 level was 2,173 tpm (hESCsESI) or 658 tpm (hESCsWi) in hESCs and 388 tpm in mESCs. Therefore, by a global pairwise comparison, we developed sublists of genes that varied between human and murine ESCs by 5-fold, 10-fold, or 50-fold and were expressed at 50 tpm or higher (Table 4
and supplementary online data 5, 6, and 7). At the least stringency (>fivefold difference, tpm >50-fold if the other species corresponding tpm is zero), we found 1,153 genes higher in human than murine ESCs (supplementary online data 5A) and 427 genes higher in murine than human ESCs (supplementary online data 5B). At the highest stringency (>50-fold differences, tpm >250 if the other species tpm is zero), 101 genes were found to be higher in human ESCs (supplementary online data 7A) and 64 in murine ESCs (supplementary online data 7B).
|
Genes differentially expressed were functionally categorized (by the gene ontology classification) to determine if differences were restricted to particular classes.
Data depicting the global comparison of the genes showing greater than fivefold difference between species was plotted as a pie chart (Fig. 2
). As can be seen, human and murine ESCs differ from each other in a wide spectrum of genes, with the largest difference being due to "unknown" genes (22%).
|
|
|
To examine further the ability of MPSS to derive biologically relevant insights from transcriptomes, we selected four specific signaling pathways that have prominent roles in the growth and development of ESCs: LIF, gp130, FGF, Wnt, and transforming growth factorbeta (TGF-ß) pathways (Tables 6
and 7
, Fig. 4
).
|
|
|
FGF and FGF Receptors
Basic FGF (FGF2) is currently used for the propagation of hESCs [12], suggesting a requirement for Fgf signaling in the maintenance of pluripotency in these cells. Culture media for mESCs is not supplemented with any of the 22 known FGFs. Furthermore, mESCs (and the inner cell mass) are known to synthesize FGF4, which is required for paracrine signaling to the trophectoderm and the primitive endoderm for normal development to continue beyond the peri-implantation stage of development [31,32]. For these reasons, we compared the expression of molecules involved in the FGF signaling pathway. Clearly, hESCs are poised to respond to FGF signals, with three of the four FGF receptors (FGFR-1 ,-3, and -4) having substantial levels of expression (Table 6
). In addition, frs2, one of the major downstream effectors of FGF receptor signaling, was detected by MPSS in hESCs. In contrast, mESCs contain a minimal level of fgfr1 (14 tpm) and zero tag counts for the other three FGF receptors and frs2. Curiously, hESCs express significant levels of FGF2 transcripts, whereas FGF2 was undetectable in mESCs.
As expected, FGF4 was found at significant levels in mESCs but was apparently absent in hESCs. Both the FGF2 and FGF4 MPSS data were confirmed by RT-PCR (Fig. 4B
).
Wnt/ß-Catenin Network
The Wnt-signaling pathways mediate important decisions between proliferative self-renewal and differentiation [3336]. Recently, Sato et al. [37] have suggested that the canonical GSK3/ß-catenin pathway may be active in undifferentiated cells and inhibition of glycogen synthase kinase-3 (gsk-3) was sufficient to maintain the undifferentiated phenotype in both murine and human ESCs. Comparison of pathway gene expression confirmed that most of the components in the canonical Wnt/ß-catenin signaling pathway were present in both cell types (Table 6
). In hESCs, RT-PCR (Fig. 4C
) generally confirmed the presence of the key components, as predicted by MPSS (Table 6
). However, in mESCs, the low level of transcripts for most of the components of the canonical Wnt/ß-catenin signaling pathwayincluding the absence of some key moleculessuggested that this pathway may not be active. MPSS readings, confirmed by RT-PCR, showed that APC [38,39], while low in hESCs, was not detected in mESCs. EST data also supported this finding. The low tpm readings for lrp5 and lrp6 [36] (Table 6
) were confirmed by our negative RT-PCR result (Fig. 4C
). In contrast, these components were present in human cells both by RT-PCR/MPSS and EST data (data not shown). The presence of an intact or complete Wnt-signaling pathway with all the attendant positive and negative regulators was further evidenced by the high expression of frizzled-related proteins (FRPs) in human but not mESCs (Table 6
). FRPs are known to antagonize Wnt signaling [4043]. Therefore, hESCs appeared better poised than mESCs to engage the Wnt-signaling pathway.
TGF-ß Superfamily
The TGF-ß/bone morphogenic protein (BMP) family has been shown to play important and pleiotropic parts in early development and in regulating self-renewal of somatic stem cells [44,45]. It has also been shown that a combination of BMP4 and LIF can support propagation of mESCs in a serum-free condition [46]. We examined the expression of both the TGF-ß/activin/nodal subfamily and the BMPs, along with their receptors and modulators, and the downstream Smads that they activate [47]. As shown in Table 7
, the absence of all the receptor-associated Smads (Smad-1 ,-3 ,-5, and -8) in mESCs suggests that any TGF- ß/BMP signaling in mESCs would likely be through a Smad-independent route. The presence of these receptor Smads in hESCs, as detected by MPSS, suggests that the Smad-mediated TGF-ß pathway is functionally important to hESCs. Furthermore, there are distinctive differences in the ID, BMP, and activin receptor genes between the species. A TGF-ßfocused chip containing probes for a spectrum of the TGF-ß/BMP superfamily was compared with the MPSS data. The presence (+) or absence () of hybridization signals in the chip, as shown in Table 8
, indicated a good concordance between the MPSS and the array results; however, the MPSS approach was more quantitative and sensitive. Overall, the results showed that significant differences exist between the species and that Smad-dependent TGF-ß/BMP signaling appeared to be much more actively recruited in hESCs than in mESCs.
|
Comparison of hESC Lines
The high similarity between hESCWi lines grown feeder-free and the hESCsESI grown on MEFs suggested that, overall, different hESC lines are similar and the differences observed between murine and human cells must represent fundamental species-specific differences. However, differences between human lines likely exist. We and others have noted some differences between ESC lines [4, 7, 23, 53], although no comprehensive comparison has been performed. We therefore examined the MPSS dataset to identify genes that showed a 10-fold or higher difference between human samples. A complete list is provided in supplementary online data 11A for hESCWi > hESCESI and 11B for hESCESI > hESCWi. Overall, even at a stringent criteria of 10-fold and higher, over 1,000 genes were highly expressed in hESCsESI and absent in hESCsWi. Several of these genes were shared by murine and human cells but not by the two ESC populations tested (supplementary online data 11). Figure 5
and Table 9
show a selected list of genes and a confirmation of the differential expression of a subset of these genes by semiquantitative RT-PCR. For example, differences in expression of collagen and BMP-related genes were seen. These were likely due to the difference in culture conditions, while other differencessuch as those in FoxD3represent allelic differences. Other differences noted include matrix proteins, junction proteins such as claudin, insulin-like growth factor binding proteins, and several novel genes of unknown function.
|
|
Overall, our results showed that MPSS is sensitive and versatile in successfully identifying multiple differences and similarities between and within species; it can be used to obtain a unique profile of each individual cell line. The results also indicate that murine and human ESCs differ fundamentally in the network of genes that are conscripted to confer their apparently similar cellular properties of totipotency and high self-renewal capacity.
| DISCUSSION |
|---|
|
|
|---|
MPSS data detected expression on the order of 6,000 to 10,000 mapped genes (or at least unique Unigene IDs) in each of the samples assessed, and this is consistent with previous reports of MPSS analysis. The distribution of gene frequencies was similar to most other cell types, with the most abundant genes being housekeeping genes that were common to most cell types. Only a few ESC-specific genes (e.g., Esg-1 and Utf-1 in murine cells) were expressed in the top 200 transcripts. Most cell-specific genes, including genes coding for transcriptional factors, cytokine receptors, and growth regulators, are actually present at low to very low levels that are likely to be missed by less in-depth analysis. For instance, a SAGE study [4] of two other hESC lines derived by the same group as derived the hESCESI line used here indicated Stat3 levels at 0 tpm in the HES3 line and 13 tpm in the HES4 line. The actual tag count for these was 0 of 67,807 total tags and 1 of77,208 totaltags, respectively. Compare this with our MPSS data in which Stat3 levels were determined to be 4 tpm in the hESCESI line and 22 tpm in the hESCWi line, this calculated from actual tag counts of 9 of 2,295,140 total tags and 53 of 2,403,315 total tags, respectively.
Perhaps the most important general observation was the remarkably low correlation coefficient between human and murine ESCs. One reason for the difference between mouse and humankind arises from the incomplete annotation of the human and mouse genomes, in particular the incomplete annotation of full-length 3'UTRs in which the furthest 3' DpnII frequently reside. Examples of this are the mouse LIFR and human Fgf4, each having more than one Unigene cluster mapping to the true full-length mRNA. The class 1 tag for each is found in the furthest 3' Unigene cluster (Mm.24003 and Hs.362432, respectively), neither of which is named appropriately as sequences within each do not overlap those from the clusters (Mm.149720 and Hs.1755, respectively) spanning the coding sequence. The low correlation coefficient of .42 between murine and human ESCs could not be attributed to differences in sensitivity, labeling efficiency, or other technical limitations, as the overall complexity as assessed by MPSS was similar and we restricted our analysis to genes for which homologues were reliably identified. Additionally, variation could not be attributed to major differences in culture conditions, as both the mESCs and hESCsESI were grown on feeders. Furthermore, the correlation coefficient of the two human populations was remarkably high (.90) despite the fact that they were grown in different laboratories under different culture conditions. The differences may represent species-specific gene expression. Another possibility is that some of the genes identified are dispensable for the stem cell state. The significant variation observed between murine and human ESCs also raises the possibility that human and murine ESCs may represent slightly different stages of early development or they may use independent pathways to maintain self-renewal.
While a detailed discussion of all the observed differences is impossible, we have highlighted a few differences that were independently verified. Our comparative analysis provided a basis for the differing growth requirements of human and mESCs. The much higher proliferation rate of mESCs, compared to hESCs, presumably had to be sustained by a higher metabolic rate. Consistent with a higher metabolic rate, mESCs expressed higher GLUT1 transcript, a major glucose transporter, while insulin-dependent GLUT8 was higher in hESCs.
Our results confirmed the lack of LIF and gp130 signaling in hESCs and the activity of this pathway in mESCs. Interestingly, levels of gp130 were low in mESCs, and examination of the corresponding Unigene cluster revealed no ESTs from ESCs. The low level of the gp130 transcript raises the possibility that this is a critical, tightly regulated step in LIF-mediated signaling.
The MPSS profile of FGFRs clearly indicates that hESCs are molecularly positioned to respond to extracellular FGF signals, whereas these same molecules are virtually absent in mESCs. As MPSS was not able to identify the known spliced isoforms of the FGF receptors and multiple FGFs work through the same receptors, further study is required to identify the FGF molecules interacting with the FGF receptors[1,3,4]expressed in hESCs. Considering that FGF2 (bFGF) is a common supplement to hESC media, it is surprising that both hESC lines synthesize their own FGF2. As FGFs work predominantly through a paracrine action, this could suggest that the apparent benefit of FGF2 supplementation works indirectly through the MEF feeder layer. Also of note is the differing expression of FGF4, a known transcriptional target of the synergistic action between oct-4 and sox-2 in mESCs [54] and an essential molecule in peri-implantation during embryo development. Unlike in the mouse, the MPSS and RT-PCR data indicate that hESCs do not synthesize FGF4. This absence or very minimal expression of FGF4 in the human may be indicative of a developmental difference between mouse and human ESCs.
It was clear that almost all the genes in Wnt/ß-catenin pathway were expressed at significantly higher levels in human cells than in murine cells, as shown in Table 6
. This suggests that the canonical Wnt/ß-catenin signaling pathway was very likely not active in mESCs but was functional in hESCs [37]. Given the reported effects of a GSK-3ß inhibitor on mESCs and the evidence for the activation of the PI3kinase/Akt pathway, we would suggest that the reported nuclear accumulation of ß-catenin in undifferentiated ESCs may be due to endogenously active PI3 K/Akt signaling rather than active Wnt signaling.
Likewise as shown in Table 8
, the absence of the receptor-associated Smads-1 ,-3, -5, and -8 in mESCs but their presence in hESCs indicated that the canonical TGF-ß pathway may be operatively important to hESCs. Indeed, recent reports have suggested that TGF-ß in combination with FGF may be sufficient to maintain hESCs (but not mESCs) in an undifferentiated state [55]. Furthermore, the MPSS data suggest that the known effects of BMP4 on mESCs is through a Smad-independent pathway.
Multiple differences were also identified in every category of genes examined. These included structural genes, metabolic pathways, and housekeeping genes as well. Nevertheless, similarities also existed. We reasoned that if such similarities exist despite widespread differences, then these may represent critical core pathways that are important for the undifferentiated state. Indeed, examining the lists generated (see Results), we identified multiple ESC-specific genes, including oct-4, tdgf1, sox2, utf-1, dnmtl, and leftB. The relatively large list of genes shown in supplementary online data 5, 6, and 7 suggests that additional common pathways required for stem cell self-renewal remain to be identified; the possible candidates include heterochronic genes, methylation agents, ubiquitin/SUMO genes, and components of the DNA repair machinery.
Our comparison of hESC lines maintained in separate laboratories revealed a high degree of similarity. We reasoned that most human lines were isolated from the same stage of development. Differential expression of some of the genes between the human lines may reflect allelic expression that is unique to each cell line. Some of the results may be a reflection of different methods used to culture and propagate the cells in the individual laboratories. The high overall similarity between hESC lines suggests that a core set of stem cell markers for all hESC lines can be generated and that cell populations can be identified by allelic differences as well. The variable expression of genes such as Rex, FoxD3, and LIFR seen in this comparison and in other experiments suggests that these molecules and pathways are not critical for maintaining hESC lines in culture. The expression level of certain ESC-specific genes may provide a prediction on the growth rate and stability of different human lines, but this will require additional detailed comparisons. Overall, it is clear that such a detailed analysis provides important insights into the biology of ESCs.
While we have highlighted the power of a large-scale analysis such as MPSS, it is important to remember that, as with any other methodologies, there are potential problems of which investigators need to be cognizant. For example, in a 20-nucleotide base pair run, which was ultimately used in this study, the ß-catenin signature was excluded because of a sequencing error in one of the four routine sequence runs. In reviewing the MPSS raw data, we found that in the 17-nucleotide base pair sequence run, a unique signature for ß-catenin was indeed present at 92 tpm. While an extremely rare occurrence, this discrepancy highlights the importance of verification. Furthermore, since the signature depends on DpnII sites and not all cDNAs contain a DpnII site, some genes will not be detected. A palindromic sequence within the signature after the DpnII site will result in a hairpin loop that prevents its sequencing, giving a false negative reading. Repetitive sequences in the signature result in nonspecificity of the signature and thus cannot be annotated. However, these errors and limitations occur very infrequently. In some instances in which the MPSS result showed no tags (zero tpm), expression of the gene could be detected by other methods such as SAGE, an EST search of the database, or RT-PCR. For most genes, however, the absence of expression by MPSS could be demonstrated by RT-PCR as well, confirming the validity of the MPSS assay. While it is important to be alert to possible sources of error in MPSS readings, the use of other methods does not undermine the overall reliability of MPSS in transcriptome analysis. It is important to emphasize, as well, that such errors are common to most large-scale analytical processes and suggest that comparisons across techniques and across species may be useful.
In summary, our analysis provides, for the first time, a direct in-depth comparison between murine and human ESCs. Our data provide unambiguous evidence for the presence of both convergent and divergent pathways critical for self-renewal. Our results highlight the similarities and differences between murine and human ESCs. Although there appears to be a core set of ESC-specific pathways that are conserved across species, other divergent pathways are equally critical and extreme care must be used in extrapolating mESC work to hESC analysis. Our results suggest that human cells isolated by different groups and maintained under different culture conditions are overall highly similar and that the few differences observed likely represent allelic differences or variation in the propagation of cells. The comprehensive database we have developed and deposited for public use may be explored for more detailed information and identification of other novel genes. This database will provide a unique resource for additional comparative genomic analysis and for identifying novel candidates critical for regulating ESC growth and survival, self-renewal, and differentiation.
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
| REFERENCES |
|---|
|
|
|---|