MicroRNAs (miRNAs) are small, non-coding, single-stranded RNAs between 18-22 nucleotides long that regulate gene expression. Expression of miRNAs is altered in tumor compared to normal tissue; there is some evidence that these changes may be reflected in the serum of cancer cases compared to healthy individuals. This has yet to be examined in a prospective study where samples are collected before diagnosis.
We used Affymetrix arrays to examine serum miRNA expression profiles in 410 participants in the Sister Study, a prospective cohort study of 50,884 women. All women in the cohort had never been diagnosed with breast cancer at the time of enrollment. We compared global miRNA expression patterns in 205 women who subsequently developed breast cancer and 205 women who remained breast cancer-free. In addition within the case group we examined the association of miRNA expression in serum with different tumor characteristics, including hormone status (ER, PR, and HER-2) and lymph node status.
Overall, 414 of 1,105 of the human miRNAs on the chip were expressed above background levels in 50 or more women. When the average expression among controls was compared to cases using conditional logistic regression, 21 miRNAs were found to be differentially expressed (P≤.05). Using qRT-PCR on a small, independent sample of 5 cases and 5 controls we verified overexpression of the 3 highest expressing miRNAs among cases, miR-18a, miR-181a, and miR-222; the differences were not statistically significant in this small set. The 21 differentially expressed miRNAs are known to target at least 82 genes; using the gene list for pathway analysis we found enrichment of genes involved in cancer-related processes. In a separate case-case analyses restricted to the 21 miRNAs, we found 7 miRNAs with differential expression for women whose breast tumors differed by HER-2 expression, and 10 miRNAs with differential expression by nodal status.
miRNA levels in serum show a number of small differences between women who later develop cancer versus those who remain cancer-free.
MicroRNAs (miRNAs) are small, non-coding, single-stranded RNAs ranging in size between 18 and 22 nucleotides; they are typically excised from longer, 60- to 110-nucleotide stem-loop precursors [1,2]. miRNAs are involved in fundamental biological processes, including development, differentiation, apoptosis, and proliferation, and are believed to act predominately as post-transcriptional regulators that can either degrade their mRNA targets or repress their translation . A single miRNA may have multiple mRNA targets, and up to 30% of human genes may be regulated by miRNAs [4,5].
Aberrant expression of miRNAs in cancer was initially identified in B-cell chronic lymphocytic leukemia , and miRNA dysregulation has been subsequently reported for many tumor types in which, depending on the specific target mRNA(s), they may act either as tumor suppressor genes or as oncogenes [7,8]. In breast cancer, post-diagnosis miRNA levels have been shown to correlate with a number of tumor characteristics, including stage, vascular invasion, proliferative index, and estrogen receptor/progesterone receptor (ER/PR) status [9,10], and may have prognostic value.
miRNAs have recently been found in human serum and plasma, where they appear to be resistant to RNAase degradation and thus relatively stable, even in stored samples . This stability has made miRNAs appealing candidates for epidemiologic studies of stored samples, particularly since miRNA profiling requires only small amounts of serum or plasma . The use of circulating miRNA profiles as potential early-detection cancer markers has generated considerable interest [13-16], although data addressing such application remain sparse. Initial studies have suggested that serum levels of miRNAs may differ between diagnosed cancer cases and controls , and several recent case control studies of breast cancer have reported evidence of differential miRNA expression levels in serum [18-21]. These studies have shown little agreement, perhaps because some have measured only a few miRNAs whereas others have used more comprehensive miRNA screens, but with a small number of subjects. None has used samples obtained prior to diagnosis. Use of such prospective samples avoids a number of important potential biases (for example, differential selection and processing of cases and controls or the possibility that the differences observed in case samples are the result of biopsy, cancer treatments, behavioral changes, stress, or other factors experienced by cases but not controls).
Here, we report on a study that prospectively collected serum samples from 205 women who subsequently developed breast cancer and 205 women who remained cancer-free and that used microarrays to comprehensively assess known miRNAs.
Materials and methods
The Sister Study  is a prospective cohort study of 50,884 women and was designed to examine the environmental and genetic determinants of breast cancer. The cohort has been previously described ; briefly, women from the US or Puerto Rico were eligible to enroll if they themselves had never had breast cancer but had a full or half-sister who had breast cancer. At baseline interview, all participants provided extensive information, including family history, reproductive history, and information about potential risk factors. Informed consent and blood samples were obtained during a home visit. For women who subsequently developed breast cancer, detailed information on diagnosis was collected from medical records and self-report. Pathology reports were abstracted for tumor grade, stage, and other information, including status for ER, PR, and HER-2 (human epidermal growth factor receptor 2) expression. The study was approved by the Institutional Review Board of the National Institute of Environmental Health Sciences, National Institutes of Health, and the Copernicus Group Institutional Review Board.
Selection of cases and controls
We designed a matched-pair nested case control study. We selected patients who had confirmed invasive breast cancer, who completed enrollment by August 2008, and whose diagnosis occurred within 18 months following blood draw (n = 242). We excluded 29 cases who lacked a serum sample or whose sample had integrity issues during collection and shipping and eight cases whose sample had limited volume, leaving 205 cases that are the focus of our study. For each case, a matched control was selected from the 50,884 participants on the basis of the following criteria: no history of cancer (other than non-melanoma skin cancer), having completed enrollment by August 2008, an available blood sample, same race (non-Hispanic white, black, Hispanic, or other), similar age at enrollment (within 5 years), and similar date of blood draw (within 2 months). Three replicate serum samples from three women (nine samples in total) who were not participants in the study but who provided blood samples that were collected and processed in the same manner as Sister Study participants were used to provide technical replicates.
Assignment to extraction batches and array chip lot
To minimize possible processing and chip lot effects, samples were assigned to processing batches of seven to nine pairs, and batches had similar distributions of age, race, and date of enrollment. For array hybridization, each batch was assigned to one of two different chip lots ('A' and 'B') in a manner designed to ensure a balance of these same characteristics. The nine replicates (described above) were assigned to the same batch and chip lot. Laboratory personnel were blind to case control status and other phenotype information.
RNA extraction, labeling, and hybridization
Total RNA was extracted in batches by using a Total RNA purification kit (cat. no. 17200; Norgen Biotek Corp., Thorold, ON, Canada). In accordance with the manufacturer's recommendation not to exceed 200 µL per column, 400 µL of total serum from each individual was split into two equal 200-µL aliquots and then processed separately following the manufacturer's recommended protocol for total RNA purification from serum. An on-column DNase digestion was added before sample elution by using an RNase-Free DNase I Kit (cat. no. 25710; Norgen Biotek Corp.), and the two aliquots were subsequently pooled. Fixed volumes rather than fixed amounts of RNA were used in accordance with other studies .
Total RNA (8 µL) was directly labeled by using Flash Tag Biotin HSR Labeling kits (cat. no. HSR30FTA; Genisphere, LLC, Hatfield, PA, USA) in accordance with the instructions of the manufacturer. RNA was heated to 80°C for 10 minutes before labeling to inactivate any residual DNase activity. RNA was hybridized for 42 hours to the GeneChip miRNA 2.0 array (cat. no. 901755; Affymetrix Inc., Santa Clara, CA, USA ). The GeneChip miRNA 2.0 arrays contain 100% miRBase version 15 coverage of 131 organisms and contain probes for 3,439 human non-coding RNAs (ncRNAs), including 1,105 miRNAs and 2,334 other ncRNAs (including scaRNAs and snoRNAs). The arrays were washed and stained by using standard Affymetrix protocols and scanned by using an Affymetrix GCS 3000 7G Scanner. Feature intensities were extracted by using miRNA 2.0 array library files. Array hybridization and scanning were completed by Precision Biomarker Resources, Inc. (Evanston, IL, USA). The average Spearman correlation coefficient values for three sets of three technical replicates were all above 0.8 (Additional file 1). Array data were deposited into the NCBI Gene Expression Omnibus (GSE44281).
Additional file 1. Spearman correlation coefficient values for technical replicates of arrays. Three replicate serum samples from three women (nine samples in total) were processed and hybridized to arrays as described for samples in the main study. Spearman correlation coefficients were calculated for the three pairings of replicate samples for each woman and averaged. One array from Individual 1 appeared to be an outlier but was included in the results shown above. Exclusion of this array resulted in correlation coefficients of greater than 0.97 in all three categories of probes for Individual 1.
Format: PDF Size: 88KB Download file
This file can be viewed with: Adobe Acrobat Reader
Replication samples and qRT-PCR
An independent set of 10 women were used to validate selected miRNAs via quantitative reverse transcription-polymerase chain reaction (qRT-PCR). Five women who provided consent and blood samples but who developed breast cancer prior to completing enrollment were selected as cases, along with five controls who also provided consent and blood samples and who were cancer-free but did not complete enrollment. Total RNA was extracted from serum samples of these women as described above with the addition of Synthetic C. elegans miScript miRNA Mimic (cat. no. MSY0000010; Qiagen, Valencia, CA, USA). Synthetic cel-39 was spiked-in at a final concentration of 0.25 fmol/µL prior to extraction and used as a PCR normalization control. The RNA concentration, reverse transcription, and pre-PCR steps were carried out in accordance with a previously published protocol . ExoSAP-IT (cat. no. 78250; Affymetrix Inc.) treatment followed by column purification (cat. no. 28004; Qiagen) in accordance with the protocol of the manufacturer was used to purify the pre-PCR product. Individual PCR was run in triplicate by using 1 µL of purified pre-PCR product. The reaction contained the following components: 2x Taqman universal master mix (cat. no. 4324018; ABI, Carlsbad, CA, USA), 1 µM forward primer, 1 µM universal reverse primer, and 0.2 µM probe. The reaction was run on a Bio-Rad CFX 384 Real-Time System (Bio-Rad Laboratories, Inc., Hercules, CA, USA) by using the following parameters: 55°C for 2 minutes, 95°C for 10 minutes, followed by 40 cycles of 95°C for 15 seconds and 55°C for 1 minute. PCR cycle threshold (Ct) values were recorded for each target gene and for normalization controls and were averaged across three independent runs. Primers for miR-222, miR-181a, miR-1825, and miR-18a were custom-ordered from IDT (San Diego, CA, USA) by using previously published sequences . Primers for cel-39 were designed in the same fashion as above and custom-ordered from IDT.
To determine the best candidate miRNA for PCR normalization in our data set, we ran the array expression data from the 47 miRNAs expressed in almost all individuals through the NormFinder software . NormFinder uses a model-based variance estimation approach . Using these results, we selected as a qRT-PCR normalization control miR-1825, which showed one of the highest stability values across the 410 cases and controls and had blood levels that were similar to those of the three target miRNAs. We used the average of miR-1825 and an external spike-in cel-39 control, a strategy shown to be effective for controlling both technical and biologic variability in qRT-PCR assays from serum [17,24]. The efficiency of the four PCR assays (for miR-181a, miR-18a, miR-222, and miR-1825) was similar for all four assays (Additional file 2). Normalized relative expression was based on Ct values and calculated as 1/(Ctgene−Ctnorm).
Additional file 2. Efficiency of the four polymerase chain reaction (PCR) assays. The efficiency of PCR amplifications for the normalization control and three target microRNAs (miRNAs) was calculated by using DART-PCR version 1.0. The average efficiency of each of three independent PCRs was similar across all four miRNAs.
Format: PDF Size: 132KB Download file
This file can be viewed with: Adobe Acrobat Reader
Data processing and statistical analysis
miRNA expression intensity values were background-corrected and normalized across arrays by using the robust multichip average method . The intensity data used in all analysis were log (2)-transformed.
For each array, the miRNA probe set signals were compared with the distribution of signals for anti-genomic probes that had matching GC content (miRNA QC Tool, version 22.214.171.124), and in accordance with the recommendation of the manufacturer, Wilcoxon rank-sum test of P value of less than 0.06 was used to identify miRNAs above background. Subsequent analysis was restricted to 414 miRNAs that exceeded background levels in at least 50 women. Conditional logistic regression was used to identify differentially expressed miRNA probes between cases and controls for those 414 probes. Because analysis of circulating miRNAs in prospectively collected samples is still exploratory, we - like some other investigators of circulating miRNAs [30,31] - regard these results as descriptive and not as tests of hypotheses and so provide P values that are unadjusted for multiple comparisons.
The association between miRNAs and the tumor characteristics of hormone receptor status (ER, PR, and HER-2) and lymph node status was tested in a case-only logistic analysis, in which race was adjusted for. Chip lot and batch were specified as random effect variables. All statistical analyses were performed by using R 2.15.
Pathway analysis with ingenuity pathway analysis
miRNAs found to be significantly associated with case control status were further analyzed with ingenuity pathway analysis (IPA) . Using IPA's microRNA target filter, we generated a list of predicted mRNA targets for each of the 21 significant miRNAs. The list was then restricted to the mRNAs listed in the IPA database as experimentally verified targets of any of the 21 miRNAs. This mRNA target list was then used to run a canonical pathway analysis.
A large number of miRNAs are detected in serum
In total, 410 serum samples from breast cancer cases (n = 205) and controls (n = 205) were analyzed in this study; baseline characteristics of the cases and controls are summarized in Table 1. Of the 1,105 human miRNAs, 414 miRNAs were detected above background threshold levels in at least 50 women. Forty-seven miRNAs were detected above background in 400 or more women (Table 2), and miR-16 showed the highest average expression. Even though expression of miRNAs showed considerable inter-individual variation, several miRNAs, including miR-1825 and miR-1228, were relatively constant among women (Figure 1).
Table 1. Demographic characteristics of study population
Table 2. Number of microRNAs detected above background
Figure 1. A large number of microRNAs (miRNAs) are detected in serum. Box-and-whisker plots showing the log (2)-normalized expression for the 47 miRNAs which are expressed above background in 400 individuals. Expression levels were adjusted for batch and chip lot across all samples. The black line represents the median, and the upper and lower 25% are the top and bottom of the box, respectively. Dots represent the outliers.
Discovery of differentially expressed miRNAs in serum
When paired case control analysis of the 414 miRNAs expressed above background was used, 21 miRNAs showed significantly different levels in cases and controls (P ≤0.05) (Table 3). The differences were small, ranging from 4% to 19%. Higher miRNA expression in women destined to become cases was significantly more common (16 of 21 miRNAs) than would be expected by chance alone (binomial test, two-tailed P <0.05). Differential miRNA expression was not stronger in women close to their time of diagnosis, but sample size was small and all cases were diagnosed within 18 months of blood draw (data not shown). Using qRT-PCR on a small independent replication set of five cases and five controls, we further examined the three miRNAs (miR-18a, miR-181a, and miR-222) with the highest expression in cases. As predicted, all three miRNAs showed higher levels in cases, although none was statistically significant in this small set of women (Additional file 3).
Table 3. Twenty-one differentially expressed microRNAs with a P value of not more than 0
Additional file 3. Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) validation in 10 cases and controls. Serum levels of five cases and five controls were examined by using qRT-PCR for miR181a, miR18a, and miR-222. Box plots show normalized relative expression. Normalization was carried out by using the mean of miR-1825 and spiked in cel-39. The horizontal line represents the mean for each sample.
Format: PDF Size: 80KB Download file
This file can be viewed with: Adobe Acrobat Reader
The impact of miRNA alterations on regulatory pathways
To explore potential biological associations, we ran IPA on the 82 experimentally verified mRNA targets of the 21 differentially expressed miRNAs. Sixteen IPA canonical pathways, including molecular mechanisms of cancer, were enriched as were other cancer-related pathways, including p53 signaling, cyclins and cell cycle regulation, and Myc-mediated apoptosis signaling (Additional file 4).
Additional file 4. Experimentally observed targets enriched for cancer and signaling pathways. Identified ingenuity pathway analysis (IPA) canonical pathways enriched by the experimentally observed targets of the 21 miRNAs differentially expressed between the cases and non-cases. The negative log (10) false discovery rate-corrected P values are shown. Note that this test has not corrected for possible dependencies across the mRNAs considered and that statistical significance may be overstated. HER-2, human epidermal growth factor receptor 2; ILK, integrin-linked kinase; PTEN, phosphatase and tensin homolog.
Format: PDF Size: 87KB Download file
This file can be viewed with: Adobe Acrobat Reader
miRNA expression association with tumor characteristics
To investigate the potential association of serum miRNA expression with tumor characteristics in the 205 women who later developed breast cancer, we subclassified them into groups based on tumor characteristics (Table 4) and performed a case-case comparison. There was no evidence of significant differences in serum miRNA levels based on tumor ER or PR staining characteristics. In comparisons of serum samples from the 25 women who developed HER-2-positive tumors with 147 samples from women who developed HER-2-negative tumors, there were seven miRNAs with significantly differential expression (P ≤0.05); one miRNA was overexpressed and six miRNAs were underexpressed in the HER-2-positive tumors (Figure 2A and Additional file 5). Case-case comparison of serum from women who subsequently developed lymph node-negative tumors (pN0, n = 153) with that of women who developed lymph node-positive tumors (pN1, pN2, or pN3, n = 52) revealed 10 differentially expressed miRNAs (P ≤0.05); five were overexpressed and five were underexpressed in node-positive tumors (Figure 2B and Additional file 6).
Table 4. Patient tumor characteristics
Figure 2. Serum microRNA (miRNA) expression is associated with tumor subtype. (A) Serum miRNAs significantly associated with HER-2 expression (negative differences correspond to lower levels in women developing tumors with overexpression) (P ≤0.05). (B) Serum miRNAs significantly associated with nodal status (pN1 or higher versus pN0) (P ≤0.05). P values and percentage change were determined by using a linear mixed model.
Format: XLSX Size: 11KB Download file
Format: XLSX Size: 11KB Download file