Evaluating urinary estrogen and progesterone metabolites using dried filter paper samples and gas chromatography with tandem mass spectrometry (GC–MS/MS)

Background Measuring concentrations of metabolites of estradiol and progesterone in urine, instead of measuring serum concentrations, is common in research and also is used in patient care. The primary aim of this study was to demonstrate that analysis of urine samples dried on filter paper by gas chromatography with tandem mass spectrometry (GC–MS/MS) provides results similar to serum analyzed by radioimmunoassay (RIA). Secondary aims were to show that collection of four samples during the day (4-spot method) can be substituted for a 24-h collection, and that analysis of urine from dried samples is equivalent to liquid urine samples. Methods This prospective observational study compared results of urine and serum analyses. Urine samples from women throughout the menstrual cycle and single samples from postmenopausal women were evaluated. Urine was collected onto filter paper and dried. Dried urine was extracted, hydrolyzed, and derivatized prior to analysis by GC–MS/MS. Hormone concentrations were normalized to creatinine. Single samples were used to compare results of 24-h urine collection to the 4-spot method from a separate population of women and men. A subset of these samples were used to compare results from dried urine to liquid urine. Results The primary study showed good reliability in the comparisons between the dried urine and serum assays. During the menstrual cycles of a subset of four women, urine metabolite concentrations followed the same pattern as serum concentrations. Comparison of 4-spot to 24-h urine collections and of dried to liquid urine measurements had intraclass correlation coefficients (ICC) greater than 0.95, indicating excellent agreement. Conclusions For estradiol and progesterone, the dried urine assay is a good surrogate for serum testing. The 4-spot method can be used instead of 24-h urine collections and dried urine results are comparable to liquid urine. The dried urine assay is useful for some clinical assessments of hormone disorders and may be useful in large epidemiologic studies due to ease of sample handling. Electronic supplementary material The online version of this article (10.1186/s13065-019-0539-1) contains supplementary material, which is available to authorized users.


Background
Analysis of reproductive hormones is commonly performed in epidemiological studies, clinical research, and patient care. Estrogen and progesterone measured in serum and plasma or their metabolites measured in urine provide similar information about ovarian steroid production [1,2]. To evaluate diurnal, circadian, or monthly variations in hormone levels, frequent sampling may be needed and patients can easily collect the urine samples. For studies analyzing the monthly pattern of reproductive hormones (menstrual cycle mapping), evaluation of daily hormone levels from the first urine of the day is usually sufficient [3]. However, to evaluate hormones that may have a circadian rhythm that is either biological or due to hormone replacement therapy, a representation of the entire day may be beneficial. To provide an alternative and possibly more convenient collection strategy, we compare the results of urine collected at 4 times during the day (4-spot method) with a complete 24-h collection.
We also compare results of dried urine samples to liquid urine, as an additional potential convenience for patients. The basic method for collection of urine samples onto filter paper was developed decades ago to facilitate sample acquisition in screening newborn/metabolic disorders and has been adapted [4]. Dried filter paper samples provide convenience of collection and ease of transport. It has also been suggested that using dried urine samples could increase participation rates in some studies, particularly in regions that do not have reliable access to refrigeration [5].
Recent advances in mass spectrometry technology have enabled its use for routine analysis of steroid hormones in clinical and research laboratories [6][7][8]. Our diagnostic laboratory has developed methods for analyzing steroid hormone metabolites from small volumes of dried urine using gas chromatography with tandem mass spectrometry (GC-MS/MS). The methodology provides accurate and precise quantification at the lower concentrations of urinary estrogen and progesterone metabolites found in postmenopausal women, children, and men. A complete evaluation of all of the urinary metabolites is obtained using GC-MS/MS that provides for high resolution of very closely related structures (isomers) [6]. All of conjugated forms of estradiol and progesterone metabolites are cleaved back to the parent hormone and measured directly.
The primary aim of the study is to demonstrate that measuring urinary metabolites of estradiol and progesterone from dried filter paper samples analyzed by GC-MS/MS provide results comparable to measuring serum estradiol and progesterone with standard RIA methodology in premenopausal and postmenopausal women. A secondary aim is to show that collection of dried urine samples at 4 time points over a 10 to 14 h period (4-spot method) provides results comparable to a complete 24-h urine collection. Additionally, we show that results obtained using dried urine extracted from filter paper are equivalent to those obtained for liquid urine.

Materials and methods
A prospective observational study was carried out between February and November of 2015. Previously collected and stored blood and urine samples from healthy adults were analyzed. Informed consent was obtained prior to the study and all individual data was de-identified, in compliance with the Helsinki Declaration. Serum analyses for estradiol and progesterone were performed by Dr. Stanczyk's laboratory at the University of Southern California, Los Angeles, CA, USA. Urine analyses for estradiol (E 2 ), estrone (E 1 ), 5α-pregnane-3α, 20α-diol (5α-pregnanediol, αPg) and 5β-pregnane-3α, 20α-diol (5β-pregnanediol, βPg) were conducted at Precision Analytical, Inc., McMinnville, OR, USA.
The study used previously collected, de-identified samples and it was determined by the National University of Natural Medicine Institutional Review Board to meet the criteria for exemption status (IRB number: MN020918).

Study populations and sample collection
For the primary study, comparing results of serum to urine analyses, samples from four premenopausal and eight postmenopausal women were used. Multiple samples of blood and urine from the premenopausal women had been collected on various days throughout their menstrual cycle (total n = 44; range: 8 to 13 samples per individual). Single samples of blood and urine from the postmenopausal women were analyzed, as there was no cyclical pattern to monitor. Inclusion of postmenopausal women provided data for the lower end of the measured range of the hormones. Women were excluded from the study if they had used hormonal contraception or hormone replacement therapy within 1 year of testing or were pregnant. Blood was collected (2 mL) by capillary finger stick; serum (1 mL) was separated then frozen at − 80 °C, and shipped overnight on ice. Urine (approximately 2 mL) from the first urine of the day was collected onto filter paper, dried, frozen, and later transported at room temperature to the lab where the samples were stored at − 80 °C until analysis.
For the secondary studies, a group of 26 individuals was used to compare samples from the 4-spot method to 24-h urine collection. A subset of the group (18 individuals) was used for the dry to liquid urine comparison. Individuals were not excluded based on current or recent hormone therapies. These study populations included both males and non-pregnant females to provide for a larger sample size and a range of expected values for hormones not included in this report (such as testosterone) as the goal was to compare measurement values for differing methodologies. Urine from 24-h collections was delivered to the laboratory, the total volume was measured, and an aliquot was frozen and stored at -80 °C until tested. Dried samples were also stored at − 80 °C until analyzed.
Urine samples were collected by saturating 2 × 3 inches of filter paper (Whatman Body Fluid Collection Paper or equivalent) with urine. Instructions were given to completely saturate the paper, which was usually easily accomplished. The paper was left exposed at room temperature for 24 h to dry. The 4-spot method used samples collected at four times during the day that spanned 10-14 h of the 24-h period. Urine collections were taken from the first urine of the day, 2 h later, at dinnertime, and before bed. The first morning collection captures the overnight period of 6-8 h with each of the other three collections accounting for approximately 2 h of the day (up to 6 h total). During the same day, liquid urine samples for the 24-h collection were added to a container with approximately 1 g of boric acid. The four dried urine samples removed a total of about 8 mL of urine from the 24-h collection. This was considered negligible and was not accounted for.

Serum hormone analysis
Estradiol, was quantified in serum by a previously described RIA method [9,10]. Prior to the RIA, steroids were extracted with hexane:ethyl acetate (3:2) and estradiol was separated by Celite column partition chromatography, using 40% ethyl acetate in isooctane. High specific activity tritiated estradiol (500 dpm) was added to each serum sample before the extraction step in order to follow and correct for procedural losses. Inter-assay coefficients of variation for quality control samples in the estradiol RIA ranged from 9.0% to 12.0%. The sensitivity of the estradiol RIA is 2 pg/mL.
A commercial RIA kit (Cisbio Bioassays, Codolet, France), using a highly specific antibody, was used to measure progesterone. The assay was carried out in progesterone antibody-coated tubes in conjunction with an iodinated progesterone derivative. After a 2-h= incubation at 37 °C, the contents of the tubes were aspirated and the tubes were washed. The interassay coefficients of variation were < 10% for the range of progesterone from 0.12 to 36 ng/mL. The progesterone RIA assay sensitivity is 0.12 ng/mL.

Urine hormone metabolite analysis
The urinary metabolites were analyzed using proprietary in-house assays referred to as Dried Urine Testing for Comprehensive Hormones (DUTCH) on the Agilent 7890/7000B GC-MS/MS (Agilent Technologies, Santa Clara, CA, USA). The equivalent of approximately 600 μl of urine was extracted from the filter paper using 2 mL of 100 mM ammonium acetate adjusted to a pH of 5.9. Aliquots of the conjugated hormones were transferred to a C18 solid phase extraction (SPE) column (UCT LLC, Briston, PA, USA), eluted using methanol, and the eluate was dried under nitrogen at 40 °C. The conjugated hormones were then hydrolyzed from their glucuronide and sulfate forms to free forms using enzymes from Helix pomatia (Sigma-Aldrich, St. Louis, MO, USA) in acetate buffer (55 °C, 90 min). The enzymatic reaction was quenched with sodium hydroxide and the hormones extracted with ethyl acetate. The ethyl acetate extracts were dried under nitrogen at 40 °C. The analytes were derivatized using a mixture of 100 μL acetonitrile and 50 μL bis(trimethylsilyl)trifluoroacetamide (Sigma-Aldrich, St. Louis, MO, USA) for 30 min at 70 °C. Internal standards (estradiol-D5, Steraloids, Newport, RI, USA) were added prior to ethyl acetate extraction, and the percentage recovery after all assays was > 90%. Derivatized extract (1.6 μL) was injected into the GC-MS/MS. Samples were analyzed along with a standard curve spanning the expected range of concentrations along with a series of controls. Instrument conditions for the oven were an initial temperature of 130 °C increasing to 200 °C at 25 °C/min, then to 230 °C at 4.3 °C/min, and finally to 290 °C at 25 °C/min. Multiple reaction monitoring transitions were 416 > 285 for E 2 , 342 > 257 for E 1 , 269 > 187 for αPg, and 449 > 103 for βPg. Creatinine was measured using a conventional colorimetric (Jaffe) method, after initial extraction from the filter paper. The average interassay coefficients of variation were 8% for E 2 , 10% for E 1 , 12% for αPg, and 13% for βPg. Sensitivities of the assays were as follows: E 2 , E 1 , and αPg, 0.2 ng/mL; βPg, 10 ng/ mL.
Urine reference ranges were determined by testing a prior cohort of 600 premenopausal women. Urine hormone metabolite concentrations were normalized to creatinine to account for variations in urine concentration and to compensate for any variations in filter paper saturation during collection. The luteal reference range used is the 20th to the 80th percentile for this population.

Dried urine sample stability
A stability study was conducted on urine from four individuals. Aliquots were applied to filter paper and dried. These were stored at room temperature and then frozen at − 80 °C at 15 different time points ranging from day 0 to day 84 (n = 60). Finally, all samples were tested in a single batch to assess for reproducibility of the measurements, and thus, stability of the four analytes at room temperature for up to 84 days.

Statistical methods
The post hoc power calculation based on the primary study with a sample size of 12 subjects provided more than 85% power to detect an interclass correlation coefficient of at least 0.75 between serum and urine values, before accounting for repeated measures. The use of repeated measures in the four premenopausal individuals increased the power to greater than 90% to detect a Spearman correlation of at least 0.6. Alpha was set at 0.05, and all analyses generated 2-sided p-values. The statistical analyses were performed using SAS/STAT ® software, Version 9.3 (SAS Institute Inc., Cary, NC, USA).
Hormonal measures are expressed as median (interquartile range (IQR)), as they were not normally distributed. Similarly, because the data were not normally distributed, only non-parametric tests were used in the analyses. Spearman correlation coefficients were used to determine interclass associations between variables. Wilcoxon and Mann-Whitney tests were used to assess differences between men and women. Reference ranges were employed to standardize the serum and urine values for direct comparison, using min-max scaling, i.e. (observed value − minimum reference)/(maximum reference − minimum reference). Consistency and agreement between the standardized serum and urine measures were assessed using intra-class correlation coefficients (ICC). ICCs differ from the more familiar interclass correlations (e.g., Pearson and Spearman correlation coefficients) by assessing the agreement of a measure between groups [11]. The ICC ranges from 0 to 1; the higher the value, the more closely the two measures are in perfect agreement. Mixed models to account for repeated measures and using a variance components covariance structure were used to assess whether observed differences between serum and urine standardized values were significant, and to determine differences between premenopausal and postmenopausal women. Consistency of 4-spot versus 24-h urine collections and dried versus liquid urine samples were evaluated with ICCs, while comparisons of differences between measures within an individual were assessed using signed rank tests. Because the hypotheses of this paper were intrinsically correlated and each question was of independent interest, no adjustments were made for multiple comparisons.

Study populations
The primary study evaluated 52 samples: 44 premenopausal from day four to the end of the menstrual cycles and 8 postmenopausal. The 4-spot to 24-h comparison had one sample each from 26 individuals, and a subset of these (n = 18) used for the liquid to dry comparison had one sample each from 18 individuals. Characteristics of the populations and descriptive statistics are provided in Table 1.

Primary study: Dried urine compared to serum
The primary study evaluated whether reproductive hormone metabolites, as measured in a dried urine sample by GC-MS/MS, provide information comparable to serum hormone concentrations measured by a conventional RIA method. Assessment of consistency between standardized measurements from serum to dried urine assays using ICCs revealed substantial reliability between the assays in all comparisons (Table 2) [12]. There were no significant differences between the urine and serum assays for E 2 , E 1 , and αPg. However, standardized βPg was consistently greater than standardized serum progesterone within individuals indicating a small, but systematic difference (p = 0.03). All four urinary hormone metabolites followed the pattern of results obtained from serum with similar timing of peaks and troughs. The cycle maps for urinary βPg and estradiol are shown in Figs. 1, 2 and urinary αPg and estrone in Additional file 1. Subjects 2, 3, and 4 showed the usual luteal surge of progesterone, whereas subject 1 is presumed to be anovulatory with no surge in either of the progesterone metabolites. The individual data for the 4 premenopausal women are found in Table 3.

Secondary studies 4-spot versus 24-h collection
The measurement of hormone metabolites in urine samples collected four times throughout the day (4-spot) covering an average of 12 h (range 10-14 h) are comparable to the gold standard of a 24-h urine collection ( Table 4).
The ICC was greater than 0.95 for all comparisons, indicating almost perfect agreement between the 4-spot collection technique and the gold standard assay. Despite the excellent agreement, the 4-spot results were slightly, but consistently higher than the 24-h urine results for the urinary E 1 , and E 2 concentrations in the paired comparisons with the 24-h urine collection (p < 0.05 for both). Spearman correlation analysis with 24-h urine data expressed as ug/day versus 4-spot assay results showed excellent correlation (Fig. 3 and Additional file 2).

Liquid versus dry urine samples
There was no difference in analyzing liquid urine or dried urine samples extracted from filter paper. Results from a dried versus a liquid urine sample showed excellent consistency, as assessed by ICCs (Table 5). Despite the excellent agreement, the dried urine assay

Table 1 Study population characteristics and hormonal concentrations
Hormone data presented as median (IQR). Urine values are from the dried urine assay Cr, creatinine; αPg, 5α-pregnanediol; βPg: 5β-pregnanediol a p < 0.01 b p < 0.05 for differences between groups (pre-versus post-menopausal women or males versus females), as assessed with mixed models for the primary study and Wilcoxon and Mann-Whitney tests for the secondary studies c All calculations of hormonal measures account for repeated measures within the premenopausal individuals using mixed models d Of the 26 individuals (15 females, 11 males) who participated in the comparison of the 4-spot versus 24-h urine assays, 18 individuals (10 females, 8 males) were used to assess comparability of dried versus liquid urine samples  measurement was consistently lower than the liquid assay for urinary E 1 and βPg concentrations in the paired comparisons (p = 0.04 for both). Spearman correlation for the 4 urinary metabolites was excellent ( Fig. 4 and Additional file 3).

Discussion
We have evaluated the performance of a validated proprietary assay for estradiol and progesterone metabolites using dried urine samples and GC-MS/MS analysis in comparison with results of serum analysis using conventional RIAs.
Our primary study showed that the measurement of urinary hormone metabolites in the dried urine assay provides menstrual cycle maps comparable to those derived from serum hormone levels in individual premenopausal women (Figs. 1, 2 and Additional file 1). The urine results showed substantial agreement with serum results based on intraclass correlations for the group of premenopausal and postmenopausal women for all metabolites supporting use of the urinary assay as a substitute for serum analysis. The urinary results were not significantly different than serum when analyzed using standardized reference ranges, except for βPg. Despite the statistical difference for βPg it is unlikely to be  Fig. 1 Hormone profiles of serum progesterone versus urinary 5β-pregnanediol (a) and serum estradiol and urinary estradiol (b) in four premenopausal women. Cr, creatinine; βPg, β-pregnanediol clinically significant. The βPg levels may be consistently higher than those of serum progesterone due to a number of factors that include variations in protein binding in serum, cross-reactivity in the RIA, and differences in that populations of premenopausal women used to develop the reference ranges. Menstrual cycle maps of four premenopausal women were compared for serum and urine measures. Although the sample size was small, because there were multiple samples from each individual over the course of their menstrual cycle, the repeated measures analysis provided good statistical power and confidence in the comparisons. We also showed that using a 4-spot collection technique is equivalent to doing a complete 24-h urine collection as evaluated by intraclass correlations. Although the 4-spot method consistently overestimated the estrogen metabolites compared to the 24-h urine collection, both differences were less than 10% of the total and not likely to be clinically relevant. Factors that may contribute to these small differences include individual variation, phase of the menstrual cycle, and timing of the 4-spot collections. With the excellent agreement demonstrated by the intraclass correlation coefficients, the 4-spot collection method may be useful for evaluating some postmenopausal women considering HRT. For monitoring the effectiveness of some HRT, such as transdermal estrogen that avoids first-pass metabolism, evaluation of urinary hormone metabolites may be a b Fig. 2 Hormone profiles of serum progesterone versus urinary β-pregnanediol (a) and serum versus urinary estradiol (b) in one premenopausal woman's cycle. Metabolites of subject 2. Cr, creatinine; βPg, β-pregnanediol useful. Routes of administration, such as oral or sublingual, undergo first-pass hepatic metabolism and urine evaluation may not be appropriate.
The final part of our study established that using urine dried onto filter paper provides almost identical results to testing liquid urine. This result was expected, but we do not believe this information has been published previously. While the collection method of urine onto filter paper is easy, it is not infallible, and occasionally the paper may not become completely saturated. For this reason, the urinary hormone metabolites are indexed to creatinine concentration in our studies to account for possible variation in saturation of the filter paper.
Use of a dried filter paper sample provides a small volume of urine that is sufficient for GC-MS/MS analysis, and limited repeat testing. If more volume is needed, additional samples can be collected, either at the time of initial collection or at a later date. Dried samples require extraction from the paper prior to analysis, an additional step as compared to using liquid urine, but in the commercial laboratory setting, workflows are optimized and throughput efficient. Using filter paper samples provides for ease of storage and transportation and may facilitate studies in a broader range of world settings, as neither refrigeration nor freezing are required. Sample stability will vary depending on the compounds of interest, but our samples had good stability at ambient temperatures for nearly 3 months.
Assessments of urinary hormone metabolites can be used in place of serum hormone analysis in some situations, but there are important variables to understand. Timing of sample acquisition is important, as serum testing provides information about status at the moment of collection, but urinary testing provides information from a span of time and with a lag due to metabolic processing. For hormones with pulsatile secretion such as progesterone [13] (serum concentrations may vary fivefold or more from minute to minute during luteal phase), the use of urine samples mitigates the variability of serum testing and provides a result that represents 6-8 h of time (for a waking sample).
Metabolism of steroid hormones produces a number of conjugates in urine and the proportion of each can vary between individuals. For instance, metabolites of E1 and E2 include glucuronide and sulfate forms of the hormones. Pregnanediol may exist as a glucuronide at  providing for a potentially more complete representation of the hormone profile. In addition to individual variation, testing urine metabolites assumes that the individuals have normally functioning phase II metabolism of estradiol and estrone as well as the phase I metabolites of progesterone. Although we are not aware of any defects in estrogen or progesterone metabolism, there are known defects of phase II metabolism of testosterone that may lead to falsely low results, and the possibility cannot be ruled out [14]. The use of GC-MS/MS confers some advantages over other analytical tools. Laboratories often choose between GC and MS and liquid chromatography with tandem mass spectrometry (LC-MS/MS) assays for urinary evaluations [6][7][8]. GC-MS offers better chromatographic separation of similar compounds (such as isomers), but assays may suffer from a lack of sensitivity and selectivity. The LC-MS/MS combination systems offer a high degree of accuracy and sensitivity needed for small sample volumes and for many polar compounds allows for testing without the need for derivatization. Throughput is often increased, but the chromatographic separation is less than GC can provide. Increased chromatographic separation is helpful in cases where the signal from the mass spectrometer is not unique for two compounds. For example, isobaric estrogen metabolites may not give a unique signal on the GC-MS/MS and thus would have to be separated chromatographically.
In this study, we evaluated only urinary metabolites of estradiol and progesterone, although the DUTCH method is designed to also measure androgen metabolites, cortisol metabolites, free cortisol, and melatonin. The 4-spot collections are helpful for evaluating hormones with diurnal cycles such as estradiol, cortisol and melatonin. Further validation studies of the dried urine method should include 24-h correlation and