Systematic review of the diagnostic performance of serum markers of liver fibrosis in alcoholic liver disease

Background Alcoholic liver disease (ALD) is a significant cause of death and morbidity. Detection of liver fibrosis at an early stage could provide opportunities for more optimal management. Serum markers of liver fibrosis offer an alternative to biopsy. Evidence of the performance of biomarkers in ALD is needed and a systematic review to evaluate available studies was conducted. Methods Electronic databases were searched. Studies were included if they evaluated paired samples of biopsy and serum, and presented data as sensitivity, specificity, or ROC curves. Results 15 studies were included- median participant number = 146 (range 44–1034). Studies differed with respect to patient populations. 6 single markers were evaluated (mostly Hyaluronic Acid), and ten combined panels. Biomarkers could discriminate between people with severe fibrosis/cirrhosis with high diagnostic accuracy- HA (median AUROC 0.79 range 0.69-0.93), panels (median AUROC 0.83 range 0.38-0.95). Significant heterogeneity precluded pooling. Performance was poorer for detecting less severe fibrosis. Conclusions There are limited numbers of small studies evaluating the accuracy of biomarkers in identifying fibrosis on biopsy in ALD. Some showed promise (both HA alone and some panels) in the identification of cirrhosis/severe fibrosis and could be used to rule it out in heavy drinkers. Biomarkers less accurate with less severe fibrosis.


Introduction
Alcohol related deaths are an important health concern worldwide. In the UK 85% of such deaths are due to cirrhosis and recent epidemiological studies have shown that although mortality rates from cirrhosis are falling in most countries absolute rates remain high, and in the UK and Eastern Europe the trend is upwards with 18% rise in deaths from alcohol related causes between 2000 and 2004 [1][2][3][4][5]. In these countries alcohol consumption is high and increasing and patterns of drinking have changed over the past three decades -binge drinking and a rise in hazardous drinking in younger women. Alcoholic Liver Disease (ALD) therefore represents a serious public health problem and is likely to get worse in the UK in the coming decades.
Clinicians and patients require accurate information about the degree of liver fibrosis in ALD to assess disease severity in order to predict outcome, guide management decisions and monitor disease. Detection of fibrosis in people drinking hazardously at an early stage or before clinical symptoms of hepatic decompensation could provide opportunities for more optimal management. This is a challenge in a disease process with few characteristic symptoms or signs. The current reference standard to ascertain the stage of fibrosis is histology obtained through liver biopsy. This is an invasive test and subject to limitations both in its acquisition (sampling error, length of biopsy, morbidity and mortality), subsequent analysis (intra and inter observer variability) and inherent drawbacks as a reference standard (ordinal categorical variable representing continuous biological process) [6][7][8]. In the past decade efforts have been made to find other tests to accurately evaluate fibrosis. Serum markers of liver fibrosis offer an attractive alternative to liver biopsy, as they are less invasive, may allow dynamic calibration of fibrosis, and are potentially more cost effective. Evidence of the diagnostic performance of such serum markers of liver fibrosis in Chronic Liver Disease are needed to assess the clinical utility and effectiveness of such tests in the diagnosis, prognosis and management of liver disease. Systematic reviews of the diagnostic performance of serum markers in chronic hepatitis C (CHC) and non alcoholic fatty liver disease (NAFLD) have been published but none so far on the evaluation of markers in ALD [9][10][11][12][13].
In order to provide such evidence, a systematic review was conducted to locate, collate, appraise and analyse studies that evaluated the performance of serum markers in the diagnosis of liver fibrosis in ALD.

Methods
A systematic literature review was conducted following accepted published principles to ascertain the diagnostic performance of serum markers of liver fibrosis [14].
Sources searched included: Electronic databases 1980 -April 2009 Cochrane Library 2009 Reference lists from relevant articles MEDLINE, EMBASE were searched using a search strategy derived from the literature (search strategy available from authors). Search terms were added following initial searches as appropriate.
No authors were contacted for further information.

Inclusion/exclusion criteria
A serum marker was defined as any measure that could be derived from a blood sample Studies were included if they; were systematic reviews, meta-analyses or primary studies of diagnostic tests were written in English used liver biopsy as a reference standard presented data as sensitivity or specificity or diagnostic accuracy or receiver operator characteristic curve (ROC) analyses included >30 participants (as smaller studies will be underpowered to produce precise estimates of test performance and would be more likely to produce zero denominator effects in a 2 × 2 Studies were excluded if data were presented only in abstract form. Studies identified by the search strategy were assessed for inclusion by two reviewers (JP and ING).

Data extraction strategy
Data extraction was undertaken by one reviewer (JP) and checked by a second reviewer (ING) with any disagreements being resolved through discussion. A third reviewer (PR) was consulted to resolve persisting issues. Information collected included patient demographics, test assay details; background prevalence of fibrosis severity, risk factors, histological parameters, statistical methods used, and test performance characteristics consistent with columns in Tables 1 and 2.

Data analysis/synthesis
Data are presented with full tabulation of results of included studies.
Where data were available, 2 × 2 tables were constructed to derive sensitivity, specificity, predictive values, likelihood ratios (LR) and diagnostic odds ratios (DOR) at each threshold value. (Accepted levels for robust tests are -LR = <0.1, and + LR = >10, >5 and <0.2 give strong diagnostic evidence. For DOR reasonable test performances would be >30). Severity of fibrosis was defined by authors (for locally derived classifications) and as mild = stages 0,1, moderate/ severe stages 2-4, severe fibrosis stages 3,4 and cirrhosis stage 4 for those using METAVIR/Scheuer classifications.

Results
The electronic search yielded 463 abstracts which were read in full. 41 full papers were retrieved of which 26 were excluded leaving 15 studies in separate populations to be included in the review (see Table 2). Reasons for exclusion were (may be >1 /study); Not primary study (editorial/non systematic review) n = 3 Outcome was not fibrosis (usually alcoholic hepatitis) n = 6 Participants <30 n = 1 No results separable for ALD alone n = 6 No results reported as sensitivity, specificity, ROC curves, diagnostic accuracy n = 11 (Most of these studies reported correlation coefficients/differences in means of serum markers between group with fibrosis and those with less fibrosis). No results for fibrosis alone separable from data that combined steatosis with fibrosis or fibrosis/cirrhosis with acute alcoholic hepatitis (AH) n = 4    No systematic reviews or meta-analyses were identified. Studies were conducted between 1989 and 2009. Study characteristics are shown in Table 2. The median age of participants in included studies was 50 years (range 44-65 years), 77% were male (range 63-100%) and the median number of study participants was 146 (range 44-1034). The median background prevalence of serious fibrosis/cirrhosis was 41% (14-59%). All of the studies were conducted in secondary/tertiary settings.
There was marked differences between the studies. Different scoring systems were used: METAVIR (or modified METAVIR) n = 6; Scheuer n = 1; Ishak n = 2; Knodell n = 1; Worner /Lieber n = 1, and locally generated n = 5 (mostly dividing fibrosis into mild, moderate or severe). 13/15 studies presented data that showed the performance of the markers in identifying cirrhosis/severe fibrosis (METAVIR stages 4 /3,4), 5/15 reported significant fibrosis (METAVIR stages 2-4), and 3/15 studies reported information identifying any fibrosis). All of the studies evaluated performance of markers using cross sectional data for paired samples of histology and serum. 14/15 studies recruited prospectively, and half recruited consecutive patients. There was heterogeneity of patient selection. Although all participants were recruited in a hospital setting, some were hospitalized and some were out-patients. There were also differences in both in the inclusion criteria and daily alcohol consumption. Inclusion criteria reported were patients with previously diagnosed ALD, and or "alcoholism" or heavy alcohol consumption, or patients admitted for rehabilitation/detoxification/alcohol withdrawal symptoms. The daily consumption of alcohol (where reported) varied with 1 study recruiting patients drinking >100 g of alcohol/day, 4 studies >80 g, and 6 studies >50 g, Inclusion criteria used a varied number of years drinking at these levels (range 5-10 years) reported, with one study having a mean alcohol consumption of 225 g/day for a mean of 19 years [29] (See Table 2). Some studies used the same population of patients with ALD to report the performance of different serum markers-single and panel tests-in two publications [25,30]. Another research group reported two studies which also used the same patient population, with the earlier study reporting results from 109 patients with compensated ALD recruited in 1994-95 and the later study adding further patients from 1997-98 and reporting from the whole cohort (n = 240) [2,18]. Both studies were included as data reported were different, with the earlier study reporting the performance of two serum markers and the later study having more participating patients but reporting results for one marker. This may reflect the difficulty in recruiting and retaining patients with this liver disease The significant heterogeneity precluded pooling of results. Results are presented separately for single markers (Table 2) and for marker panels (Table 3) in the identification of cirrhosis (F4 METAVIR) cirrhosis, /severe fibrosis (F3/F4 METAVIR) and 'significant' fibrosis (F2-4-Metavir). There were 13 separate markers evaluated-6 as single markers, and the remaining as components of 10 panels. 5/6 of those reported as single markers were also used in the panels. Three studies reported sensitivity and specificity at more than one threshold [25][26][27].

Single markers
All single markers studies were heterogeneous with respect to the grade of fibrosis identified by the test, and the thresholds reported ( Table 2).

i) Hyaluronic Acid (HA)
The most commonly measured single marker was HA (7 studies, total n = 1360), The studies were all small (n =~200) and where reported different thresholds of HA concentration for positive test results were used (range 55 mcg/l -250 mcg/l). Not all studies gave sufficient detail of analytical methods used to determine HA, but there were differences in methods used in those that did report the assay-a radiometric binding protein assay (used by three included studies); an enzyme linked binding protein assay, and immunoassay using a magnetic particle separation technique (2 studies). The inclusion criteria with respect to alcohol consumption were different for each study (>100 g alcohol daily; >80 g for >5 years, , >50 g daily alcohol for >5 yrs, >50 g alcohol daily for 1 year) as was the size of the studies (range n = 70-247). The severity of serious fibrosis varied between studies, with prevalence of cirrhosis in one study [22] being less than half that in the other studies Seven studies evaluated its performance in the identification of cirrhosis or cirrhosis /severe fibrosis although only 4 of these reported AUROC values. One study reported results for the identification of patients with no or mild fibrosis. The AUROCs for the 3 studies identifying cirrhosis were discrepant −0.78, 0.80 and 0.93. The median AUC for predicting severe fibrosis/cirrhosis =0.79 (range 0.69-0.93). Overall the LRs and predictive values showed that HA was better at excluding cirrhosis/ severe fibrosis than detecting it, with NPVs consistently high 90% for cirrhosis. There are two direct comparisons of a panel and HA. These showed differing results. In the larger study [25] there was no significant difference between panel (Fibrotest) and HA at both identifying cirrhosis and moderate /severe fibrosis. In the other study [28] most of the panel tests had greater AUC values in predicting cirrhosis than HA alone (but 95% CI were overlapping) but at lower levels of fibrosis the performance of HA and panels are more similar. Overall HA was better at identifying   cirrhosis alone than moderate/severe fibrosis (AUROC0 .80) or milder fibrosis.

ii) Other single markers
There were more limited data on five other single markers, with only three studies presenting AUROC analyses. Prothrombin index had high LR + and predictive values in the identification of cirrhosis in two studies. One study reported performance of TIMP1 and PIIINP in the same population of patients as single markers and as part of a panel. The study found that the AUROC values were lower than in other studies of the same markers [29]. However this study population differed from the other studies in having a very high alcohol consumption over a long period of time

Marker panels
Cirrhosis/severe fibrosis ( Figure 1, Table 3). Eight studies assessed the performance in detecting cirrhosis/severe fibrosis, five of which reported AUROCs. Four studies were external validations of previously derived panels [25,[27][28][29][30]. Several panels (Fibrotest, Fibrometer, Hepascore, ELF) showed promise in detection of cirrhosis with AUROCs >0.9, although one was small (ELF n = 64), and one showed no statistically significant difference to HA in direct comparison (Fibrotest). Common components of these panels are HA (in 3 panels), alpha macroglobulin (in 2 panels), GGT (in 2 panels). One panel (Tran index) reported a very high specificity and PPV compared to other panels. Simpler panels with ≤3components (for example PGA-Prothrombin Index, GGT and Apolipoprotein A1) performed as well as more complex panels -in a direct comparison AUROCs for cirrhosis PGA 0.89 Vs Fibrotest 0.84 Vs Hepascore 0.76, and for severe fibrosis/cirrhosis AUROCs PGA 0.84 Vs Fibrotest 0.80 Vs Hepascore 0.83 although this was only in one small study [25].
(ii) Moderate /severe fibrosis (Biopsy stages 2-4) The performance of eight panels were reported of which three had AUROCs >0.8 in detection of moderate/severe fibrosis, Three studies reported results for Fibrometer, with a varying range of AUROCs (0.96, 0.83, 0.82, total patients n = 416). Fibrotest AUROCs were 0.84,0.83, 0.79) (total n = 324); and it was not significantly more accurate than HA alone in direct comparison). Two studies reported results for Hepascore (AUCs 0.76, 0.83) total n = 321. Other panels had poorer performance in detecting moderately severe fibrosis.

Discussion
A systematic review of the diagnostic performance of serum markers in identifying liver fibrosis on biopsy in patients with ALD using standard methodology found 15 primary studies. The evaluations used 13 different markers, for single markers most commonly HA (n = 7), and 10 marker panels. Serum markers were able to identify those people with severe fibrosis/cirrhosis with reasonable diagnostic accuracy (based on AUROCs). HA as a single marker performed well in identifying cirrhosis, as do some panels of markers. The performance of the serum markers was poorer at identifying lower grades of fibrosis, although few studies evaluated this. The paucity of the literature precluded further conclusions and summative analysis was not possible due to study heterogeneity.
The evidence base for serum markers in ALD lags behind that of Hepatitis C and non alcoholic fatty liver disease. The studies are fewer in number, have fewer participants, vary considerably in inclusion criteria, and have a higher prevalence of cirrhosis/severe fibrosis than in similar studies in Hepatitis C and NAFLD. They also tend to be older studies than other liver disease aetiologies, being less informed by recent advances in the rigour and standardisation required from design and reporting of diagnostic studies [31]. More recent studies have evaluated panels (two of which were external validation studies). Panels varied in their individual constituents, and in the number of components. Generally the values of AUROCs of panel tests in patients with ALD in predicting cirrhosis /sever fibrosis are comparable with those in NAFLD or Hepatitis C. For example in a metaanalysis of Fibrotest in Hepatitis C the mean AUROC for predicting significant fibrosis was reported as 0.77 (95% CI 0.75, 0.79) and in NAFLD 0.81 (95% CI 0.74 0.86) [2], and a summary AUROC for cirrhosis 0.82 [32]. Certain panels such as APRI seem to perform less well in ALD than in Hepatitis C. Summary AUROC for significant fibrosis was reported as 0.76 (95% CI 0.74 0.79) and for cirrhosis 0.82 (95% CI 0.79 0.86) [33,34].
There have been reports in the literature of the effect of current heavy alcohol consumption on circulating serum markers which may limit their performance in identifying the chronic effect of alcohol on fibrosis in patients who may be current drinkers. The mode of action of alcohol on the markers is unclear. Animal models have shown that alcohol may have an effect on serum markers such as HA in several ways-by alteration of communication between liver cells thereby affecting HA clearance and by direct effect on induction of hepatic sinusoidal endothelial cell dysfunction [35,36], Studies have shown that some markers are more susceptible to influences of acute consumption but results are not consistent. One study reported that some markers are affected (tenascin, laminin), some are unaffected (PIIINP, TIMP1), and some very variable (HA) [37]. One small study reported that mean levels of PIIINP but not TIMP1 rise with abstinence [38]. This confirmed the results from an earlier study which showed similar effect of alcohol on PIIINP [38] Direct studies of effects of alcohol on serum markers in clinical studies involve very small numbers and few studies have reported in the last 5 years. Most alcohol status (were reported ) is self report with some studies using collateral evidence when available. The included studies in this review did not all report current drinking status in detail. In 4 studies included patients were in-patients for alcohol withdrawal /rehabilitation, in 2 studies the patients were not abstinent. More data from large robust studies are needed to properly evaluate the influence of current alcohol intake (ideally quantified with objective measures/triangulated evidence) on markers, reporting results in terms of level of alcohol consumption and time of abstinence. A major concern in drawing overall conclusions from this review is the considerable heterogeneity of the study populations. Whilst all included studies recruited patients from specialist clinics in secondary or tertiary settings (there were no studies set in primary care), there was variation in the population characteristics, such as level of alcohol consumption, and differences in the prevalence of severe fibrosis. This may lead to spectrum bias influencing diagnostic performance and additionally, affect generalisability. Design of the studies differed with variation in recruitment methods and inclusion criteria. All patients had to have had a biopsy (from inclusion criteria) which could introduce verification bias compared to those patients with excess alcohol consumption not selected for biopsy having a different disease severity than those who were selected. Only four studies reported any parameters by which biopsy quality could be judged, and half of these reported findings stratified by biopsy quality. Even when the tests were similar between studies, the thresholds used were different or not reported. Direct comparison between studies was made more difficult by the use of a range of fibrosis staging systems, largely locally generated. There was heterogeneity and lack of standardization of analytical methods used for the markers measurements and as these different assays may not be well correlated, external validity may be reduced and the determination of a single generalisable threshold remains problematic for those markers assayed locally. Access and availability of serum markers using commercial automated platforms may address this issue. There was incomplete reporting of co-morbidities and diagnostic test results, making appraisal and summative assessment difficult. The paucity of studies which looked at direct comparisons between panels, and between single marker and panels make it difficult to say one panel is more accurate than another. It is clear from this systematic review that the current serum markers are promising, improving and may provide additional diagnostic information in the identification and management of people with ALD.
The limitations of this review include lack of data to perform summative analyses and a focus on the ability of diagnostic tests to identify fibrosis alone. Detection of inflammation has not been addressed. Issues of spectrum bias which may have an impact on performance characteristics of the tests making direct comparisons between studies problematic, and this has not been directly addressed in this review. This is due to several main problems in accounting for such as bias. The first is a lack of a universally accepted system of dealing with this issue, especially in this group of patients with ALD. There have been some methodological suggestions published by one group in chronic Hepatitis C [39], who have used this method in a study in ALD patients [30]. Authors used standard population of same prevalence for all fibrosis stages and currently it is unclear if this has external validity or international acceptance by professionals working in this field. In addition the studies included in this review are older, use different classification systems for histology and have inconsistent and incomplete reporting of the individual stages of study participants. All of this makes accounting for spectrum bias problematic, complex and of questionable validity in this review. However it is an important issue and should be borne in mind when looking at results between studies.

Clinical implications
For preventing and managing ALD it is important to identify those patients who are drinking hazardously and have clinically silent severe fibrosis/cirrhosis in order to focus interventions, to begin to screen for varices and Hepatocellular carcinoma or to prepare for possible liver transplant. Data presented in this review suggest that marker panels could be used effectively in this situation. It would be clinically useful to patients and clinicians to identify the proportion of hazardous drinkers who have developed liver disease to monitor disease progress more closely and to offer an opportunity for strategies aimed at reduction/abstention. Repeated serum marker measurement showing rise or decline in results may have an impact on lifestyle choices again allowing scope for reduction in alcohol consumption. These are speculative ideas and require further research. This group of patients often has erratic attendance at outpatient and biopsy appointments and may present in settings where invasive tests are inappropriate/ difficult (e g prison). Access to non-invasive tests of liver fibrosis would be useful in the management of such patients.

Future research
Large studies of patients with ALD need to be designed which can directly compare and validate in external populations, performance of existing markers, the identification of new markers or enhancement of existing tests to identify any, mild or moderate fibrosis. For example, methods such as proteomics and metabonomics may identify markers that can be incorporated into existing or new panels of markers, either in isolation or in combination with quantitative imaging techniques (such as elastography). This process might be facilitated by establishing an international reference library and quality assurance scheme. The evaluation of diagnostic performance should be accompanied by parallel evaluation of test performance for properties such as reproducibility, stability and linearity. Further work is needed to ascertain the diagnostic performance of markers in primary care setting. The limitations of liver biopsy may create a glass ceiling for potential noninvasive tests, and future studies should consider use of clinical outcomes as the reference standard. The few studies that have been reported in the literature on performance of serum markers in ALD predicting clinical outcomes rather than fibrosis have shown good performance for some panels of serum markers [27]. Fibrotest, Hepascore and Fibrometer A has been shown to be able to predict liver related mortality at 5 years and 10 years (AUC = 0.79 (95% CI 0.68,0.86) 0.77(95% CI 0.69,0.85) 0.80(95% CI 0.71,0.87) respectively, at least as well as biopsy (AUC 0.77 (95% CI 0.70,0.83). Forns index, APRI and FIB 4 had lower performance in predicting liver related mortality -AUCs 0.40 (95% CI 0.30,0.49), 0.60 (95% CI 0.50,0.69), 0.65 (95% CI 0.54 0.74 respectively. In a smaller population of patients with ALD the predictive performance of the ELF test has also shown AUC 0.80 (95% CI 0.70, 0.89) for liver related morbidity/mortality at 7 years (personal communication with Authors). Additional larger studies that can evaluate and compare performance of non invasive methods in predicting clinical outcomes in patients with ALD are needed.
In summary, none of the serum markers reported so far in the literature appear to have a very good performance for fibrosis severity less than moderate/severe fibrosis/ cirrhosis. In general, performance decreases as severity of fibrosis being identified/ruled out decreases. HA shows some promise as a single marker in ruling out cirrhosis and to an extent severe fibrosis, but it is hard to know what threshold to use. Other single markers have less good performance when used alone. Some Panels (Fibrometer, Fibrotest Hepascore, and ELF) show promise in diagnosing cirrhosis/severe fibrosis but studies in ALD have small numbers.

Conclusion
A systematic evaluation of the evidence of the diagnostic performance of serum markers of fibrosis in ALD has shown that there are few small studies published which show that serum markers are able to identify cirrhosis/ severe fibrosis with good diagnostic accuracy, although study heterogeneity in design and outcome precludes pooling. In clinical practice, this may allow earlier exclusion of liver damage in hazardous drinkers permitting earlier and targeted interventions. The limitations of the liver biopsy may create a glass ceiling for potential non-invasive tests, and in this regard more studies using clinical outcomes should be evaluated.