Water Resources esearch, Vol. 31, No. 12, pages 3229-3240, December 1995

Copyright 1995 by the American Geophysical Union
(Received December 7, 1994; revised August 11, 1995; accepted August 24, 1995.)
Paper number 95WRO2573. - 0043-1397/95/95WR-02573$05.00

FIELD EXAMPLE OF DATA FUSION IN SITE CHARACTERIZATION

by Sean A. McKenna and Eileen P. Poeter
Department of Geology and Geological Engineering, Colorado School of Mines, Golden

S. A. McKenna, Sandia National Laboratories, Geohydrology Department, P. 0. Box 5800, Mail Stop 1324, Albuquerque, NM 811851324. (e-mail:samcken@nwer.sandia.gov)

E. P. Poeter, Department of Geology and Geological Engineering, Colorado School of Mines, Golden, CO 80401. epoeter@mines.edu

Abstract

Application of data fusion to characterization of the Fountain and Lyons Formations at a field site incorporates geologic knowledge, geophysical log data, cross-hole seismic tomography, hydraulic test data, and observations of head to reduce uncertainty associated with subsurface interpretation. These formations consist of channel and overbank deposits that have undergone variable diagenesis, resulting in more hydrofacies than would have been encountered in the original, unaltered deposits. The disparate types of available data are integrated to yield a coherent hydrofacies classification through use of discriminant analysis and soft data techniques. This data fusion improves definition of the complex hydrofacies and increases knowledge of their spatial correlation. Two hundred multiple-indicator, conditional, stochastic simulations of the site are generated, 100 with only hard data and 100 with both hard and soft data. Forward groundwater flow modeling using estimates of hydraulic conductivity from field testing yields smaller head residuals for realizations which include soft data. Inverse modeling is used to eliminate hydrofacies realizations that do not honor hydraulic data and to estimate hydrofacies hydraulic conductivity ranges for the hard and hard/soft data ensembles. Inverse parameter estimation substantially decreases head residuals for both ensembles. Standard deviations of hydraulic conductivities estimated through inverse modeling are smaller when both hard and soft data are used to generate the simulations, even though head residuals are similar within the two ensembles when these estimated hydraulic conductivities are used.

Introduction

In contrast to describing heterogeneity with a lognormal distribution of hydraulic conductivity, the hydrofacies conceptualization allows more of the available data to be integrated into subsurface interpretation, and many characteristics of a deterministic model can be preserved in the stochastic representations. A hydrofacies is a homogeneous, possibly anisotropic geologic unit that is hydrogeologically meaningful for purposes of field experiments and modeling [Anderson, 1989]. Relationships between geologic fabric of a lithologic facies and hydrogeologic parameters of that facies are documented for a number of depositional environments [e.g., Davis et al., 1993]. The hydrofacies treatment of heterogeneities allows for the construction of subsurface models which contain sharp contrasts in hydraulic conductivity. It is these abrupt changes which control the migration of contaminants in groundwater. Recent interest in incorporating geologic knowledge into hydrogeologic studies suggests an increased use of the hydrofacies concept.

The hydrofacies treatment of hydrogeologic heterogeneity has several advantages over Gaussian and binary treatments. First of all, recognition that hydrogeologic properties are related to geologic properties allows use of often large, albeit empirical, amounts of geologic information. Second, in most cases, it is not yet practical to directly determine hydrologic properties from field geophysical techniques, but determination of geologic heterogeneity from geophysical surveys in groundwater studies is becoming commonplace [Olsen et al., 1993; McKenna and Poeter, 1995]. Recognizing the influence of geologic heterogeneity on hydrogeologic heterogeneity facilitates conceptualization of deterministic heterogeneities (e.g., channels of well-connected, coarse-grained, high-hydraulic conductivity pathways in alluvial deposits). This type of heterogeneity is difficult to model through techniques based on a Gaussian distribution of log hydraulic conductivities.

Desbarats and Srivastava [1991] indicate that the perceived expense and effort required to obtain data that adequately represent spatial structure of subsurface heterogeneity may preclude field application of stochastic methods, and Gelhar [1986] calls for better methods to determine the spatial correlation structure of hydraulic properties. The concept of data fusion has been used informally in both the groundwater and oil industries for subsurface characterization. Previous work describing methods of combining multiple types of information for subsurface characterization in groundwater studies include Brannan and Haselow [1993], Hoeksema and Kitanidis [19841, Hyndman et al. [1994], and James and Freeze [1993]. The approach for an objective combination of geophysical, geological, and hydrological data through geostatistical simulation and inverse parameter estimation in subsurface hydrological applications was outlined by Fogg [1989]. The field application presented here follows that approach through demonstrating application of such a method, based on the hydrofacies conceptualization, to a fluvial sandstone.

Geologic, geophysical, and hydrologic data are used to identify hydrofacies and define their spatial distribution. Multiple-indicator, stochastic simulation conditioned on hard and soft information produces realizations of hydrofacies geometries, each of which serves as a zonation pattern for inverse flow modeling. Results of inverse flow modeling are used to discard implausible realizations and estimate values of hydraulic conductivity within each hydrofacies. Fusion of disparate types of data can have a greater impact on uncertainty reduction than refinement of any single type of data.

Site Description

The field site is located in Golden, Colorado, and overlies two basal sedimentary formations in the Denver Basin: the Fountain Formation and the Lyons Formation. The units have a N-S strike direction and dip to the east. The contact between the two units lies in the eastern half of the site. The depositional environment of the Fountain Formation is interpreted as a braided stream system including three depositional facies: stream channel arkoses, floodplain claystones, and deltaiclacustrine feldspathic sandstones [Howard, 1966; Hubert, 1960; Weimer and Land, 1972]. The distribution of hydrofacies within the Fountain Formation is a complex function of the depositional processes and postdeposition diagenesis. The Lyons Formation is a fluvial deposit in this locale and consists of fine to medium-grained sandstone containing large amounts of arkosic conglomerate and minor lenticular beds of shale and mudstone [Weimer and Land, 1972].

Soft Data

A comprehensive overview of soft data is given by Alabert [1987], and further discussion is provided by Journel [1989]. The codification of soft data has been recently updated [Zhu and Journel, 1993]. These qualitative data are written as z*(x): an estimate of attribute z at location x. Three categories of soft data are recognized: type A soft data are values based on an imprecise measurement (e.g., lithologic facies based on a measurement of seismic velocity), type B soft data consist of recognized bounds on a value without information on the distribution of values between the bounds (e.g., from geophysical logs it is determined a unit is coarser grained than a silt and finer grained than a gravel) type C soft data consist of a prior probability distribution on the variable (e.g., it is known from previous studies that the distribution of hydraulic conductivity values for a sandstone aquifer is lognormal with a given mean and variance). Type B soft data were not used in this study, and they are not discussed further.

Indicator Coding of Data

Indicator coding of information does not call for knowledge of the character of prior and/or posterior distributions. For hard data each discrete class will contain a 0 or 1. Type A soft data are also encoded with a 0 or 1 at each class, but each "1" has a nonnegligible probability that the true corresponding indicator is actually an "0" as determined by misclassification probabilities p₁ and p₂ [Alabert, 1987]:

where z(x) is a binary function defined for threshold z_c and i(x, z_c) is the indicator value at each location and threshold For type A soft data both hard and soft data can be sampled in the same area, and the quality of the experiment can be assessed by determining probabilities of misclassifying the attribute using soft data via calibration samples (collocated data). For a given threshold, if z*(x) is a good measure of z(x), p₁ approaches 1 and p₂ approaches 0. Type C soft data can be thought of as a posterior distribution, and in this case, coded vectors are complete and contain indicator values between 0 and 1 [Journel, 1986].

Improving Covariance Estimates With Soft Data

In many subsurface studies all data come from wells; thus data are plentiful in the vertical direction, but sparse in the horizontal, rendering it difficult to determine spatial covariance in horizontal directions. This problem can be alleviated by using soft data which can be plentiful in the horizontal dimension (e.g., geophysical data collected in surface or cross-hole surveys).

There are alternative approaches for using type A soft data in estimation of the covariance. In this case we estimate the true hard data covariance (C_I) as a linear combination of the sample hard data autocovariance, the soft-hard crosscovariance, and the soft data autocovariance. The hard-soft cross-covariance is scaled by the quantity p₁ - p₂, and the soft data autocovariance is scaled by (p₁ - p₂)². Covariance estimation through this technique follows the theory put forth by Alabert [1987] and employed by James and Freeze [1993]. The use of soft data to estimate the covariance allows the maximum number of pairs available for estimation to be N_h² + N_hN_s + N_s², where N_h and N_s refer are the number of hard and soft data respectively, rather than just N_h² if only hard data are available.

Type C data are incorporated into covariance estimates by weighting each sample location by its probability of occurrence for the given cutoff in a manner similar to that presented by Kulkarni [1984] for fuzzy bits, using the standard variogram (or covariance) equation but weighting the calculation for each pair of locations. Each indicator is coded as 1.0, and a corresponding vector of probabilities is defined from the prior cumulative density function values. The coefficient weight k(h), is given by

If the probabilities are determined via expert opinion, then it is assumed that the expert's degree of confidence is incorporated into the estimate of k(h). The indicator covariance for a given threshold is

Where m_-h and m_+h are the tail and head means, respectively, following the naming convention of Deutsch and Journel [1992] with the modification that the probability of the indicator value being

z_c is used in the calculations rather than the indicator value.

Estimating the Z cpdf With Soft Data

The estimates of p₁ and p₂ can also be used in determining the shape of the cumulative posterior distribution function (cpdf) at each location within the geostatistical simulation. The value of the cpdf at each threshold is estimated by a linear combination of hard data and soft data kriging system solutions. Each soft data point is scaled by p₁ and p₂ [Alabert, 1987]:

where both

and

are kriging weights derived from solution of a single kriging matrix, such that

Equations (4) and (5) describe cokriging of hard and soft data with a single unbiasedness constraint. This cokriging through a single matrix is possible by invoking the "Markov approximation" which states that when hard and soft data are collocated, the hard data will prevail over the soft data [Zhu and Journel, 1993]. The hard and soft indicator values are i and i* respectively. for each location and cutoff. The cpdf may not always satisfy constraints on cdf's (i.e., may not always yield values less than 1). especially when the soft information is poor and/or the number of calibration samples is small In these cases, the cdf is corrected [Deutsch and Journel, 1992].

For type C data the prior cdf is reproduced exactly as the posterior cdf independent of the proximity of any conditioning data. The posterior cdf is used in a Monte Carlo process to determine the simulated value at that location. The simulated data are then used as conditioning data for the simulation of other points.

Data Collection and Analysis

Sixteen wells (Figure 1), ranging from 30.5 to 48.S m deep (and including 122 m of core), exhibit water levels 10-12 feet (3.0-3.7 m) below ground elevation and reveal bedding planes dipping an average of 33° to the east [McKenna, 1994]. Geophysical logs from the wells include neutron-epithermal neutron, natural gamma, electrical conductivity, and sonic functions. Available data include small-scale air permeability measurements from core samples, larger-scale hydraulic conductivity measurements from downhole packer tests, and cross-borehole seismic tomograms from eight well pairs. Hydraulic heads are measured in 28 locations: Eight of these are from open wells and the remaining 20 are from 1.5- to 3.0-m-long screens in hydraulically sealed zones.

Figure 1. Site map including well locations, tomograph lines. and groundwater table elevation.

Given the extensive spatial coverage of the seismic tomography on site, it is desirable to define hydrologically significant units in terms of seismic velocity. Seven seismic threshold levels, separating eight sonic facies, are defined by comparing the sonic logs to the recovered core and also examining the polymodality of the sonic log measurements. Examination of the other available logs in the context of these eight sonic facies reveals that three of the sonic facies contain bimodal distributions of one of the other geophysical logs. Dividing these three sonic facies into two separate facies each results in 11 electrofacies. An electrofacies is defined as a composite Geophysical log response which defines a portion of the subsurface [Moline et al., 1992]. The electrofacies approach to defining units allows the use of the Geophysical logs to define facies at locations where core has not been collected. Relative to examination by a geologist, electrofacies classification can provide a more objective discrimination of units in locations where core samples exist, especially when the geophysical log responses are indicative of rock properties important to fluid flow (e.g., clay content or porosity).

Multivariate discriminant analysis is employed to examine the similarities and differences between the 11 electrofacies. Discriminant analysis defines the centroid of each electrofacies in four-dimensional space where each type of geophysical log is a dimension. For each data point the probability of membership within each electrofacies is calculated as a distance between the data point and the centroid of each facies. The electrofacies groupings are then examined in terms of similarity. Similarity is defined as a small distance between two electrofacies centroids. Electrofacies can be combined if they are statistically and geologically similar and/or if they are statistically indistinguishable in terms of small-scale air permeability measurements.

A portable air permeameter based on the design described by Davis et al. [1994] was constructed for this project. The air permeameter was used to obtain 237 small-scale permeability measurements on core samples in the laboratory. Hypothesis testing on the means and variances of permeability was used to determined whether or not electrofacies could be combined into hydrofacies. Combining electrofacies in this manner adds geologic and hydrologic information and reduces the number of facies to seven.

The hydrogeologic character of these seven hydrofacies is described in Table 1. The hydrofacies can be distinguished by using thresholds of seismic velocity (Table 2). Figure 2 shows the distribution of seismic velocity within each facies. The probabilities of class membership as determined by discriminant analysis for each datum are updated by summing the probabilities of membership for electrofacies which were combined into hydrofacies. If the final probability of class membership is greater than or equal to 95% at any location, that uncertainty is deemed negligible, and the datum from the location is determined to be a hard datum. The remaining locations are used as type C data with the probabilities of class membership used to build the prior cdf at each location. Of a total of 1355 locations, 983 were determined to be hard data, and the remaining 372 locations are classified as type C data.

Figure 2. Box plots showing the distribution of seismic velocity within each facies. The dashed line is the median of each distribution and the boxes show the 10th, 25th, 75th, and 90th percentiles of the distribution.

**Table 1.** Hydrofacies Classifications
Hydrofacies	Description	Estimated Mean K, m/d
1	conglomeratic member of the Lyons Formation	3.1 X 10^{-4 a}
2	well-cemented, poorly sorted, sandstone with gravel conglomerate lenses	3.1 X 10^{-4 b}
3	sandstone member of the Lyons Formation	9.0 X 10^{-4 c}
4	well-cemented, poorly sorted, sandstone	1.2 x 10^{-3 d}
5	moderately consolidated, porous material with low to moderate clay content	1.2 x 10^{-2 e}
6	combination of well-sorted, silty sandstone; moderately to poorly consolidated, poorly sorted sandstone; fine gravel conglomerate lenses; and siltstone lenses	1.2 x 10^{-3 f}
7	poorly consolidated, moderately porous, clay poor material, weathered or fractured	1.2 x 10^{-1 e}
8	a well-fractured area of any previous hydrofacies	2.6^g

^a Based on similarity to hydrofacies 2.

^b Based on two packer tests.

^c Based on geologic inference suggesting the value falls between hydrofacies 1 and 4.

^d Based on one packer test.

^e Based on the assumption that the lower the sonic velocity, the less the amount of consolidation and the greater the K value; however, it is noted that other factors including grain size distribution and clay content may control the K value.

^f Based on seven packer tests.

^g Based on one packer test.

**Table 2.** Seismic Thresholds Dividing Hydrofacies and the Median Seismic Velocity Within Each Hydrofacies
Hydrofacies	Threshold Identifier	Seismic Threshold, m/s	Median Hydrofacies Seismic Velocity, m/s

1			3610
	1.5	3313
2			3060
	2.5	3050
3			3020
	3.5	2820
4			2690
	4.5	2610
5			2520
	5.5	2460
6			2310
	6.5	2210
7			2030
	7.5	1850
8			1835

The threshold identifiers are arbitrary values indicating which hydrofacies are divided by the threshold (e.g., 3.5 is the facies threshold between hydrofacies 3 and 4).

Cross-Borehole Seismic Tomography

Cross-borehole seismic tomography can provide high-resolution images of aquifer heterogeneities without the limitations of traditional surface seismic surveys. Cross-borehole tomography techniques have been anticipated as a means of heterogeneity characterization [Phillips et al., 1989], but the field application of seismic tomography to heterogeneity mapping in groundwater studies is in its infancy. Eight cross borehole seismic transmission tomography surveys were conducted to obtain high-resolution images (maximum resolution, 1.3

0.4 m) of seismic velocity between boreholes (Plates 1 and 2). Inversion of cross-borehole seismic travel times is a nonlinear problem; thus an iterative, conjugate gradient, inversion algorithm [Schneider, 1990] is used to invert arrival times and solve for the velocity image in the interwell domain. In this project. tomography is used to derive soft data estimates of hydrofacies between wells, so a deterministic interpretation of the tomographic images is not warranted. However, the general character of the hydrofacies with extreme values of hydraulic conductivity is assessed.

Plate 1. Tomogram along strike direction between borings T-2 and T-1 with velocity range from 2360 to 4290 m/s. Values below 2500 m/s are dark blue and above 4000 m/s are red.

Plate 2. Composite of four tomograms along the dip direction with velocity range from 1855 to 4820 m/s. Velocities below 2000 m/s are dark blue. Velocities above 3700 m/s are red. 'The well labels refer to the well locations in Figure 1.

On the basis of one packer test, the hydraulic conductivity of facies 8 appears to be at least an order of magnitude higher than the other facies. This estimate of hydraulic conductivity is reasonable given that facies 8 is fractured material. Facies 8 was not identified by the geophysical log responses incorporated in the discriminant analysis owing to their limited volume of investigation, but is identified by full waveform acoustic logs and by the cross-borehole tomography. Limited lateral extent of this fracture zone in the strike distance is delineated by the low-velocity zone (blue) at the 1793-m elevation between borings T-1 and T-2 (Plate 1). The lowest values of hydraulic conductivity (highest-velocity zones shown as red/orange) occur within strongly cemented, coarse-grained channel conglomerates, such as the Lyons conglomerate (hydrofacies 1), which is continuous across the interwell domain in the strike direction above the fractured zone and discontinuous below the fractured zone. Plate 1 reveals that lower-velocity hydrofacies are continuous across the upper portion of the interwell domain. A mosaic of four tomograms completed in adjacent wells situated approximately along dip direction (Plate 2) reveals locations of coarse-grained conglomerates in the east, as defined by high velocity (red) zones. The Lyons/Fountain contact occurs in the vicinity of well T-10. The high velocity (red) conglomerate units (hydrofacies 1) do not form a continuous zone of low K. Locations of high-velocity zones are interpreted as the expression of a channel belt meandering in and out of the plane of the tomogram cross section.

Seismic velocities derived from the tomography are used as imprecise estimates of hydrofacies (type A soft data) based on their calibration at the wells with collocated hard data (velocities measured with sonic logs) (Figure 3). High-quality soft data are obtained at the previously defined thresholds (Table 2), with calculated values for p₁ ranging from 0.83 to 0.96 and for p₂ from 0.02 to 0.07. A calibration index can be defined as the quantity [p₁ - p₂]. If this index equals 1.0, there is a perfect calibration between the hard and soft data and the soft information can be considered hard data. If the calibration index equals 0.0, the soft data provide no information on the hard data. The quality of soft data at this site is good as indicated by the [p₁ - p₂] indices of 0.81, 0.80, 0.80, 0.84, 0.89, 0.94, and 0.92 for thresholds 1.5-7.5, respectively. The calibration indices were calculated at the only places possible (the wells). Owing to the geometry of the inversion process, the quality of the indices will be nonstationary. Incorporation of nonstationary indices into the geostatistical simulation routine is beyond the scope of this study. The values of the indices at the wells were used because those are the only data and any increase in the uncertainty of these indices would be arbitrary.

Figure 3. Calibration scattergram for tomography data (45' line is shown for reference). The seismic velocity values for the thresholds are shown in Table 2. A total of 324 collocated data pairs are shown.

Variography

A total of 983 hard data locations defining seven separate hydrofacies are identified through the combination of geologic, hydrologic, and geophysical information as described above (Plate 3a). Using these hard data, three-directional variograms are calculated and modeled for each of the six hard data, indicator thresholds. Cross-borehole tomographic inversion produces a total of 3530 imprecise estimates of hydrofacies (type A data), including an eighth hydrofacies. These are combined with the 372 prior probability distributions (type C data) on hydrofacies membership obtained by discriminant analysis and the hard data (Plate 3b) to calculate and model three-directional hard/soft data semivariograms at the seven hard/ soft data indicator thresholds. Example variograms at the median threshold (4.5) and the highest threshold (6.5 for the hard data, 7.5 for the hard/soft data) are shown in Figure 4.

Plate 3. Three-dimensional distribution of (a) hard data (hydrofacies 1 through 7 are dark blue through orange) and (b) hard data with soft data included (hydrofacies 1 through 8 are dark blue through red). The domain is 123.4 by 93.0 by 43.9 m (east-west, north-south, vertical). North is to the upper left.

Figure 4. Sample experimental and modeled variograms in each direction at the median threshold (4.5) and the highest threshold (6.5 for hard data, 7.5 for hard/soft data).

All measures of spatial correlation were calculated as covariances. These covariances were then transformed to variograms prior to modeling using a nonergodic transform [Isaaks and Srivastava, 1988]. All variograms were modeled with two nested structures to permit flexibility in fitting the experimental data. In general, the hard data variograms exhibit trends and were fitted with a spherical model for the first structure and a Gaussian model with a long range for the second structure (see hard data vertical variograms in Figure 4). The Gaussian model with a long range permits the model variogram to follow the trend. The hard/soft data variograms were fit with two spherical models. The overall effect of adding the soft information to the variogram calculations is to better define the structure in the horizontal directions (compare the hard data and hard/soft data N-S and E-W variograms at threshold 4.5 in Figure 4) and to remove trends seen in the hard data variograms. To honor the zonal anisotropy with a constant sill in all directions, it was sometimes necessary to have the model variogram deviate from the experimental variogram. In these cases, search radii within the simulation program were set to be no greater than the distance at which the experimental and modeled variograms begin to deviate. For example, the hard/ soft data vertical variogram model at the 4.5 threshold begins to deviate from the experimental data at approximately 8 m (Figure 4). In this case the search radius was set to 7.5 m in the simulation program.

Most of the horizontal semivariograms are modeled with isotropic ranges. A principal direction may exist, but it cannot be identified owing to an inadequate amount of data. Hard data experimental semivariograms calculated at different dip angles along the dip direction do not yield different ranges; thus a dip of zero degrees is used. There may be insufficient data for defining a dip angle, or the dip angle may be masked by irregular diagenesis of the original sedimentary facies. In contrast, hard/soft data experimental variograms calculated in the dip direction exhibit a maximum spatial correlation between 30° and 36° of dip. The additional soft information is adequate to define the angle of geologic dip.

Geostatistical Simulation

Hard and hard/soft data and their associated global cumulative distribution functions (cdf's) and three-directional semivariograms at each threshold are used to generate two ensembles of realizations (one based solely on hard data and another on hard/soft data). The site domain is 123.4 by 93.0 by 43.9 m (x, y, z) and is discretized into an 81 x 61 x 72 grid (x, y, z) of blocks that are 1.52 X 1.52 X 0.6 m, yielding 355,752 elements. There is a large discrepancy between cdf's estimated from the hard and hard/soft data (Table 3), primarily with regard to hydrofacies 7 which comprises approximately 50% of the hard data samples, but only 13% of hard/soft data samples. This change in cdf between data sets is attributed to a less biased data sample owing to greater spatial coverage provided by type A soft data.

**Table 3.** Global cdf's for Hydrofacies
Hard Data		Hard and Soft Data
Threshold	cdf Value	Threshold	cdf Value
1.5	0.164	1.5	0.217
2.5	0.237	2.5	0.295
3.5	0.271	3.5	0.452
4.5	0.369	4.5	0.652
5.5	0.389	5.5	0.749
6.5	0.507	6.5	0.848
		7.5	0.981

The seven facies determined with the hard data are simulated using ordinary kriging for the entire simulation. At least three points must be within the search neighborhood to solve the kriging system (if only one or two points are found, the prior estimate of the global cdf is reproduced as the posterior local cdf). The maximum number of points retained from the search neighborhood is 24 (three per octant). Search radii are chosen to both retain the directions of anisotropy revealed in the variogram modeling and be large enough to provide an adequate search neighborhood without using excessive amounts of computer memory.

Most of the simulation parameters for hard/soft data simulations have the same values as the hard data simulations with the exception that the global estimate of the prior cdf is used only if no conditioning points are found in the search neighborhood. This produces simulations with a less random appearance when using the hard/soft conditioning data. Eight hydrofacies as identified by the hard and soft data are simulated.

Simulation Results

A typical example of a hard and hard/soft data simulation is presented in Plate 4, which shows the middle layer of the three-dimensional domain. Realizations generated with hard and hard/soft data are strikingly different, as would be expected from the large differences in their prior cdf s. Hard data images all show clusters of hydrofacies 1 and 2 (blues) in the northeast corner (Plate 4a). Other than this feature, the realizations vary between images but always have a large amount of hydrofacies 7 (orange). Hard/soft data images show less variation between simulations relative to hard data realizations, have much less hydrofacies 7 (orange), and include zones of hydrofacies 8 (red) (Plate 4b), which are never present in hard data simulations (Plate 4a). The apparent shorter E-W correlation in Plate 4b relative to Plate 4a is due to the Plate 4b image's being created with a dipping search ellipse and variograms.

Plate 4. (a) A typical realization of the middle layer from a hard data simulation. (b) A tyical realization of the middle layer from a hard/soft data simulation. The images are 123.4 by 93.0 m, and north is at the top of the figure.

In lieu of physically examining the reality which the simulations attempt to describe, other measures of simulation quality are used. For example, the closeness of fit between hydrologic observations and flow modeling results when the simulations are used for zonations in the flow model is discussed later. Another approach to evaluating quality of the simulations is the similarity of the prior and posterior global cdf's and spatial covariances.

In the simplest sense, geostatistical simulation attempts to reproduce the global prior cdf and spatial covariance that are used as input to the simulator. However, these input values are only estimates of the true values derived from a limited sample data set. Inference of the global statistics from a limited sample set raises the question of how well the global cdf and covariances should be reproduced [Deutsch and Journel, 1992]. Certainly, it is not desirable to exactly reproduce the input parameters in every realization because they are derived from a limited sample. Ideally, the estimates of the parameters derived from the resulting simulations will show "ergodic fluctuations" around the input parameter values due to the stochastic nature of the simulation process.

Each of the 200 resultant simulations are resampled by obtaining data from 50 randomly placed "wells" and recalculating the cdf and semivariograms. Inclusion of the soft data resulted in improved reproduction of the cdf as compared to using the hard data. In all cases (directions and thresholds), the soft data realizations yield better reproductions of the input semivariogram model than do the hard data realizations. Improvements resulting from soft data incorporation are attributed to the soft data's being less biased than the limited hard data.

Flow Modeling

Forward flow modeling is undertaken to examine reduction of uncertainty in head prediction due solely to incorporation of soft information into the subsurface characterization. Inverse flow modeling is conducted to determine the influence of additional soft data on estimation of hydraulic conductivities within the geostatistically simulated hydrofacies.

Model Geometry and Boundary Conditions

The groundwater gradient is essentially west to east across the site (Figure 1); thus fixed head boundaries are imposed at the west and east boundaries with zero-flux boundaries along the north and south boundaries to simulate easterly flow. The type of conditions that can be applied on the upper domain boundary are limited by the inverse groundwater flow model (MODFLOWP) which currently cannot accommodate convertible layers (i.e., layers that are both confined and unconfined in space and/or time). Recharge in the vicinity of the site [Kiusalaas, 1992] is essentially zero. For these reasons the top boundary of the aquifer is modeled as a confined system with zero recharge. Rubin and Dagan [1988] show that fixed head boundary conditions can have an adverse effect on predicted heads in heterogeneous media. To surmount this problem, "buffer zones" were added to the west and east ends of the model.

The full-scale geostatistical realizations of 355,752 elements are too large to solve for flow in many realizations with the available facilities in a reasonable time frame; thus the geostatistical realizations are upscaled to cell dimensions of 6.1 x 6.1 x 0.6 m (x, y, z). Incorporation of 16 of the fine-scale cells into one coarse cell was accomplished by assigning to the larger cell the hydrofacies which was most common within its boundaries. Upscaling, reduces the total number of cells by more than 90%.

Forward Modeling

The field estimates of hydraulic conductivity and measurements of head are used to model groundwater flow and determine head residuals. The only difference between the hard and hard/soft data forward models is zonation of the hydrofacies. This difference in zonation due to incorporation of soft information reduces the mean average absolute error on the predicted heads by 23% (from 3.8 to 2.9 m) (Figures 5a and 5b).

Figure 5. Distributions of average absolute head deviations for (a) the hard data and (b) the hard/soft data realizations resulting from forward Groundwater flow modeling using field estimates of K and distributions of average absolute head deviations after inverse parameter estimation for (c) the hard data and (d) the hard/soft data realizations. Note the change in the dimensions of the X axis from Figures 5a and 5b to Figures 5c and 5d. Dashed lines indicate the mean value.

Parameter Estimation

In this project, parameter estimation is accomplished with a nonlinear regression algorithm implemented in MODFLOWP [Hill, 1992]. In addition to providing estimates of hydraulic conductivities for each hydrofacies, inverse modeling provides a means of eliminating geostatistical realizations which are inconsistent with hydrogeologic data.

Average absolute head deviation obtained from forward flow modeling is large, even after incorporation of soft information into the zonation process (Figures 5a and 5b); however, head deviations are minimized through calibration. The estimates of the hydraulic conductivities made with available field data and seismic velocity information (Table 1) do not provide the optimum fit to the measured heads. Trial and error calibration of a few realizations are used to obtain updated hydraulic conductivity estimates for inverse modeling. The field estimates and initial estimates of hydraulic conductivities for both hard and hard/soft data realizations are given in Table 4. Hydraulic conductivities of hydrofacies 6 and 8 are thought to be known with greater confidence (Table 1) relative to other hydrofacies; therefore they are not estimated.

**Table 4.** Field and Initial Estimates of Hydraulic Conductivity for Inverse Modeling
Data Set	1	2	3	4	5	6	7	8
	Hydrofacies
Field	-3.5	-3.5	-3.0	-2.9	-1.9	-2.9	-0.9	0.4
Hard	-1.8	-1.9	-1.1	-1.9	-1.9	-2.9	-0.5	. . .
Hard/soft	-2.8	-2.9	-2.1	-4.9	-5.9	-2.9	-3.1	0.4

Rules are developed for acceptable realizations: (1) hydrofacies I (well-cemented conglomerates) must have lower hydraulic conductivity than hydrofacies 3 (less consolidated sandstones); (2) hydrofacies 8 (fracture zones) must have the highest estimated hydraulic conductivity of any hydrofacies; and (3) after applying criteria 1 and 2, the half of the remaining solutions that has the smallest head residuals is retained.

After applying these elimination criteria, the hard data ensemble is reduced from 100 to 21 plausible realizations, and the hard/soft data ensemble from 100 to 19. After parameter estimation, residual head distributions are much smaller (Figures 5c and 5d) than those produced by forward modeling (Figures 5a and 5b). The average absolute head deviations are similar for both the hard (mean 0.73 m, standard deviation 0.05 m) and hard/soft (mean 0.71 m, standard deviation 0.05 m) simulations. Because of the similar head residuals after parameter estimation, it can be argued that the hard data alone produced the same quality realizations. However, estimates of hydraulic conductivity calculated from hard data realizations are different than estimates obtained from hard/soft data realizations and have greater standard deviations (with the exception of hydrofacies 4 where the standard deviations are roughly equal), suggesting less certainty in the result when only hard data are used (Figure 6). The decrease in variance of estimated hydraulic conductivity associated with the hard/soft data simulations is attributed to reduction in (geologic uncertainty resulting from use of soft data. The median value of hydraulic conductivity for every hydrofacies is higher for the hard versus the hard/soft data realizations. One factor that contributes to this difference is the absence of the highest hydraulic conductivity hydrofacies (hydrofacies 8) in the hard data realizations.

Figure 6. Box and whisker plots of hydraulic conductivity estimates for (a) the 21 hard data and (b) the 19 hard/soft data realizations. The dashed lines indicate the median, the ends of the box are the 25th and 75th quartiles, and the ends of the whiskers are the minimum and maximum of the distribution. Values next to the whiskers are the standard deviations. The hydraulic conductivity value of facies 8, which was not estimated, is shown as a solid circle in Figure 6b.

There is no overlap in distributions of hydraulic conductivity estimates in hydrofacies 4 and 5 from the hard and hard/soft realization ensembles (Figure 6), indicating that one of the approaches (hard data versus hard/soft) is inaccurate for these two hydrofacies. Hydraulic conductivity field data are insufficient to determine which assessment is inaccurate. One explanation for the lower hydraulic conductivities of facies 4 and 5 in the hard/soft data simulations is that they are more continuous across the site than was predicted by the hard data simulations. A model with well-connected bodies of facies 4 and 5 requires lower hydraulic conductivities in order to match the observed heads than if the zonation of these facies was very disconnected.

An even distribution of hydraulic conductivity tests between facies cannot be obtained during the first phase of a field investigation because the first phase data are needed to define the facies. Once the facies are delineated, subsequent field investigations can be designed to include tests in all of the facies. Eighteen field scale hydraulic conductivity tests were completed at this site. One of these is in facies 4, and none are available for facies 5. If this study were extended, further testing would be conducted in each facies in order to develop stricter rules for selection of plausible realizations. If such information were available, it might be determined that all of the realizations in one of the ensembles are unacceptable, as was determined by Poeter and McKenna [1995] for a synthetic data set.

To obtain unique estimates of hydraulic conductivities, head observations and either flow observations or prior estimates of parameters are required. In this study, 28 head observations are available, and hydraulic conductivity is known for facies 6 and 8. No significant correlation of hydraulic conductivity existed between any of the facies, indicating that seismic velocities are valuable for delineating hydrofacies at this site.

Conclusion

This field study demonstrates how techniques for incorporating soft data can objectively combine geologic and geophysical information into a description of subsurface heterogeneity. The relationship between hydrofacies and seismic velocity is described with indicator thresholds. This information is used with seismic tomography measurements to further condition multiple-indicator, geostatistical realizations. Hydrologic information is utilized in hydrofacies definition through both air permeability measurements and determination of the plausibility of each stochastic, hydrofacies zonation via parameter estimation using nonlinear regression and application of hydrogeologic rules. The change between zonation patterns of the hydrofacies from the hard data to the hard/soft data realizations causes large changes in the estimated hydraulic conductivities in facies 4 and 5, indicating that hydraulic conductivities estimated through inverse modeling are very sensitive to fine-scale, geologically based zonation patterns. This is fortunate because it allows use of hydraulic data to delineate the small-scale heterogeneity that is critical to the migration of contaminants. The character of this detailed heterogeneity is not considered when either Gaussian hydraulic conductivity distributions or the broad zones typically defined by deterministic conceptual model development are utilized.

The combination of geophysical and geostatistical techniques allows for the definition of facies with distinct hydrological properties. This definition of distinct parameters is corroborated by the inverse parameter estimation because the hydraulic conductivity of each facies is not correlated with the hydraulic conductivity of other facies. Although the combination of geophysical techniques determines hydrologically distinct facies, the geophysical techniques currently do not provide values of hydraulic conductivity within the facies. Inverse modeling uses the observations of the hydraulic head across the site to estimate values of hydraulic conductivity for each facies given the plausible zonation patterns based on the geophysical, geological, and hydrological information. This data fusion constrains the solution (i.e., reduces uncertainty associated with the response function) to a greater degree than can be achieved through refined analysis and interpretation of any single data source.

Acknowledgments

This work was partially supported by the U.S. Bureau of Reclamation under agreement 1-FC-81-17500 and the U.S. Army Waterways Experiment Station. This work benefitted from reviews by G. Fogg and two anonymous reviewers.

References

Alabert, F., Stochastic imaging of spatial distributions using hard and soft information, 197 pp., M.S. thesis, Stanford Univ., Stanford, Calif., 1987.

Anderson, M. P., Hydrogeologic facies models to delineate large-scale spatial trends in glacial and glaciofluvial sediments, Geol. Soc. Am. Bull., 101, 501-511, 1989.

Brannan, J. R., and J. S. Haselow, Compound random field models of multiple scale hydraulic conductivity, Water Resour. Res., 29(2), 365-372, 1993.

Davis, J. M., R. C. Lohmann, F. M. Phillips, J. L. Wilson, and D. W. Love, Architecture of the Sierra Ladrones Formation, central New Mexico: Depositional controls on the permeability correlation structure, Geol. Soc. Am. Bull., 105, 998-1007, 1993.

Davis, J. M., J. L. Wilson, and F. M. Phillips, A portable airminipermeameter for rapid in-situ field measurements, Ground Water, 32(2), 258-266, 1994.

Desbarats, A. J., and R. M. Srivastava, Geostatistical characterization of groundwater flow parameters in a simulated aquifer, Water Resour. Res., 27(5), 687-698, 1991.

Deutsch, C. V., and A. G. Journel, GSLIB: Geostatistical Software Library and User's Guide, 340 pp., Oxford Univ. Press, New York, 1992.

Fogg, G. E., Emergence of geologic and stochastic approaches for characterization of heterogeneous aquifers, paper presented at the Conference on New Field Techniques for Quantifying the Physical and Chemical Properties of Heterogeneous Aquifers, Natl. Water Well Assoc., Dublin, Ohio, 1989.

Gelhar, L. W., Stochastic subsurface hydrology from theory to applications, Water Resour. Res., 22(9), 135S-145S, 1986.

Hill, M. C., A computer program (MODFLOWP) for estimating parameters of a transient, three-dimensional, groundwater flow model using nonlinear regression, U.S. Geol. Surv. Open File Rep., 91-484, 358 pp., 1992.

Hoeksema, R. J., and P. K. Kitanidis, An application of the geostatistical approach to the inverse problem in two-dimensional groundwater modeling, Water Resour. Res., i(7), 1003-1020, 1984.

Howard, J. D., Patterns of sediment dispersal in the Fountain Formation of Colorado, Mt. Geol., 3(4), 147-153, 1966.

Hubert, J. F., Petrology of the Fountain and Lyons Formations, Front Range, Colorado, Q. Colo. Sch. Mines, 55(1), 1-238, 1960.

Hyndman, D. W., J. M. Harris, and S. M. Gorelick, Coupled seismic and tracer test inversion for aquifer property characterization, Water Resour. Res., 30(7), 1965-1977, 1994.

Isaaks, E. H., and R. M. Srivastava, Spatial continuity measures for probabilistic and deterministic geostatistics, Math. Geol., 20(4), 313-341, 1988.

James, B. R., and R. A. Freeze, The worth of data in predicting aquitard continuity in hydrogeological design, Water Resour. Res., 29(7), 2049-2065, 1993.

Journel, A. G., Constrained interpolation and qualitative information, Math. Geol., 18(3), 269-286, 1986.

Journel, A. G., Fundamentals of geostatistics: In five lessons, Short Course Geol., vol. 8, 40 pp., AGU, Washington, D. C., 1989.

Kiusalaas, N. J., Estimation of groundwater recharge using neutron probe moisture readings near Golden, CO, M.E. thesis ER-4191, 186 pp., Colo. Sch. of Mines, Golden, 1992.

Kulkarni, R., Bayesian kriging in geotechnical problems, in Geostat-Tahoe, part 2, pp. 261-270, D. Reidel, Norwell, Mass., 1984.

McKenna, S. A., Utilization of soft data for uncertainty reduction in groundwater flow and transport modeling, Ph.D. dissertation T-4291, 325 pp., Colo. Sch. of Mines, Golden, 1994.

McKenna, S. A., and E. P. Poeter, Subsurface characterization through cross-borehole seismic tomography, Ground Water, in press, 1995.

Moline, G. R., J. M. Bahr, P. A. Drzewiecki, and L. D. Shepherd, Identification and characterization of pressure seals through the use of wireline logs: A multivariate statistical approach, Log Anal., 33(4), 362-372, 1992.

Olsen, H., C. Plough, U. Nielsen, and K. Sorensen, Reservoir characterization applying high resolution seismic profiling, Rabis Creek, Denmark, Ground Water, 31(1), 84-90, 1993.

Phillips, F. M., J. L. Wilson, and J. M. Davis, Statistical analysis of hydraulic conductivity distributions: A quantitative geological approach, paper presented at the Conference on New Field Techniques for Quantifying the Physical and Chemical Properties of Heterogeneous Aquifers, Natl. Water Well Assoc., Dublin, Ohio, 1989.

Poeter, E. P., and S. A. McKenna, Reducing uncertainty associated with groundwater flow and transport predictions, Ground Water, 33(6), 899-905, 1995.

Rubin, Y., and G. Dagan, Stochastic analysis of boundaries effects on head spatial variability in heterogeneous aquifers, 1, Constant head boundary Water Resour. Res., 24(10), 1689-1697, 1988.

Schneider, W. A., Jr., A three dimensional physical modeling study applying tomographic inversion and seismic migration to the tunnel detection problem, Ph.D. dissertation T-3866, 281 pp., Colo. Sch. of Mines, Golden, 1990.

Weimer, R. J., and C. B. Land, Lyons Formation (Permian), Jefferson County, Colorado: A fluvial deposit, Mt. Geol., 9(2-3), 289-297, 1972.

Zhu, H., and A. G. Journel, Formatting and integrating soft data: Stochastic imaging via the Markov-Bayes algorithm, in Geostatistics Troia '92, vol. 1, Quantitative Geology and Geostatistics, edited by A. Soares, pp. 1-12, Kluwer Acad., Norwell, Mass., 1993.