NAME

Demeter::Data - Process and analyze EXAFS data with Ifeffit or Larch

VERSION

This documentation refers to Demeter version 0.9.26.

SYNOPSIS

  use Demeter;
  my $data = Demeter::Data -> new;
  $data -> set(file      => "example/cu/cu10k.chi",
               name      => 'My copper data',
               fft_kmin  => 3,         fft_kmax  => 14,
               bft_rmin  => 1,         bft_rmax  => 4.3,
               fit_k1    => 1,         fit_k3    => 1,
              );
  $data -> plot("r");

DESCRIPTION

This subclass of the Demeter class is inteded to hold information pertaining to data for use in data processing and analysis.

ATTRIBUTES

The following are the attributes of the Data object. Attempting to access an attribute not on this list will throw an exception.

The type of argument expected in given in parentheses. i.e. number, integer, string, and so on. The default value, if one exists, is given in square brackets.

For a Data object to be included in a fit, it is necessary that it be gathered into a Fit object. See Demeter::Fit for details.

General Attributes

group (string) [random 5-letter string]

This is the name associated with the data. It's primary use is as the group name for the arrays associated with the data in Ifeffit or Larch. That is, its arrays will be called group.k, group.chi, and so on. It is best if this is a reasonably short word and it must follow the conventions of a valid group name in Ifeffit or Larch. The default group is a random five-letter string generated automatically when the object is created.

tag (string) [same random 5-letter string as group]

Use to disambiguate guess parameter names when doing variable name substitution for local parameters.

file (filename)

This is the file containing the chi(k) associated with this Data object. It will be an empty string if the data comes from an Athena project file or is generated by Demeter.

unreadable (boolean)

This is set to a true value if any error is seen while trying to import the data. This allows a GUI to bail out of processing the data file before running into a problem that might cause a crash.

prjrecord (project filename and record number)

This is the Athena project file from which the data associated with this Data object. It will be an empty string if the data comes from a normal file or is generated by Demeter. The format of this string is the project file name and the record number separated by a comma:

  print $data->prjrecord, $/;
   ==prints==>
     whatever.prj, 5

This will be set automatically by the Data::Prj record method. Setting it by hand does not trigger a reading of the record. So, you should pretend that this is a read-only attribute.

provenance (string)

This is a short string explaining where the data object came from, e.g. from a column data file or an Athena project file.

name (string)

This is a text string used to describe this object in a plot ot a user interface. Like the group attribute, this should be short, but it can be a bit more verbose. It should be a single line, unlike the title attibute.

plotkey (string)

This is a text string used as a temporary override to the name of Data object for use in a plot. It should be reset to an empty string as soon as the plot requiring the name override is finished.

forcekey (string)

When this is true, it forces the plotting system to use the name of the Data group as the legend key regardless of the type of plot. This was needed to force an R123 plot to use the scaling information as the key rather than having the key be the part plotted, which is what normally happens for a single group R or q space plot.

cv (number)

The characteristic value is a number associated with the data object. The cv is used to generate guess parameteres from lguess parameters and is used as the substitution value in math expressions containing the string [cv]. The default value is a number that is incremented as Data objects are created.

datatype (string)

This identifies the record type. It is one of

  xmu chi xmudat xanes

Earlier versions of Demeter had attributes like is_xmu and is_chi. Those are now obsolete and there use will return an error about being unknown attributes.

is_col (boolean)

This is true if the file indicated by the file attribute contains column data that needs to be converted into mu(E) data. See the description of the read_data method for how this gets set automatically and when you may need to set it by hand.

is_col (string)

This is an empty string by default. When a merge group is made, its string value is set to themanner in which the merge was made, which is one of e for mu(E), n for norm(E>, and k for chi(k).

columns (string)

This string contains Ifeffit's $column_label or Larch's ... string from importing the data file.

energy (string)

This string uses gnuplot-like notation to indicate which column in the data file contains the energy axis. As an example, the default is that the first column contains the energy and this string is $1.

numerator (string)

This string uses gnuplot-like notation to indicate how to convert columns from the data file into the numerator of the expression for computing mu(E). For example, the default is that these are transmission data and I0 is in the 2nd column and the default for this string is $2.

If these are fluorescence data measured with a multichannel analyzer and the MCA channels are in columns 7 - 10, then this string would be $7+$8+$9+$10.

denominator (string)

This string uses gnuplot-like notation to indicate how to convert columns from the data file into the denominator of the expression for computing mu(E). For example, the default is that these are transmission data and transmission is in the 3nd column and the default for this string is $3.

ln (boolean)

This is true for transmission data, i.e. if conversion from columns to mu(E) requires that the natural log be taken. In fact, the natural log of the absolute value of the ratio of the numerator and denominator is computed to avoid numerical error in certain situations.

ln (boolean)

This flag severely restricts data processing to just what is necessary to make a plot of unnormalized mu(E). It is for use in a GUI's column selection dialog so that swift plot updates can be made while columns are being selected.

display (boolean)

This is a flag used by a GUI while selecting columns. It severely limits the amount of data processing done so that the display updates quickly in the GUI. This should be set back to 0 as soon as possible so t hat subsequent data processing proceeds properly.

Background Removal Attributes

bkg_e0 (number)

The E0 value of mu(E) data. This is determined from the data when the read_data method is called.

bkg_e0_fraction (number) [0.5]

This is a number between 0 and 1 used for the e0 algorithm which sets e0 to a fraction of the edge step. See Demeter::Data::E0.

bkg_eshift (number) [0]

An energy shift to apply to the data before doing any further processing.

bkg_kw (number) [1]

The k-weight to use during the background removal using the Autobk algorithm.

bkg_rbkg (number) [1]

The Rbkg value in the Autobk algorithm.

bkg_dk (number) [0]

The dk value to be used in the Fourier transform as part of the Autobk algorithm.

bkg_pre1 (number) [-150]

The lower end of the range of the pre-edge regression, relative to E0.

bkg_pre2 (number) [-30]

The upper end of the range of the pre-edge regression, relative to E0.

bkg_nor1 (number) [100]

The lower end of the range of the post-edge regression, relative to E0.

bkg_nor2 (number) [600]

The upper end of the range of the post-edge regression, relative to E0.

bkg_spl1 (number) [0]

The lower end in k of the spline range in the Autobk algorithm. The value of bkg_spl1e is updated whenever this is updated.

bkg_spl2 (number) [15]

The upper end in k of the spline range in the Autobk algorithm. The value of bkg_spl2e is updated whenever this is updated.

bkg_spl1e (number) [0]

The lower end in energy of the spline range in the Autobk algorithm, relative to E0. The value of bkg_spl1 is updated whenever this is updated.

bkg_spl2e (number) [857]

The upper end in energy of the spline range in the Autobk algorithm, relative to E0. The value of bkg_spl2 is updated whenever this is updated.

bkg_kwindow (list) [Hanning]

This is the functional form of the Fourier transform window used in the Autobk algorithm. It is one of

   Kaiser-Bessel Hanning Welch Parzen Sine Gaussian

bkg_slope (number)

The slope of the pre-edge line. This is set as part of the normalize method.

bkg_int (number)

The intercept of the pre-edge line. This is set as part of the normalize method.

bkg_step (number)

The edge step found by the normalize method. This attribute will be overwritten the next time the normalize method is called unless the bkg_fixstep atribute is set true.

bkg_fitted_step (number)

The value of edge step found by the normalize method, regardless of the setting of bkg_fixstep. This is needed to correctly flatten data.

bkg_fixstep (boolean) [0]

When true, the value of the c<bkg>_step will not be overwritten by the c<normalize> method.

bkg_nc0 (number)

The constant parameter in the post-edge regression. This is set as part of the c<normalize> method.

bkg_nc1 (number)

The linear parameter in the post-edge regression. This is set as part of the normalize method.

bkg_nc2 (number)

The cubic parameter in the post-edge regression. This is set as part of the normalize method.

bkg_nc3 (number)

The quartic parameter in the post-edge regression. This is set as part of the normalize method. (Larch only)

bkg_flatten (boolean) [1]

When true, a plot of normalized mu(E) data will be flattened.

bkg_funnorm (boolean) [0]

When true, a functional normalization is performed to approximately remove the effect of quickly varying I0 signal on fluorescence EXAFS.

Note that has a bad impact on the display of mu(E). The energy-dependent normalization is made by dividing (post(E)-pre(E)) from mu(E) right doing the background removal. Thus the background function is computed from something different from mu(E). This turns out to be hard to keep track of when using Demeter's plotting system. Athena "solves" this by keeping track of what is being plotted. A plot in k, R, or q uses the corrected mu(E). A plot in E is made by temporarily turning off the energy-dependent normalization flag, re-evaluating the background function, then turning the flag back on. This is a bit of an efficiency hit. And the plot in E does not quite correspond to the plot in k. Awkward!

bkg_nnorm (integer) [3]

This can be either 2 or 3 and specifies the order of the post-edge regression. When this is 2, bkg_nc2 will be forced to 0 in the regression. not yet implemented

bkg_stan (Data or Path object)

The background removal standard. This can be either a Data object or a Path object. not yet implemented

bkg_clamp1 (integer) [0]

The value of the low-end spline clamp.

bkg_clamp2 (integer) [24]

The value of the high-end spline clamp.

bkg_nclamp (integer) [5]

The number of data points to use in evaluating the clamp.

bkg_cl (boolean) [0]

When true, use Cromer-Liberman normalization rather than the Autobk algorithm. not yet implemented

bkg_z (number)

The Z number of the absorber for these data. This is determined as part of the normalize method but can also be set by hand. To deal with edge energy confusions, certain K and L3 edges are prefered over nearby L2 and L3 edges when this attribute is set automatically.

       prefer     over
      ----------------
       Fe K       Nd L1
       Mn K       Ce L1
       Bi K       Ir L1
       Se K       Tl L2
       Pt L3      W  L2
       Se K       Pb L2
       Np L3      At L1
       Cr K       Ba L1

There is a configuration parameter to turn this behavor on and off.

Forward Transform Attributes

Note that there is not an fft_kw attribute. For all plotting and data processing purposes, the Plot object's kweight attribute is used, while in fits the Fit object's k-weighting attributes are used.

fft_edge (edge symbol) [K]

The absorption edge measured by the input data. This is used doing a central-atom-only phase correction to the Fourier transform.

fft_kmin (number) [2]

The lower end of the k-range for the forward transform. fft_kmin and fft_kmax will be sorted by Demeter.

fft_kmax (number) [12]

The upper end of the k-range for the forward transform. fft_kmin and fft_kmax will be sorted by Demeter.

fft_dk (number) [2]

The width of the window sill used for the forward transform. The meaning of this parameter depends on the functional form of the window. See the Ifeffit/Larch document for a full discussion of the functional forms.

fft_kwindow (list) [Hanning]

This is the functional form of the Fourier transform window used in the forward transform. It is one of

   Kaiser-Bessel Hanning Welch Parzen Sine Gaussian

fft_pc (Path object) [0]

This is set to the Path object to be used for a full phase correction to the Fourier transform.

rmax_out (number) [10]

This tells Ifeffit/Larch how to size output arrays after doing a Fourier transform.

Back Transform Attributes

bft_rwindow (number) [Hanning]

This is the functional form of the Fourier transform window used in the backward transform. It is one of

   Kaiser-Bessel Hanning Welch Parzen Sine Gaussian

bft_rmin (number) [1]

The lower end of the R-range for the backward transform or the fitting range. bft_rmin and bft_rmax will be sorted by Demeter.

bft_rmax (number) [3]

The upper end of the R-range for the backward transform or the fitting range. bft_rmin and bft_rmax will be sorted by Demeter.

bft_dr (number) [0.2]

The width of the window sill used for the backward transform. The meaning of this parameter depends on the functional form of the window. See the Ifeffit/Larch document for a full discussion of the functional forms.

Fitting Attributes

Note that parameters with fft_ and bft_ analogs such as fft_kmin have been deprecated along with the process mode.

fitting (boolean) [0]: This is set to true when a Data object is used in a fit. It is used by plotting methods to determine whether data parts (fit, background, residual) should be considered for plotting.
fit_k1 (boolean) [1]: If true, then k-weight of 1 will be used in the fit. Setting more than one k-weighting parameter to true will result in a multiple k-weight fit. By default, fits are done with kweight of 1, 2, and 3.
fit_k2 (boolean) [1]: If true, then k-weight of 2 will be used in the fit. Setting more than one k-weighting parameter to true will result in a multiple k-weight fit. By default, fits are done with kweight of 1, 2, and 3.
fit_k3 (boolean) [1]: If true, then k-weight of 3 will be used in the fit. Setting more than one k-weighting parameter to true will result in a multiple k-weight fit. By default, fits are done with kweight of 1, 2, and 3.
fit_karb (boolean) [0]: If true, then the user-supplied, arbitrary k-weight will be used in the fit. Setting more than one k-weighting parameter to true will result in a multiple k-weight fit. By default, fits are done with kweight of 1, 2, and 3.
fit_karb_value (number) [0]: This is the value of the arbitrary k-weight which will be used in the fit is fit_karb is true.
fit_space (list) [R]: This is the space in which the fit will be evaluated. It is one of k, r, or q.
fit_epsilon (number) [0]: If this number is non-zero, it will be used as the measurement uncertainty in k-space when the fit is evaluated. If it is zero, then the default in Ifeffit or Larch will be used.
fit_cormin (number) [0.4]: This is the minimum value of correlation to be reported in the log file after a fit.
fit_pcpath (Path object): This is the Path object to use for phase correction when these data and it paths are plotted. It takes the reference to the Path object as its value.
fit_include (boolean) [1]: When this is true, the data will be included in the next fit.
fit_plot_after_fit (boolean) [0]: This is a flag for use by a user interface to indicate that after a fit is finished, this data set should be plotted.
fit_do_bkg (boolean) [0]: When true, the background function will be corefined for this data set and the "bkg" part of the data will be created.
fit_titles (multiline string): These are title lines associated with this Data object. These lines will be written to log files, output data files, etc.
fitsum (list): This attribute indicates whether the Fit objects fit or ff2chi mehthod was most recently called. It is one of fit or sum. It's purpose is to allow the fit part of the data object to be labeled correctly in a plot.

Plotting Attributes

Most aspects of how plots are made are handled by the attributes of the Plot object. These Data attributes are specific to a particular Data object and influence how that object is plotted.

y_offset (number) [0]: The vertical displacement given to this data when plotted. This is useful for making stacked plots.
plot_multiplier (number) [1]: An over-all scaling factor for this data when plotted. It is probably a bad idea for this to be 0.

METHODS

This subclass inherits from Demeter, so all of the methods of the parent class are available.

See "Object_handling_methods" in Demeter for a discussion of accessor methods.

I/O methods

These methods handle the details of file I/O. See also the save method of the parent class.

save

This method returns the Ifeffit/Larch commands necessary to write column data files based on the data object. See Demeter::Data::IO for details.

data_parameter_report

This method returns a simple, textual summary of the attributes of the data object related to background removal and data processing. It is used in log files, output data files, and elsewhere. It may also be useful as a way of interactively describing the data.

fit_parameter_report

This method returns a simple, textual summary of the attributes of the data object related to the fit. It is used in log files, output data files, and elsewhere. It may also be useful as a way of interactively describing the data. The two optional arguments control whether the r-factor is computed as part of the report.

r_factor

This returns an evaluation of the R-factor from a fit for a single data set. This is different from the R-factor for a multiple data set fit as reported by Ifeffit or Larch in that this number includes only the misfit of the single data set.

  $r = $data_object -> r_factor;

nidp

This returns the number of independent points associated with this data set, as determined from the values of the k- and R-range parameters.

   $n = $data_object -> nidp;

The Athena-like fft and bft ranges are used if the processing mode is set to "fft". The Artemis-like fit ranges are used if the processing mode is set to "fit".

plot

Use this method to make plots of data. Demeter keeps track of changes to parameters and which data processing steps need to be taken in order to correctly make the plot. Consequently, it should never be necessary to import data or perform a Fourier tansform by hand. The practice with Demeter is to create a Data group, set some of its attributes, and then make a plot.

  $data_object -> plot($space)

The argument is one of E, k, R, q, or kq. The details of the plot are determine by the current state of the Demeter::Plot object.

Convenience methods

set_windows

This is a shortcut to setting the functional form of all Fourier transform windows used by the Data object. In one swoop, this method sets bkg_kwindow, fit_kwindow, fit_rwindow, fft_kwindow, and bft_rwindow to the specified window type.

  $data_object -> set_windows("Hanning");

The window type must be one of Kaiser-Bessel, Hanning, Parzen, Welch, Sine, or Gaussian.

nsuff

Returns either "norm" or "flat" as the proper suffix for the normalized mu(E) array depending on the value of bkg_flatten.

data

This method returns the reference to the data object itself. This is less silly than it seems. Having a data method defined for both Data and Path objects allows a loop over both kinds of objects and provides a simple way to identify the correct Data object.

  foreach my $obj (@data_objects, @paths_objects) {
     my $d = $obj->data;
     ## do something with $d
  };

DATA FILES AND DATA PARTS

When data are imported, Demeter tries to figure out whether the data are raw data, mu(E) or chi(k) data, if that has not been specified. The heuristics are as follows:

If the numerator or denominator attributes are set, this is data is assumed to be column data that will be interpreted as raw data and converted to mu(E) data.
The data will be read by Ifeffit or Larch. If the first data point is greater than 100, it will be assumed that these data are mu(E) and that the energy axis is absolute energy.
If the last data point is greater than 35, it will be assumed that these data are mu(E) and that the energy axis is relative energy.
If none of the above are true, then the data must be chi(k) data.

If your data will be misintepreted by these heuristics, then you must set the data_type attribute by hand.

The Data object has several parts associated with it. Before a fit (or summation) is done, there are two parts: the data itself and the window. After the fit, there is a residual, a fit, and (if the fit_do_bkg attribute is true) a background. Attributes of the Demeter::Plot object are used to specify which Data parts are shown in a plot.

DIAGNOSTICS

These messages are classified as follows (listed in increasing order of desperation):

    (W) A warning (optional).
    (F) A fatal error (trappable).

"$key" is not a valid Demeter::Data parameter: (W) You have attempted to access an attribute that does not exist. Check the spelling of the attribute at the point where you called the accessor.
Demeter::Data: "group" is not a valid group name: (F) You have used a group name that does not follow Ifeffit's rules for group names. The group name must start with a letter. After that only letters, numbers, &, ?, _, and : are acceptable characters. The group name must be no longer than 64 characters.
Demeter::Data: "$key" takes a number as an argument: (F) You have attempted to set a numerical attribute with something that cannot be interpretted as a number.
Demeter::Data: $k must be a number: (F) You have attempted to set an attribute that requires a numerical value to something that cannot be interpreted as a number.
Demeter::Data: $k must be a positive integer: (F) You have attempted to set an attribute that requires a positive integer value to something that cannot be interpreted as such.
Demeter::Data: $k must be a window function: (F) You have set a Fourier transform window attribute to something not recognized as a window function.
Demeter::Data: $r_hash-{$k} is not a readable data file>: (F) You have set the file attribute to something that cannot be found on disk.
Demeter::Data: $k must be one of (k R q): (F) You have set a fitting space that is not one k, R, or q.
Demeter::Data: $k must be one of (K L1 L2 L3): (F) You have attempted to set the fft_edge attribute to something that is not recognized as an edge symbol.
No filename specified for save: (F) You have called the save method without supplying a filename for the output file.
Valid save types are: xmu norm chi r q fit bkgsub: (F) You have called the save method with an unknown output file type.
cannot save mu(E) file from chi(k) data
cannot save norm(E) file from chi(k) data: (F) You have attempted to write out data in energy for data that were imported as chi(k).
No filename specified for serialize: (F) You have not supplied a filename for your data serialization.
No filename or YAML stream specified for deserialize: (F) You have not supplied a filename from which to deserialization data.

SERIALIZATION AND DESERIALIZATION

The serialization format of a data object is as a YAML file. The YAML serialization begins with a mapping of the Data object attributes. This is followed by sequences representing the data. For mu(E) data, the sequences are energy and xmu. If the data was column data, the xmu sequence is followed by a sequence representing the i0 array. For chi(k) data, the mapping is followed by sequences representing k and chi(k).

To serialize a Data object to a file:

  $data -> serialize($filename);

To import the serialized data and create a Data object to hold it:

  $data -> deserialize($filename);

As a convenience freeze is a synonym for serialize and thaw is a synonym for deserialize.

Among the many attractive features of YAML as a serialization format is that YAML is supported by lots of programming languages. So Demeter serialization can be imported easily into other analysis software.

In principle, the Athena project file is also a serialization format. The YAML serialization is intended for use as part of a fitting project. An Athena project file is probably a more useful user interaction format for data processing.

CONFIGURATION

See Demeter::Config for a description of the configuration system. Many attributes of a Data object can be configured via the configuration system. See, among others, the bkg, fft, bft, and fit configuration groups.

DEPENDENCIES

Demeter's dependencies are in the Build.PL file.

BUGS AND LIMITATIONS

Several features have not yet been implemented.

Should there only be two items in a collection of tied references? Consider importing MED channels -- it would be reasonable for each channel and the reference to be a reference collection.
Only some of the Athena-like data process methods (alignment, merging, and so on) have been implemented at this time. None of the analysis features of Athena (LCF, LR/PD, peak fitting) have been implemeneted.
Various background and normalization options: functional normalization, normalization order, background removal standard, Cromer-Liberman.
Tied reference channel
Standard deviation for merged data not written to serialization.

Please report problems to the Ifeffit Mailing List (http://cars9.uchicago.edu/mailman/listinfo/ifeffit/)

Patches are welcome.

AUTHOR

Bruce Ravel, http://bruceravel.github.io/home

http://bruceravel.github.io/demeter/

LICENCE AND COPYRIGHT

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlgpl.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.