Demeter

Description Perl tools for X-ray Absorption Spectroscopy
Demeter > Perl Modules > Demeter::LCF
Source

NAME

Demeter::LCF - Linear combination fitting

VERSION

This documentation refers to Demeter version 0.9.26.

SYNOPSIS

   #!/usr/bin/perl
   use Demeter;

   my $prj  = Demeter::Data::Prj -> new(file=>'examples/cyanobacteria.prj');
   my $lcf  = Demeter::LCF -> new;
   my $data = $prj->record(4);
   my ($metal, $chloride, $sulfide) = $prj->records(9, 11, 15);

   $lcf -> data($data);
   $lcf -> add($metal);
   $lcf -> add($chloride);
   $lcf -> add($sulfide);

   $lcf -> xmin($data->bkg_e0-20);
   $lcf -> xmax($data->bkg_e0+60);
   $lcf -> po -> set(emin=>-30, emax=>80);
   $lcf -> fit;
   $lcf -> plot_fit;
   $lcf -> save('lcf_fit_result.dat');

DESCRIPTION

Linear combination fitting (LCF) is an analysis method for interpreting XANES or EXAFS data using standards. The assumption is that the data from an unknown sample can be understood as a linear superposition of the data from two or more known, well-understood standards. The LCF analysis, therefore, tells us what fraction of the unknown sample is explained by one of the known standards.

For example, imagine mixing together quantities of iron oxide and iron sulfide such that there are equal numbers of iron atoms surrounded by oxygen and by sulfur. You would then expect to be able to describe the data from the mixure by adding together equal parts of the data from the two pure materials.

This object provides a framework for performing this sort of analysis. In the example shown above, data and standards are imported from an Athena project file. The data are fit as a linear combination of three standards. The result of the fit is the fraction of each standard present in the data as well as uncertainties in those fractions.

This object also provides methods for "combinatorial fitting". In this approach an ensemble of standards are compared to the data in all possible combinations (with certain constraints). The results are sorted by increasing R-factor of the fit. The first result, then, is the combination of standards giving the closest fit to the data.

ATTRIBUTES

Parameters of the fit

xmin

The lower bound of the fit. For a fit to the normalized or derivative mu(E), this is an absolute energy value and not relative to the edge energy.

xmax

The upper bound of the fit. For a fit to the normalized or derivative mu(E), this is an absolute energy value and not relative to the edge energy.

space

The fitting space. This can be one of nor, der, or chi. When fitting in chi, e0 cannot be varied.

max_standards

The maximum number of standards to use in each fit of a combinatorial sequence.

include_caller

A boolean. Use in sequence fitting to determine whether the data currently associated with the LCF object is inlcuded in the fit sequence. The default is true. This is a convenience introduced for the sake of the LCF tool in Athena.

linear

A boolean. When true, add a linear term to the fit. (Not implemented yet.)

inclusive

A boolean. When true, all weights are forced to be between 0 and 1 inclusive.

unity

A boolean. When true, the weights are forced to sum to 1.

one_e0

A boolean. When true, one over-all e0 parameter is used in the fit rather than one for each standard. In practice, the standards are shifted by the same floated e0 value. That is, one parameter is floated and an e0 for each standard is def-ed to that value.

noise

If non-zero, add artifical noise to the data. The value is used as the sigma of the normally distributed artifical noise. You may need to play around to find an appropriate value for your data. Note that for a fit in chi(k), the noise is added to the un-k-weighted chi(k) data.

kweight

The kweighting used in a fit using chi(k) data.

plot_components

A boolean. When true, the scaled components of the fit will be included in a plot.

plot_difference

A boolean. When true, the residual of the fit will be included in a plot.

plot_indicators

A boolean. When true, plot indicators will mark the boundaries of the fit in a plot.

Standards

standards

This attribute contains the list of standards added to this LCF problem. The accessor returns a list reference:

  $ref_to_standards = $lcf->standards;

It is strongly recommended that you do not assign standards directly to this. Instead use the add or add_many methods. Those methods take care of some other chores required to keep the LCF organized.

A number of methods are provided by Moose for interacting with the list stored in this attribute:

push_standards

Push a value to the list.

pop_standards

Pop a value from the list.

shift_standards

Shift a value from the list.

unshift_standards

Unshift a value to the list.

clear_standards

Assign an empty list.

Statistics

Once the fit finishes, each of the following attributes is filled with a value appropriate to recently completed fit.

nstan

The number of standars used in the fit.

npoints

The number of data points included in the fit.

nvarys

The number of variable parameters used in the fit.

rfactor

An R-factor for the fit. For fits to chi(k) or the derivative spectrum, this is a normal R-factor in Ifeffit or Larch:

   sum( [data_i - fit_i]^2 ]
  --------------------------
      sum ( data_i^2 )

For a fit to normalized mu(E), that formulation for an R-factor always results in a really tiny number. Demeter thus scales the R-factor to make it somewhat closer to 10^-2.

    npoints * sum( [data_i - fit_i]^2 ]
  ---------------------------------------
        sum ( [data_i - <data>]^2 )

where <data> is the geometric mean of the data in the fitting range.

chisqr

This is the chi-square for the fit.

chinu

This is the reduced chi-square for the fit.

METHODS

add

Add a Data object to the LCF object for use one of the fitting standards. In it's simplest form, the sole argument is a Data objectL

  $lcf -> add($data_object);

You can also set certain parameters of the standard by supplying an optional anonymous hash:

  $lcf -> add($data_object, required => 0,
                            float_e0 => 0,
                            weight   => 1/3,
                            e0       => 1/3,);

The required parameter flags this standard as one that is required to be in a combinatorial fit. float_e0 is true when you wish to float an energy shift for this standard. The other two are used to specify the weight and e0.

There are methods (described) below for setting each of these parameters.

add_many

This method provides a way of setting a group of Data objects as standard in one shot. It is equivalent to repeated calls to the add method without the anonymous hash.

  $lcf -> add_many(@data);
float_e0

This method is used to turn a floating e0 value on or off for a given standard.

  $lcf->float_e0($standard, $onoff);

The first argument is the standard in question, the second is a boolean toggling the floating e0 on or off.

These are the same:

  $lcf->add($data);
  $lcf->float_e0($data, 1);

and

  $lcf->add($add, float_e0=>1);
required

This method is used to require a given standard in a combinatorial fit.

  $lcf->required($standard, $onoff);

The first argument is the standard in question, the second is a boolean toggling the requirement on or off.

These are the same:

  $lcf->add($data);
  $lcf->required($data, 1);

and

  $lcf->add($add, required=>1);
weight

This method is both a setter and getter of the weight for a given standard. As a getter:

  my ($weight, $dweight) = $lcf->weight($standard);

The weight and the uncertainty in the weight are returned as an array.

The weight can be set to an explicit value:

  my ($weight, $dweight) = $lcf->weight($standard, $value);

Again weight and the uncertainty in the weight are returned as an array. The uncertainty is zeroed when the weight is explicitly set.

In scalar context, this just returns the weight.

e0

This method is both a setter and getter of the e0 shift for a given standard. As a getter:

  my ($e0, $e0) = $lcf->e0($standard);

The e0 and the uncertainty in the e0 are returned as an array.

The e0 can be set to an explicit value:

  my ($e0, $de0) = $lcf->e0($standard, $value);

Again e0 and the uncertainty in the e0 are returned as an array. The uncertainty is zeroed when the e0 is explicitly set.

In scalar context, this just returns the e0.

fit

Perform the fit.

  $lcf->fit;

This will perform some sanity checks, including verifying that the data has been set and that at least two standards have been defined. It will also make sure xmin and xmax are in the correct order.

An optional boolean argument turns the spinner off when in screen UI mode. This allows use of a counter for combinatorial fits.

  $lcf->fit(1);  # true value means to turn spinner off
report

This returns a summary of the fitting results.

  print $lcf->report;
save

This method saves the results of a fit to a column data file containing columns for the x-axis (energy or wavenumber), the data, the fit, and each of the weighted components.

  $lcf -> save("file.dat");
plot_fit

This method will generate a plot showing the data and the fit.

  $lcf -> plot_fit;

The plot_difference, plot_components, and plot_indicators attributes determine whether the residual, the weighted components, and indicators marking the fitting range are included in the plot.

By default, the chi(k) and derivative components are stacked automatically.

plot

This is the generic plotting method for use when overplotting a large number of objects. In this example, the data, the standards, and the result of the LCF fit are plotted together with the standards plotted normally rather than as the weighted components of the fit.

   $lcf->po->start_plot;
   $lcf->po->set(e_norm=>1, e_der=>1, emin=>-30, emax=>70);
   $_->plot('e') foreach ($data, @standards, $lcf);

This method does nothing (i.e. it does not attempt to plot the LCF fit at all) if the plot conditions do not match the fitting space of the fit. E.g., an LCF fit to normalized data will only be plotted if the fit is in energy and the e_norm Plot attribute is true.

clean

This method clears all scalars and arrays out of the memory of the data processing backend (Ifeffit/Larch).

  $lcf->clean;

Note that there is not a remove method to do the opposite of add. This seems to me unnecessarily difficult to use. I suggest explicitly clearing the standards list and then adding a new set of standards. This is how the combinatorial fitting loop works.

  $lcf->add(@data);
   ... do stuff, then
  $lcf->clear_standards;
  $lcf->add(@new_data);

Explain these:

                                  'push'    => 'push_standards',
                                  'pop'     => 'pop_standards',
                                  'shift'   => 'shift_standards',
                                  'unshift' => 'unshift_standards',
                                  'clear'   => 'clear_standards',

COMBINATORIAL FITTING

These attributes and methods are specifically related to combinatorial fitting. A combinatorial fit is one in which all possible combinations (within certain constraints) are compared to the data.

Attributes for combinatorial fitting

max_standards

The maximum number of standards to use in each fit of a combinatorial sequence. Note that the size of the combinatorial problem grows geometrically in the value of this parameter and in the number of possible standards.

If, for example, this is set to 4, then in a combinatorial fit, all possible combinations of 2, 3, or 4 standards will be fit to the data.

Note that the size of the combinatorial problem gets very large as this number grows.

combi_results

This is an array of hashes, sorted by rfactor, containing all the results of fitting sequence.

Each hash in the array looks like this:

  {
   Rfactor => Num,
   Chinu   => Num,
   Chisqr  => Num,
   Nvarys  => Num,
   Scaleby => Num,
   aaaaa   => [weight, dweight, e0, de0],
   bbbbb   => [weight, dweight, e0, de0],
   ....
  }

A key like "aaaaa" is the group attribute of a Data object used in the fit. From this, the final state of each fit can be recovered using the restore method.

Methods of combinatorial fitting

combi

Perform a combinatorial sequence of fits, that is, perform all fits using all combinations of standards up to the number in max_standards. If max_standards is 4, then all combinations of 2, 3, or 4 of all the standards added to the object will considered.

  $lcf->combi;

At the end of the fit, the combi_results attribute is filled with an array of hashes containing the sorted results of the fit. The first item in the array contains th results from the fit with the lowest R-factor (that is, the combinationof standards that most closely describes the data).

One or more standards can be flagged as being required in a fit. That is, each fit will include the flagged standards. This will significantly reduce the size of the combinatorial problem. See the discussion of the add, required, and is_required methods above.

At the end of the combinatorial sequence of fits, the fit with the lowest R-factor will be restored. Calling plot_fit, report, or save will act on that fit. To examine other fits from the sequence, the restore must be called using one of the results from the combi_results attribute.

combi_report

Write an Excel or CSV (comma separated value) file that can be imported into a spreadsheet with the results of the combinatorial fit.

  $lcf->combi_report("results.xls");

The argument is the name of the output file (which you probably want to give a ".csv" or ".xls" extension so your spreadsheet will know to import it as such. The choice of file type is controlled by the value of the lcf ->; output configuration parameter.

Note that care is taken to strip any commas from the names of the standards before writing the CSV file. Also note that this does not make the most elegant spreadsheet, but it is certainly functional and it certainly allows you to examine all of your results.

SEQUENCE FITTING

These attributes and methods are specifically related to combinatorial fitting. A combinatorial fit is one in which all possible combinations (within certain constraints) are compared to the data.

Attributes for combinatorial fitting

seq_results

This is an array of hashes, sorted by rfactor, containing all the results of fitting sequence.

Each hash in the array looks like this:

  {
   Rfactor => Num,
   Chinu   => Num,
   Chisqr  => Num,
   Nvarys  => Num,
   Scaleby => Num,
   aaaaa   => [weight, dweight, e0, de0],
   bbbbb   => [weight, dweight, e0, de0],
   ....
  }

A key like "aaaaa" is the group attribute of a Data object used in the fit. From this, the final state of each fit can be recovered using the restore method.

Methods of sequence fitting

sequence

Perform a sequence of fits to a group data.

  $lcf->sequence(@data);

The data in $lcf is appended to the beginning of @data unless the include_caller attribute is false. Care is taken to remove repeats from @data.

At the end of the fit, the seq_results attribute is filled with an array of hashes containing the results of each fit. This array is in the same order as @data.

At the end of the sequence of fits, the fit to the data originally in $lcf will be restored. Calling plot_fit, report, or save will act on that fit. To examine other fits from the sequence, the restore must be called using one of the results from the seq_results attribute.

sequence_report

Write an Excel or CSV (comma separated value) file that can be imported into a spreadsheet with the results of the combinatorial fit.

  $lcf->sequence_report("results.csv");

The argument is the name of the output file (which you probably want to give a ".csv" or ".xls" extension so your spreadsheet will know to import it as such. The choice of file type is controlled by the value of the lcf ->; output configuration parameter.

Note that care is taken to strip any commas from the names of the standards before writing the CSV file. Also note that this does not make the most elegant spreadsheet, but it is certainly functional and it certainly allows you to examine all of your results.

sequence_plot

Make a plot of the component weights as a function of data set.

SERIALIZATION AND DESERIALIZATION

Good question ...

CONFIGURATION AND ENVIRONMENT

See Demeter::Config for a description of the configuration system. See the lcf configuration group for the relevant parameters.

DEPENDENCIES

Demeter's dependencies are in the Build.PL file.

BUGS AND LIMITATIONS

Please report problems to the Ifeffit Mailing List (http://cars9.uchicago.edu/mailman/listinfo/ifeffit/)

Patches are welcome.

AUTHOR

Bruce Ravel, http://bruceravel.github.io/home

http://bruceravel.github.io/demeter/

LICENCE AND COPYRIGHT

Copyright (c) 2006-2018 Bruce Ravel (http://bruceravel.github.io/home). All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlgpl.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.