Description | Perl tools for X-ray Absorption Spectroscopy |
Demeter::LCF - Linear combination fitting
This documentation refers to Demeter version 0.9.26.
#!/usr/bin/perl
use Demeter;
my $prj = Demeter::Data::Prj -> new(file=>'examples/cyanobacteria.prj');
my $lcf = Demeter::LCF -> new;
my $data = $prj->record(4);
my ($metal, $chloride, $sulfide) = $prj->records(9, 11, 15);
$lcf -> data($data);
$lcf -> add($metal);
$lcf -> add($chloride);
$lcf -> add($sulfide);
$lcf -> xmin($data->bkg_e0-20);
$lcf -> xmax($data->bkg_e0+60);
$lcf -> po -> set(emin=>-30, emax=>80);
$lcf -> fit;
$lcf -> plot_fit;
$lcf -> save('lcf_fit_result.dat');
Linear combination fitting (LCF) is an analysis method for interpreting XANES or EXAFS data using standards. The assumption is that the data from an unknown sample can be understood as a linear superposition of the data from two or more known, well-understood standards. The LCF analysis, therefore, tells us what fraction of the unknown sample is explained by one of the known standards.
For example, imagine mixing together quantities of iron oxide and iron sulfide such that there are equal numbers of iron atoms surrounded by oxygen and by sulfur. You would then expect to be able to describe the data from the mixure by adding together equal parts of the data from the two pure materials.
This object provides a framework for performing this sort of analysis. In the example shown above, data and standards are imported from an Athena project file. The data are fit as a linear combination of three standards. The result of the fit is the fraction of each standard present in the data as well as uncertainties in those fractions.
This object also provides methods for "combinatorial fitting". In this approach an ensemble of standards are compared to the data in all possible combinations (with certain constraints). The results are sorted by increasing R-factor of the fit. The first result, then, is the combination of standards giving the closest fit to the data.
xmin
The lower bound of the fit. For a fit to the normalized or derivative mu(E), this is an absolute energy value and not relative to the edge energy.
xmax
The upper bound of the fit. For a fit to the normalized or derivative mu(E), this is an absolute energy value and not relative to the edge energy.
space
The fitting space. This can be one of nor
, der
, or chi
. When fitting in chi
, e0 cannot be varied.
max_standards
The maximum number of standards to use in each fit of a combinatorial sequence.
include_caller
A boolean. Use in sequence fitting to determine whether the data currently associated with the LCF object is inlcuded in the fit sequence. The default is true. This is a convenience introduced for the sake of the LCF tool in Athena.
linear
A boolean. When true, add a linear term to the fit. (Not implemented yet.)
inclusive
A boolean. When true, all weights are forced to be between 0 and 1 inclusive.
unity
A boolean. When true, the weights are forced to sum to 1.
one_e0
A boolean. When true, one over-all e0 parameter is used in the fit rather than one for each standard. In practice, the standards are shifted by the same floated e0 value. That is, one parameter is floated and an e0 for each standard is def-ed to that value.
noise
If non-zero, add artifical noise to the data. The value is used as the sigma of the normally distributed artifical noise. You may need to play around to find an appropriate value for your data. Note that for a fit in chi(k), the noise is added to the un-k-weighted chi(k) data.
kweight
The kweighting used in a fit using chi(k) data.
plot_components
A boolean. When true, the scaled components of the fit will be included in a plot.
plot_difference
A boolean. When true, the residual of the fit will be included in a plot.
plot_indicators
A boolean. When true, plot indicators will mark the boundaries of the fit in a plot.
standards
This attribute contains the list of standards added to this LCF problem. The accessor returns a list reference:
$ref_to_standards = $lcf->standards;
It is strongly recommended that you do not assign standards directly to this. Instead use the add
or add_many
methods. Those methods take care of some other chores required to keep the LCF organized.
A number of methods are provided by Moose for interacting with the list stored in this attribute:
push_standards
Push a value to the list.
pop_standards
Pop a value from the list.
shift_standards
Shift a value from the list.
unshift_standards
Unshift a value to the list.
clear_standards
Assign an empty list.
Once the fit finishes, each of the following attributes is filled with a value appropriate to recently completed fit.
nstan
The number of standars used in the fit.
npoints
The number of data points included in the fit.
nvarys
The number of variable parameters used in the fit.
rfactor
An R-factor for the fit. For fits to chi(k) or the derivative spectrum, this is a normal R-factor in Ifeffit or Larch:
sum( [data_i - fit_i]^2 ]
--------------------------
sum ( data_i^2 )
For a fit to normalized mu(E), that formulation for an R-factor always results in a really tiny number. Demeter thus scales the R-factor to make it somewhat closer to 10^-2.
npoints * sum( [data_i - fit_i]^2 ]
---------------------------------------
sum ( [data_i - <data>]^2 )
where <data> is the geometric mean of the data in the fitting range.
chisqr
This is the chi-square for the fit.
chinu
This is the reduced chi-square for the fit.
add
Add a Data object to the LCF object for use one of the fitting standards. In it's simplest form, the sole argument is a Data objectL
$lcf -> add($data_object);
You can also set certain parameters of the standard by supplying an optional anonymous hash:
$lcf -> add($data_object, required => 0,
float_e0 => 0,
weight => 1/3,
e0 => 1/3,);
The required
parameter flags this standard as one that is required to be in a combinatorial fit. float_e0
is true when you wish to float an energy shift for this standard. The other two are used to specify the weight and e0.
There are methods (described) below for setting each of these parameters.
add_many
This method provides a way of setting a group of Data objects as standard in one shot. It is equivalent to repeated calls to the add
method without the anonymous hash.
$lcf -> add_many(@data);
float_e0
This method is used to turn a floating e0 value on or off for a given standard.
$lcf->float_e0($standard, $onoff);
The first argument is the standard in question, the second is a boolean toggling the floating e0 on or off.
These are the same:
$lcf->add($data);
$lcf->float_e0($data, 1);
and
$lcf->add($add, float_e0=>1);
required
This method is used to require a given standard in a combinatorial fit.
$lcf->required($standard, $onoff);
The first argument is the standard in question, the second is a boolean toggling the requirement on or off.
These are the same:
$lcf->add($data);
$lcf->required($data, 1);
and
$lcf->add($add, required=>1);
weight
This method is both a setter and getter of the weight for a given standard. As a getter:
my ($weight, $dweight) = $lcf->weight($standard);
The weight and the uncertainty in the weight are returned as an array.
The weight can be set to an explicit value:
my ($weight, $dweight) = $lcf->weight($standard, $value);
Again weight and the uncertainty in the weight are returned as an array. The uncertainty is zeroed when the weight is explicitly set.
In scalar context, this just returns the weight.
e0
This method is both a setter and getter of the e0 shift for a given standard. As a getter:
my ($e0, $e0) = $lcf->e0($standard);
The e0 and the uncertainty in the e0 are returned as an array.
The e0 can be set to an explicit value:
my ($e0, $de0) = $lcf->e0($standard, $value);
Again e0 and the uncertainty in the e0 are returned as an array. The uncertainty is zeroed when the e0 is explicitly set.
In scalar context, this just returns the e0.
fit
Perform the fit.
$lcf->fit;
This will perform some sanity checks, including verifying that the data has been set and that at least two standards have been defined. It will also make sure xmin
and xmax
are in the correct order.
An optional boolean argument turns the spinner off when in screen UI mode. This allows use of a counter for combinatorial fits.
$lcf->fit(1); # true value means to turn spinner off
report
This returns a summary of the fitting results.
print $lcf->report;
save
This method saves the results of a fit to a column data file containing columns for the x-axis (energy or wavenumber), the data, the fit, and each of the weighted components.
$lcf -> save("file.dat");
plot_fit
This method will generate a plot showing the data and the fit.
$lcf -> plot_fit;
The plot_difference
, plot_components
, and plot_indicators
attributes determine whether the residual, the weighted components, and indicators marking the fitting range are included in the plot.
By default, the chi(k) and derivative components are stacked automatically.
plot
This is the generic plotting method for use when overplotting a large number of objects. In this example, the data, the standards, and the result of the LCF fit are plotted together with the standards plotted normally rather than as the weighted components of the fit.
$lcf->po->start_plot;
$lcf->po->set(e_norm=>1, e_der=>1, emin=>-30, emax=>70);
$_->plot('e') foreach ($data, @standards, $lcf);
This method does nothing (i.e. it does not attempt to plot the LCF fit at all) if the plot conditions do not match the fitting space of the fit. E.g., an LCF fit to normalized data will only be plotted if the fit is in energy and the e_norm
Plot attribute is true.
clean
This method clears all scalars and arrays out of the memory of the data processing backend (Ifeffit/Larch).
$lcf->clean;
Note that there is not a remove
method to do the opposite of add
. This seems to me unnecessarily difficult to use. I suggest explicitly clearing the standards list and then add
ing a new set of standards. This is how the combinatorial fitting loop works.
$lcf->add(@data);
... do stuff, then
$lcf->clear_standards;
$lcf->add(@new_data);
Explain these:
'push' => 'push_standards',
'pop' => 'pop_standards',
'shift' => 'shift_standards',
'unshift' => 'unshift_standards',
'clear' => 'clear_standards',
These attributes and methods are specifically related to combinatorial fitting. A combinatorial fit is one in which all possible combinations (within certain constraints) are compared to the data.
max_standards
The maximum number of standards to use in each fit of a combinatorial sequence. Note that the size of the combinatorial problem grows geometrically in the value of this parameter and in the number of possible standards.
If, for example, this is set to 4, then in a combinatorial fit, all possible combinations of 2, 3, or 4 standards will be fit to the data.
Note that the size of the combinatorial problem gets very large as this number grows.
combi_results
This is an array of hashes, sorted by rfactor, containing all the results of fitting sequence.
Each hash in the array looks like this:
{
Rfactor => Num,
Chinu => Num,
Chisqr => Num,
Nvarys => Num,
Scaleby => Num,
aaaaa => [weight, dweight, e0, de0],
bbbbb => [weight, dweight, e0, de0],
....
}
A key like "aaaaa" is the group attribute of a Data object used in the fit. From this, the final state of each fit can be recovered using the restore
method.
combi
Perform a combinatorial sequence of fits, that is, perform all fits using all combinations of standards up to the number in max_standards
. If max_standards
is 4, then all combinations of 2, 3, or 4 of all the standards added to the object will considered.
$lcf->combi;
At the end of the fit, the combi_results
attribute is filled with an array of hashes containing the sorted results of the fit. The first item in the array contains th results from the fit with the lowest R-factor (that is, the combinationof standards that most closely describes the data).
One or more standards can be flagged as being required in a fit. That is, each fit will include the flagged standards. This will significantly reduce the size of the combinatorial problem. See the discussion of the add
, required
, and is_required
methods above.
At the end of the combinatorial sequence of fits, the fit with the lowest R-factor will be restored. Calling plot_fit
, report
, or save
will act on that fit. To examine other fits from the sequence, the restore
must be called using one of the results from the combi_results
attribute.
combi_report
Write an Excel or CSV (comma separated value) file that can be imported into a spreadsheet with the results of the combinatorial fit.
$lcf->combi_report("results.xls");
The argument is the name of the output file (which you probably want to give a ".csv" or ".xls" extension so your spreadsheet will know to import it as such. The choice of file type is controlled by the value of the lcf ->; output
configuration parameter.
Note that care is taken to strip any commas from the names of the standards before writing the CSV file. Also note that this does not make the most elegant spreadsheet, but it is certainly functional and it certainly allows you to examine all of your results.
These attributes and methods are specifically related to combinatorial fitting. A combinatorial fit is one in which all possible combinations (within certain constraints) are compared to the data.
seq_results
This is an array of hashes, sorted by rfactor, containing all the results of fitting sequence.
Each hash in the array looks like this:
{
Rfactor => Num,
Chinu => Num,
Chisqr => Num,
Nvarys => Num,
Scaleby => Num,
aaaaa => [weight, dweight, e0, de0],
bbbbb => [weight, dweight, e0, de0],
....
}
A key like "aaaaa" is the group attribute of a Data object used in the fit. From this, the final state of each fit can be recovered using the restore
method.
sequence
Perform a sequence of fits to a group data.
$lcf->sequence(@data);
The data in $lcf
is appended to the beginning of @data
unless the include_caller
attribute is false. Care is taken to remove repeats from @data
.
At the end of the fit, the seq_results
attribute is filled with an array of hashes containing the results of each fit. This array is in the same order as @data
.
At the end of the sequence of fits, the fit to the data originally in $lcf
will be restored. Calling plot_fit
, report
, or save
will act on that fit. To examine other fits from the sequence, the restore
must be called using one of the results from the seq_results
attribute.
sequence_report
Write an Excel or CSV (comma separated value) file that can be imported into a spreadsheet with the results of the combinatorial fit.
$lcf->sequence_report("results.csv");
The argument is the name of the output file (which you probably want to give a ".csv" or ".xls" extension so your spreadsheet will know to import it as such. The choice of file type is controlled by the value of the lcf ->; output
configuration parameter.
Note that care is taken to strip any commas from the names of the standards before writing the CSV file. Also note that this does not make the most elegant spreadsheet, but it is certainly functional and it certainly allows you to examine all of your results.
sequence_plot
Make a plot of the component weights as a function of data set.
Good question ...
See Demeter::Config for a description of the configuration system. See the lcf
configuration group for the relevant parameters.
Demeter's dependencies are in the Build.PL file.
noise
Serialization and deserialization
linear term
better sanity method that provides usable feedback for a GUI
singlefile plot
further processing of LCF result (i.e. bkg removal, FTs). This seems better than converting the fit into a normal Data object
Please report problems to the Ifeffit Mailing List (http://cars9.uchicago.edu/mailman/listinfo/ifeffit/)
Patches are welcome.
Bruce Ravel, http://bruceravel.github.io/home
http://bruceravel.github.io/demeter/
Copyright (c) 2006-2018 Bruce Ravel (http://bruceravel.github.io/home). All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlgpl.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.