11.3. File type plugins

11.3.1. Extending ATHENA to read new file types

bend ATHENA uses IFEFFIT's read\_data() function or LARCH's read_ascii() function to import data. This means that ATHENA's notion of what is an acceptable data format is completely identical to IFEFFIT's (or LARCH's) notion. The contrapositive is also true – if IFEFFIT (or LARCH) can read a data file, so can ATHENA.

In practice, this works great. IFEFFIT is able to read the data files generated by many of the world's XAS beamlines. And so, consequently, is ATHENA. Sadly, there are many beamlines that use a format that confounds IFEFFIT and ATHENA. LARCH is rather more intelligent, but still unable to read some of the wackier file types. There are two obvious ways that I could deal with data from those beamline:

  1. Refuse to deal with them and require the user to transform the data into a form that IFEFFIT can handle.
  2. Hard-wire code into ATHENA to deal with each new data format as I become aware of it.

Neither of those are particularly user-friendly. ATHENA instead relies on a plugin architecture allowing ATHENA to be extended on the fly to deal well with new data formats without having to change the underlying code.

This page documents the plugin architecture so that ATHENA's users can write their own file type plugins.

11.3.2. Overview of how plugins work

In simple language, a perl module is a short file containing special perl code placed in a special location. ATHENA uses the code contained in that file to recognize and pre-process data files so that they can be imported properly using IFEFFIT or LARCH.

In somewhat more technical language, a plugin is just a perl module placed on your computer in a place where it can be found. This file is used when ATHENA starts and its methods are available when data are imported.

When a plugin is available for use, it is invoked every time a file is imported into ATHENA using the Open file function. The new file is checked using one of the plugin's methods to ascertain if the file is of the sort serviced by the plugin. If the file is recognized, another method in the plugin transforms the original data file into a form that is readable by IFEFFIT or LARCH. This transformation is done in a way that leaves the original data file unchanged.

If the transformation is successful, the user is presented with ATHENA's column selection dialog and can import data in the normal manner. Ideally, a plugin is written in a way that makes the import of the data into ATHENA a completely transparent process for the user.

11.3.3. Example plugin

Here is a complete example of a functional plugin taken from the DEMETER distribution. This plugin allows ATHENA to import files from NSLS beamline X10C. As you can see, the plugin is quite short. The following sections of this page will explain this example in detail.

package Demeter::Plugins::X10C;

use Moose;
extends 'Demeter::Plugins::FileType';

has '+is_binary'    => (default => 0);
has '+description'  => (default => "NSLS beamline X10C");
has '+version'      => (default => 0.1);
has '+metadata_ini' => (default =>
                        File::Spec->catfile(File::Basename::dirname($INC{'Demeter.pm'}),
                                            'Demeter', 'share', 'xdi', 'x10c.ini'));

sub is {
  my ($self) = @_;
  open D, $self->file or $self->Croak("could not open " . $self->file . " as data (X10C)\n");
  my $first = <D>;
  close D, return 0 unless (uc($first) =~ /^EXAFS/);
  my $lines = 0;
  while (<D>) {
    close D, return 1 if (uc($first) =~ /^\s+DATA START/);
    ++$lines;
  };
  close D;
};


sub fix {
  my ($self) = @_;
  my $new = File::Spec->catfile($self->stash_folder, $self->filename);
  ($new = File::Spec->catfile($self->stash_folder, "toss")) if (length($new) > 127);
  open D, $self->file or die "could not open " , $self->file . " as data (fix in X10C)\n";
  open N, ">".$new or die "could not write to $new (fix in X10C)\n";
  my $header = 1;
  my $null = chr(0).'+';
  while (<D>) {
    $_ =~ s/$null//g;             # clean up nulls
    print N "# " . $_ if $header; # comment headers
    ($header = 0), next if (uc($_) =~ /^\s+DATA START/);
    next if ($header);
    $_ =~ s/([eE][-+]\d{1,2})-/$1 -/g; # clean up 5th column
    print N $_;
  };
  close N;
  close D;
  $self->fixed($new);
  return $new;
}

sub suggest {
  my ($self, $which) = @_;
  $which ||= 'transmission';
  if ($which eq 'transmission') {
    return (energy      => '$1',
            numerator   => '$4',
            denominator => '$6',
            ln          =>  1,);
  } else {
    return ();
  };
};


__PACKAGE__->meta->make_immutable;
1;

11.3.4. Namespace

The module must be in a particular namespace. The namespace is defined by the package function on line 1 of the example. The package must be below the Demeter::Plugins namespace and should have a name that is descriptive of what format it is made for. In the case of the example, the plugin is intended to transform files from NSLS beamline X10C, so the full namespace of the module is Demeter::Plugins::X10C. Lines 3, 4, 62, and 63 are some requisite boilerplate which allow this module to work properly with DEMETER and ATHENA.

11.3.5. Required methods and variables

The plugin must supply three methods and must set several attributes of the Plugin object.

11.3.5.1. required attributes

Lines 12-14 define the two required variables in a way that allows them to be accessed outside the scope of this module.

is_binary
(Line 6) A boolean that tells ATHENA whether the input file format is in a text or binary format. ATHENA handles binary files slightly differently in the column selection dialog.
description
(Line 7) A short text string describing the purpose of this plugin. This string will be displayed in the plugin registry. This description should be no more than a few dozen characters.
version
(Line 8) This is a numeric version of the plugin.
metadata_ini
The file in share/xdi/ folder that contains metadata common to the beamline and facility.
headers
A reference to a hash containing additional metadata related to the work done by the plugin.

11.3.5.2. the is method

Lines 12-23 show the is method. This method is called by ATHENA to try to recognize an input data file as being of a particular format. In the case of this example, the X10C file is recognized by some of the text in the first few lines of the files. When the file is recognized, this method returns a true value. If the test fails, it returns 0. When ATHENA sees the true return value, it applies the fix method to transform the data file into an IFEFFIT- or LARCH-friendly format.

It is quite important that the is method be fast. It is possible that a data file will have to be tested against a large number of plugins. If the is method is slow, file import will be slow.

11.3.5.3. the fix method

Lines 26-46 show the fix method. This method is called when the is method returns true. In some manner it makes a copy of the original data file and transforms that copy into a form that can be read by IFEFFIT or LARCH. This method needs to follow a number of strict rules, however within those rules there is a lot of flexibility about how the transformation is accomplished and the scope of what that transformation does to the data.

First and most important, never alter the original data! Either work on the contaents of the original file in memory or make a copy of the data, preferably in the stash folder (a folder known to DEMETER as a place for writing scratch files). At line 29, we see that file is opened in the stash folder for holding the transformed data. As the data is processed, the output is written to that file (see lines 36 and 40).

Do whatever chore needs doing to transform the portion of the original data file that needs attention. Afterwords close both the input and output files. It is esential that the files be closed, particularly on Windows, which locks opened files from other uses.

Finally set the fixed attribute of the object to the path and name of the transformed file and return that same string.

In the example given on this page, the first thing the fix method does is to create a file name in the stash directory for the transformed file. Line 28 tells ATHENA to give the stash file the same name as the original file (before calling this method, ATHENA sets the filename attribute appropriately) but in the stash directory (the catfile method builds a fully resolved filename in a platform transparent manner). Line 29 checks the length of the fully resolved filename to avoid running into one of IFEFFIT's internal limitations.

Three things are done to transform an X10C file. The header is stripped of null characters, the header is commented out by putting # characters in the first column, and a formatting problem in some files involving a lack of white space between columns is resolved. Each line of the original file is read, operated on, and written to the transformed file in the stash directory. The while loop starting at line 34 reads through the file line-by-line and performs the operations.

Lines 42 and 43 close the original and new file handles. The filter should always close the file handles. This is not such a huge issue under unix, but Windows places a lock on any open file handle. If you fail to close one, for as long as ATHENA is running no other process will be able to do anything with that file.

At line 45, the method returns with the fully resolved name of the transformed file. At no point was the original file altered. When ATHENA exits, it will clean up the stash directory, thus avoiding a pile up of unnecessary data files.

DEMETER ships with a number of differnt kinds of plugins. Some of them perform simple, linear transofrmations (like this one). Others interpret binary data. A couple export project files rather than data files. One even performs an on-the-fly deadtime correction for data from an energy dispersive detector. Examine them for hints about how to create your own plugins.

11.3.5.4. the suggest method

Lines 48-59 show the suggest method. This provides feedback for use by the column selection dialog is selecting initial guesses for the columns containing the numerator and denominator of the data. In this case, the method suggests columns for transmission data butmakes no suggestions of fluorescence data.

11.3.6. ATHENA's plugin registry

Because there might be a large number of file type plugins, it is possible for the user to turn the checks for the file types on and off. In the main menu, you will find the Plugin Registry. This is a simple list of all plugins found in the system and user directories. The check buttons enable and disable the plugins. The value of the description attribute is displayed in the list (so be sure to choose a suitable and suitably short value for that variable).

../_images/import_plugin.png

Fig. 11.4 The plugin registry.

Note that the order in which the plugins are displayed above is the same order in which files are checked against the plugins. User plugins are checked before system plugins. After that the plugins are ordered alphabetically. If you want your system plugins to be checked against the data first, choose a name that comes early in the alphabetical sense.

Right-clicking rightclick on an item in the registry posts the context menu shown in the figure above. All such context menus have at least one item for reading the documentation contained in the plugin source code file. Some plugins, such as the one shown, also provide a way of configuring the behavior of the plugin.

11.3.7. System plugins and user plugins

ATHENA looks in two different places for these plugins. One place is in ATHENA's installation location where it finds the plugins that come with the horae distribution. The other is in the user's space (on Windows plugins are located in C:\\Program File\\Ifeffit\\horae\\Ifeffit\\Plugins\\Filetype\\Athena\\, on unix $HOME/.horae/Ifeffit/Plugins/Filetype/Athena/). In both places, it reads the contents of the plugin directory and attempts to import the files which end in .pm.

11.3.8. Miscellaneous advice on plugins

  1. Cut-n-paste is an excellent way to get started on a new plugin. Make a copy of a plugin for a file that is similar to your own file and use that as the basis for your new plugin.
  2. X15B.pm is an example of a plugin for a binary format.
  3. You can use any module that you need, thus you have all of CPAN available to you when designing your plugin. If you need to do any seriously heavy lifting, check out the Math::Pari module or the Perl Data Language
  4. Although a well-tested, robust plugin should be your goal, one of the nice features of the plugin architecture is that a “good-enough” plugin is easy to write and can quickly get you over a hurdle.



DEMETER is copyright © 2009-2016 Bruce Ravel – This document is copyright © 2016 Bruce Ravel

This document is licensed under The Creative Commons Attribution-ShareAlike License.

If DEMETER and this document are useful to you, please consider supporting The Creative Commons.