RECSPE package for:
   RECovery of legacy paper SPEectra

        The RECSPE package described here is the result of a project to digitise extensive mm-wave rotational spectra of the H2O...HF hydrogen bonded complex recorded in the Nizhnii Novgorod laboratory in Russia.  Partial analysis of those spectra was published (Belov et al. J.Mol.Spectrosc. 241 (2007) 124), but the majority of the lines remained unassigned and only the paper version of those spectra survived.

        In fact the situation when a spectrum exists only in the form of a paper record and contains valuable unprocessed information is not that rare.  Such spectra are also often in the form of  chart recorder rolls.  It is very desirable to convert such spectra into a digital form that will be amenable for use with contemporary packages for graphical assignment, such as AABS.

        RECSPE is a package of programs for conversion into a usable digital form of such legacy paper spectra.  Several graphics programs (such as Inkscape) can trace a bitmap image into a vector, which is useful, but the result is still far from what we would regard as a digital spectrum.  The present package offers a complete route from legacy paper spectra to calibrated digital spectra in the form of point intensities at a uniform frequency spacing.

        Recovery of paper spectra poses some specific issues that need to be addressed, and these needed to be dealt with in the RECSPE programs:

  • Frequency calibration: This is key to the usability of recovered spectra.  Many old spectra are inherently nonlinear in frequency.  Even if the spectrum was linear it is possible that nonlinearities may have crept in from uneven operation of the original recorder or distortions in the paper through folding or crumpling.
  • Multipage spectra: If the spectra are in the form of a strip chart record then they need to be scanned to multiple images that need to be spliced together 


       The steps in the RECSPE procedure:
  1. Scan the spectrum into a reference bitmap image (300dpi color TIFF with LZW compression is recommended)
  2. Convert the bitmap image to indexed 256 color=8bit 300 dpi BMP, which is the form that will be used for further analysis.  You may also need to modify the scanned image of the spectrum for optimum tracing and freely available bitmap graphics programs IrfanView and GIMP are recommended for this purpose.

  3. Use program TRACE to trace the spectrum from bitmaps to vector representation.  The success of the tracing can be previewed by means of automatically generated diagrams for the gle package.

  4. Use program SPLICE to splice together traces from adjacent pages of multipage spectra (you need to ensure that there is sufficient overlap between their bitmaps).

  5. Use program FZERO to assign a zero order linear frequency scale to the horizontal axis based on specification of two characteristic points.

  6. Use program MERGE to combine all spectra into a single record.

  7. Use the AABS package to determine the frequency calibration of the spectrum and then program FRECAL to convert the frequency scale to that resulting from the calibration.

        Some of these steps are only needed for more complex situations.  For a single page spectrum that was plotted linear in frequency you might only need to use TRACE and FZERO.  For more complex spectra and if you want to achieve  maximum accuracy then you may need to go through the whole procedure, iterating some steps several times.

        Examples of paper spectra and of their conversion:

Stark spectrum of methanol at taken in the 1970's with the Hewlett-Packard 8460A rotational spectrometer at University College London:
meoh_04.jpg = fourth segment of scanned chart strip output (reduced from original 11 Mb size) . This strip chart spectrum covers 26.5-40 GHz.
meoh_04_uncal.pdf = result of conversion to pixel coordinates

meoh_04_cal.pdf = frequency axis added by using FZERO and pixel coordinates for two widely separated markers, scanned into a separate marker channel.  Note that frequency now increases from left to right.

Source modulation spectrum of acrylonitrile at 295 GHz taken in 1986 with the IFPAN spectrometer by free scanning the BWO source:

vincn295GHz_a.jpg = first part of a spectrum glued from several A3 size XY plotter sheets (this has been reduced from 31 Mb original scan size)
vincn295_complete.pdf = result of conversion using the RECSPE procedure.  The spectrum was self-calibrated since frequencies of most of the lines are currently well known.

vincn295_zoom.pdf   = zoomed view onto the group of lines preceding the ground state

spline.pdf =  the frequency correction function established for this spectrum

RAD spectrum of H2O..HF at 319 GHz recorded in 1987 in Nizhnii Novgorod:

38a_reduced.jpg = reduced version of the first scanned sheet of this three sheet long spectrum.  Top trace is H2O...HF, bottom is SO2 reference spectrum.
38a_sm.pdf = result of tracing this spectrum with smoothing

38a_dif.pdf = result of additional differentiation of the spectrum at the end of tracing

Back to the table of programs





TRACing of spectra

        This is the key program in the RECSPE  package and it converts a bitmap image of a spectrum into a string of points.  If the spectrum contains a second channel with  markers or a reference spectrum then that channel can also be analysed synchronously with the main channel.  The points are assigned x,y values in pixel units.

       The steps in using TRACE:
  1. Scan the spectrum to a lossless bitmap: it is recommended to use 300 dpi LZW compressed TIFF
  2. Convert the bitmap to 8-bit (ie. indexed 256 colour) BMP standard. Convenient conversion is possible with the batch convert mode of Irfanview.

  3. Establish the RGB colours and their range for the traces of interest.  One or two channel spectra can be traced, providing the two channels (say spectrum and markers, or sample and calibration spectra) have been drawn in different colours).  A useful tool for colour identification is Gimp.

  4. Gimp, or a similar program may also be used for cleaning up the spectrum.  It is very important that the intensity axis is true vertical, so that if the image is slanted it should be rotated. Areas of the image that might confuse the program can be deleted, examples of these are or edge perforations if their colour is close to that of the trace.
  5. Write the colour values and their tolerances to file TRACE.INP.  If you do not need the second trace then enter zero values for its colours.  You can also declare whether the traces are to be smoothed and then optionally differentiated.  NOTE: make sure that the frequency scale, if present in the spectrum image, is in a very different colour to that of the spectral trace.  If you do not need to convert the frequency scale then just erase it from the bitmap, otherwise you may obtain confusing results.

  6. Run TRACE.  You can view the results directly with gle by clicking on one of the automatically generated .gle scripts.  If conversion problems are spotted then you might need to retouch the original bitmap or tune up the TRACE.INP file and redo the tracing.  The gle display will be updated automatically.

        The operation of TRACE is based on the concept that spectra are single valued functions so that for a given frequency there should be just one data point.  The bitmap is scanned one column at a time and all pixels in the specified colour range are identified.  The outliers are then established and rejected, and the y-value of the remaining points averaged.  Interpolation is used for empty columns within the x-axis range of the spectrum.

TRACE.FOR Source listing.
TRACE.EXE Windows executable.  The program runs as specified in the trace.inp file. Launch from the command line in the directory containing the bitmaps for tracing. Two modes are possible:
  • Manual mode: program will trace only the specified bitmap
  • Auto mode: program will attempt to trace all .BMP files in the current directory
TRACE.INP The control file for TRACE with entries for tracing the sample bitmap below.  This can be reedited as necessary.
  • Colour values are to be established from the bitmap to be scanned by using the colour picker of any bitmap graphics program
  • If you only want to trace one channel then specify 0 values for RGB colours of trace B
  • Traces can be smoothed (recommended) using standard Savitsky-Golay least-squares polynomial smoothing
  • Traces can also be differentiated for use when you might want to convert from first to second derivative lineshape.  The phase factors ensure upward central peaks.

38A.ZIP This is the full bitmap of the image shown in 38a_reduced.jpg for the H2O...HF example above.  It is quite large (>8Mb) so it has been zipped but it can be unpacked and used for testing   TRACE.
One of several sets of  files for gle that will be produced by TRACE for the bitmap above.  The .XY files are the resulting traces while various additional files allow convenient viewing of the results of the tracing.  The files are produced in sets for the raw traces, smoothed traces, and differentiated traces (if specified).
These three files correspond to the gle diagram shown in 38a_sm.pdf.

The .XY traces are ASCII files containing in the first two columns the x,y values that will be used for further processing.  The last two columns list actual pixel coordinates of the points (top-left corner of bitmap is 0,0) for direct comparison with coordinates displayed by most graphics programs.
The .XY files can be read and displayed with the SVIEW_L program of the AABS package.

Back to the RECSPE summary





SPLICing of traces for multipage spectra

       This program splices traces for adjacent scanned pages of multipage spectra by aligning the overlap regions.  So it is necessary to exercise some foresight during the scanning process to ensure that there is sufficient overlap between adjacent pages.

       The use of the QGLE previewer from the gle package is mandatory in this case.  Once the package is installed, and SPLICE is launched then all you need to do is to click on the automatically generated file  SPLICE.GLE to view the splicing for the current parameters.

SPLICE.FOR Source listing.
SPLICE.EXE Windows executable. The program is to be launched from the command line in the directory containing the traces. For the input file as below you will see the following screen.  At the same time a file SPLICE.GLE is generated and you need to click on that in order to preview the splicing with QGLE.

After these preliminaries you need to manually hunt around for the best splicing parameters, by typing in the option number and its value.
The control file.  This can be reedited as necessary and the entries shown are for the sample case below.

If you specify only one channel conversion and generic file names MOLNAM and MOLNAM1 then
SPLICE expects to find files MOLNAM.XY and MOLNAM1.XY
If two channel conversion is specified then

The first block of the splicing options controls the QGLE display, while the last three parameters control the splicing.  The crucial aligning parameter is the
x-axis overlap width

but you may also need to change the other two parameters. Once you are satisfied that optimum splicing has been reached you need to exit SPLICE by pressing ENTER, when the parameters in SPLICE.INP will be updated.  The contents of this file underneath the top block will be copied over so that commenting/previous versions of parameters can be kept there.


The traces for spectrum 38a (channel a and b) and for spectrum 38b (channel a and b) to be spliced using the input file above
SPLICE.PDF Illustration of the display that you will see in the QGLE viewer of gle on launching SPLICE with the data above.   You can see that there is some x-axis misalignment that can be corrected by changing the value of parameter number 6.
The traces resulting from optimum splicing of the data above, channel A is SO2, channel B is H2O...HF.

Back to the RECSPE summary





Assignment of zero order frequency axis

        This program assigns the frequency axis to a trace, which can be either directly from  TRACE, or result from splicing with SPLICE.  Frequency is recalculated in a straightforward linear conversion based on coordinates of two points.  For a spectrum that is known to be nonlinear this is really a zero order operation to make subsequent handling easier.  If the spectrum is linear then this may be all that you need to do.

        You need to load the traced spectrum into SVIEW_L and measure two lines (or features) to determine their X-coordinates for use in calibration.  These X-coordinates and the known true frequencies of these two points are then to be written to the file MOLNAM.FPT, where MOLNAM is the generic name used for files associated with this spectrum.

FZERO.FOR Source listing.
FZERO.EXE Windows executable, to be used from the command line.  The program will:
  1. first try to convert file MOLNAM.XY (single channel mode).
  2. if there is no MOLNAM.XY  then the program will try to convert files  MOLNAM_A.XY and MOLNAM_B.XY (two channel mode)

Uncalibrated trace for the example methanol spectrum as shown in meoh_04_uncal.pdf
MEOH04_SM.FPT The file with the two calibration points for the above.
MEOH04_SM.SPE The resulting file corresponding to meoh_04_cal.pdf

38ab.FPT The file with the two calibration points for the H2O...HF+SO2 example discussed in the description of SPLICE.  
The files resulting from addition of the zero order frequency axis to files 38ab_a.xy and 38ab_b.xy from the SPLICE example


Back to the RECSPE summary





MERGing of spectra

        This program merges all traces with assigned frequency scale into a single spectrum. The operation is as follows:

  1. frequency sorted list of basic properties spectra in the current directory is produced
  2. the spectra are unified to a common frequency grid (defined by the internal parameter FSTEP) and each spectrum  SPECNAM.SPE is converted to U_SPECNAM.SPE
  3. all U_ spectra spectra are then merged into two files, U_A.SPE containing all A channel spectra, and U_B.SPE containing all B channel spectra.

MERGE.FOR The source listing.
MERGE.EXE Windows executable to be launched from the command line in the directory containing the spectra. Note that:
  1. spectral files are to have  extension .SPE and are to be in the two column ASCII standard as produced by FZERO 
  2. no spaces are allowed in file names
  3. data points have to be equidistant in frequency
  4. missing parts are filled with zeroes, overlapping parts are connected at the middle of the overlap region
LIST Listing of the spectra found and processed by MERGE. This file is identical in format tho the LIST file required by the AABS package for displaying the ranges of spectra available for analysis.
This listing summarises all constituent spectra from the H2O...HF project that were combined into one single spectrum.
The result of operation of MERGE on files 38ab_a.SPE and 38ab_b.SPE obtained above with FZERO.  The files were converted to the 0.5 MHz frequency grid and if more spectra were available then those would have been merged into these two output files. 

Back to the RECSPE summary





FREquency CALibration of a spectrum

        This program calibrates the frequency axis of the spectrum by applying a correction based on a cubic spline function fit to a predefined set of calibration points.  Alternatively, a previously determined spline function can be used, providing that it was determined for the same frequency axis (for cases when a separate reference channel was recorded).

        A prerequisite to running this program is to produce a file of frequency calibration points. For this you need to use the AABS package. The spectrum is to be displayed in SVIEW_L and the predictions with true frequencies of lines should be displayed displayed in ASCP_L.   The two program should be in linked mode ensured by the presence of a suitable SVIEW_L.INP file in the working directory.  You need to declare MOLNAM.FRE as the name of the fitting data file, where MOLNAM is the generic name for the project.  Calibration measurements should then be written to that file with the F8 option of ASCP_L.

FRECAL.FOR The source listing.
FRECAL.EXE Windows executable to be run from the command line.  The only parameter that you specify is the generic name, MOLNAM, for the files in question.  The program then expects that you have the file FRECAL.INP (as below) and have prepared:
  1. MOLNAM.SPE = the file containing the spectrum to be calibrated (in the IFPAN binary format, as written with the m option of SVIEW_L)  
  2. MOLNAM.FRE = the file with the calibration points produced with the F8 option of ASCP_L operating in linked mode with SVIEW_L.

Alternatively if a run such as that described above has taken place on a reference spectrum and you have an identically recorded sample spectrum to calibrate then you can reuse the spline function MOLNAM_spline.FNC generated in the previous run by copying it to a file where the MOLNAM part of the name corresponds to that used for the sample spectrum.

The primary output file will be MOLNAM_frecal.SPE.containing a two column ASCII version for the spectrum for the same points as in the input spectrum, but with frequency of each point recalculated according to the calibration function.  This point spacing in this spectrum will NOT be equidistant in frequency, so you can convert to equidistant frequency spacing with SVIEW_L

FRECAL.INP The control file for FRECAL.  In the presence of noise affecting the calibration points a simple cubic spline function fit may not be the  optimum solution.  You therefore have the option of interpolating additional points that will reduce spline function excursions, and also of smoothing the correction function.  The best solution is to use a mixture of these techniques.

A.SPE Spectrum for the SO2 channel in H2O...HF spectra used as a worked example for the complete RECSPE procedure.  This file is a direct conversion to binary format made with SVIEW_L of spectrum u_A.spe obtained above with MERGE.
A.FRE The calibration points for this spectrum determined by using the AABS package with spectrum A.SPE, as above, and linelists for SO2 from the  CDMS database.  Linelists for the ground states of the parent and isotopic species, and for the bending satellite in the parent were loaded.
The calibration points do not have to be in any particular order, but  FRECAL will sort them in frequency.
The main result of operation of FRECAL on the two files above (without the use of interpolation and smoothing).  This is a frequency calibrated spectrum in ASCII standard.  The file also contains an additional third column listing  the original frequencies.  Note that the points in this  spectrum are NOT equidistant in frequency but this spectrum can be read and converted to equal frequency increments with SVIEW_L
Additional files produced by of FRECAL that allow viewing of the spline function used for the calibration.  Spline functions are powerful tools but are susceptible to experimental errors in declared points.  The sensitivity is particularly high for points very close together and it is recommended that a check for unexpected spline function excursions is made.
A_spline.pdf The spline function diagram produced with the 'export' option of  QGLE from the three files above.  The lowest and highest frequency  points have zero correction because they were already calibrated, in the process of defining the zero order frequency scale in FZERO.
The two files necessary for calibration of the spectrum in the H2O...HF channel:
  • the file B.SPE is a binary version of u_B.spe obtained above with MERGE.  It is necessary to ensure that the first point in this spectrum is at the at the same frequency as in the reference spectrum.
  • the file B_spline.fnct is a binary file containing the spline function that was generated during calibration of the SO2 channel.  It is just a copy of the file A_spline.fnc generated during that operation.
B_FRECAL.SPE The frequency calibrated H2OHF spectrum at 321GHz.

Back to the RECSPE summary