Notes on the GUPPI Raw Data Format
S. Ellingson
Nov 1, 2013
This format consists of blocks, with each block consisting of a text header and
a raw binary data segment.
An example of a header is shown in the file "header_example.txt". The header
ends with the word "END". Some important fields are:
OBSFREQ: [MHz] center of the RF passband
OBSBW: [MHz] width of passband; negative sign indicates spectral flip
OBSNCHAN: Number of channels (subbands)
NPOL: Number of polarizations times 2. For example, NPOL=4 means 2
polarizations.
NBITS: Number of bits per I or Q value. So, one complex-valued sample has
2*NBITS bits
TBIN: [s] sample period within a channel
CHAN_BW [MSPS] sample rate for a channel. Negative sign indicates spectral
flip.
OVERLAP: This many samples per subband from the previous data block are
repeated at the beginning of this data block.
BLOCSIZE: The size of the raw data segment in bytes.
The center frequency of channel i (where i is in [1..OBSNCHAN]) is OBSFREQ -
OBSBW/2 + (i-0.5)*CHAN_BW [MHz].
Pseudocode describing the structure of the raw data block is as follows:
--- begin ----
for channel=1..OBSNCHAN,
for nsamples=1..NDIM,
for polarization=1..(NPOL/2)
write I, Q
--- end ---
Above, NDIM is the number of samples per channel in the block; i.e.,
BLOCSIZE/(OBSNCHAN*NPOL*(NBITS/8)). This *includes* overlap bits. For
NBITS=8, the samples are "signed char".
For the one and only dataset I've worked with so far (identified below):
----
OBSFREQ = 1378.125
OBSBW = -200
OBSNCHAN = 32
NPOL = 4
NBITS = 8
TBIN = 1.6E-07
CHAN_BW = -6.25
OVERLAP = 512
BLOCKSIZE=1073545216
----
In this case, NDIM = 8387072 and the time span covered by a raw data block is
NDIM*TBIN = 1.3419 s. Keep in mind, however, that this 1.3419 s span overlaps
with the next block by 512 samples.
As an example, src/rg.c is C source code which reads a single header + raw data
block from a GUPPI raw data file, extracts one channel, and writes it back out
as time and spectra. (See the source code for compiling instructions and
usage.) The script src/a.sh runs "rg" repeatedly to obtain the output for all
channels. src/a.gp is a Gnuplot script which reads these files and plots the
entire bandpass, including all channels.
"guppi.png" is the output when the above code is applied to the file
"guppi_56465_J1713+0747_0006.0000.raw" (NRAO folks:
/lustre/pulsar/scratch/1713+0747_global/raw). For this particular dataset
there is a large DC offset in each channel, which accounts for the spike in the
center of each channel bandpass. In this output, channel 1 is on the right and
channel 32 is on the left.
Thanks to Paul Demorest for helping me figure this out.