From atae@spva.ph.ic.ac.uk Thu Jun 17 11:19:06 1993
Newsgroups: comp.infosystems,comp.infosystems.wais,comp.infosystems.gopher,alt.cyberspace
From: atae@spva.ph.ic.ac.uk (Ata Etemadi)
Subject: RFD: Hitch Hiker's Guide
Keywords: expert systems, information systems, data servers
Nntp-Posting-Host: crab.sp.ph
Reply-To: atae@spva.ph.ic.ac.uk
Organization: Imperial College of Science, Technology, and Medicine, London, England
Date: Thu, 17 Jun 93 12:51:18 BST

G'Day

Is anyone working on such a thing ? I'd like to start a dialogue with any 
interested parties. Let me explain. My field of research is space physics. 
Currently there are of the order of a Tera-byte or so of data from various 
sources accessible through the internet. This will grow by more than an 
order of magnitude in the next 3-4 years. I am hoping to also make our data 
available via gopher in the near furture, and will be putting it all in 
HDF-netCDF format with software for its manipulation and conversion to CDF 
(its not necessary to convert to "original" netCDF since there is a simple 
way around this). I will also be making my s/w freely available. The problem 
is how does a researcher like myself access all this data in an efficient 
way without knowing about where the data is, what format its in, what is 
kept on a particular site etc.. ?

Now, what I would like to provide the users is an intelligent front-end to 
the data. Something that will accept queries in a standard language (lets 
say SQL) and send back the results to the user. My plan in a year or so is 
to try for an ESPRIT grant for developing a multi-agent system for the
dissemination of large distributed scientific databases. Ofcourse this 
will not be limited to scientific use but that's my main interest. The 
end-user would compose his/her query in terms independant of where the actual 
data is, what format it is in, how it is to be accessed, and not even 
necessarily in SQL. Maybe a hierarchical set of type-in boxes in a GUI or 
something like:

Context: Space Physics

Data Type  = Magnetic field
Start Time = 19:00 15 Jan 1993 
End Time   = 20:00 16 Jan 1993
Lat.       = 10.0
Long.      = 20.0
etc..

You get the idea, although I admit this is going to require a lot more
thought. There are 2 ways to take it from there:

Keep an index and a short description (eg start/end times, parameters..) of 
all the data on single site which may be queried by the user. A process on
this site then sends back the appropriate info for the local process to go off, 
get the data, convert it to HDF-netCDF (if necessary) and present it to the 
user. Simple, but not elegant and probably will require a dedicated group 
devoted to its maintenance and update.

Or, on each data site have an agent expert in the particular data kept there.
One would supply a template agent to the institution and they would configure 
it. Now the original query would spawn a set of agents which communicate with 
these local experts using a simple protocol (eg SQL). If there is any succeess 
the local expert will do the data extraction and forwarding, or it would send
back a query to the original user asking for more parameters.

My question may also be phrased as: Is anyone out there working on a gopher 
or WAIS interface (in the form I described) for data access ? Any help,
suggestions, pointer to further reading, etc.. would be greatly appreciated.
I will certainly acknowledge all help, and if you are interested in participating
in such a project I will keep your name/email on file and contact you if/when 
the project gets rolling.

	best regards
		Ata <(|)>.
-- 
| Mail          Dr Ata Etemadi, Blackett Laboratory,                          |
|               Space and Atmospheric Physics Group,                          |
|               Imperial College of Science, Technology, and Medicine,        |
|               Prince Consort Road, London SW7 2BZ, ENGLAND                  |
| Internet/Arpanet/Earn/Bitnet atae@spva.ph.ic.ac.uk or ata@c.mssl.ucl.ac.uk  |
| Span                              SPVA::atae       or     MSSLC:atae        |
| UUCP/Usenet                       atae%spva.ph.ic@nsfnet-relay.ac.uk        |

From jimf@neptune.gsfc.nasa.gov Thu Jun 17 17:22:38 1993
From: jimf@neptune.gsfc.nasa.gov (Jim Firestone)
Newsgroups: comp.infosystems,comp.infosystems.wais,comp.infosystems.gopher,alt.cyberspace
Subject: Re: RFD: Hitch Hiker's Guide
Date: 17 Jun 1993 19:39:18 GMT
Organization: NASA Code 971, Oceans & Ice Branch
Distribution: world
NNTP-Posting-Host: neptune.gsfc.nasa.gov
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Keywords: expert systems, information systems, data servers

In article <1vq2ukINNh7m@srvr1.engin.umich.edu>, jxm@engin.umich.edu (John Murray) writes:
> In article <1993Jun17.125118.346@cc.ic.ac.uk> atae@spva.ph.ic.ac.uk writes:
> >
> >Is anyone working on such a thing ? I'd like to start a dialogue with any 
> >interested parties. Let me explain. My field of research is space physics. 
> 
> Presumably the writer means a Hitchhiker's Guide to space-related databases. 
> 
> >Currently there are of the order of a Tera-byte or so of data from various 
> >sources accessible through the internet. This will grow by more than an 
> >order of magnitude in the next 3-4 years. I am hoping to also make our data 
> >available via gopher in the near furture, and will be putting it all in 
> >HDF-netCDF format with software for its manipulation and conversion to CDF 
> >(its not necessary to convert to "original" netCDF since there is a simple 
> >way around this). I will also be making my s/w freely available. The problem 
> >is how does a researcher like myself access all this data in an efficient 
> >way without knowing about where the data is, what format its in, what is 
> >kept on a particular site etc.. ?
>  
> The NASA Earth Observation System (EOS) program is building a coordinated
> Data & Information System (EOSDIS), intended for this type of purpose. It
> would seem wise to at least ensure some form of compatibility with that 
> system. You might want to contact the EOS Program Office for information.
> They're at NASA Headquarters (Code EE), Washington DC 20546. Or ask about
> EOSDIS on sci.space - I'm sure someone there knows what's happening with it.
>  
> John Murray
> Univ of Mich

I happen to work here in the building at NASA/Goddard Space Flight
Center (Greenbelt, MD) where the EOSDIS and their data distribution
group (the Distributed Active Archive Center or DAAC) are located.
My understanding of what they are building is a software system
for querying, browsing and ordering earth and space science
data sets which are stored locally here at GSFC on massive jukeboxes.
This is different than going out and querying data bases located in numerous
locations. There will be other DAACs located around the country, at places such
as NASA/Langley (Virginia) and the National Snow and Ice Data Center in Colorado,
each of which will build their own archive/browse systems.

For more info. on a distributed data system I have used to access oceanographic
data sets, please see my previous posting dated today in this group.

Jim Firestone
SeaWiFS ocean color project
NASA/GSFC

From jimf@neptune.gsfc.nasa.gov Thu Jun 17 17:23:46 1993
From: jimf@neptune.gsfc.nasa.gov (Jim Firestone)
Newsgroups: comp.infosystems,comp.infosystems.wais,comp.infosystems.gopher,alt.cyberspace
Subject: Re: RFD: Hitch Hiker's Guide
Date: 17 Jun 1993 19:26:56 GMT
Organization: NASA Code 971, Oceans & Ice Branch
Distribution: world
NNTP-Posting-Host: neptune.gsfc.nasa.gov
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Keywords: expert systems, information systems, data servers

In article <1993Jun17.125118.346@cc.ic.ac.uk>, atae@spva.ph.ic.ac.uk (Ata Etemadi) writes:
> G'Day
> 
> Is anyone working on such a thing ? I'd like to start a dialogue with any 
> interested parties. Let me explain. My field of research is space physics. 
> Currently there are of the order of a Tera-byte or so of data from various 
> sources accessible through the internet. This will grow by more than an 
> order of magnitude in the next 3-4 years. I am hoping to also make our data 
> available via gopher in the near furture, and will be putting it all in 
> HDF-netCDF format with software for its manipulation and conversion to CDF 
> (its not necessary to convert to "original" netCDF since there is a simple 
> way around this). I will also be making my s/w freely available. The problem 
> is how does a researcher like myself access all this data in an efficient 
> way without knowing about where the data is, what format its in, what is 
> kept on a particular site etc.. ?
> 
> Now, what I would like to provide the users is an intelligent front-end to 
> the data. Something that will accept queries in a standard language (lets 
> say SQL) and send back the results to the user. My plan in a year or so is 
> to try for an ESPRIT grant for developing a multi-agent system for the
> dissemination of large distributed scientific databases. Ofcourse this 
> will not be limited to scientific use but that's my main interest. The 
> end-user would compose his/her query in terms independant of where the actual 
> data is, what format it is in, how it is to be accessed, and not even 
> necessarily in SQL. Maybe a hierarchical set of type-in boxes in a GUI or 
> something like:
> 
> Context: Space Physics
> 
> Data Type  = Magnetic field
> Start Time = 19:00 15 Jan 1993 
> End Time   = 20:00 16 Jan 1993
> Lat.       = 10.0
> Long.      = 20.0
> etc..
> 
> You get the idea, although I admit this is going to require a lot more
> thought. There are 2 ways to take it from there:
> 
> Keep an index and a short description (eg start/end times, parameters..) of 
> all the data on single site which may be queried by the user. A process on
> this site then sends back the appropriate info for the local process to go off, 
> get the data, convert it to HDF-netCDF (if necessary) and present it to the 
> user. Simple, but not elegant and probably will require a dedicated group 
> devoted to its maintenance and update.
> 
> Or, on each data site have an agent expert in the particular data kept there.
> One would supply a template agent to the institution and they would configure 
> it. Now the original query would spawn a set of agents which communicate with 
> these local experts using a simple protocol (eg SQL). If there is any succeess 
> the local expert will do the data extraction and forwarding, or it would send
> back a query to the original user asking for more parameters.
> 
> My question may also be phrased as: Is anyone out there working on a gopher 
> or WAIS interface (in the form I described) for data access ? Any help,
> suggestions, pointer to further reading, etc.. would be greatly appreciated.
> I will certainly acknowledge all help, and if you are interested in participating
> in such a project I will keep your name/email on file and contact you if/when 
> the project gets rolling.
> 
> 	best regards
> 		Ata <(|)>.
> -- 
> | Mail          Dr Ata Etemadi, Blackett Laboratory,                          |
> |               Space and Atmospheric Physics Group,                          |
> |               Imperial College of Science, Technology, and Medicine,        |
> |               Prince Consort Road, London SW7 2BZ, ENGLAND                  |
> | Internet/Arpanet/Earn/Bitnet atae@spva.ph.ic.ac.uk or ata@c.mssl.ucl.ac.uk  |
> | Span                              SPVA::atae       or     MSSLC:atae        |
> | UUCP/Usenet                       atae%spva.ph.ic@nsfnet-relay.ac.uk        |

I have used a system which sounds something like what you are
seeking. It's called the JGOFS (Joint Global Ocean Flux Study) distributed data
system, and is used to store access and display oceanographic data sets located
all over the world.  It uses an SQL-like query to retrieve data from objects (be
they local to the user's system or on a remote system) consisting of data and
an associated "method" (e.g. program) to read the data. This is the result of a joint
effort between groups at Massachusetts Institute of Technology (MIT) and Woods Hole
Oceanographic Institution (WHOI), also in Massachusetts. They have built a menu system
which has some basic display tools for the data once queried, and routines which can
be called from C or Fortran if you just want to retrieve the data for use in your own
software. The software, when run, looks both at the user's local catalog of objects, and
a central one maintained at MIT. This approach allows the individual scientists providing
data (my group here at NASA/Goddard Space Flight Center is one - we've contributed 
in situ pigment data of a historical nature) to keep the data in their own preferred
format, on their own system, and yet the world can access it as long as the proper method
is provided to read it.  From our point of view, this is nice because we don't have to
worry about people logging into our system (what a security nightmare) and slowing things
down. 

Although the concept is nice, the system is still fairly early in its development. The 
graphics are still fairly primitive and access across the network to remote objects
(or the MIT catalog) can often be very slow. But I expect upcoming releases to work out a
lot of the kinks in the software.

If you're interested in downloading the software, free of charge, you can get it as follows:

1. telnet to pimms.mit.edu (18.83.0.104)
2. login as user "jgofs", with password "object$data". The system requests your machine
name, your internet address and your user name. 
3. Select menu item 1 to transfer the software automtically to your machine (it is suggested
that you set up a "jgofs" account first on your machine and specify this as the place to
copy the software to).
4. After entering the account to copy the software to, type "put jgofs.tar" then "quit",
then select option 0 in the menu to quit.

I believe the documentation is included with the jgofs.tar file under the "doc" subdirectory.

I would be interested in keeping in touch regarding your project, as we use HDF here as well
for storage of our satellite and ancillary (e.g. gridded meteorological and ozone) data.

Good luck!

Jim Firestone
SeaWiFS ocean color project
NASA/GSFC, Greenbelt, MD, USA