From atae@spva.ph.ic.ac.uk Thu Jun 17 11:19:06 1993 Newsgroups: comp.infosystems,comp.infosystems.wais,comp.infosystems.gopher,alt.cyberspace From: atae@spva.ph.ic.ac.uk (Ata Etemadi) Subject: RFD: Hitch Hiker's Guide Keywords: expert systems, information systems, data servers Nntp-Posting-Host: crab.sp.ph Reply-To: atae@spva.ph.ic.ac.uk Organization: Imperial College of Science, Technology, and Medicine, London, England Date: Thu, 17 Jun 93 12:51:18 BST G'Day Is anyone working on such a thing ? I'd like to start a dialogue with any interested parties. Let me explain. My field of research is space physics. Currently there are of the order of a Tera-byte or so of data from various sources accessible through the internet. This will grow by more than an order of magnitude in the next 3-4 years. I am hoping to also make our data available via gopher in the near furture, and will be putting it all in HDF-netCDF format with software for its manipulation and conversion to CDF (its not necessary to convert to "original" netCDF since there is a simple way around this). I will also be making my s/w freely available. The problem is how does a researcher like myself access all this data in an efficient way without knowing about where the data is, what format its in, what is kept on a particular site etc.. ? Now, what I would like to provide the users is an intelligent front-end to the data. Something that will accept queries in a standard language (lets say SQL) and send back the results to the user. My plan in a year or so is to try for an ESPRIT grant for developing a multi-agent system for the dissemination of large distributed scientific databases. Ofcourse this will not be limited to scientific use but that's my main interest. The end-user would compose his/her query in terms independant of where the actual data is, what format it is in, how it is to be accessed, and not even necessarily in SQL. Maybe a hierarchical set of type-in boxes in a GUI or something like: Context: Space Physics Data Type = Magnetic field Start Time = 19:00 15 Jan 1993 End Time = 20:00 16 Jan 1993 Lat. = 10.0 Long. = 20.0 etc.. You get the idea, although I admit this is going to require a lot more thought. There are 2 ways to take it from there: Keep an index and a short description (eg start/end times, parameters..) of all the data on single site which may be queried by the user. A process on this site then sends back the appropriate info for the local process to go off, get the data, convert it to HDF-netCDF (if necessary) and present it to the user. Simple, but not elegant and probably will require a dedicated group devoted to its maintenance and update. Or, on each data site have an agent expert in the particular data kept there. One would supply a template agent to the institution and they would configure it. Now the original query would spawn a set of agents which communicate with these local experts using a simple protocol (eg SQL). If there is any succeess the local expert will do the data extraction and forwarding, or it would send back a query to the original user asking for more parameters. My question may also be phrased as: Is anyone out there working on a gopher or WAIS interface (in the form I described) for data access ? Any help, suggestions, pointer to further reading, etc.. would be greatly appreciated. I will certainly acknowledge all help, and if you are interested in participating in such a project I will keep your name/email on file and contact you if/when the project gets rolling. best regards Ata <(|)>. -- | Mail Dr Ata Etemadi, Blackett Laboratory, | | Space and Atmospheric Physics Group, | | Imperial College of Science, Technology, and Medicine, | | Prince Consort Road, London SW7 2BZ, ENGLAND | | Internet/Arpanet/Earn/Bitnet atae@spva.ph.ic.ac.uk or ata@c.mssl.ucl.ac.uk | | Span SPVA::atae or MSSLC:atae | | UUCP/Usenet atae%spva.ph.ic@nsfnet-relay.ac.uk | From jimf@neptune.gsfc.nasa.gov Thu Jun 17 17:22:38 1993 From: jimf@neptune.gsfc.nasa.gov (Jim Firestone) Newsgroups: comp.infosystems,comp.infosystems.wais,comp.infosystems.gopher,alt.cyberspace Subject: Re: RFD: Hitch Hiker's Guide Date: 17 Jun 1993 19:39:18 GMT Organization: NASA Code 971, Oceans & Ice Branch Distribution: world NNTP-Posting-Host: neptune.gsfc.nasa.gov Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Keywords: expert systems, information systems, data servers In article <1vq2ukINNh7m@srvr1.engin.umich.edu>, jxm@engin.umich.edu (John Murray) writes: > In article <1993Jun17.125118.346@cc.ic.ac.uk> atae@spva.ph.ic.ac.uk writes: > > > >Is anyone working on such a thing ? I'd like to start a dialogue with any > >interested parties. Let me explain. My field of research is space physics. > > Presumably the writer means a Hitchhiker's Guide to space-related databases. > > >Currently there are of the order of a Tera-byte or so of data from various > >sources accessible through the internet. This will grow by more than an > >order of magnitude in the next 3-4 years. I am hoping to also make our data > >available via gopher in the near furture, and will be putting it all in > >HDF-netCDF format with software for its manipulation and conversion to CDF > >(its not necessary to convert to "original" netCDF since there is a simple > >way around this). I will also be making my s/w freely available. The problem > >is how does a researcher like myself access all this data in an efficient > >way without knowing about where the data is, what format its in, what is > >kept on a particular site etc.. ? > > The NASA Earth Observation System (EOS) program is building a coordinated > Data & Information System (EOSDIS), intended for this type of purpose. It > would seem wise to at least ensure some form of compatibility with that > system. You might want to contact the EOS Program Office for information. > They're at NASA Headquarters (Code EE), Washington DC 20546. Or ask about > EOSDIS on sci.space - I'm sure someone there knows what's happening with it. > > John Murray > Univ of Mich I happen to work here in the building at NASA/Goddard Space Flight Center (Greenbelt, MD) where the EOSDIS and their data distribution group (the Distributed Active Archive Center or DAAC) are located. My understanding of what they are building is a software system for querying, browsing and ordering earth and space science data sets which are stored locally here at GSFC on massive jukeboxes. This is different than going out and querying data bases located in numerous locations. There will be other DAACs located around the country, at places such as NASA/Langley (Virginia) and the National Snow and Ice Data Center in Colorado, each of which will build their own archive/browse systems. For more info. on a distributed data system I have used to access oceanographic data sets, please see my previous posting dated today in this group. Jim Firestone SeaWiFS ocean color project NASA/GSFC From jimf@neptune.gsfc.nasa.gov Thu Jun 17 17:23:46 1993 From: jimf@neptune.gsfc.nasa.gov (Jim Firestone) Newsgroups: comp.infosystems,comp.infosystems.wais,comp.infosystems.gopher,alt.cyberspace Subject: Re: RFD: Hitch Hiker's Guide Date: 17 Jun 1993 19:26:56 GMT Organization: NASA Code 971, Oceans & Ice Branch Distribution: world NNTP-Posting-Host: neptune.gsfc.nasa.gov Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Keywords: expert systems, information systems, data servers In article <1993Jun17.125118.346@cc.ic.ac.uk>, atae@spva.ph.ic.ac.uk (Ata Etemadi) writes: > G'Day > > Is anyone working on such a thing ? I'd like to start a dialogue with any > interested parties. Let me explain. My field of research is space physics. > Currently there are of the order of a Tera-byte or so of data from various > sources accessible through the internet. This will grow by more than an > order of magnitude in the next 3-4 years. I am hoping to also make our data > available via gopher in the near furture, and will be putting it all in > HDF-netCDF format with software for its manipulation and conversion to CDF > (its not necessary to convert to "original" netCDF since there is a simple > way around this). I will also be making my s/w freely available. The problem > is how does a researcher like myself access all this data in an efficient > way without knowing about where the data is, what format its in, what is > kept on a particular site etc.. ? > > Now, what I would like to provide the users is an intelligent front-end to > the data. Something that will accept queries in a standard language (lets > say SQL) and send back the results to the user. My plan in a year or so is > to try for an ESPRIT grant for developing a multi-agent system for the > dissemination of large distributed scientific databases. Ofcourse this > will not be limited to scientific use but that's my main interest. The > end-user would compose his/her query in terms independant of where the actual > data is, what format it is in, how it is to be accessed, and not even > necessarily in SQL. Maybe a hierarchical set of type-in boxes in a GUI or > something like: > > Context: Space Physics > > Data Type = Magnetic field > Start Time = 19:00 15 Jan 1993 > End Time = 20:00 16 Jan 1993 > Lat. = 10.0 > Long. = 20.0 > etc.. > > You get the idea, although I admit this is going to require a lot more > thought. There are 2 ways to take it from there: > > Keep an index and a short description (eg start/end times, parameters..) of > all the data on single site which may be queried by the user. A process on > this site then sends back the appropriate info for the local process to go off, > get the data, convert it to HDF-netCDF (if necessary) and present it to the > user. Simple, but not elegant and probably will require a dedicated group > devoted to its maintenance and update. > > Or, on each data site have an agent expert in the particular data kept there. > One would supply a template agent to the institution and they would configure > it. Now the original query would spawn a set of agents which communicate with > these local experts using a simple protocol (eg SQL). If there is any succeess > the local expert will do the data extraction and forwarding, or it would send > back a query to the original user asking for more parameters. > > My question may also be phrased as: Is anyone out there working on a gopher > or WAIS interface (in the form I described) for data access ? Any help, > suggestions, pointer to further reading, etc.. would be greatly appreciated. > I will certainly acknowledge all help, and if you are interested in participating > in such a project I will keep your name/email on file and contact you if/when > the project gets rolling. > > best regards > Ata <(|)>. > -- > | Mail Dr Ata Etemadi, Blackett Laboratory, | > | Space and Atmospheric Physics Group, | > | Imperial College of Science, Technology, and Medicine, | > | Prince Consort Road, London SW7 2BZ, ENGLAND | > | Internet/Arpanet/Earn/Bitnet atae@spva.ph.ic.ac.uk or ata@c.mssl.ucl.ac.uk | > | Span SPVA::atae or MSSLC:atae | > | UUCP/Usenet atae%spva.ph.ic@nsfnet-relay.ac.uk | I have used a system which sounds something like what you are seeking. It's called the JGOFS (Joint Global Ocean Flux Study) distributed data system, and is used to store access and display oceanographic data sets located all over the world. It uses an SQL-like query to retrieve data from objects (be they local to the user's system or on a remote system) consisting of data and an associated "method" (e.g. program) to read the data. This is the result of a joint effort between groups at Massachusetts Institute of Technology (MIT) and Woods Hole Oceanographic Institution (WHOI), also in Massachusetts. They have built a menu system which has some basic display tools for the data once queried, and routines which can be called from C or Fortran if you just want to retrieve the data for use in your own software. The software, when run, looks both at the user's local catalog of objects, and a central one maintained at MIT. This approach allows the individual scientists providing data (my group here at NASA/Goddard Space Flight Center is one - we've contributed in situ pigment data of a historical nature) to keep the data in their own preferred format, on their own system, and yet the world can access it as long as the proper method is provided to read it. From our point of view, this is nice because we don't have to worry about people logging into our system (what a security nightmare) and slowing things down. Although the concept is nice, the system is still fairly early in its development. The graphics are still fairly primitive and access across the network to remote objects (or the MIT catalog) can often be very slow. But I expect upcoming releases to work out a lot of the kinks in the software. If you're interested in downloading the software, free of charge, you can get it as follows: 1. telnet to pimms.mit.edu (18.83.0.104) 2. login as user "jgofs", with password "object$data". The system requests your machine name, your internet address and your user name. 3. Select menu item 1 to transfer the software automtically to your machine (it is suggested that you set up a "jgofs" account first on your machine and specify this as the place to copy the software to). 4. After entering the account to copy the software to, type "put jgofs.tar" then "quit", then select option 0 in the menu to quit. I believe the documentation is included with the jgofs.tar file under the "doc" subdirectory. I would be interested in keeping in touch regarding your project, as we use HDF here as well for storage of our satellite and ancillary (e.g. gridded meteorological and ozone) data. Good luck! Jim Firestone SeaWiFS ocean color project NASA/GSFC, Greenbelt, MD, USA