From hsuc@rpi.edu Thu Mar 3 09:50:08 1994 X-VM-Summary-Format: "%n %*%a %-17.17F %-3.3m %2d %4l/%-5c %I\"%s\"\n" X-VM-Labels: nil X-VM-VHeader: ("Resent-" "From:" "Sender:" "To:" "Apparently-To:" "Cc:" "Subject:" "Date:") nil X-VM-Bookmark: 31 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["1636" "Thu" " 3" "March" "1994" "09:33:23" "EST" "hsuc@rpi.edu" "hsuc@rpi.edu" nil "27" "Re: IRDS and PCTE" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA03719; Thu, 3 Mar 94 09:50:06 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA01520; Thu, 3 Mar 94 09:50:04 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA02565; Thu, 3 Mar 94 06:33:35 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA17190; Thu, 3 Mar 94 06:34:53 PST Return-Path: Received: from rpi.edu by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA17173; Thu, 3 Mar 94 06:34:47 PST Received: from client.its.rpi.edu (goya.its.rpi.edu) by rpi.edu (4.1/SMHUB41); id AA12901; Thu, 3 Mar 94 09:33:24 EST for metadata@llnl.gov Received: by client.its.rpi.edu (4.1/SUB16); id AA26042; Thu, 3 Mar 94 09:33:23 EST for metadata@llnl.gov Message-Id: <9403031433.AA26042@client.its.rpi.edu> From: hsuc@rpi.edu To: 71162.3600@compuserve.com, metadata@llnl.gov Subject: Re: IRDS and PCTE Date: Thu, 3 Mar 94 09:33:23 EST I've been reading articles on this mailing list with great interest since I signed on several days ago. In a way, my recent research is all built around metadata, from model to technology. At Rensselaer Polytechnic Institute, my formal and current students and I have developed something we call the Metadatabase for information resources management (comparable to IRDS) and the management of multiple database, with immediate application domain in manufacturing. The results have been published in a number of journals since 1987. A core paper can be found in the June issue of IEEE Transactions on Software Engineering by C.Hsu, M. Bouziane, L. Rattner, and L. Yee, 1991. At present, we are working on visualization using metadata as well as working with IBM and others to commercialize the metadatabase technology. A while back, some colleagues at Rensselaer and I have proposed to NSF to extend this approach for large -scale scientific databases, but didn't go thru the final selection. The interest on my part, however, remains. I guess the reason for all of the above background is that, I wonder if this group has some interest in metadata problems in traditional databases, and how does it see the roleof database technology in large-scale files and (scientific) data management. For instance, how does the metadata reflector see the prior initiatives by NSF on scientific databases? I would appreciate any response that anyone on this list cares to give. Cheng Hsu Decision Sciences and Engineering Systems Rensselaer Polytechnic Institute (518) 276-6847 fax (518) 276-8227 hsuc@rpi.edu From tking@igpp.ucla.edu Fri Mar 4 23:29:49 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["2166" "Fri" " 4" "March" "1994" "20:17:39" "PST" "Todd King" "tking@igpp.ucla.edu" nil "42" "Re: IRDS and PCTE" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA03887; Fri, 4 Mar 94 23:29:48 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA14646; Fri, 4 Mar 94 23:29:45 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA04945; Fri, 4 Mar 94 20:14:47 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA29196; Fri, 4 Mar 94 20:16:05 PST Return-Path: Received: from igpp.ucla.edu by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA29187; Fri, 4 Mar 94 20:16:02 PST Received: from galsun.igpp.ucla.edu (uclasu.igpp.ucla.edu) by igpp.ucla.edu (4.1/SMI-4.1.1) id AA24552; Fri, 4 Mar 94 20:15:33 PST Received: from kingsun.igpp.ucla.edu by galsun.igpp.ucla.edu (4.1/SMI-4.0) id AA02035; Fri, 4 Mar 94 20:17:39 PST Message-Id: <9403050417.AA02035@galsun.igpp.ucla.edu> From: tking@igpp.ucla.edu (Todd King) To: 71162.3600@compuserve.com, metadata@llnl.gov Subject: Re: IRDS and PCTE Date: Fri, 4 Mar 94 20:17:39 PST To all: There seems to be great interest in existing systems for managing data and metadata in an effective way. It also seems that there are many groups out there who must deal with this issue, as well as groups how have already developed solutions. Perhaps it would be a group idea to ask everyone to present a short description (a paragaph) description of their systems. It would also be use if everyone would include information about where to obtain a copy of the system, if available. I'm willing to volunteer as a moderator for such a list and then would post it to the mailing list when assembled. As an example of what I think is appropriate here's a description of our metadata system: DITDOS (Distributed Inventory Tracking and Data Ordering Specifications) DITDOS is a set of specifications for building format independent information systems to manage and distribute data. DITDOS compliant systems maintain inventories of data holdings and provide a means for accessing the actual data. DITDOS contains specifications for metadata content, a protocol for client/server access, a portable inventory description language and interface specifications to format specific readers, writers and viewers. Inventories can be placed on any media including portable media like CD-ROM. Inventories can easily be merged together to build large scale information systems. An inventory reader (whether standalone or a client) can provide inventory browsing capabilities and data ordering and viewing capabilities. We have developed an initial DITDOS compliant system that runs on Sun SPARCstations under SunOS. It is available form anonymous FTP at "ftp.igpp.ucla.edu" in the compressed tar file "igpp/ditdos.tar.Z". --------------------------------------------------+---------------------------- Todd King | email: tking@igpp.ucla.edu 5881 Slichter Hall | Office: 1-310-206-7201 UCLA/IGPP | FAX: 1-310-206-8042 Los Angeles, CA 90024-1567 | --------------------------------------------------+---------------------------- From gab@mitchell.hitc.com Wed Mar 9 09:27:29 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["1494" "" " 9" "March" "1994" "09:09:13" "-0400" "Greg A.'Tony' Baraghimian" "gab@mitchell.hitc.com" nil "27" "More on preparing for the next meeting" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA12458; Wed, 9 Mar 94 09:27:27 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA19092; Wed, 9 Mar 94 09:27:25 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA25302; Wed, 9 Mar 94 06:09:38 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA15953; Wed, 9 Mar 94 06:10:57 PST Return-Path: Received: from hac2arpa.hac.com by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA15944; Wed, 9 Mar 94 06:10:52 PST Received: from EDEN1.HAC.COM by hac2arpa.hac.com (4.1/SMI-DDN) id AA04397; Wed, 9 Mar 94 06:09:21 PST Received: from hacgate.SCG.HAC.COM by EDEN1.HAC.COM (PMDF #2669 ) id <01H9RATGM7WW00DRID@EDEN1.HAC.COM>; Wed, 9 Mar 1994 06:09:24 PST Received: from whitney.HITC.COM by hacgate.SCG.HAC.COM with SMTP id AA29392 (5.65c/IDA-1.4.4 for ); Wed, 9 Mar 1994 06:09:11 -0800 Received: from mitchell.HITC.COM by whitney.hitc.com (4.1/SMI-4.1) id AA11085; Wed, 9 Mar 94 05:58:50 PST Received: by mitchell.HITC.COM (4.1/SMI-4.0) id AA10807; Wed, 9 Mar 94 09:09:14 EST Message-Id: <9403091409.AA10807@mitchell.HITC.COM> Content-Transfer-Encoding: 7BIT X-Mailer: ELM [version 2.2 PL0] From: gab@mitchell.hitc.com (Greg A.'Tony' Baraghimian) To: metadata@llnl.gov Subject: More on preparing for the next meeting Date: 09 Mar 1994 09:09:13 -0400 (EDT) ------------------------------------------------------------------------- Charles Dollar recently wrote: >I have one other thought. Do we have a list of metadata projects with >named individuals and a written description of each one? Also, is it >useful for someone to be compiling an annotated bibliography of metadata >projects? Preparation of such a bibliography, it seems to me, would be a >prerequisite for reaching out to other communities interested in metadata >issues in a variety of application contexts. Circulation of either a list >or an annotated bibliography (both of which would be dynamic and grow over >time) to conference participants prior to the meeting itself might further >serve to focus discussion. -------------------------------------------------------------------------- My interests in metadata are as a means to an end. Meaning, my program requirements are to quickly (on the order of minutes) retrieve relevant image data (eg.less than 50 images) from a 100 terabyte or greater sized archive. Thus far, in our discussions, both electronically and at Austin, there has been very little mention of requirements. Especially timing, thruput, efficiency, etc. Which brings me to Charles Dollar's comments. I strongly agree that we should compile a metadata project bibliography including information such as project requirements and other issues that may serve to taxonomize all of the various efforts going on out there. Any thoughts? - Tony Baraghimian From dale@convex1.convex.com Mon Mar 7 14:33:35 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["3343" "Mon" " 7" "March" "1994" "13:14:24" "-0600" "Dale Lancaster" "dale@convex1.convex.com" nil "64" "IEEE Sponsorship of Metadata Conference" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA07974; Mon, 7 Mar 94 14:33:33 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA03643; Mon, 7 Mar 94 14:33:31 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA28939; Mon, 7 Mar 94 11:14:15 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA24017; Mon, 7 Mar 94 11:15:32 PST Return-Path: Received: from convex.convex.com (convex-inet.convex.com) by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA23994; Mon, 7 Mar 94 11:15:24 PST Received: from convex1.convex.com by convex.convex.com (5.64/1.35) id AA01138; Mon, 7 Mar 94 13:11:31 -0600 Received: by convex1.convex.com (5.64/1.28) id AA07907; Mon, 7 Mar 94 13:14:24 -0600 Message-Id: <9403071914.AA07907@convex1.convex.com> In-Reply-To: <940303044138_71162.3600_DHQ23-3@CompuServe.COM> From: dale@convex1.convex.com (Dale Lancaster) To: Charles Dollar <71162.3600@compuserve.com> Cc: Metadata Reflector , coyne@vnet.ibm.com Subject: IEEE Sponsorship of Metadata Conference Date: Mon, 7 Mar 94 13:14:24 -0600 Charles Dollar writes: > > Did anyone find Bob Coyne's explanation of what the IEEE is doing > about regional metadata conferences enlightening or helpful? > I did, but I have been involved in this process from before we had the first workshop. Bob explained exactly where we fit in, which was needed so we can make ourselves more "legit" as you implied later on in this mail. > It is clear that at Austin we were all working under a different set > of assumptions. It would have been useful to have had the full > picture laid out for us there. Doubtless, this would have occurred > had Bob been present on Friday morning when we discussed this topic. > I think the key thing is that the group affirmed/decided that this type of effort needs to be under the IEEE and under the MSS&TC and is appropriate, especially since it was responsible for the original idea of having such workshops on data management and the MSS&TC is working on the problem of standardizing efforts on data and data management. > The impression I have from Bob's statement about the intent of the > IEEE is that for some unspecified time (perhaps six months or a year) > the IEEE through its sponsorship will "test" the metadata water, so to > speak. This suggests that the focus is more on enlisting support and > building a network rather than on sorting out fairly quickly the > intellectual issues (assuming this is possible) and then moving on to > doing real things, like prototypes. It is clear to me now that the > Austin group/meeting was an ad hoc activity that from an IEEE > perspective was useful but lacks legitimacy in terms of having a > strong influence on what happens next. > I think your description is not too far off the mark. Bob initiated these workshops to basically get some activity started on data management. At the original meeting that described these ideas, one particular workshop was highlighted, which was the one that was to take place at Oakridge National Labs and was to deal with something like Management of large scientific databases. It did not appear at the time to Jim Almond and I that this necessarily had anything to do with metadata. So Jim and I ask Bob about having this workshop under the IEEE MSS&TC hat. We all agreed and Jim and I hit the ground running. Since then this other workshop basically discovered it was directly interested in metadata. So now we have a point of convergence on this issue. Bob, Jim, Otis and I have discussed this a bit to determine what is the best next step. To be honest, until this last workshop it was not apparent to me or a couple others involved in both metadata workshops where this effort would go. It now appears that it has formalized itself a bit more and that we now need also formalize the leadership of this metadata group to manage this effort from an IEEE perspective. I believe Bob is more than willing to entertain ideas on who/what should be leading this effort and where it should be going. We obviously have Otis, Jim and myself doing some of this, we now need to see what needs to change or be added to make this more official. The key thing is to get movement and momentum. In a democratic/volunteer effort, its hard to get enough real workers to work on things and make things happen. I hope this helps :-)) dml From dale@convex1.convex.com Mon Mar 7 14:34:39 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["2509" "Mon" " 7" "March" "1994" "13:19:44" "-0600" "Dale Lancaster" "dale@convex1.convex.com" nil "40" "Re: IRDS and PCTE" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA07980; Mon, 7 Mar 94 14:34:38 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA03648; Mon, 7 Mar 94 14:34:30 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA29117; Mon, 7 Mar 94 11:19:45 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA24471; Mon, 7 Mar 94 11:20:57 PST Return-Path: Received: from convex.convex.com (convex-inet.convex.com) by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA24444; Mon, 7 Mar 94 11:20:51 PST Received: from convex1.convex.com by convex.convex.com (5.64/1.35) id AA01377; Mon, 7 Mar 94 13:16:55 -0600 Received: by convex1.convex.com (5.64/1.28) id AA08050; Mon, 7 Mar 94 13:19:44 -0600 Message-Id: <9403071919.AA08050@convex1.convex.com> In-Reply-To: <9403031433.AA26042@client.its.rpi.edu> From: dale@convex1.convex.com (Dale Lancaster) To: hsuc@rpi.edu Cc: 71162.3600@compuserve.com, metadata@llnl.gov Subject: Re: IRDS and PCTE Date: Mon, 7 Mar 94 13:19:44 -0600 hsuc@rpi.edu writes: > I've been reading articles on this mailing list with great interest since > I signed on several days ago. In a way, my recent research is all built > around metadata, from model to technology. At Rensselaer Polytechnic > Institute, my formal and current students and I have developed something > we call the Metadatabase for information resources management (comparable > to IRDS) and the management of multiple database, with immediate application > domain in manufacturing. The results have been published in a number of > journals since 1987. A core paper can be found in the June issue of IEEE > Transactions on Software Engineering by C.Hsu, M. Bouziane, L. Rattner, and > L. Yee, 1991. At present, we are working on visualization using metadata > as well as working with IBM and others to commercialize the metadatabase > technology. A while back, some colleagues at Rensselaer and I have proposed > to NSF to extend this approach for large -scale scientific databases, but > didn't go thru the final selection. The interest on my part, however, remains. > > I guess the reason for all of the above background is that, I wonder if this > group has some interest in metadata problems in traditional databases, and > how does it see the roleof database technology in large-scale files and > (scientific) data management. For instance, how does the metadata reflector > see the prior initiatives by NSF on scientific databases? > I would appreciate any response that anyone on this list cares to give. > Part of the original impetus to start these workshops was how to manage very large scientific databases (not necessarily RDBMS type of stuff, but large unstructured blobs and lots of them). So there is a lot of interest. There is also apparent interest in metadata from a traditional sense as well, because many people feel that traditional RDBMS are not adequate for maintaining relationships between data elements in a robust and highly extensible way (yes, it can be brute forced into it, but most RDBMS managers hate it). So, in an indirect way, this interest is also being explored in this group. As to the NSF stuff, I don't personally know much about the efforts. In fact that is one of our priorities, to discover about all the other groups and technologies working on this problem. We hope to have a WWW/Mosaic site established soon that will begin to collect this information. thanks for your input :-)) dml From dale@convex1.convex.com Mon Mar 7 15:02:04 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["6275" "Mon" " 7" "March" "1994" "13:42:38" "-0600" "Dale Lancaster" "dale@convex1.convex.com" nil "130" "Metadata comments" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA08013; Mon, 7 Mar 94 15:02:03 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA03792; Mon, 7 Mar 94 15:01:54 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA29905; Mon, 7 Mar 94 11:42:43 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA27164; Mon, 7 Mar 94 11:43:45 PST Return-Path: Received: from convex.convex.com (convex-inet.convex.com) by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA27150; Mon, 7 Mar 94 11:43:39 PST Received: from convex1.convex.com by convex.convex.com (5.64/1.35) id AA02858; Mon, 7 Mar 94 13:39:46 -0600 Received: by convex1.convex.com (5.64/1.28) id AA08520; Mon, 7 Mar 94 13:42:38 -0600 Message-Id: <9403071942.AA08520@convex1.convex.com> In-Reply-To: <9403040205.AA11034@pierce.llnl.gov> From: dale@convex1.convex.com (Dale Lancaster) To: Bruce Gritton Cc: metadata@llnl.gov Subject: Metadata comments Date: Mon, 7 Mar 94 13:42:38 -0600 Hi Bruce, I think you have done an excellent job summarizing some of the key aspects and problems for data access and management, or what we have been calling the metadata problem. Its clear you have spent a lot of time thinking about this. > Our primary technology goal is the design, implementation, > maintenance, and evolution of a scientific information system which > provides integrated access, processing and display of scientific data. > The foundation of the system is a data management subsystem which > integrates data from MBARI acquisition systems and other external data > sets. The underlying database is designed to support long term, > interdisciplinary research. > This is a clear goal of many of the attendees at the workshop. > Further, we adopted the philosophy that there is no data management > partition between data and metadata; data entities and their inherent > relationships are modeled as a logically contiguous realm. By > modeling at a high enough level of abstraction, we feel that we can > represent changes in technology, practices, procedures, etc as changes > in database content and NOT in database structure. This provides the I believe we all came to this conclusion as well. Data is metadata and vice versa. > We have implemented this model within our MBARI Observations Data Base > and will implement a variation under a collaborative project with the > University of California at Santa Cruz (Real-Time Environmental > Information Network & Analysis System - REINAS). This approach has > provided us an opportunity to learn about the data/metadata problem. > Do you have a whitepaper/email that you can make available to us describing this system? > 1.) Metadata is a heavily overloaded term. Computer scientists and > domain scientists think of these along two overlapping dimensions. > Another term would be better, but since we seem stuck with this one we We had the same problem when trying to nail down the definition. > decided to make all components of the definition visible. METADATA > REPRESENTS INFORMATION WHICH SUPPORTS THE EFFECTIVE USE OF DATA FROM > CREATION THROUGH LONG TERM USE. IT SPANS FOUR ANCILLARY REALMS: > CONTENT, STRUCTURE, REPRESENTATION AND CONTEXT. The CONTENT REALM I like your definition, its more from the point of view of how its used, rather than what it is. > 2.) The modeling of the domain of discourse (data & metadata) > indicates that most of the data management complexity is driven by > metadata relationships (this may be why this has been traditionally > handled by unstructured text). However, discussions with other > scientific information management efforts, from other disciplines, > indicates broad topological agreement in the respective information > models. (e.g. botanical museum collections and oceanography) > We all agreed that metadata relationships are the most important thing, not so much that metadata exists. I guess metadata without relationships is the same as collecting small bits of paper (and shiny objects? :-)) > 3.) The nature of the metadata problem may be summarized as follows: > > Metadata must support the sharing of data and information across > groups which: operate under different paradigms; have different > terminology for similar concepts; must collaborate across time... > compare data collected in different contexts with different > technology. > > There are different metadata needs for current and future users of > primary data. > > There are different metadata needs for expert/familiar users and other > users. > > Thers is a COST/BENEFIT imbalance for the implementation of an > effective metadata mechanism.... Creators must supply the metadata and > bear most of the burden; where distant users enjoy the benefits. > > Agreed 100% to all the above. > 4.) Therfore, any metadata mechanism must fit seemlessly into an > integrted data management environment to support the following: > The key is to integrate into existing environments, this is sometimes not trivial. This problem was highlighted as something to be considered. > The scientific community cannot expect technology to solve this > problem... Instead they must do a better job in understanding and > specifying metadata needs AND establish policies, practices, and > technical infrastructure to implement the need solution. > In particular, how do you enforce someone to give you metadata about an object? If my archiving system takes your file, I want you to fill out a form stating as much information as possible. Some at the workshop felt their users (scientist) would generally not fill out the form adequately. However I agree with your assertion that if you reform the policies and practices, it will happen. My idea was that if you archive a file into my system and it doesn't have adequate metadata (not jibberish, blanks or whatever), then I delete the file or I refuse to take it. Kinda harsh, but it works. > I will send a follow-up message which contains the postscript file of > a graphical depiction of our information model.. You will note that > most of the complex structure represented comes from what is typically > defined as metadata. I hope this will help communicate our work in > this area. > I printed it out, but I wasn't able to follow it too well :-)) > A few other activities that I am involved in are also looking at this > problem: I was the originator of the NEONs system at NRL in Monterey: > this system does well with technology independent access to point, > grid and image data but is weak in the CONTEXT REALM of the metadata. > I am also involved in an NRC study for the National Archives for long > term retention of scientific data, and an advisory panel to the EOSDIS > project. Needless to say, these efforts are struggling with the same > problems. > We heard a bit about this at the workshop. > I hope this contributes something to this very important discussion > and look forward to hard hitting feedback. > It contributes a boatload in my opinion. I will keep this mail around and when we get the metadata WWW home page started, would like to make a specific link to it. thanks again! :-) dml From dale@convex1.convex.com Mon Mar 7 15:07:08 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["1170" "Mon" " 7" "March" "1994" "13:50:10" "-0600" "Dale Lancaster" "dale@convex1.convex.com" nil "26" "Re: IRDS and PCTE" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA08019; Mon, 7 Mar 94 15:07:06 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA03837; Mon, 7 Mar 94 15:07:04 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA00166; Mon, 7 Mar 94 11:50:11 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA27647; Mon, 7 Mar 94 11:51:29 PST Return-Path: Received: from convex.convex.com (convex-inet.convex.com) by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA27630; Mon, 7 Mar 94 11:51:25 PST Received: from convex1.convex.com by convex.convex.com (5.64/1.35) id AA03193; Mon, 7 Mar 94 13:47:17 -0600 Received: by convex1.convex.com (5.64/1.28) id AA08698; Mon, 7 Mar 94 13:50:10 -0600 Message-Id: <9403071950.AA08698@convex1.convex.com> In-Reply-To: <9403050417.AA02035@galsun.igpp.ucla.edu> From: dale@convex1.convex.com (Dale Lancaster) To: tking@igpp.ucla.edu (Todd King) Cc: 71162.3600@compuserve.com, metadata@llnl.gov Subject: Re: IRDS and PCTE Date: Mon, 7 Mar 94 13:50:10 -0600 Todd King writes: > To all: > > There seems to be great interest in existing systems for managing > data and metadata in an effective way. It also seems that there are > many groups out there who must deal with this issue, as well as > groups how have already developed solutions. Perhaps it would be a group > idea to ask everyone to present a short description (a paragaph) description > of their systems. It would also be use if everyone would include information > about where to obtain a copy of the system, if available. > > I'm willing to volunteer as a moderator for such a list and then would > post it to the mailing list when assembled. > At the workshop, we felt this kind of thing was very needed and very important. We know there are lots of things going on out there and we need to be aware of them (no since in re-inventing the wheel). It would be great if you could collect this and then email a copy to the reflector. At some point, Robyn Sumpter will take that and hopefully create some pointers to papers/archives in the WWW home page that she is setting up that will help us all track all the different activities. thanks :-) dml From coyne@vnet.ibm.com Mon Mar 7 16:31:25 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["3163" "Mon" " 7" "March" "1994" "15:16:48" "CST" "coyne@vnet.ibm.com" "coyne@vnet.ibm.com" nil "55" "re: IEEE MSS&TC Sponsorship of Metadata Conference" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA08086; Mon, 7 Mar 94 16:31:24 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA04841; Mon, 7 Mar 94 16:31:20 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA01319; Mon, 7 Mar 94 13:16:14 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA03738; Mon, 7 Mar 94 13:17:32 PST Return-Path: Received: from vnet.IBM.COM by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA03722; Mon, 7 Mar 94 13:17:27 PST Message-Id: <9403072117.AA03722@pierce.llnl.gov> Received: from HOUVMSCC by vnet.IBM.COM (IBM VM SMTP V2R2) with BSMTP id 4319; Mon, 07 Mar 94 16:13:02 EST From: coyne@vnet.ibm.com To: metadata@llnl.gov, olear@ncar.ucar.edu Subject: re: IEEE MSS&TC Sponsorship of Metadata Conference Date: Mon, 7 Mar 94 15:16:48 CST Charles Dollars writes ......... Dale comments ........... I think that Dale gave a fair assessment of the situation in responding to Charles observations about the IEEE MSS&TC. So I only have some minor comments. Dale mentioned that the first workshop was to be focused on Managing Large Scientific Databases, and that we (IEEE MSS&TC) were trying to get some activity started in the area of data management. This is true. Most of the participants in the organizing meetings, before the first Austin workshop, did not separate the metadata definition and metadata management as a separate topic or workshop. Metadata was viewed as an integral part of the data management problem, especially in large systems where storage and data management were viewed an open systems to be integrated together to exploit the data (ie accomplish the mission). It was the notion of large, growing systems including numeric, text, audio, image, and video data as well as paper and digitized paper images would require solutions were the data management and storage system technology could be integrated in an open systems environment that drove the IEEE MSS&TC to sponsor activity in the data management arena. We believe that metadata at various levels of an information systems (eg storage and data management) would require a vertical view to maximize the ability of the data management to exploit storage "things". We did not assume that a storage centric view was required or desired. We (IEEE MSS&TC) decided to take the initiative to sponsor data management activities through specialist workshops to promote open systems technology that would allow multi-vendor storage and data management technology to be integrated into information systems. It seemed appropriate for the MSS&TC to be a sponsor since some groups tend to forget about open storage systems and work the problem from a data management and/or user i/f centric view. Using the MSS&TC as a sponsor would keep the "plumbing" on the table while the "representation of the information content" was being discussed. We do appreciate the need for metadata as a representation of the "information content" of store things as well as we appreciate the role of metadata in support of the intelligent access to and organization of stored things on the storage media, devices and libraries. We are trying to support many organizations that wish to host IEEE Specialist workshops to gain the broadest exposure. We are listening to various views on the data management issue. Data life cycle management is an example of critical data management issue requiring attention. If a metadata reference model is to be drafted by IEEE MSS&TC then we would form a standing group to address the development. If so, we would look for a small group of knowledgeable and available individuals to lead that effort. I do not plan to be lead such an effort. It did not appear, however, that it is time to launch an IEEE Metadata Reference Model activity. It seems to us that broader discussion is required. This does not mean that the work and discussions of this group is not providing leadership; I think it is. Regards, Bob From lynn@bli.com Mon Mar 7 18:36:52 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["4270" "Mon" " 7" "March" "1994" "13:07:41" "-0800" "lynn@bli.com" "lynn@bli.com" nil "110" "Metadata & Data Engineering" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA08208; Mon, 7 Mar 94 18:36:51 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA05681; Mon, 7 Mar 94 18:36:39 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA03747; Mon, 7 Mar 94 15:18:13 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA15256; Mon, 7 Mar 94 15:19:31 PST Return-Path: Received: from netcomsv.netcom.com (uucp6.netcom.com) by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA15176; Mon, 7 Mar 94 15:19:10 PST Received: from localhost by netcomsv.netcom.com with UUCP (8.6.4/SMI-4.1) id NAA03337; Mon, 7 Mar 1994 13:54:12 -0800 Received: by bli.com (AIX 3.2/UCB 5.64/4.03) id AA22502; Mon, 7 Mar 1994 13:07:41 -0800 Message-Id: <9403072107.AA22502@bli.com> Reply-To: Lynn Wheeler From: lynn@bli.com To: Subject: Metadata & Data Engineering Date: Mon, 7 Mar 1994 13:07:41 -0800 it is possible that some of the metadata issues might fit better in data engineering ... see attached posting. +++++ Lynn Wheeler | internet: lynn@bli.com Britton Lee | voice: 408-370-1400 PO Box 8 | fax: 408-370-1598 Los Gatos, Ca. 95031 | Words of wisdom from Zippy: I want a WESSON OIL lease!! From: barker@barium.cs.umanitoba.ca (Ken Barker) Newsgroups: comp.os.research Subject: RIDE-DOM`95 Announcement Date: 4 Mar 1994 03:17:00 GMT Organization: Computer Science, University of Manitoba, Winnipeg, Canada Lines: 85 Approved: comp-os-research@ftp.cse.ucsc.edu NNTP-Posting-Host: ftp.cse.ucsc.edu Originator: osr@ftp Fifth International Workshop on Research Issues on Data Engineering: DISTRIBUTED OBJECT MANAGEMENT Taipei, Taiwan, March 6-7, 1995 2.5in Sponsored by the IEEE Computer Society (pending) RIDE-DOM'95 is the fifth of a series of annual workshops on Research Issues in Data Engineering (RIDE). RIDE workshops are held in conjunction with the IEEE CS International Conferences on Data Engineering. Past successful RIDE conferences include RIDE-IMS'91 (Kyoto, Japan), RIDE-TQP'92 (Pheonix, USA), RIDE-IMS'93 (Vienna), and RIDE-ADB'94 (Houston, USA). The next RIDE workshop will also focus on distributed object management systems. The objective of the workshop is to provide a forum for the discussion and disseminatin of original and fundamental advances in all aspects of distributed object management. Original research papers are sought in all areas related this objective. The following is partial list of research areas of interest: Distibuted Objectbase Design Distributed object system architecture Object Migration Transactions in Object-Oriented Systems Managing large distributed object stores Distributed garbage collection Language Support for Persistent Objects Queries and Optimization Operating System Support Applications (e.g., GIS, CSCW) Object Views Supporting Interoperability The workshop encourages papers from industrial and user communities that will promote debate among researchers and practitioners. The workshop will include panel sessions that will investigate the emerging products, standards and application platforms. The proceedings consisting of the accepted papers will be published by IEEE Computer Society and will be widely available. INSTRUCTIONS : Authors are invited to submit six copies of manuscripts (up to 25 pages, double-spaced) by July 22, 1994 (hard deadline!), to the RIDE'95 Secretariat: RIDE'95 Secretariat Department of Computing Science University of Alberta Alberta, Canada, T6G 2H1 CONFERENCE ORGANIZATION: HONORARY CHAIRMAN: GENERAL CHAIRMAN: Yun Kuo Ahmed Elmagarmid Inst. Inf. Industry Purdue University PROGRAM COMMITTEE CO-CHAIRS: M. Tamer Ozsu Ming-Chien Shan U. of Alberta HP Laboratories ozsu@cs.ualberta.ca shan@hplmcs.hpl.hp.com PROGRAM COMMITTEE : G. Agha (USA) P. Apers (Netherlands) M. Atkinson (Scotland) F. Bancilhon (France) E. Bertino (Italy) J. Blakeley (USA) A. Buchmann (Germany) M. Franklin (USA) O. Gruber (France) M. Hsu (USA) Y. Kambayashi (Japan) W. Klas (Germany) P. Leach (USA) C. Lee (Taiwan) F. Manola (USA) E. Moss (USA) J. Orenstein (USA) H-J. Schek (Switzerland) M. Shapiro (France) H. Tirri (Finland) G. Weikum (Germany) A. Yonezawa (Japan) S. Zdonik (USA) Steering Committee : Ahmed Elmagarmid (Chair) Joseph Urban (Chair) Yahiko Kambayashi Marek Rusinkiewicz Local arrangements : C.J. Cherng (Taiwan); cjcherng@iiidns.iii.org.tw Publicity co-chairs : Ken Barker (Canada); barker@cs.umanitoba.ca Gary Gong (Taiwan); gary@iiidns.iii.org.tw Proceedings chair : Omran Bukhres (USA); bukhres@cs.purdue.edu IMPORTANT DATES : Deadline for submission: July 22, 1994 Notification of acceptance: October 30, 1994 Final camera-ready due: December 05, 1994 From 71162.3600@compuserve.com Mon Mar 7 22:39:23 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["827" "" " 7" "March" "1994" "22:19:04" "EST" "Charles Dollar" "71162.3600@compuserve.com" nil "17" "Metadata Musings" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA08312; Mon, 7 Mar 94 22:39:22 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA06925; Mon, 7 Mar 94 22:39:19 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA06776; Mon, 7 Mar 94 19:24:56 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA29280; Mon, 7 Mar 94 19:26:14 PST Return-Path: <71162.3600@CompuServe.COM> Received: from dub-img-1.compuserve.com by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA29268; Mon, 7 Mar 94 19:26:09 PST Received: from localhost by dub-img-1.compuserve.com (8.6.4/5.930129sam) id WAA16540; Mon, 7 Mar 1994 22:24:49 -0500 Message-Id: <940308031903_71162.3600_DHQ88-3@CompuServe.COM> From: Charles Dollar <71162.3600@compuserve.com> To: Metadata Reflector Subject: Metadata Musings Date: 07 Mar 94 22:19:04 EST I appreciate the thoughtful responses that both Bob and Dale made regarding my general observation about the Austin meeting and next steps. At the meeting I sensed a concern that the next meeting might replow the ground that was plowed at Austin. This concern has now been overtaken in part by Bruce Gritton's extremely useful description of his work, especially his definition of metadata, and Francis Bretherton's "Metadata Strawman." It might be useful to expand the focus of the May meeting to include Bruce's work. Incidentally, I found very useful (not surprising because I am an archivist) Bruce's definition of metadata because it encompasses the life cycle or continuum of data/records. In other words, it takes into account long-term use and usability. I think we can profit from Bruce's work in this area. From 71162.3600@compuserve.com Thu Mar 10 10:11:40 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["471" "" "10" "March" "1994" "09:33:16" "EST" "Charles Dollar" "71162.3600@compuserve.com" nil "13" "Bibliograpjy" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA15836; Thu, 10 Mar 94 10:11:39 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA29411; Thu, 10 Mar 94 10:11:31 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA08632; Thu, 10 Mar 94 06:41:58 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA25292; Thu, 10 Mar 94 06:43:14 PST Return-Path: <71162.3600@CompuServe.COM> Received: from arl-img-2.compuserve.com by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA25283; Thu, 10 Mar 94 06:43:10 PST Received: from localhost by arl-img-2.compuserve.com (8.6.4/5.930129sam) id JAA13387; Thu, 10 Mar 1994 09:41:39 -0500 Message-Id: <940310143315_71162.3600_DHQ44-1@CompuServe.COM> From: Charles Dollar <71162.3600@compuserve.com> To: Reflector Subject: Bibliograpjy Date: 10 Mar 94 09:33:16 EST On the topic of bibliography, I can take on temporary responsibility for it until a decision is made about what role it has in this endeavor. Please send full citations to me and I will obtain a copy through the NARA Library and attempt to provide some annotations. Specifically, I believe that Tony Baraghimian and Robert Demolombe alluded to work they had done. If there is a publication, white paper, or the like, please advise me. Thanks. C$ National Archives From xxja001@chpc.utexas.edu Thu Mar 10 17:17:14 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["2099" "Thu" "10" "March" "1994" "15:56:35" "-0600" "xxja001@chpc.utexas.edu" "xxja001@chpc.utexas.edu" "<9403102156.AA09360@cowchip.chpc.utexas.edu>" "46" "some comments on metadata meetings" "^From:" nil nil "3" "1994031021:56:35" "some comments on metadata meetings" nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA18085; Thu, 10 Mar 94 17:17:13 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA03007; Thu, 10 Mar 94 17:17:10 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA15004; Thu, 10 Mar 94 13:56:39 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA29915; Thu, 10 Mar 94 13:57:57 PST Return-Path: Received: from cowchip.chpc.utexas.edu by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA29905; Thu, 10 Mar 94 13:57:53 PST Received: by cowchip.chpc.utexas.edu (4.1/SMI-4.1) id AA09360; Thu, 10 Mar 94 15:56:36 CST Message-Id: <9403102156.AA09360@cowchip.chpc.utexas.edu> X-Mailer: ELM [version 2.4 PL23] Content-Type: text Content-Length: 2099 From: xxja001@chpc.utexas.edu To: metadata@llnl.gov Subject: some comments on metadata meetings Date: Thu, 10 Mar 1994 15:56:35 -0600 (CST) Fellow metadata freaks, I am very gratified at the constructive discussion which has emerged on the reflector during the last few weeks. There is indeed a lot of thinking going on in metadata-related areas, and also quite a bit of insight. All this makes me very optimistic as to possibilities for the collaboration which the DC meeting should bring - and no, I don't feel I or Dale or Otis or any of us have in some way been "removed" from the metadata intitiative. We certainly cannot claim to own it in any way - not yet anyway! We should simply regard ourselves as an informal bunch of people with some common problems and perceptions, trying to get some action started, and to some degree succeeding! I agree with the comments that Dale and C$ have been making as to the agenda. I hope we can get the likes of Bruce Gritton and Cheng Hsu to present (and I assume they are listening on this reflector). I have also indicated my willingness to present some of the insights which have resulted from our Austin meetings so far. I have also submitted a paper for the meeting of the IEEE MSS&TCC meeting in Annency in June, at which I propose to relate the state of our deliberations on the subject, as well as some examples and applications of the concept of metadata, and a brief description of our MetaStore prototype. One of the issues which is still unresolved is that of whether there is actually any distiction between data and metadata. Some say there is not, and certainly there are applications which will use descriptive attributes as "data", (=> metadata is data!) I believe however, that there will be applications which do make an absolute distinction (e.g. an application may use descriptive attributes to browse/select information to be processed, and then may access the described information from entirely different sources and media in order to perform very different operations), so that perhaps METADATA MAY BE REGARDED AS DATA, BUT DATA IS NOT IN GENERAL METADATA. But then again, in general, one should not generalize(!) - Jim Almond From BARGMEYER.BRUCE@epamail.epa.gov Thu Mar 10 21:38:47 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["7397" "Thu" "10" "March" "1994" "21:16:00" "-0500" "BRUCE BARGMEYER 202-260-5306" "BARGMEYER.BRUCE@epamail.epa.gov" nil "253" "EPA Workshop on Metadata" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA18230; Thu, 10 Mar 94 21:38:45 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA04537; Thu, 10 Mar 94 21:38:43 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA18245; Thu, 10 Mar 94 18:24:05 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA17114; Thu, 10 Mar 94 18:25:23 PST Return-Path: Received: from VAXTM1.RTPNC.EPA.GOV by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA17104; Thu, 10 Mar 94 18:25:12 PST Received: from pyxis.rtpnc.epa.gov by epavax.rtpnc.epa.gov (PMDF V4.2-13 #5309) id <01H9TL111UQ890O8G4@epavax.rtpnc.epa.gov>; Thu, 10 Mar 1994 21:23:28 EST Received: from mr.rtpnc.epa.gov by mail.rtpnc.epa.gov (PMDF V4.2-15 #5309) id <01H9TL0KSIY88WZVK6@mail.rtpnc.epa.gov>; Thu, 10 Mar 1994 21:23:05 EST Received: with PMDF-MR; Thu, 10 Mar 1994 21:22:13 EST Mr-Received: by mta CARINA; Relayed; Thu, 10 Mar 1994 21:22:13 -0500 Alternate-Recipient: prohibited Disclose-Recipients: prohibited Message-Id: <01H9TL0M2LWM8WZVK6@mr.rtpnc.epa.gov> X-Envelope-To: metadata@llnl.gov Mime-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="Boundary (ID NmzQY6atPIuXznfiHzX9Bg)" Content-Transfer-Encoding: 7BIT Posting-Date: Thu, 10 Mar 1994 21:17:00 -0500 (EST) Importance: normal Priority: normal X400-Mts-Identifier: [;31221201304991/908520@MAIL] A1-Type: MAIL Hop-Count: 0 From: BRUCE BARGMEYER 202-260-5306 To: metadata@llnl.gov Subject: EPA Workshop on Metadata Date: Thu, 10 Mar 1994 21:16:00 -0500 (EST) --Boundary (ID NmzQY6atPIuXznfiHzX9Bg) Content-type: TEXT/PLAIN; CHARSET=US-ASCII --Boundary (ID NmzQY6atPIuXznfiHzX9Bg) MIME-version: 1.0 Content-type: MESSAGE/RFC822 Date: Thu, 3 Mar 1994 10:57:00 EST Subject: Status - Data Management Workshop Mar. 15, 1994 Sender: "William T. Smith" To: "ravi@avl.umd.edu" , "mike@xidak.com" , "trhyne@vislab.epa.gov" , "cjk@hemlock.cray.com" , "r.p.weston@larc.nasa.gov" , "walther@caip.rutgers.edu" , "peskin@caip.rutgers.edu" , "jambrosiano@llnl.gov" , "vouk@adm.csc.ncsu.edu" , "neacsu@mcnc.org" , "bilicki@mcnc.org" , "galluppi@mcnc.org" , "bargmeyer.bruce" , "benjey.william" , "jzn@epavax.rtpnc.epa.gov" Content-type: TEXT/PLAIN; CHARSET=US-ASCII Posting-date: Thu, 10 Mar 1994 00:00:00 EST Importance: normal A1-type: MAIL I am forwarding this for information of anyone in the area (it is late notice). This meeting relates to the Environmental Protection Agency HPCC initiative. It is an open meeting, but there is limited space available. --Bruce Bargmeyer ----------------------- Enclosed you will find a status for the data management workshop. It includes a list of talks, a very tentative agenda, a list of participants, and a list of convenient hotels. I expect the number of participants to continue to grow. I will provide updates as more information becomes available. When and Where Tuesday, March 15, 1994 from 9:00 AM to:5:00 PM Seminar Rooms A&B North Caroling Supercomputing Center 3021 Cornwallis Road Research Triangle Park, N.C. 27707 The following talks are scheduled and will be limited to 20 minutes plus 10 minutes for questions answers. Talks will emphasize the problems and issues of data managment. An overhead projector will be available. If anyone needs a slide projector or other equipment, let me know. "Macro-programming: An Approach to Data Management and Control in an Earth System Modeling Framework" John Ambrosiano - Lawrence Livermore National Laboratory "Intelligent Software Access to Data" Bruce Bargmeyer - EPA/OARM Washington, D.C. "Advanced Visulization of Data in Earth Sciences and Space" (to be confirmed) Ravi Kulkarni - University of Maryland "Data Mining Technologies" (to be confirmed) Michael Neacsu - Independent Consultant "SCENE: An Object-Oriented Scientific User Interface and Data Management System" Richard Peskin and Sandra Walther, Dept. of Mechanical & Aerospace Engineering, Rutgers University "Framework for Interdisciplinary Design Optimization" Robert Weston - NASA Proposed Agenda 8:30 - 9:00 AM - Welcome Coffee 9:00 - 9:20 AM - Introduction and Overview Ken Galluppi - MCNC 9:20 - 9:50 AM - Earth System Modeling John Ambrosiano - LLNL 9:50 - 10:20 AM - Software Access to Data Bruce Bargmeyer - EPA/OARM 10:20 - 10:30 AM break 10:30 - 11:00 AM - SCENE Richard Peskin and Sandra Walther - Rutgers University 11:00 - 11:30 AM - Interdisciplinary Design Optimization Robert Weston - NASA 11:30 - 12:00 Advanced Data Visualization (to be confirmed) Ravi Kulkarni - University of Maryland 12:00 - 1:00 lunch 1:00 - 1:30 - Data Mining Technologies (to be confirmed) Michael Neacsu - Independent Consultant 1:30 - 3:00 Break out sessions to discuss major issues and problems which may include: distributed verses centralized data management limits imposed by system software and hardware requirements for metadata networking file services data archiving 3:00 - 3:15 break 3:15 - 4:45 continue break out sessions 4:45 - 5:00 Closing Remarks Ken Galluppi - MCN The following organizations will be represented at the workshop: U.S. Environmental Protection Agency Joan Novak William Benjey Bruce Bargmeyer MCNC Ken Galluppi Ted Smith Ed Bilicki Independent Consultant Mike Neacsu N.C. State University (to be confirmed) Mladen Vouk Lawrence Livermore National Lab John Ambrosiano Rutgers University Richard Peskin Sandra Walther NASA Robert Weston Corps of Engineers Lisa Roig Cray Research Inc. Carla Kennedy Martin Marietta Theresa Rhyne (EPA Visualization Lab) Xidak (to be confirmed) Mike Achenbach University of Maryland (to be confirmed) Ravi Kulkarni The following hotels are convenient to Research Triangle Park. Holiday Inn RDU Airport 4810 New Page Road Durham, NC 27709 919-941-6000 Radisson Governor's Inn Highway 54 RTP, NC 27709 919-549-8631 Residence Inn (Marriott) 1919 Highway 54 East Durham, NC 27709 919-361-1266 Crown Park Best Western 4627 S. Miami Blvd. Durham, NC 27709 919-941-6066 Red Roof Inn 4405 Highway 55 East Durham, NC 27709 919-361-1950 The first four range from $70 - $100 per night (single room). The Red Roof Inn is in the $30 - $50 range. I also have a simple map of RTP which I can fax to you if you send be a request. See you at the workshop. Ted -- Ted Smith MCNC smith_w@mcnc.org Information Technologies Division Voice: 919-248-9232 P.O. Box 12889, 3021 Cornwallis Rd. Fax: 919-248-9245 Research Triangle Park, NC 27709-2889 --Boundary (ID NmzQY6atPIuXznfiHzX9Bg) MIME-version: 1.0 Content-type: MESSAGE/RFC822 Date: Thu, 3 Mar 1994 11:13:00 EST From: SYSTEM@CARINA.RTPNC.EPA.GOV Subject: Content-type: TEXT/PLAIN; CHARSET=US-ASCII Posting-date: Thu, 3 Mar 1994 11:13:00 EST Importance: normal A1-type: DOCUMENT RFC-822-headers: Received: from vaxtm1.rtpnc.epa.gov by mail.rtpnc.epa.gov (PMDF V4.2-15 #5309) id <01H9J7C7M3O08WY8W1@mail.rtpnc.epa.gov>; Thu, 3 Mar 1994 11:03:39 EST Received: from merlin.rtpnc.epa.gov by epavax.rtpnc.epa.gov (PMDF V4.2-13 #5309) id <01H9J7C5FW408WXHCN@epavax.rtpnc.epa.gov>; Thu, 3 Mar 1994 11:03:37 EST Received: from cardinal.ncsc.org by merlin.rtpnc.epa.gov (8.6.6.Beta9/1.34) id LAA03040; Thu, 3 Mar 1994 11:03:17 -0500 Received: by cardinal.ncsc.org (5.64/MCNC/6-25-91) id AA01789; Thu, 3 Mar 94 10:57:23 -0500 for bargmeyer.bruce@epamail.epa.gov Date: Thu, 03 Mar 1994 10:57:23 +0000 From: "William T. Smith" Subject: Status - Data Management Workshop Mar. 15, 1994 To: jzn@epavax.rtpnc.epa.gov (Joan Novak), benjey.william@epamail.epa.gov (Bill Benjey), bargmeyer.bruce@epamail.epa.gov (Bruce Bargmeyer), galluppi@mcnc.org (Ken Galluppi), bilicki@mcnc.org (Ed Bilicki), neacsu@mcnc.org (Mike Neacsu), vouk@adm.csc.ncsu.edu (Mladen Vouk), jambrosiano@llnl.gov (John Ambrosiano), peskin@caip.rutgers.edu (Richard L. Peskin), walther@caip.rutgers.edu (Sandra Walther), r.p.weston@larc.nasa.gov (Robert Weston), cjk@hemlock.cray.com (Carla Kennedy), trhyne@vislab.epa.gov (Theresa Rhyne), mike@xidak.com (Mike Achenbach), ravi@avl.umd.edu (Ravi Kulkarni) Message-id: <9403031557.AA01789@cardinal.ncsc.org> X-Envelope-to: bargmeyer.bruce@mr.rtpnc.epa.gov, benjey.william@mr.rtpnc.epa.gov X-Mailer: ELM [version 2.3 PL0] Content-transfer-encoding: 7BIT --Boundary (ID NmzQY6atPIuXznfiHzX9Bg)-- From jkineman@luna.ngdc.noaa.gov Fri Mar 11 13:44:08 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["2566" "Fri" "11" "March" "1994" "11:30:58" "" "John J. Kineman" "jkineman@luna.ngdc.noaa.gov" "<9403111830.AA17892@ngdc.noaa.gov>" "46" "meta-data" "^From:" nil nil "3" "1994031118:30:58" "meta-data" nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA00957; Fri, 11 Mar 94 13:44:06 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA10578; Fri, 11 Mar 94 13:44:04 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA24557; Fri, 11 Mar 94 10:27:58 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA14068; Fri, 11 Mar 94 10:29:16 PST Return-Path: Received: from ngdc.noaa.gov (luna.ngdc.noaa.gov) by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA14058; Fri, 11 Mar 94 10:29:12 PST Received: from 192.149.148.73 ([192.149.148.73]) by ngdc.noaa.gov (4.1/SMI-4.1) id AA17892; Fri, 11 Mar 94 11:30:35 MST Message-Id: <9403111830.AA17892@ngdc.noaa.gov> X-Sender: jjk@luna.ngdc.noaa.gov X-Mailer: From: jkineman@luna.ngdc.noaa.gov (John J. Kineman) To: Reflector Subject: meta-data Date: Fri, 11 Mar 1994 11:30:58 -700 I'm new to this forum and so don't have much background on your discussions. Is there a summary or something that I can get to catch up? My own work is with a national data center and I have been conducting a project in data integration for global change. I have thus thought a great deal about data, meta-data, etc. and have developed some protocols for my project that I hope will help influence the general field. In my own view, data are indeed different from meta-data, which is indeed different from Documentation. "Data" in my view, is defined by authorship. That is, a data set is defined by the principal investigator. The PI may include meta-data headers and descriptors, but these are not part of the primary intellectual product that would be published in the scientific literature (ideally). I am strongly promoting the concept of data "publications" with analogous conventions and protocols as for reviewed scientific literature. If we take that as a model, the formats, which often include changes in meta-data, would be more within the scope of the data publisher than the PI, although, as with literature, there must be dialog to ensure that meta-data and format changes do not unduely alter the intellectual content of the PI's work, in this case as reflected in the dataset. Going one step further, I make the distinction between meta-data and documentation somewhat more arbitrarily, considering numerical information such as in header files (i.e., locational reference systems, resolution, coverage, projection, temporal character, etc.) meta-data, and written descriptions or analyses (e.g., of quality, application, descriptions of the original study, etc.) that help understand the dataset and its limits and applications as "documentation." In my view, meta-data are digital information intended for use by a computer program, while documentation is information intended for use by a person. Naturally, one could argue that future programs could access and use digitized documentation, but again the criterion is more based on original design and intent than subsequent fact. I would like to know what people think about the concept of data publishing. In my mind, without such a concept, we cannot have truly scientific data. ----------------------------------------------- John J. Kineman, Physical Scientist/Ecologist Ecosystems and Global Change National Geophysical Data Center 325 Broadway E/GC1 (3100 Marine St. Rm: A-152) Boulder, Colorado 80303 USA (303) 497-6900 (phone) (303) 497-6513 (fax) From sumpter@llnl.gov Fri Mar 11 16:05:10 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["2850" "Fri" "11" "March" "1994" "12:47:18" "PST" "Robyne Sumpter" "sumpter@llnl.gov" "<9403112047.AA27028@ocfmail.ocf.llnl.gov>" "70" "Re: Recommended agenda for May workshop" "^From:" nil nil "3" "1994031120:47:18" "Recommended agenda for May workshop" nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA01891; Fri, 11 Mar 94 16:05:09 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA11980; Fri, 11 Mar 94 16:05:07 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA27031; Fri, 11 Mar 94 12:47:25 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA27702; Fri, 11 Mar 94 12:48:44 PST Return-Path: Received: from ocfmail.ocf.llnl.gov by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA27693; Fri, 11 Mar 94 12:48:41 PST Received: from [134.9.50.11] (sumpter-mac.ocf.llnl.gov) by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA27028; Fri, 11 Mar 94 12:47:18 PST Message-Id: <9403112047.AA27028@ocfmail.ocf.llnl.gov> X-Sender: sumpter@ocfmail.ocf.llnl.gov From: sumpter@llnl.gov (Robyne Sumpter) To: metadata@llnl.gov Subject: Re: Recommended agenda for May workshop Date: Fri, 11 Mar 94 12:47:18 PST I agree with Dale that we need to quickly provide input on this agenda if we want to influence the next meeting. Dale has made a good first cut. I would like to suggest that we plan some time to breakout and discuss various issues in smaller working groups. With 50 or more people in a meeting it will be difficult to focus the discussion. Some suggested topics for breakout groups might be: 1. application/user metadata requirements - Structure of metadata - How metadata is created - Indexing (how to provide multiple views, where do databases fall short) - Interfaces between applications, databases, file systems - Managing "off-line" data (paper, books, reference material) 2. metadata issues for systems - storage/retrieval performance for large datasets - data longevity (preserving info on formats & relationships) These are just a few ideas that need more fleshing out and I'm sure there are more. I like the first day of Dales suggessted agenda. I think if we break into smaller groups at the end of the first day or beginning of the second day with the goal of fleshing out issues and developing a high level architectural or functional picture, we have a better chance of producing something by the end of the second day. ( I also printed Bruce Gritton's diagram but had difficulty following it ). I have modified Dales agenda below. This is just a suggestion. I'm open to other agendas. Dale writes.... > >Suggested Agenda > >Day 1 (Assuming there may be more than one day) > >* Intros all around (who are you and why are you here) > >* Review of the work done at the last two Metadata workshops > - the whitepaper > - our thrashing out of the scope and functionality requirements > - miscellaneous things I'm sure will need to be mentioned > >* Presentations by invited speakers on proposed metadata > reference models: > - Francis Bretherton > - Bruce Gritton from NEONS ? > - others ??? >(Hopefully these folks will propose not just models, but especially >the rationale for having a model in the first place) > Break into working groups for inital discussion on agendas > >Day 2 > Continue working group discusion for first half of day Working groups present results to whole group Discuss plans for next meeting or goals for working groups =========================================================== Robyne M. Sumpter sumpter@llnl.gov Lawrence Livermore Laboratory Phone: (510) 423-5054 P.O. Box 808 L-60 Fax: (510) 423-8715 Livermore, CA 94550 =========================================================== From jkineman@luna.ngdc.noaa.gov Fri Mar 11 21:46:51 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["2566" "Fri" "11" "March" "1994" "19:40:43" "" "John J. Kineman" "jkineman@luna.ngdc.noaa.gov" nil "46" "meta-data" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA02607; Fri, 11 Mar 94 21:46:50 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA14298; Fri, 11 Mar 94 21:46:48 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA03699; Fri, 11 Mar 94 18:37:44 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA24216; Fri, 11 Mar 94 18:39:03 PST Return-Path: Received: from ngdc.noaa.gov (luna.ngdc.noaa.gov) by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA24205; Fri, 11 Mar 94 18:39:00 PST Received: from 192.149.148.73 ([192.149.148.73]) by ngdc.noaa.gov (4.1/SMI-4.1) id AA25009; Fri, 11 Mar 94 19:40:22 MST Message-Id: <9403120240.AA25009@ngdc.noaa.gov> X-Sender: jjk@luna.ngdc.noaa.gov X-Mailer: From: jkineman@luna.ngdc.noaa.gov (John J. Kineman) To: metadata@llnl.gov Subject: meta-data Date: Fri, 11 Mar 1994 19:40:43 -700 I'm new to this forum and so don't have much background on your discussions. Is there a summary or something that I can get to catch up? My own work is with a national data center and I have been conducting a project in data integration for global change. I have thus thought a great deal about data, meta-data, etc. and have developed some protocols for my project that I hope will help influence the general field. In my own view, data are indeed different from meta-data, which is indeed different from Documentation. "Data" in my view, is defined by authorship. That is, a data set is defined by the principal investigator. The PI may include meta-data headers and descriptors, but these are not part of the primary intellectual product that would be published in the scientific literature (ideally). I am strongly promoting the concept of data "publications" with analogous conventions and protocols as for reviewed scientific literature. If we take that as a model, the formats, which often include changes in meta-data, would be more within the scope of the data publisher than the PI, although, as with literature, there must be dialog to ensure that meta-data and format changes do not unduely alter the intellectual content of the PI's work, in this case as reflected in the dataset. Going one step further, I make the distinction between meta-data and documentation somewhat more arbitrarily, considering numerical information such as in header files (i.e., locational reference systems, resolution, coverage, projection, temporal character, etc.) meta-data, and written descriptions or analyses (e.g., of quality, application, descriptions of the original study, etc.) that help understand the dataset and its limits and applications as "documentation." In my view, meta-data are digital information intended for use by a computer program, while documentation is information intended for use by a person. Naturally, one could argue that future programs could access and use digitized documentation, but again the criterion is more based on original design and intent than subsequent fact. I would like to know what people think about the concept of data publishing. In my mind, without such a concept, we cannot have truly scientific data. ----------------------------------------------- John J. Kineman, Physical Scientist/Ecologist Ecosystems and Global Change National Geophysical Data Center 325 Broadway E/GC1 (3100 Marine St. Rm: A-152) Boulder, Colorado 80303 USA (303) 497-6900 (phone) (303) 497-6513 (fax) From STREBEL@boreas.gsfc.nasa.gov Thu Mar 17 20:04:46 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["1326" "Thu" "17" "March" "1994" "19:45:11" "EST" "STREBEL@boreas.gsfc.nasa.gov" "STREBEL@boreas.gsfc.nasa.gov" nil "26" "Documentation" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA18127; Thu, 17 Mar 94 20:04:45 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA03250; Thu, 17 Mar 94 20:04:43 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA21092; Thu, 17 Mar 94 16:45:29 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA08764; Thu, 17 Mar 94 16:46:47 PST Return-Path: Received: from BOREAS.GSFC.NASA.GOV by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA08718; Thu, 17 Mar 94 16:46:41 PST Message-Id: <940317194511.26400270@BOREAS.GSFC.NASA.GOV> X-Vmsmail-To: SMTP%"metadata@llnl.gov" From: STREBEL@boreas.gsfc.nasa.gov To: metadata@llnl.gov Subject: Documentation Date: Thu, 17 Mar 1994 19:45:11 EST Let me add an echo to what John Kineman and James Brunt have said. Most of my work has been at the interface between the scientific community and the information management community. You can't explain easily what (the awful, horrible, pretentious) term "meta-data" means to the former, and the latter has a hard time incorporating all of the aspects of "documentation" of a data set into a traditional data base driven concept of meta-data. We have opted in our projects supporting large scientific field experiments to go with the "data publication" analogy. It gains automatic recognition from the scientific community, and it provides a paradigm for what constitutes required descriptive information about the data ("meta-data, if you insist). It also provides a good implementation mechanism, in that you need to write a human-readable "technical paper" about the data, which will accompany it whereever it goes. The challenge to information managers is how to capture this meta data in electronic form, store and index it in standard ways, and associate it with individual data "items" in such a way that an appropriately descriptive document can be automatically constructued to accompany the data returned by an arbitrary query. Donald E. Strebel Code 923 NASA/Goddard Space Flight Center Greenbelt, MD 20771 From SAWYER@nssdca.gsfc.nasa.gov Fri Mar 18 11:20:04 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["1098" "Fri" "18" "March" "1994" "10:50:40" "-0500" "SAWYER@nssdca.gsfc.nasa.gov" "SAWYER@nssdca.gsfc.nasa.gov" nil "25" "RE: Documentation" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA19716; Fri, 18 Mar 94 11:20:03 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA08732; Fri, 18 Mar 94 11:20:01 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA27621; Fri, 18 Mar 94 07:50:17 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA14181; Fri, 18 Mar 94 07:51:36 PST Return-Path: Received: from NSSDCA.GSFC.NASA.GOV by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA14172; Fri, 18 Mar 94 07:51:32 PST Message-Id: <940318105040.27002b43@NSSDCA.GSFC.NASA.GOV> From: SAWYER@nssdca.gsfc.nasa.gov To: STREBEL@boreas.gsfc.nasa.gov Cc: metadata@llnl.gov Subject: RE: Documentation Date: Fri, 18 Mar 1994 10:50:40 -0500 (EST) Let me echo what Donald Strebel has said, and add a bit more. I have been assisting in the development of standards by the Panel 2 of the Consultative Committee for Space Data Systems (CCSDS). These standards (the basic one is also ISO 12175 now) address the linking of data and metadata/documentation/publication in a simple but effective way. They support packaging of data and metadata, and provide for an infrastructure to maintain the metadata/documentation. We do not try to make a clear distinction between what is intended only for machine use and what is only for human use. Some of the metadata/ documentation is useful for both beyond just presentation. We are also working on data description languages with different levels of semantic functionality. Instances of these description languages provide various metadata/documentation description objects that are linked to the data (objects). I was unaware of Donald Strebel's work, even though we are both at GSFC. I am sure we will compare notes! Donald M. Sawyer Code 633 NASA Goddard Space Flight Center Greenbelt, MD 20771 From system@cuhhca.hhmi.columbia.edu Sat Mar 19 11:34:43 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["1714" "Sat" "19" "March" "1994" "11:17:24" "-0500" "Phil Bourne" "system@cuhhca.hhmi.columbia.edu" nil "35" "Pointers to the Content/Context Issue" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA21726; Sat, 19 Mar 94 11:34:41 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA17080; Sat, 19 Mar 94 11:34:40 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA14468; Sat, 19 Mar 94 08:23:49 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA19876; Sat, 19 Mar 94 08:25:04 PST Return-Path: Received: from mailhub.cc.columbia.edu by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA19867; Sat, 19 Mar 94 08:25:00 PST Received: from cuhhca.hhmi.columbia.edu by mailhub.cc.columbia.edu with SMTP id AA02979 (5.65c+CU/IDA-1.4.4/HLK for metadata@llnl.gov); Sat, 19 Mar 1994 11:23:39 -0500 Received: by cuhhca.hhmi.columbia.EDU (5.64/8.0) id AA00627; Sat, 19 Mar 94 11:17:24 -0500 Message-Id: <9403191617.AA00627@cuhhca.hhmi.columbia.EDU> From: system@cuhhca.hhmi.columbia.edu (Phil Bourne) To: metadata@llnl.gov Cc: system@cuhhca.hhmi.columbia.edu Subject: Pointers to the Content/Context Issue Date: Sat, 19 Mar 94 11:17:24 -0500 Hi: I am new to this group so forgive me if I am dealing in FAQ's. For the past 2 years we have been developing a dictionary based on the STAR (Self Defining Text Archival and Retrieval) for describing the structure of biological macromolecules. This began as a project for achiving purposes, since this had been successful in the related discipline dealing with data from small molecules -- by successful I mean it was adopted by the community. As the dictionary has developed the idea that STAR can be used to define the context of the data, by that I mean explicit relationships between data items among other things arose. I see this as being of enormous potential in software development in our field. My questions are as follows: (i) From this brief blurb can you point me to other domains from which we might gain some insight? (ii) Related to (i) is what I refer to as the content versus context battle whereby the domain scientists want to maintain a user readable data dictionary whereas us software types want to turn it into a tool we can really use. Is there documented evidence of this type of battle elsewhere, and more importantly some kind of outcome? (iii)Is there a concise review of the evolving field of metadata? Thanks.. /p =========================================================================== Philip E. Bourne Howard Hughes Medical Institute Department of Biochemistry and Molecular Biophysics Columbia University 630 W. 168th Street New York NY 10032 USA Telephone: (212) 305-3657 FAX: (212) 305-7379 Internet: system@cuhhca.hhmi.columbia.edu (128.59.98.1) =========================================================================== From BARGMEYER.BRUCE@epamail.epa.gov Sun Mar 20 09:13:28 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["26269" "Thu" "17" "March" "1994" "12:39:00" "-0500" "BRUCE BARGMEYER 202-260-5306" "BARGMEYER.BRUCE@epamail.epa.gov" nil "522" "Position Statement from Treinish" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA24171; Sun, 20 Mar 94 09:13:25 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA22845; Sun, 20 Mar 94 09:13:23 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA19880; Sun, 20 Mar 94 05:56:36 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA01796; Sun, 20 Mar 94 05:57:55 PST Return-Path: Received: from VAXTM1.RTPNC.EPA.GOV by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA01787; Sun, 20 Mar 94 05:57:50 PST Received: from pyxis.rtpnc.epa.gov by epavax.rtpnc.epa.gov (PMDF V4.2-13 #5309) id <01HA6TU8XY8W90R7J1@epavax.rtpnc.epa.gov>; Sun, 20 Mar 1994 08:56:21 EST Received: from mr.rtpnc.epa.gov by mail.rtpnc.epa.gov (PMDF V4.2-15 #5309) id <01HA6TTJR25C8X1QDN@mail.rtpnc.epa.gov>; Sun, 20 Mar 1994 08:55:50 EST Received: with PMDF-MR; Sun, 20 Mar 1994 08:51:34 EST Mr-Received: by mta CARINA; Relayed; Sun, 20 Mar 1994 08:51:34 -0500 Alternate-Recipient: prohibited Disclose-Recipients: prohibited Message-Id: <01HA6TTM8JLI8X1QDN@mr.rtpnc.epa.gov> X-Envelope-To: metadata@llnl.gov Mime-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="Boundary (ID jda+PRN2HgXwZK3kdufM2g)" Content-Transfer-Encoding: 7BIT Posting-Date: Sun, 20 Mar 1994 08:44:00 -0500 (EST) Importance: normal Priority: normal X400-Mts-Identifier: [;43158002304991/941549@MAIL] A1-Type: MAIL Hop-Count: 0 From: BRUCE BARGMEYER 202-260-5306 To: metadata@llnl.gov Subject: Position Statement from Treinish Date: Thu, 17 Mar 1994 12:39:00 -0500 (EST) --Boundary (ID jda+PRN2HgXwZK3kdufM2g) Content-type: TEXT/PLAIN; CHARSET=US-ASCII The following was distributed to participants of an EPA/MCNC workshop on metadata. I think it may be of interest to recipients of this metadata reflector. --Bruce Bargmeyer --Boundary (ID jda+PRN2HgXwZK3kdufM2g) MIME-version: 1.0 Content-type: MESSAGE/RFC822 Date: Wed, 16 Mar 1994 11:37:00 EST Subject: Position Statement from Treinish Sender: "William T. Smith" To: "r.p.weston@larc.nasa.gov" , "rjw@hpcc.epa.gov" , "walther@caip.rutgers.edu" , "vouk@adm.csc.ncsu.edu" , "smith_w@mcnc.org" , "diane@xidak.com" , "roig@hl.wes.army.mil" , "trhyne@vislab.epa.gov" , "reagan.james" , "peskin@caip.rutgers.edu" , "jzn@epavax.rtpnc.epa.gov" , "neacsu@mcnc.org" , "namzer.john" , "dluecken@climate.rtpnc.gov" , "ravi@avl.umd.edu" , "cjk@hemlock.cray.com" , "jou@mcnc.org" , "galluppi@mcnc.org" , "frame@ida.org" , "fine@mcnc.org" , "eyth@mcnc.org" , "davis.steve" , "coats@mcnc.org" , "cxh@epavax.rtpnc.epa.gov" , "bdx@flyer.ncsc.org" , "bilicki@mcnc.org" , "benjey.william" , "bargmeyer.bruce" , "gab@mitchell.hitc.com" , "rajini@galileo.csc.ncsu.edu" , "jambrosiano@llnl.gov" Content-type: TEXT/PLAIN; CHARSET=US-ASCII Posting-date: Sun, 20 Mar 1994 00:00:00 EST Importance: normal A1-type: MAIL Workshop Participants: I thank you for attending our data management workshop. As promised, I am sending you a position statement I received from Lloyd Treinish. Other papers available in hardcopy include: "Unifying Principles of Data Management for Scientific Visualization" "Interactive Archives of Scientific Data" "An Architecture for Rule-Based Visualization" "Visualization of Stratospheric Ozone Depletion and Polar Vortex" "Human Vision, Visual Processing, and Digital Display IV" Ted -------------------- Message from Treinish ----------------- >From lloydt@watson.ibm.com Tue Mar 8 10:47:56 1994 Date: Tue, 8 Mar 94 10:51:11 EST From: "Lloyd A. Treinish" To: smith_w@mcnc.org Status: OR Position Statement on Scientific Data Models, Structures and Access Software Lloyd A. Treinish Visualization Systems IBM Thomas J. Watson Research Center Post Office Box 704 Yorktown Heights, NY 10598 USA 914-784-5038 (voice) 914-784-5130 (facsimile) lloydt@watson.ibm.com Background There are a tremendous number of sources of scientific data, be they computed or measured. Even from a single source there can be a wide variety of data sets. Each such data set typically contains several independent variables such as time, one or more spatial, spectral, etc. variables, and of course, many dependent variables, where the interesting science is stored. There can be a bewildering range of underlying formats, structures, arrangements and access methods for these data. Of course, the size and complexity of these data are growing significantly as data generators rapidly improve. Unfortunately, an appropriate data handling infrastructure is still required, for which there is no universally agreed upon solution. Independently, there is a need to develop a taxonomy of such data by both structural characteristics and application areas. Rudimentary efforts along these lines were begun at a SIGGRAPH '90 workshop and additional efforts have been continued since but much work is still required, especially in the collection of discipline-specific information. Data Structures To attempt to bring some simplifying order to this chaos, consider six key attributes of data: dimensionality, parameters, data type, rank, mesh structure, and aggregation. Any data set may be considered as a function(s) of independent variable(s). These independent variables may be called dimensions. The number of independent variables may be called the dimensionality of the data. It is the fundamental characteristic of the data. Such dimensions may be space (length, width, height), time, energy, etc. For example, zero-dimensional data are just numbers such as sales, while two-dimensional data could depend on an area such as barometric pressure over a state. Some complex data may have five or more dimensions. The function(s) composing a data set really are dependent variable(s) -- the data themselves, which may be called parameters. They are dependent on the dimensions, such as sales or temperature. Thus, data implies a parameter or field of one or more (dependent) values that is a function of one or more (independent) variables, e.g., y1, y2, ... , ymY = f1(x1, x2, ... , xn) (1) f2(x1, x2, ... , xn) . . . fm(x1, x2, ... , xn)Y The data type includes the physical primitive, which describes how data values are stored on some medium (e.g., byte, int, float, etc.). It can include machine representations (e.g., little endian vs. big endian, IEEE vs. VAX, etc.). In addition, there is a category of such types, i.e., real, complex or quaternion. A parameter may have more than one value, which is characterized by tensor rank. Rank 0 is a scalar (one value), such as temperature (a magnitude -- a single-valued function). Rank 1 is a vector such as wind velocity (a magnitude and a direction: two values in two dimensions, three values in three dimensions). Vectors of size, n, are n-valued functions. Rank 2 is a tensor such as stress on an airframe (four values in two dimensions, nine values in three dimensions). A rank 2 tensor in n-dimensional space is a n x n matrix of functions (e.g., stress). Dimensionality and rank are thus, related. The number of elements in a particular parameter is d**r, where d is the dimensionality and r is the rank. As with dimensionality, rank may be large for very complex data. There is often an association between the dimensionality of the data and its geometry, which can be called a mesh or grid -- the size, shape and organization of how the data relates to a physical domain. This is perhaps the class of characteristics for which there is the most confusion because of the typical use of domain-specific terminology. In general, a mesh describes the base geometry for the mapping of the functions or dependent variables to some (physical) coordinate system. For example, there can be: o Regular grid with regular positions and regular connectivity such as a temperature map o Deformed regular or curvilinear or structured grid with irregular positions and regular connectivity such as the pressure on an airframe o Irregular "regular" or structured grid with irregular positions and regular connectivity such as several satellite images with gaps in coverage o Unstructured or irregular grid with regular or irregular connectivity such as a finite element mesh in a structural analysis problem o No grid with irregular positions and no connectivity -- scattered points such as sales figures or rainfall in specific towns Such a mesh may be explicitly or implicitly positioned. In the latter case, it is preferred to store the positioning information implicitly. Some implementations may store it explicitly. In most cases there is a topological relationship or cell primitive connecting these positions. For example, Dimensionality Cell Primitive -------------- -------------- 0 not applicable 1 line 2 triangle, quadrilateral 3 tetrahedra, parallelpiped, hexahedra, prism, pyramid Often there is a need to form aggregates of data. In dependent-variable space, one could have a collection of parameters or fields over the same or different grids that could be treated as single entity. In that sense, an aggregate or group could be composed of members that are either a single field or another group. This mechanism can be used to define simple tree structures. In independent-variable space, one could have a collection of meshes. For example, in aerospace fluid dynamics simulations, a computation is often performed over several intersecting grids. Such a multigrid solution permits the definition of variable grid resolution and regularity around airframe structures such as an engine nacelle or a wing. In this case, one could treat the collection of meshes as a single entity, although a mechanism must be defined for accommodating regions of invalidity where grids may intersect. A similar problem occurs with observational data, where there may be no data for some grid nodes. Another type of mesh aggregate can result from a hybrid collection of meshes of different cell primitives, where within each mesh, the cell is the same. A special case of aggregates can be called series, where the tree structure has only one level of children. The classic example is a time series, where there is multiple instances of some field or aggregate over a constant or changing mesh. Generically, such a series does not have to depend on time, but could apply to any sequencing of events. Implementations Traditional methods of handling scientific data such as flat sequential files are generally inefficient in storage, access or ease-of-use for large complex data sets particularly for input/output and floating-point-intensive applications like signal processing (e.g., inverse problems) and visualization. Modern, commercial relational data management systems do not offer an effective solution because they are more oriented to business applications. The relational model does not accommodate multidimensional, irregular or hierarchical structures often found in scientific data sets. In addition, relational systems do not provide sufficient performance for the size, complexity and type of access dictated by current and future data sets and their potential usage. Hence, current data base systems are not yet up to the challenge of supporting very large data sets. Therefore, there is a need for some type of data (base) model(s) that possesses elements of a modern data base management system but is oriented toward scientific data sets and applications. This intermediate approach should be easy to use, support large disk-based (perhaps other media as well) data sets and accommodate multiple scientific data structures in a uniform fashion. In the process of providing simple access to self-describing data, such a mechanism should match applications requirements for visualization as well as data analysis and management, and be independent of any specific discipline or source or visualization technique. Hence, data management as embodied as a data model(s) is as important to scientific applications as the underlying computation. Its implementation, the management of and access to the data, should be decoupled from the actual computational software. These classic limitations have been recognized by a few groups in the support of a number of scientific applications. As a result, several data models have been defined, some within very domain-specific contexts while others being more general. Recently, a few of these models have been associated with software, including well-defined language bindings that represent the implementation of one or more abstract data types. Common Data Format (CDF), developed at NASA/Goddard Space Flight Center, was one of the first implementations of a scientific data model. It is based upon the concept of providing abstract support for a class of scientific data that can be described by a multidimensional block structure. Although all data do not fit within this framework, a large variety of scientific data do. From the CDF effort spawned the Unidata Program Center's netCDF. These systems are extensible by the user, and conventions have been established in some orga- nizations to ensure proper interpretation when data are exchanged. Although the data models for both CDF and netCDF are essentially the same, the interfaces and physical storage are quite different. Among many other differences, CDF transparently supports multiple physical formats on all platforms, which can be chosen for convenience or improving performance. In contrast, netCDF supports a single physical representation. Another important effort has been the Hierarchical Data Format (HDF) developed by the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. This activity evolved from the need to move files of scientific data among heterogeneous machines, which grew out of the requirement to look at images and other data on personal computers, workstations, etc. that were generated on a supercomputer. HDF, which is also self-describing, uses an extensible tagged file organization to provide access to basic data types like a raster image (and an associated palette), a multidimensional block, etc. In this sense, HDF provides access (i.e., via its C and FORTRAN bindings) to a number of different flat file organizations. Unlike CDF and netCDF, registration authority for the definition of data structures, formats and interfaces is with the implementor. The Visualization Systems Group at IBM Thomas J. Watson Research Center has developed a more comprehensive data model that includes curvilinear and irregular meshes and hierarchies (e.g., trees, series, composites), vector and tensor data, etc. in addition to the class of scalar, multidimensional blocks supported by the aforementioned implementations. The IBM Visualization Data Explorer software, developed by this same group is built upon this data model. It is a client-server data-flow system for general visualization applications. This software provides polymorphic operations on data, independent of a role in generating pictures, by working with shared data structures in memory via an uniform interface. There is a physical disk-based format, as one import/export mechanism but only simple sequential access is provided. Issues A major concern for extant systems is in their practical application to a number of large data sets. None of the implementations in their current state can address the diversity and size of extant and future data sets in an application-independent fashion. These issues often relate to scaling to even modest data sets by today's standards independent of addressing raw bandwidth. The data effectiveness of a system can be measured by its ability to handle multiple data sets simultaneously of various sizes, types, structures, etc. without forcing artificial constraints that disrupt the fidelity of the original data. Systems that support different classes of data separately will have difficulty scaling to support disparate data properly at the same time. Systems that support different classes of data uniformly do not because they effectively decouple the management of and access to the data from the actual applications software. Some of the specific issues, for which improvements and standards are potential required, include but are not limited to: o Data structure residency Are the data structures in an implementation on secondary or persistent storage (i.e., disk) and/or in primary storage (i.e., memory)? Efficient disk-based data structures, access methods and interfaces are critical for utilization of large data sets, where the aggregate size may be one or more orders of magnitude greater than available memory. o Transaction-like processing What types of access methods for data are supported? The ability to do DBMS- like operations on data in place on secondary storage is important for data sets that are too expensive to even partially reproduce. The design of efficient caching schemes for large data blocks is required. Support of different physical organizations transparently may be necessary as well. o Physical format, file structure, protocols, etc. How does an implementation compensate for typical limitations in conventional file systems (e.g., blocking), operating systems (e.g., paging) and networks (e.g., NFS, RPC, TCP/IP) for bulk data access? An underlying mechanism geared for bulk access is required. Ideally, this should be available independently of a data model implementation. In practice, it is not. This can be viewed as an interface between storage systems and data management systems, since these scientific data models are in effect a class of data management systems. This notion is also important for effective use of high-performance and expensive storage, communications and computation technology (e.g., parallel disk arrays, HiPPI networks, supercomputers). For example, the IBM Power Visualization System (a medium-grain, shared-memory, parallel supercomputer) utilizes a custom RPC between computational and input/output (HiPPI) processors to achieve sustained high-speed throughput (e.g., 95 MB/sec to a HiPPI frame buffer, 95 MB/sec to a RAID-3 HiPPI disk array or 76 MB/sec to a 4-bank SCSI-2 fast and wide RAID-3 disk array with a simple block 64KY + extent file system) o Applications enabling How should programmers view data when they develop applications? Similarly, how should applications view data for users? Data model(s) and implementations should provide uniform "object" access independent of underlying structure, which may include representation and access to complex meshes and hierarchies, multiresolution abstraction, compression as well as as simple rectilinear arrays and collections of point data. Interfaces to data (for programmers) and operations (for users) should be polymorphic and interoperable. In addition, there is a need to develop data-format, structure and access software benchmarks to investigate performance and compare implementations. At a minimum, such benchmarks could be used to compare strengths and weaknesses of extant systems, and to help focus the direction of future implementations and standards. In this context, performance is defined in terms of ease of specification and extensibility, level of complexity that can be represented, storage efficiency and actual access (read/write) times of different classes of data for specific computer systems in common use, and software and data portability. Selected References 1. Brown, S. A., M. Folk, G. Goucher and R. Rew. "Software for Portable Scientific Data Management". Computers in Physics, 7, n.3, pp.304-308, May/June 1993. 2. Butler, D. M. and M. H. Pendley. "The Visualization Management System Approach to Visualization in Scientific Computing" and "A Visualization Model Based on the Mathematics of Fiber Bundles", Computers in Physics, 3, n.5 Sept./Oct. 1989. 3. Butler, D. M. and C. Hansen (ed.). "Scientific Visualization Environments: A Report on a Workshop at Visualization '91". Computer Graphics, 26, n. 3, pp. 213-216, February 1992. 4. Butler, D. M. and S. Bryson. "Vector-Bundle Classes Form Powerful Tool for Scientific Visualization". Computers in Physics, 6, n.6, pp. 213-216, November/December 1992. 5. Campbell, W. J., R. F. Cromp, G. Fekete, R. Wall and M. Goldberg. Panel on "Techniques for Managing Very Large Scientific Data Bases". Proceedings IEEE Visualization '92, pp. 362-365, October 1992. 6. Faust, J. T. and D. S. Dyer. "An Effective Data Format for Scientific Visualization". Proceedings of the SPIE/SPSE Symposium on Electronic Imaging, February 1990. 7. Fekete, G. "Rendering and Managing Spherical Data with Sphere Quadtrees". Proceedings IEEE Visualization '90, pp. 176-186, October 1990. 8. French, J. C., A. K. Jones, J. L. Pfaltz. "A Summary of the NSF Scientific Database Workshop". Quarterly Bulletin of IEEE Computer Society Technical Committee on Data Engineering, 13, n. 3, September 1990. 9. Haber, R., B. Lucas and N. Collins. "A Data Model for Scientific Visualization with Provisions for Regular and Irregular Grids", Proceedings IEEE Visualization '91, pp. 298-305, October 1991. 10. Hibbard, W., C. R. Dyer and B. Paul. "Display of Scientific Data Structures for Algorithm Visualization". Proceedings IEEE Visualization '92, pp. 139-146, October 1992. 11. "Definition of the Flexible Image Transport System". NASA/OSSA Office of Standards and Technology, June, 1993. 12. Kochevar, P., Z. Ahmed, J. Shade and C. Sharp. "Bridging the Gap Between Visualization and Data Management: A Simple Visualization Management System". Proceedings IEEE Visualization '93, pp. 94-101, October 1993. 13. Lang, U., R. Lang, R. R-hle. "Integration of Visualization and Scientific Calculation in a Software System, Proceedings IEEE Visualization '91 Conference, October 1991. 14. Li, Y. P., T. H. Handley, Jr., E. R. Dobinson. "Data Hub: A Framework for Science Data Management". Submitted to 18th International Conference in Very Large Data Bases, August 1992. 15. NASA/OSSA Office of Standards and Technology. "Definition of the Flexible Image Transport System", June 1993. 16. Pfau, L. M. "The GridFile Tool Structure of System Files". Eidgenossische Technische Hochschule, Zurich, 1990. 17. Rew. R. K. and G. P. Davis. "NetCDF: An Interface for Scientific Data Access". IEEE Computer Graphics and Applications, 10, n.4, July 1990, pp. 76-82. 18. Salem, K. "MR-CDF: Managing Multi-Resolution Scientific Data". Cent of Excellence in Space Data and Information Systems Technical Report 928, NASA/Goddard Space Flight Center, March 1992. 19. Stonebraker, M., J. Chen, N. Nathan, C. Paxson, A. Su and J. Wu. "Tioga: A Database-Oriented Visualization Tool". Proceedings IEEE Visualization '93, pp. 86-93, October 1993. 20. Treinish, L. A. and M. L. Gough. "A Software Package for the Data- Independent Storage of Multi-Dimensional Data". Eos Transactions American Geophysical Union, 68, pp. 633-635, 1987. 21. Treinish, L. A. (ed). "Data Structures and Access Software for Scientific Visualization", A Report on a Workshop at SIGGRAPH '90. Computer Graphics, 25, n. 2, April 1991. 22. Treinish, L. A. "Unifying Principles of Data Management for Scientific Visualization". Proceedings of the British Computer Society Conference on Animation and Scientific Visualization, December 1992 and Animation and Scientific Visualization Tools and Applications (R. Earnshaw and D. Watson, editors), Academic Press, pp. 141-169, 1993. 23. Treinish, L. A., M. Folk, G. Goucher, R. Kulkarni and R. Rew. "Data Models, Structures and Access Software for Scientific Visualization". Proceedings IEEE Visualization '93, pp. 355-360, October 1993. 24. Wells, D. C., E. W. Greisen and R. H. Harten. "FITS: A Flexible Image Transport System". Astronomy and Astrophysics Supplement Series, 44, pp. 363- 370, 1981. -- Ted Smith MCNC smith_w@mcnc.org Information Technologies Division Voice: 919-248-9232 P.O. Box 12889, 3021 Cornwallis Rd. Fax: 919-248-9245 Research Triangle Park, NC 27709-2889 --Boundary (ID jda+PRN2HgXwZK3kdufM2g) MIME-version: 1.0 Content-type: MESSAGE/RFC822 Date: Wed, 16 Mar 1994 12:21:00 EST From: SYSTEM@CARINA.RTPNC.EPA.GOV Subject: Content-type: TEXT/PLAIN; CHARSET=US-ASCII Posting-date: Wed, 16 Mar 1994 12:21:00 EST Importance: normal A1-type: DOCUMENT RFC-822-headers: Received: from vaxtm1.rtpnc.epa.gov by mail.rtpnc.epa.gov (PMDF V4.2-15 #5309) id <01HA1F9HIFXS8X0V6W@mail.rtpnc.epa.gov>; Wed, 16 Mar 1994 12:05:04 EST Received: from merlin.rtpnc.epa.gov by epavax.rtpnc.epa.gov (PMDF V4.2-13 #5309) id <01HA1EN7JHEO90Q0QY@epavax.rtpnc.epa.gov>; Wed, 16 Mar 1994 11:46:52 EST Received: from cardinal.ncsc.org by merlin.rtpnc.epa.gov (8.6.6.Beta9/1.34) id LAA26165; Wed, 16 Mar 1994 11:45:51 -0500 Received: by cardinal.ncsc.org (5.64/MCNC/6-25-91) id AA29721; Wed, 16 Mar 94 11:37:13 -0500 for reagan.james@epamail.epa.gov Date: Wed, 16 Mar 1994 11:37:12 +0000 From: "William T. Smith" Subject: Position Statement from Treinish To: jambrosiano@llnl.gov (John Ambrosiano), rajini@galileo.csc.ncsu.edu (Rajini Balay), gab@mitchell.hitc.com (Tony Baraghimian), bargmeyer.bruce@epamail.epa.gov (Bruce Bargmeyer), benjey.william@epamail.epa.gov (Bill Benjey), bilicki@mcnc.org (Ed Bilicki), bdx@flyer.ncsc.org (Daewon Byun), cxh@epavax.rtpnc.epa.gov (Jason Ching), coats@mcnc.org (Carlie Coats), davis.steve@epamail.epa.gov (Steve Davis), eyth@mcnc.org (Alison Eyth), fine@mcnc.org (Steve Fine), frame@ida.org (Mike Frame), galluppi@mcnc.org (Ken Galluppi), jou@mcnc.org (Frank Jou), cjk@hemlock.cray.com (Carla Kennedy), ravi@avl.umd.edu (Ravi Kulkarni), dluecken@climate.rtpnc.gov (Deborah Leucken), namzer.john@epamail.epa.gov (John Manzer), neacsu@mcnc.org (Mike Neacsu), jzn@epavax.rtpnc.epa.gov (Joan Novak), peskin@caip.rutgers.edu (Richard L. Peskin), reagan.james@epamail.epa.gov (Jim Reagan), trhyne@vislab.epa.gov (Theresa Rhyne), roig@hl.wes.army.mil (Lisa Roig), diane@xidak.com (Diane Sandoval), smith_w@mcnc.org (Ted Smith), vouk@adm.csc.ncsu.edu (Mladen Vouk), walther@caip.rutgers.edu (Sandra Walther), rjw@hpcc.epa.gov (Bob Wayland), r.p.weston@larc.nasa.gov (Robert Weston) Message-id: <9403161637.AA29721@cardinal.ncsc.org> X-Envelope-to: bargmeyer.bruce@mr.rtpnc.epa.gov, benjey.william@mr.rtpnc.epa.gov, davis.steve@mr.rtpnc.epa.gov, reagan.james@mr.rtpnc.epa.gov X-Mailer: ELM [version 2.3 PL0] Content-transfer-encoding: 7BIT --Boundary (ID jda+PRN2HgXwZK3kdufM2g)-- From SAWYER@nssdca.gsfc.nasa.gov Mon Mar 21 13:30:19 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["2601" "Mon" "21" "March" "1994" "13:01:49" "-0500" "SAWYER@nssdca.gsfc.nasa.gov" "SAWYER@nssdca.gsfc.nasa.gov" nil "56" "RE: Pointers to the Content/Context Issue" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA26232; Mon, 21 Mar 94 13:30:18 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA01722; Mon, 21 Mar 94 13:30:15 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA01086; Mon, 21 Mar 94 10:13:31 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA03123; Mon, 21 Mar 94 10:04:12 PST Return-Path: Received: from NSSDCA.GSFC.NASA.GOV by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA03068; Mon, 21 Mar 94 10:04:03 PST Message-Id: <940321130149.2700551c@NSSDCA.GSFC.NASA.GOV> From: SAWYER@nssdca.gsfc.nasa.gov To: system@cuhhca.hhmi.columbia.edu Cc: metadata@llnl.gov Subject: RE: Pointers to the Content/Context Issue Date: Mon, 21 Mar 1994 13:01:49 -0500 (EST) system@cuhhca.hhmi.columbia.edu (Phil Bourne) writes: > Hi: I am new to this group so forgive me if I am dealing in FAQ's. > For the past 2 years we have been developing a dictionary based > on the STAR (Self Defining Text Archival and Retrieval) for > describing the structure of biological macromolecules. This began as > a project for achiving purposes, since this had been successful in > the related discipline dealing with data from small molecules -- > by successful I mean it was adopted by the community. As the dictionary > has developed the idea that STAR can be used to define the context of > the data, by that I mean explicit relationships between data items > among other things arose. I see this as being of enormous potential > in software development in our field. My questions are as follows: > (i) From this brief blurb can you point me to other domains from > which we might gain some insight? > (ii) Related to (i) is what I refer to as the content versus context > battle whereby the domain scientists want to maintain a > user readable data dictionary whereas us software types want > to turn it into a tool we can really use. Is there documented > evidence of this type of battle elsewhere, and more importantly > some kind of outcome? > (iii)Is there a concise review of the evolving field of metadata? > > Thanks.. /p In response to your question 'i', we are participating in the Consultative Committee for Space Data Systems (CCSDS) Panel 2 program, and included here is what we call our "Data Entity Dictionary" draft standard. We are applying it to data taken from, or derived from, space borne sensors. We are evolving it as we encounter new issues. We are about the generate a new version of the document, and will be glad to put you on a list to receive it. We will need your hard copy mail address. In response to your question 'ii', this is certainly a concern we have. Our Dictionary is intended to be human readable, but it is growing in its support to software services. However we have separated issues like the representation of base data types at the bit level to another language, which we refer to as a Data Description Language (DDL). Both of these are really description languages. Sorry I can't help you with your question 'iii'. We would be very interested to get some documentation on your Dictionary work. If you need to send hard copy, please send it to my address below. Thanks Don ======================================= Don Sawyer Code 633 NASA/Goddard Space Flight Center Greenbelt, MD 20771 From bburns@milisant.CV.NRAO.EDU Wed Jan 26 15:39:02 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["5518" "Wed" "26" "January" "1994" "15:39:00" "EST" "Bob Burns" "bburns@milisant.CV.NRAO.EDU" "<9401262039.AA13950@milisant.cv.nrao.edu>" "132" "forwarded message from Robyne Sumpter" "^From:" nil nil "1" "1994012620:39:00" "forwarded message from Robyne Sumpter" nil nil] nil) Return-Path: Received: from milisant.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA01131; Wed, 26 Jan 94 15:39:01 EST Received: by milisant.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA13950; Wed, 26 Jan 94 15:39:00 EST Message-Id: <9401262039.AA13950@milisant.cv.nrao.edu> From: bburns@milisant.CV.NRAO.EDU (Bob Burns) To: fschwab@milisant.CV.NRAO.EDU, dwells@milisant.CV.NRAO.EDU Subject: forwarded message from Robyne Sumpter Date: Wed, 26 Jan 94 15:39:00 EST ------- Start of forwarded message ------- Received: from ocfmail.ocf.llnl.gov ([134.9.48.4]) by milisant.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA13932; Wed, 26 Jan 94 14:59:31 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA29891; Wed, 26 Jan 94 11:45:04 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA06810; Wed, 26 Jan 94 11:46:19 PST Return-Path: Received: from ocfmail.ocf.llnl.gov by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA06801; Wed, 26 Jan 94 11:46:16 PST Received: from [134.9.50.11] (sumpter-mac.ocf.llnl.gov) by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA29887; Wed, 26 Jan 94 11:44:56 PST Message-Id: <9401261944.AA29887@ocfmail.ocf.llnl.gov> X-Sender: sumpter@ocfmail.ocf.llnl.gov From: sumpter@llnl.gov (Robyne Sumpter) To: metadata@llnl.gov Subject: Forwarding a call for papers Date: Wed, 26 Jan 94 11:44:56 PST Thought the group might find this interesting. I apologize if this is a repeated posting. Robyne > >I would also appreciate it if you would post the following >CFP to your metadata mail list. It should be of interest to >the metadata group. > >Regards, >Jim French > >----------- > > **** C A L L F O R P A P E R S **** > > Seventh International Working Conference on > Scientific and Statistical Database Management > ---------- --- ----------- -------- ---------- > Charlottesville, Virginia U.S.A. September 28-30, 1994 > >The Conference >--- ---------- > This international conference provides a forum for the presentation and > exchange of current work in the field of scientific and statistical data- > base management. The workshop character of the conference provides oppor- > tunities for interaction among the attendees. The Seventh SSDBM continues > the tradition of providing a stimulating environment to encourage discus- > sion and the exchange of ideas in a quiet setting. > > We are particularly soliciting papers on new concepts, novel ideas, and > state-of-the-art research results relevant to database and knowledge base > design from a theoretical as well as applicative point of view. To > encourage the dialog between practitioners and researchers we invite con- > tributions also from domain-scientists, reporting experiences in data > management from their field. Topics of interest include but are not limited > to: modeling and semantics, query languages and user interfaces, physical > organization, security, scientific databases, data analysis and visualiza- > tion, management of temporal and spatial data, knowledge discovery, uncer- > tainty, evaluation of scientific, engineering, or statistical applications. > >Submission of Papers >---------- -- ------ > Authors are requested to submit five copies of the complete paper, not > exceeding 20 pages, as follows. > > Contributions from the European and Asian continents to: > > Hans Hinterberger, European Co-Chairman. > Institute for Scientific Computing > ETH Zentrum > CH-8092 Zurich, Switzerland > > All other contributions to: > > James C. French, General Chairman. > c/o Sandra Sullivan > Thornton Hall, University of Virginia > Charlottesville, VA 22901 USA > > >Important Dates >--------- ----- > > March 1, 1994 Deadline for submission of papers. > May 24, 1994 Notification of acceptance. > July 12, 1994 Camera-ready copies of papers due. > > >Program Committee >------- --------- > R.A. Becker (USA), R. Cubitt (Luxembourg), D.M.Y. Defays (Luxembourg), K.R. > Dittrich (Switzerland), J.C. French (USA), H. Gilgen (Switzerland), P. > Golder (UK), D.J. Hand (UK), H. Hinterberger (Switzerland), J. Klensin > (USA), K. Kuespert (Germany), F. M. Malvestuto (Italy), M. McLeish > (Canada), Z. Michalewicz (USA), G. Ozsoyoglu (USA), J.L. Pfaltz (USA), M. > Rafanelli (Italy), S. Ram (USA), D. Rotem (USA), A. Shoshani (USA), B. > Sundgren (Sweden), P. Svensson (Sweden), J.L.A. Van Rijckevorsel (Nether- > lands), A. Westlake (UK), M. Zemankova (USA). > >Organizing Committee >---------- --------- > J.C. French, H. Hinterberger, J.L. Pfaltz, A. Shoshani. > >General Chair >------- ----- > James C. French, Department of Computer Science, School of Engineering and > Applied Science, Thornton Hall, University of Virginia, Charlottesville, VA > 22901, USA; e-mail: french@virginia.edu > >Co-Chair >-- ----- > Hans Hinterberger, Institute for Scientific Computing, ETH Zentrum, CH-8092 > Zurich, Switzerland; e-mail: hinterberger@inf.ethz.ch > > > Sponsored in part by the Center of Excellence in Space Data and Information > Sciences and the National Aeronautics and Space Administration. In > cooperation with the IEEE Computer Society and the International Associa- > tion for Statistical Computing. > > =========================================================== Robyne M. Sumpter sumpter@llnl.gov Lawrence Livermore Laboratory Phone: (510) 423-5054 P.O. Box 808 L-60 Fax: (510) 423-8715 Livermore, CA 94550 =========================================================== ------- End of forwarded message ------- From LIBRARY@stsci.edu Tue Feb 1 17:20:57 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil] ["763" "Tue" " 1" "February" "1994" "17:20:11" "-0500" "Sarah Stevens-Rayburn" "LIBRARY@stsci.edu" "<01H8DNI3VSYQKRWNG0@avion.stsci.edu>" "16" "A&A preprints and abstracts" "^From:" nil nil "2" "1994020122:20:11" "A&A preprints and abstracts" nil nil] nil) Return-Path: Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA15467; Tue, 1 Feb 94 17:20:55 EST Received: from stsci.edu (eotvos.stsci.edu) by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA28257; Tue, 1 Feb 94 17:20:52 EST Received: from avion.stsci.edu by avion.stsci.edu (PMDF V4.2-13 #4188) id <01H8DNI3J7N4KRWNG0@avion.stsci.edu>; Tue, 1 Feb 1994 17:20:12 EST Message-Id: <01H8DNI3VSYQKRWNG0@avion.stsci.edu> X-Vms-To: IN%"dwells@nrao.edu" Mime-Version: 1.0 Content-Type: TEXT/PLAIN Content-Transfer-Encoding: 7BIT From: Sarah Stevens-Rayburn To: dwells@NRAO.EDU Subject: A&A preprints and abstracts Date: Tue, 01 Feb 1994 17:20:11 -0500 (EST) Hi Don-- Following a `conversation' with Ellen about a brochure she and Caroline are working on to help folk find net resources, I looked again at NRAO's homepage. Having wended my way down to your stuff about preprints and abstracts, I wanted to let you know that we've added a separate file of the preprint database for 1982-1991 (which will really be 1982 through two years ago, with the regular STEPsheet always being current and previous year for speed and accuracy). It's stsci-old-preprint-db if you're interested. Also, I'm curious to know why your pointer to the STEPsheets is via CERN . Isn`t that a rather round-about way to go? Or is there some Mosaic subtlety I'm missing? I note you go to the RAPsheet via the same route.... Cheers, Sarah From dwells Tue Feb 1 20:00:17 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["1469" "Tue" " 1" "February" "1994" "20:00:08" "EST" "Don Wells" "dwells" "<9402020100.AA15747@fits.cv.nrao.edu>" "36" "Re: A&A preprints and abstracts" "^From:" nil nil "2" "1994020201:00:08" "A&A preprints and abstracts" nil "<01H8DNI3VSYQKRWNG0@avion.stsci.edu>"] nil) Return-Path: Received: by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA15747; Tue, 1 Feb 94 20:00:08 EST Message-Id: <9402020100.AA15747@fits.cv.nrao.edu> In-Reply-To: <01H8DNI3VSYQKRWNG0@avion.stsci.edu> References: <01H8DNI3VSYQKRWNG0@avion.stsci.edu> From: dwells (Don Wells) To: Sarah Stevens-Rayburn Cc: library Subject: Re: A&A preprints and abstracts Date: Tue, 1 Feb 94 20:00:08 EST Sarah, Sarah Stevens-Rayburn writes: > .. we've added.. stsci-old-preprint-db .. OK. I will add that to my resource list soon. > .. curious.. why.. pointer.. via CERN.. rather round-about.. The problem is that Mosaic was not able to talk directly to WAIS servers as recently as November. This capability was added to Mosaic with the 2.0 release. Those URLs were put in the page back when all we had was Mosaic 1.x. I (and many other people maintaining pages) were reluctant to switch the URLs to direct-access at first, because we knew that many people were still running Mosaic 1.x. But many weeks have passed, and by now I don't have much sympathy for people who are running obsolete Mosaic versions. Those pointers need to be updated. I will do that soon. -*- I am right now in the process of re-designing my resource list. I am talking to several other people about the problem of maintaining these lists. I am particularly interested in seeing improvements in the representation of library, preprint and other scholarly-support resources for astronomy. I will be glad to talk to -- and cooperate with -- my favorite librarians about these matters! -*- I gather that the thesaurus project has finally been concluded. I saw some annoncement about this, and it used the verb "published" about the thesaurus. What about a machine=readable version? It is obvious that one must exist; will it become available on the Net? -Don From @server.cs.virginia.edu:7ssdbm@acacia.cs.virginia.edu Wed Feb 16 10:38:45 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["4212" "Wed" "16" "February" "1994" "10:31:08" "EST" "7ssdbm@acacia.cs.virginia.edu" "7ssdbm@acacia.cs.virginia.edu" "<9402161531.AA27888@acacia.cs.Virginia.EDU>" "94" "7SSDBM Papers due March 1st" "^From:" nil nil "2" "1994021615:31:08" "7SSDBM Papers due March 1st" nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA05173; Wed, 16 Feb 94 10:38:44 EST Received: from virginia.edu (uvaarpa.Virginia.EDU) by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA29757; Wed, 16 Feb 94 10:38:42 EST Received: from server.cs.virginia.edu by uvaarpa.virginia.edu id aa07938; 16 Feb 94 10:31 EST Received: from acacia.cs.Virginia.EDU by uvacs.cs.virginia.edu (4.1/5.1.UVA) id AA21715; Wed, 16 Feb 94 10:31:15 EST Posted-Date: Wed, 16 Feb 94 10:31:08 EST Return-Path: <7ssdbm@acacia.cs.Virginia.EDU> Received: by acacia.cs.Virginia.EDU (4.1/SMI-2.0) id AA27888; Wed, 16 Feb 94 10:31:08 EST Message-Id: <9402161531.AA27888@acacia.cs.Virginia.EDU> From: 7ssdbm@acacia.cs.virginia.edu To: 7ssdbm-announcements@uvacs.cs.virginia.edu Subject: 7SSDBM Papers due March 1st Date: Wed, 16 Feb 94 10:31:08 EST Please accept our apology if you receive multiple copies of this message. We are using many mailing lists and even though we have removed many duplicates, some still manage to slip through. **** C A L L F O R P A P E R S **** **** **** **** R E M I N D E R **** **** **** **** P A P E R S D U E: M A R C H 1st **** Seventh International Working Conference on Scientific and Statistical Database Management ---------- --- ----------- -------- ---------- Charlottesville, Virginia U.S.A. September 28-30, 1994 The Conference --- ---------- This international conference provides a forum for the presentation and exchange of current work in the field of scientific and statistical data- base management. The workshop character of the conference provides oppor- tunities for interaction among the attendees. The Seventh SSDBM continues the tradition of providing a stimulating environment to encourage discus- sion and the exchange of ideas in a quiet setting. We are particularly soliciting papers on new concepts, novel ideas, and state-of-the-art research results relevant to database and knowledge base design from a theoretical as well as applicative point of view. To encourage the dialog between practitioners and researchers we invite con- tributions also from domain-scientists, reporting experiences in data management from their field. Topics of interest include but are not limited to: modeling and semantics, query languages and user interfaces, physical organization, security, scientific databases, data analysis and visualiza- tion, management of temporal and spatial data, knowledge discovery, uncer- tainty, evaluation of scientific, engineering, or statistical applications. Submission of Papers ---------- -- ------ Authors are requested to submit five copies of the complete paper, not exceeding 20 pages, as follows. Contributions from the European and Asian continents to: Hans Hinterberger, European Co-Chairman. Institute for Scientific Computing ETH Zentrum CH-8092 Zurich, Switzerland All other contributions to: James C. French, General Chairman. c/o Sandra Sullivan Thornton Hall, University of Virginia Charlottesville, VA 22901 USA Important Dates --------- ----- March 1, 1994 Deadline for submission of papers. May 24, 1994 Notification of acceptance. July 12, 1994 Camera-ready copies of papers due. Program Committee ------- --------- R.A. Becker (USA), R. Cubitt (Luxembourg), D.M.Y. Defays (Luxembourg), K.R. Dittrich (Switzerland), J.C. French (USA), H. Gilgen (Switzerland), P. Golder (UK), D.J. Hand (UK), H. Hinterberger (Switzerland), J. Klensin (USA), K. Kuespert (Germany), F. M. Malvestuto (Italy), M. McLeish (Canada), Z. Michalewicz (USA), G. Ozsoyoglu (USA), J.L. Pfaltz (USA), M. Rafanelli (Italy), S. Ram (USA), D. Rotem (USA), A. Shoshani (USA), B. Sundgren (Sweden), P. Svensson (Sweden), J.L.A. Van Rijckevorsel (Nether- lands), A. Westlake (UK), M. Zemankova (USA). Organizing Committee ---------- --------- J.C. French, H. Hinterberger, J.L. Pfaltz, A. Shoshani. General Chair ------- ----- James C. French, Department of Computer Science, School of Engineering and Applied Science, Thornton Hall, University of Virginia, Charlottesville, VA 22901, USA; e-mail: french@virginia.edu Co-Chair -- ----- Hans Hinterberger, Institute for Scientific Computing, ETH Zentrum, CH-8092 Zurich, Switzerland; e-mail: hinterberger@inf.ethz.ch Sponsored in part by the Center of Excellence in Space Data and Information Sciences and the National Aeronautics and Space Administration. In cooperation with the IEEE Computer Society and the International Associa- tion for Statistical Computing. From rosen@speckle.ncsl.nist.gov Tue Mar 8 10:33:44 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["994" "Tue" " 8" "March" "1994" "10:13:37" "EST" "Bruce K. Rosen" "rosen@speckle.ncsl.nist.gov" nil "20" "IEEE Metadata -- sources" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA09622; Tue, 8 Mar 94 10:33:42 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA10730; Tue, 8 Mar 94 10:33:41 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA10370; Tue, 8 Mar 94 07:14:47 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA09846; Tue, 8 Mar 94 07:15:56 PST Return-Path: Received: from speckle.ncsl.nist.gov by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA09837; Tue, 8 Mar 94 07:15:50 PST Received: from [129.6.59.25] (BEIGE.NCSL.NIST.GOV) by speckle.ncsl.nist.gov (4.1/SMI-3.2-del/cas.6) id AA19738; Tue, 8 Mar 94 10:13:39 EST Message-Id: <36818.rosen@speckle.ncsl.nist.gov> X-Popmail-Charset: English From: "Bruce K. Rosen" To: BARGMEYER.BRUCE@epamail.epa.gov, metadata@llnl.gov Subject: IEEE Metadata -- sources Date: Tue, 8 Mar 94 10:13:37 EST An additional NIST source on metadata related issues is NIST Special Publication 500-173, "Guide to Data Administration." This guide provides a reference model for the various activities performed by Information Resource Manaement, Data Administration, Data Modeling Tools Administration, and Database Administration. The functions of Data Administration are discussed in detail. Data Administration is responsible for defining an information architecture, and for establishing policies for naming conventions, information modeling techniques and methodologies, data element specification, system information integration, and data protection (i.e. metadata issues). ********************************** * Bruce K. Rosen * * NIST * * Technology Bldg. Room A-266 * * Gaithersburg, MD 20899, USA * * Phone: (301) 975-3246 * * FAX Number: (301) 948-6213 * * email: brosen@nist.gov * ********************************** From coyne@vnet.ibm.com Thu Mar 24 18:29:55 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["7511" "Thu" "24" "March" "1994" "17:10:00" "CST" "coyne@vnet.ibm.com" "coyne@vnet.ibm.com" nil "201" "IEEE Workshop on Metadata for Scientific and Technical Data Management" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA09757; Thu, 24 Mar 94 18:29:53 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA28503; Thu, 24 Mar 94 18:29:50 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA29923; Thu, 24 Mar 94 15:12:47 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA14706; Thu, 24 Mar 94 15:14:06 PST Return-Path: Received: from vnet.IBM.COM by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AB14694; Thu, 24 Mar 94 15:14:01 PST Message-Id: <9403242314.AB14694@pierce.llnl.gov> Received: from HOUVMSCC by vnet.IBM.COM (IBM VM SMTP V2R2) with BSMTP id 6342; Thu, 24 Mar 94 18:09:31 EST From: coyne@vnet.ibm.com To: metadata@llnl.gov Subject: IEEE Workshop on Metadata for Scientific and Technical Data Management Date: Thu, 24 Mar 94 17:10:00 CST *********************** CALL FOR PARTICIPATION ********************** Workshop on Metadata for Scientific and Technical Data Management ================================================================= IEEE Computer Society Technical Committee on Mass Storage Systems Specialist Workshop Chair: Francis Bretherton, Vice-Chair: Otis Graf May 16-18, 1994 National Archives, Archives II Washington, D.C. Increasingly large amounts of scientific and technical data are being created and saved in digital data storage systems. There is a need to expedite the access and use of this data, and to promote interdisciplinary sharing of the data. A variety of different data types and formats need to be addressed, such as: images, audio, video, tables, arrays, graphics, algorithms and procedures, and documents. The purpose of this workshop is to bring together individuals with a common interest in managing and using large stores of scientific and technical data. The focus will be on defining a framework for metadata (data describing the stored data). It is a particular goal of the specialist workshop series to produce a metadata reference model for scientific and technical data management. It is expected that the workshop will be of interest to the following groups: o Users and managers of scientific & technical databases - Astrophysics and astronomy - Biochemistry - Biological - Earth Science - Environmental - Geographical Information Systems - Geophysics and Exploration - High Energy Physics - Intelligence and surveillance - Process Control Systems - Visualization o Computer scientists o Hardware and software vendors o Data system integrators WORKSHOP SPONSOR: IEEE Mass Storage Systems & Technology Technical Committee Bob Coyne, coyne@vnet.ibm.com, (713) 282-7274 WORKSHOP OBJECTIVES (1) Create an intellectual framework and guide for metadata. This can be called a "Metadata Reference Model" for scientific and technical data management. (2) Begin a development effort for the Reference Model. Identify the need for such a model and necessary participants in its development. Begin to define some of the elements of the Reference Model. (3) Identify data providers and data consumers. Determine how they should influence the Reference Model. (4) Determine the technology required to implement an application of the Reference Model. (5) Identify work in progress and describe how that work relates the draft Reference Model. WORKSHOP STRUCTURE The workshop will begin with a series of keynote speakers to set the direction for the following days work. Then the workshop chair and organizing committee will present the scope and objectives, as well as draft versions of the reference model and definitions of terms. The heart of the workshop will be four work groups meeting separately. Prior to breaking into separate groups, the chair will give specific topics and directions to each group. The workshop will conclude with a panel discussion session and a summary session. PARTICIPATION Workshop participation will be limited to 50 persons. It is expected that all participants are active researchers or developers, and have an interest in the management of large stores of scientific or technical data. Participants should submit position papers through electronic mail (preferred), regular mail or FAX by no later than April 20, 1994. Position papers can be a review, original research contributions, extensions of previous work, or application briefs. The position papers will be made available to all workshop attendees. Participants should also read the three reference papers listed below. Copies of the first two items can be downloaded from the anonymous FTP server ftp.clearlake.ibm.com in the directory pub/IEEE_Metadata. The third item can be requested in printed form from the Workshop Coordinator. Participants will be selected by the workshop organizers based on the position papers. Please e-mail or send position papers to the workshop coordinator (Otis Graf) at the address below. The registration fee for the 2 1/2 day workshop is $300.00, which includes breaks and continental breakfasts. Checks or money orders should be made payable to Jorge Scientific Corporation. Registration fees will be collected at the workshop. WORKSHOP CHAIR Francis Bretherton Univ. of Wisconsin, Space Science and Engineering Center E-mail: fbretherton@ssec.wisc.edu VICE-CHAIR & WORKSHOP COORDINATOR Otis Graf IBM U.S. - Federal 3700 Bay Area Blvd. Houston, Texas 77058 FAX: (713) 282-7439 Phone: (713) 282-8216 E-mail: ofgraf@clearlake.ibm.com ORGANIZING COMMITTEE Francis Bretherton, Univ. of Wisconsin SSEC Charles Dollar, National Archives Dave Fulker, Unidata Program Center Otis Graf, IBM U.S. - Federal Paul Kanciruk, Oak Ridge National Lab Ben Kobler, NASA Goddard Space Flight Center Dave Sebring, IBM U.S. - Federal Paul Singley, Oak Ridge National Lab Steve Worley, National Center for Atmospheric Research ADVISORY COMMITTEE Jim Almond, Univ. of Texas CHPC John Davis, DOD Ft. George Meade (IEEE MSS&TC EC) Tom Karl, NOAA NCDC Bernard T. O'Lear, NCAR (IEEE MSS&TC EC) Ron Pfaff, LANL Tom Pyke, NOAA HPCC Dick Watson, LLNL (IEEE MSS&TC EC) Greg Withee, NOAA ESDIM WORKSHOP AGENDA Monday, May 16, 1994 8:00 Registration 8:00 Continental Breakfast 8:45 Welcome and Introductions 9:00 Keynote Speaker #1: Greg Withee, NOAA 9:30 Keynote Speaker #2 10:00 Break 10:30 Discussion of Workshop and Objectives 11:00 Metadata Definition and System Assumptions 12:00 Lunch 1:00 Discussion on Reaction to Draft Metadata Reference Model 2:00 Specific Guidance to Work Groups 3:00 Breakout into four Work Groups 5:30 Adjourn for the day Tuesday, May 17, 1994 8:30 Continental Breakfast 9:00 Breakout into four Work Groups 10:30 Break 10:50 Breakout into four Work Groups 12:00 Lunch 1:00 Breakout into four Work Groups 3:00 Break 3:20 Breakout into four Work Groups 5:30 Adjourn for the day Wednesday, May 18, 1994 8:30 Continental Breakfast 9:00 Reports from the four Work Groups 10:00 Break 10:30 Panel session with audience discussion 12:00 Workshop summary, recommendations and forward plans 1:00 Adjourn the workshop REFERENCE MATERIAL (1) F. Bretherton, Draft of the Metadata Reference Model (2) B. Gritton, "Metadata Definitions" (3) C. Hsu, M. Bouziane and L. Yee, "Information Resources Management in Heterogeneous, Distributed Environments: A Metadatabase Approach", IEEE Transactions on Software Engineering, Vol. 1, No. 6, June 1991. From BARGMEYER.BRUCE@epamail.epa.gov Sun Mar 27 11:45:56 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["10751" "Sun" "27" "March" "1994" "11:20:00" "-0500" "BRUCE BARGMEYER 202-260-5306" "BARGMEYER.BRUCE@epamail.epa.gov" nil "208" "Related Standards Efforts" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA17933; Sun, 27 Mar 94 11:45:54 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA19020; Sun, 27 Mar 94 11:45:51 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA01526; Sun, 27 Mar 94 08:36:29 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA09733; Sun, 27 Mar 94 08:37:49 PST Return-Path: Received: from VAXTM1.RTPNC.EPA.GOV by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA09724; Sun, 27 Mar 94 08:37:44 PST Received: from pyxis.rtpnc.epa.gov by epavax.rtpnc.epa.gov (PMDF V4.2-13 #5309) id <01HAGRGWVGBK90TZXJ@epavax.rtpnc.epa.gov>; Sun, 27 Mar 1994 11:36:15 EST Received: from mr.rtpnc.epa.gov by mail.rtpnc.epa.gov (PMDF V4.2-15 #5309) id <01HAGRG5YDLC8X367P@mail.rtpnc.epa.gov>; Sun, 27 Mar 1994 11:35:39 EST Received: with PMDF-MR; Sun, 27 Mar 1994 11:32:00 EST Mr-Received: by mta CARINA; Relayed; Sun, 27 Mar 1994 11:32:00 -0500 Alternate-Recipient: prohibited Disclose-Recipients: prohibited Message-Id: <01HAGRG6YTBQ8X367P@mr.rtpnc.epa.gov> X-Envelope-To: metadata@llnl.gov Mime-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-Transfer-Encoding: 7BIT Posting-Date: Sun, 27 Mar 1994 11:24:00 -0500 (EST) Importance: normal Priority: normal X400-Mts-Identifier: [;00231172304991/967460@MAIL] A1-Type: MAIL Hop-Count: 0 From: BRUCE BARGMEYER 202-260-5306 To: metadata@llnl.gov Subject: Related Standards Efforts Date: Sun, 27 Mar 1994 11:20:00 -0500 (EST) There are several ongoing standards that are relevant to the topics discussed via this metadata reflector. I think that all the efforts can be complimentary and produce a whole that is greater than any of the efforts separately. The X3L8 and SC14 work tends to focus more on scalar data than on some of the more complex scientific datasets. However, much of the work seems relevant. In the Environmental Protection Agency, we must cover the whole spectrum from data in traditional databases to the terabyte/day of EOS data from NASA. I will send a few messages to the metadata reflector describing some of the work of ANSI X3L8, Data Element Representation and ISO SC 14, Data Element Principles. As an overview, X3L8 together with ISO SC 14 is developing a six part standard for data elements. The six parts are described below. Some of the parts are considerably advanced beyond the others. Part 3, on basic attributes of data elements, and part 4, on data element definitions, are proposed for Draft International Standard status. Part 5, on naming is out for a second Committee Draft ballot. Part 1, on framework, and Part 6, on registration are early drafts. X3L8 is also developing a metamodel for data representation. I am forward a draft of that metamodel as a Postscript file, and a second postscript file describing the modeling notation. I have a reasonable ASCII version of the naming paper, but the rest have a lot of format that is lost in translation, along with the clarity that is conveyed by the format. I will try to send clear text versions where possible, and Postscript versions for better readibility. I will be out for the next week, so it may take a while to get all the material forwarded to the reflector. The following introduces each individual part of the multi-part data specification standard (ISO/IEC 11179). It summarizes main points and discusses the importance of each. It is copied out of Part 1, and will lose its format, fonts, etc. in this transmission. The Framework for the Generation and Standardization of Data Elements, Part 1 of ISO/IEC 11179, introduces and discusses fundamental concepts of data elements essential to the understanding of this set of standards; provides the context for associating the individual parts of the standard; and provides the overall glossary. Classification of Concepts for the Identification of Domains, Part 2 of ISO/IEC 11179, provides a taxonomic profile of a special class of concepts called property terms. This class of concepts is sometimes called "class words" because they identify the data class, or category, to which a data element belongs. Sometime these terms are referred to as "headwords" because they are the word of syntactic dominance in a data element name. Examples include terms such as "color", "count", "code", "rate". It is important to identify and classify property terms because each of these property terms carries an inherent and often extensive domain associated with it. A subset of the property term's inherent domain becomes part of a particular data element domain. To illustrate, consider the data element "color of eyes." The domain of this data element contains a subset of "color's" domain (i.e., the part of the domain that pertains to eye colors). The classification of this special class of concepts assists in delineating and standardizing relevant property terms with their valid value sets for use in data elements. In addition to standardizing property terms for consistency of use, ISO/IEC 11179, Part 2, introduces relevant information concerning taxonomies and ontologies of context objects. Basic Attributes of Data Elements, the increased use of data processing and electronic data interchange heavily relies on accurate, reliable, controllable and verifiable data recorded in databases. One of the prerequisites for a correct and proper use and interpretation of data is that both users and owners of data have a common understanding of the meaning and representation of the data elements. To guarantee a shared view of data elements, a number of attributes have to be defined. This Part 3 of ISO/IEC 11179 specifies attributes of data elements. It is limited to a set of basic attributes independent of their usage in application systems, databases, data interchange messages etc. The basic attributes specified are applicable for the following main activities: a) definition and specification of the contents of data element dictionaries; b) design and specification of application-oriented data models, databases and messages for data interchange; c) actual use of data in communications and information processing systems; d) interchanging or referencing among various collections of data elements. Basic means that they are essential in specifying a data element completely enough to ensure that it will be applicable for a variety of functions such as: design of information processing systems; retrieval of data from databases; design of EDI-messages for data interchange; maintenance of data element dictionaries; data administration; dictionary design; dictionary control; use of information processing systems; Basic also implies that the attributes are independent of any: application environment; function of a data element (e.g. qualifier, indicator); level of abstraction of the meaning (e.g. a representation of a generic concept like 'name of a person' or a representation of a specific concept like 'name of the driver of a truck'); grouping of data elements; method for designing information processing systems or data interchange messages; data element dictionary system. Basic does not imply that all attributes specified in this Part of the International Standard are required in all cases. Distinction is made between those basic attributes that are: mandatory: always required; conditional: required to be present under certain specified conditions; optional: allowed but not required. The set of basic attributes can be extended with additional attributes, e.g. to enable the performance of a comprehensive data management function in an information management domain. Examples thereof are given in an informative annex to Part 3. Rules and Guidelines for the Formulation of Data Definitions, Part 4 of ISO/IEC 11179-4, provides guidance on how to develop good data element definitions. A number of specific rules and guidelines are presented in this document that specify exactly how a data element definition should be formed. A precise, well- formed definition is one of the most critical requirements for shared understanding of a data element; well-formed definitions are imperative for the exchange of information. Only if every user has a common and exact understanding of the data element, can it be exchanged trouble-free. Naming Principles for Data Elements, Part 5 of ISO/IEC 11179, provides guidance for the identification of data elements. Identification is a broad term for designating, or identifying, a particular data element. Identification can be accomplished in various ways, depending upon the use of the identifier. Identification includes the assignment of numerical identifiers, or Data Element Identifiers (DEID), that have no inherent meaning to humans; icons (graphic symbols to which meaning has been assigned); and names with embedded meaning, usually for human understanding, that is associated with the data element's definition and domain. Names are semantic, natural language labels given to data elements and variations of these labels serve different functions. Some names are for human usage and comprehension; some names are for use in a particular physical system environment. Names are often user established and vary from one user to the other. The principles in this document describe the various functions of names and how names are used. Names must be clear, brief, rule-conformant, and free of physical context. They are formulated according to rules which are not dependent on any specific natural language syntax. They contain no concepts which are not represented in the definition; they ideally contain all concepts which are represented in the definition. Names contain certain components in their construction. These components are property terms, context terms, and modifiers. In some naming convention paradigms, property terms are known as "class words" and context terms are known as "prime words". Property terms are the common, unit, abstract nouns that identify the category of data usage of the data element (e.g., "code", "name"). Context terms are the concrete terms derived from the application's context (e.g. "product", "aircraft"), and modifiers are additional words used to specify or restrict the property terms and context terms (e.g., "current", "military" ). These three components of a name are consequential because they indicate the data element's domain and context. There is a very close relationship between the name and the domain of a data element. Registration and Maintenance of Generic Data Elements, Part 6 of ISO/IEC 11179, provides instruction on how to register a data element with a central registration authority and the allocation of unique identifiers for each data element. Maintenance of data elements already registered is also specified in this document. From BARGMEYER.BRUCE@epamail.epa.gov Sun Mar 27 11:56:29 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["26882" "Sun" "27" "March" "1994" "11:33:00" "-0500" "BRUCE BARGMEYER 202-260-5306" "BARGMEYER.BRUCE@epamail.epa.gov" nil "745" "Draft ISO 11179, Part 5, Data element Naming" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA17940; Sun, 27 Mar 94 11:56:26 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA19039; Sun, 27 Mar 94 11:56:22 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA01581; Sun, 27 Mar 94 08:46:36 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA09823; Sun, 27 Mar 94 08:47:55 PST Return-Path: Received: from VAXTM1.RTPNC.EPA.GOV by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA09814; Sun, 27 Mar 94 08:47:50 PST Received: from pyxis.rtpnc.epa.gov by epavax.rtpnc.epa.gov (PMDF V4.2-13 #5309) id <01HAGRTEUOKW90TOA7@epavax.rtpnc.epa.gov>; Sun, 27 Mar 1994 11:46:20 EST Received: from mr.rtpnc.epa.gov by mail.rtpnc.epa.gov (PMDF V4.2-15 #5309) id <01HAGRSMBNMO8X2ZPL@mail.rtpnc.epa.gov>; Sun, 27 Mar 1994 11:45:46 EST Received: with PMDF-MR; Sun, 27 Mar 1994 11:42:39 EST Mr-Received: by mta CARINA; Relayed; Sun, 27 Mar 1994 11:42:39 -0500 Alternate-Recipient: prohibited Disclose-Recipients: prohibited Message-Id: <01HAGRSPBCXY8X2ZPL@mr.rtpnc.epa.gov> X-Envelope-To: metadata@llnl.gov Mime-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="Boundary (ID UeddU+sHeUzSrP8ZDu1acg)" Content-Transfer-Encoding: 7BIT Posting-Date: Sun, 27 Mar 1994 11:34:00 -0500 (EST) Importance: normal Priority: normal X400-Mts-Identifier: [;93241172304991/967477@MAIL] A1-Type: MAIL Hop-Count: 0 From: BRUCE BARGMEYER 202-260-5306 To: metadata@llnl.gov Subject: Draft ISO 11179, Part 5, Data element Naming Date: Sun, 27 Mar 1994 11:33:00 -0500 (EST) --Boundary (ID UeddU+sHeUzSrP8ZDu1acg) Content-type: TEXT/PLAIN; CHARSET=US-ASCII --Boundary (ID UeddU+sHeUzSrP8ZDu1acg) MIME-version: 1.0 Content-type: MESSAGE/RFC822 Date: Thu, 3 Mar 1994 10:15:00 EST Subject: New version of 11179-5 Sender: Judith Newton To: "goldfine@speckle.ncsl.nist.gov" , "kirkbrik@hoffman-emhl.army.mil" , "whkjr@digex.net" , "mrood@mitre.org" , "ngo.phongx" , "mph@rruxls.bellcore.com" , "bargmeyer.bruce" , "appel@srdslc1.fb4.noaa.gov" Content-type: TEXT/PLAIN; CHARSET=US-ASCII Posting-date: Sun, 27 Mar 1994 00:00:00 EST Importance: normal A1-type: MAIL Editor's notes: 24 February 1994 This version of CD 11179-5 submitted for a second CD ballot has been revised taking into consideration comments received from the French and Netherlands members. Some of the changes include: - The scope section clarifies the narrow definition of "identification" which this standard addresses. In addition, the term "identifier" has been replaced by "registration identifier" to emphasize this narrow scope. - The normative references now only list documents at the DIS level and above. - Each term has only a single definition, and one in agreement with 11179-3. - A definition for "structure set" has been added. - Some revision to the text of Clauses 5, 6, and 7 have been made for clarification (I hope!). - References to predicates and versions have been deleted. - Annexes A and B have been deleted. The editor wishes to thank those members who submitted comments and hopes this version meets with approval. Thank You, Judith Newton voice +1 301 975 3256 fax +1 301 948 6213 email jnewton@nist.gov ISO/IEC CD 11179-5 Specification and Standardization Of Data Elements PART 5 Naming and Identification Principles For Data Elements FEBRUARY 1994 Naming and Identification Principles For Data Elements Contents ---------------------------------------------------- Foreword 1 Introduction 2 Scope 3 Normative References 4 Definitions 5 Principles for the Identification Structure of Data 6 Rules for Registration Identification of Data 7 Guidelines for Structured Naming Conventions 8 Thesaurus Application Guidelines Informative Annexes A Example Naming Convention B Registration Identification Example C Thesaurus Example Foreword This International Standard consists of a set of six interrelated parts, with each part focussing on one aspect of data representation principles. The six parts are titled as follows: 11179-1 - Framework for the Specification and Standardization of Data Elements 11179-2 - Classification of Concepts for the Identification of Domains 11179-3 - Basic Attributes of Data Elements 11179-4 - Rules and Guidelines for the Formulation of Data Definitions 11179-5 - Naming and Identification Principles for Data Elements 11179-6 - Registration of Data Elements This part of ISO/IEC 11179 contains rules and guidance for naming and identifying data elements. 1 Introduction This standard contains principles, rules and guidelines. Principles establish the premises on which the rules are based. Rules are mandatory and testable for compliance. Guidelines are applications of the rules recommended for good practice. 2 Scope This part of the International Standard describes the components and structure of data element identification. Identification is narrowly defined to encompass only the means to establish unique identification of data elements within a registry. It defines the identifying attributes; describes the relationship of the attributes to each other; includes principles by which naming conventions can be developed; and describes an example naming convention. The naming guidelines described herein can be applied to all names. This part of the International Standard is meant to be used in conjunction with those which establish rules and procedures for attributing, classifying, defining, and registering data elements. 3 Normative References The following standards contain provisions which, through reference in the text, constitute provisions for this International Standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this International Standard are encouraged to investigate the possibility of applying the most recent editions of standards indicated below. Members of IEC and ISO maintain registers of currently valid International Standards. ISO/IEC DIS 11179-3:1993, Basic Attributes of Data Elements 4 Definitions For the purposes of this part of this International Standard, the following definitions apply. 4.1 attribute: A characteristic of an object or entity. 4.2 class (of entities): In a conceptual schema language, all possible entities in a universe of discourse for which a given proposition holds. 4.3 context: An environment, such as an organization, discipline or industry, for which specific naming conventions, including rules for uniqueness, are enforced. 4.4 data element: a unit of data for which the identification, meaning, representations and permissible values are specified by means of a set of attributes. 4.5 data element concept: A concept which can be represented in the form of a data element, described independently of any particular representation. 4.6 definition: A word or phrase expressing the essential nature of a person or thing or class of persons or things: an answer to the question "what is x?" or "what is an x?". 4.7 domain: The set of permissible data values from which actual values are taken for a particular attribute or specific data element. 4.8 headword: A common, unit, abstract noun which is syntactically dominant. 4.9 identifier: A character string or other graphic symbol used to identify a data element. 4.10 lexical: Of or relating to words or the vocabulary of a language as distinguished from its grammar and construction. 4.11 modifier: A word or words which help define and differentiate a name within the database. 4.12 name: The primary means of identification of objects and concepts for humans. 4.13 property: A component of the name of a data element which expresses the category to which the data element belongs, e.g., "file," "date," "name." 4.14 referent: A component of the name of a data element which reresents the logical data grouping (in a logical data model) to which it belongs; e.g., "employee." 4.15 registration authority: Any organization authorized to register a data element; the attribute which stores this. 4.16 semantics: The relationships of characters, groups of characters, words, or terms to their meanings. 4.17 separator: A symbol or space enclosing or separating a component within a name; a delimiter. 4.18 structure set: a method of placing objects in context, revealing relationships to other objects. Examples include Entity-Relationship Models, taxonomies, and ontologies. 4.19 syntax: The relationships among characters or groups of characters, independent of their meanings or the manner of their interpretation and use. The structure of expressions in a language, and the rules governing the structure of a language. 4.20 thesaurus: A controlled vocabulary arranged in a known order in which relationships among terms are displayed and identified. 5 Principles for the Identification Structure of Data 5.1 Identifying Attributes A set of four related attributes serves to name and identify each data element for the purpose of differentiating data elements within a registery. These attributes are: name context registration identifier registration authority The fundamental principles for these attributes are stated below. 5.2 Name and Context Each data element is assigned one or more names. Each name has special utility within a particular context. Rigorously structured names may be created for data administration, a preferred name may be specified by users, shortened names may be generated for particular software environments such as a particular programming language or database management system. Names for many data elements may be developed within each context. A naming convention (usually a set of rules) is established for each context to specify how names are formulated within that context. The naming convention covers all pertinent aspects of the context. This includes, as applicable: a. the purpose of the context, e.g., industry preferred name; b. the authority which establishes the name; c. semantic rules governing the source and content of the words used in the name, e.g., words derived from data models, words commonly used in the discipline, etc.; d. syntactic rules covering required word order; e. lexical rules covering controlled word lists, name length, character set, language; and f. a rule establishing whether or not names within this context must be unique. These aspects of a naming convention are detailed in Clause 7, which provides guidelines for developing a naming convention for a rigorously structured context. 5.3 Registration Identifier and Registration Authority At least one unique registration identifier is required for a data element. The registration identifier does not change as long as the domain and definition of the data element do not change. Each registration identifier is assigned by a registration authority; therefore, registration identifiers are unique within a registration authority. As a data element may be assigned registration identifiers by multiple registration authorities, the registration identifier and the registration authority are both necessary for identification of a data element. If particular aspects of a data element change, then a new version of the data element is registered. The registration identifier is the universal means for identifying a data element and can serve as the basis for exchanging data among information systems, organizations, or other parties who wish to share a specific data element, but may not utilize the same names or contexts. the unique registration identifier is also useful for language translation when the registration identifier is associated with contexts established for more than one natural language. This standard does not specify the format or content of the unique registration identifier to be assigned by any registration authority. Requirements for a Registration Authority are specified in Part 6 of the International Standard. 6 Rules for Registration Identification of Data 1. Each data element shall have a unique registration identifier within a registration authority. 2. The combination of registration identifier and registration authority shall constitute a unique identification for a data element. 3. To be assigned a registration identifier, a data element shall have been: derived according to Part 2, attributed according to Part 3, defined according to Part 4, named according to Part 5, and registered according to Part 6. 4. A data element shall have at least one name within a context. 7 Guidelines for Structured Naming Conventions The following are guidelines that could be used to develop a naming convention to produce rigorously structured names for a particular context. Annex A is an example of a specific naming convention that is consistent with these principles. The guidelines are described in general terms with examples furnished. Rules are derived from the principles by which names are developed; these rules form a naming convention. Names formed according to these rules can be easily translated into languages other than the original because of the simplified syntax. Syntax, semantics and lexical rules vary by organizations such as corporations or standards-setting bodies for business sectors; each can establish rules for name formation within its context. As discussed in subclause 7.1.1.1, each data element is formed from a set of components selected from the structure sets within its context. Data element names should be formed from the names of components, each assigned meaning (semantics) and relative or absolute position (syntax) within a name. They are subject to lexical rules. They may, but need not, be delimited by a separator symbol. The domain, the set or range of values of each component, should be rigorously controlled by an authority, e.g., a data administrator within a corporation or an approving committee for an international business sector naming standard. Semantic rules convey meaning through logical reference. Syntactic rules relate components in a consistent, specified order. Lexical (word form and vocabulary) rules reduce redundancy and increase precision. 7.1 Principles Governing Semantic Content of Names Semantics concerns the meanings of components and the separators which enclose or delimit them. 7.1.1 Semantics of Components Components consist of discrete terms to which meaning is ascribed through logical reference. The components described in this International Standard are: referents, properties, and modifiers. 7.1.1.1 Referent Using a modeling methodology, as for instance an Entity Relationship Diagram (ERD) or object model, is a way to locate and discretely place all data elements in relation to their higher-level model entities. The attributes of entity-relationship model entities equate to data elements which are related to each other through further application of the methodology. In the object model, data elements are expressed as object attributes. Many of these model attributes remain constant among model entities; for instance, the location of a real-world object such as a building or piece of equipment, address for a person or business, and many codes widely distributed throughout the enterprise, such as social security number used as the identifier of an employee. Models provide one classification scheme for data elements. Data elements may be identified with their originating modeling entities by mapping the referent to the model entity name. Examples of these names are: Employee, Cost, Training, and Member. The Framework provides examples of the mapping between referents and ERD and object model entities. 7.1.1.2 Property A set of properties is developed from the set of components in the taxonomy. This set must consist of terms which are discrete (the definition of each does not overlap the definition of any other), and complete (taken together, the set represents all information concepts required for the specification of data elements). A property may contain a headword (a word of syntactic dominance). Examples of properties include Name, Count, Code, Rate, Velocity, Length Measure, Height Measure, and Number. The property will occur naturally in the definition of a data element. Using components from two structure sets provides a complementary way of categorization. Both referent and property components of data elements are utilized to form a name which contains vital information about the data element, and also excludes extraneous or irrational elements which may be introduced when no conventions are employed. 7.1.1.3 Modifier Modifiers may be attached to properties and referents if necessary to uniquely identify a data element. These modifiers may be derived from structure sets specific to a context. In the rules for a naming convention, a restriction in the number of modifiers is recommended. A modifier serves to constrain the meaning of another component. Semantic limitations are delineated for modifiers to reduce redundancy and increase incidence of data reuse through recognition of synonymous elements. A mechanism such as a thesaurus of terms facilitates this effort (see Clause 8; Annex C). 7.1.2 Semantics of Separators Components are delimited by separators. These may or may not have semantic meaning. A simple rule stating that separators will consist of one blank space or exactly one special character (for example a hyphen or underscore) regardless of semantic relationships of components simplifies name formation. Alternatively, semantic meaning can be conveyed by separators by, for example, assigning a different separator between the modifier and head word in the property than that which separates the other modifier(s). In this way, the separator serves as a signal to the user and identifies the property clearly as different from the rest of the name. Example: in Employee-Birth-State_Name, the property separator is an underscore; other components are separated by hyphens. Some languages, such as German and Dutch, commonly join grammatical constructs together in a single word (resulting in one word which in English might be a phrase consisting of nouns and adjectives). These languages may use a separator which is not a break between words, such as a hyphen, space or underscore, but instead capitalize the first letter of each grammatical construct within a single word. 7.2 Principles Governing Format of Names 7.2.1 Syntactic Guidelines A naming convention specifies the arrangement of components within the name. This arrangement may be specified as relative or absolute, or some combination of the two. 1. Relative arrangement specifies components in terms of other components, e.g., a rule within a convention might require that a modifier must always appear before the component being modified appears. 2. Absolute arrangement specifies a fixed occurrence of the component, e.g., a rule might require that the property is always the last component in the name. 7.2.2 Lexical Guidelines These rules concern allowed and disallowed words in components, synonyms, abbreviations, component length, etc. 8 Thesaurus Application Guidelines A thesaurus in which the user can find a variety of synonyms, near-synonyms and homographs for name components is a valuable tool. It can provide semantic linking between preferred name terms and other terms. In addition to guidance for use of homographs (words with the same spelling representing different concepts), a thesaurus can direct the user through choices involving: equivalence - in which one word or term is preferred over others for expression of a concept; hierarchy - in which a relationship between broader and narrower terms is expressed by levels of superordination or subordination; and association - in which two or more terms are semantically or conceptually associated, whether they belong to the same hierarchy or different hierarchies. A thesaurus for components used in the name may be developed and distributed to interested parties by the registrar; in addition, development of subject area thesauri should be encouraged. A thesaurus may be used to describe structures in verbal representation as a supplement to graphic depiction. Preferred terms become component names; relationships between preferred terms express the position of components in the structure. A controlled vocabulary is an advantage for thesaurus use. Control can be built into a thesaurus through scope of descriptors, linking of synonyms and near-synonyms through equivalence, and the resolution of homographs. These are all functions which users of classification structures need to navigate through the system. ANNEX A (INFORMATIVE) EXAMPLE NAMING CONVENTION These rules are derived from the guidelines described in Clause 7. Examples are included. They may be applied to the development of context names at the discretion of the subject area authority. A.1 Semantic Rules a. Referents represent things of interest to the enterprise which may, for instance, be found in an enterprise model structure set. Example: Cost b. One and only one referent shall be present. c. Properties shall be derived from the classification system structure set and represent the class, or category, of the data. Example: Total Amount d. One and only one property shall be present. e. Modifiers may be derived as determined by the subject area authority and will be added as needed to describe the data element and make it unique within a specified context. One modifier may be part of a property; the order of other modifiers is not significant. Modifiers are optional. Example: Budget Period Total Amount A.2 Syntactic Rules a. The referent shall occupy the first (leftmost) position in the name. b. Modifiers shall precede the component modified. The order of modifiers shall not be used to differentiate data element names. c. The property shall occupy the last (rightmost) position. Example: Cost : Budget Period : [Total Amount] referent : modifiers : [property] A.3 Lexical Rules a. Nouns are used in singular form only. Verbs (if any) are in the present tense. b. Name components and words in multi-word terms are separated by spaces. No special characters are allowed. c. All words in the name are in mixed case. d. Abbreviations, acronyms, and initialisms are allowed. Example: Cost Budget Period Total Amount ANNEX B (INFORMATIVE) IDENTIFICATION EXAMPLE (ENGLISH LANGUAGE) The identification structure of an example data element is as follows: DATA ELEMENT | 1:1 | contains ------------------------------------------------------------------ |REG AUTH REG ID NAME CONTEXT | | | |ISO 848575 ACCOUNT_AMOUNT USA_GICS | | | | Cost Budget Period FIN-EDI | | Total Amount | | | |IEEE 193847 Transfer-Cost-Amount Engineering | | | | our_cost_$ Contracts | | | | | | | | | | | ------------------------------------------------------------------ The component structure for this element is as follows: REFERENT: Cost \ \ PROPERTY: \ Total Amount \ / \ MODIFIERS: / \ Budget Period \ / \ NAME: Cost Budget Period Total Amount ANNEX C (INFORMATIVE) THESAURUS EXAMPLE An excerpt from a possible thesaurus of structure terms, including structural information as well as synonym and homonym resolution. key: BT - Broader Term NT - Narrower Term UF - Use For USE - Use the following term instead RT - Related Term SN - Scope Note THESAURUS EXCERPT COST SN Amount the organization spends to procure goods or services. BT Contract NT Petty Cash UF Expense RT Budget RT Amount Expense USE Cost Note that although the thesaurus entry can show that COST is related to both Budget and Amount, the exact nature of the relationship is not explicit. Italics are used to denote deprecated terms. This is the method used to distinguish preferred terms among synonyms. One or more levels of hierarchy can be shown by listing several broader and narrower terms. This is a thesaurus design decision. The scope note reflects the definition as described in Part 4 of this Standard. --Boundary (ID UeddU+sHeUzSrP8ZDu1acg) MIME-version: 1.0 Content-type: MESSAGE/RFC822 Date: Thu, 3 Mar 1994 10:28:00 EST From: SYSTEM@CARINA.RTPNC.EPA.GOV Subject: Content-type: TEXT/PLAIN; CHARSET=US-ASCII Posting-date: Thu, 3 Mar 1994 10:28:00 EST Importance: normal A1-type: DOCUMENT RFC-822-headers: Received: from vaxtm1.rtpnc.epa.gov by mail.rtpnc.epa.gov (PMDF V4.2-15 #5309) id <01H9J5PGHCC08WYA8P@mail.rtpnc.epa.gov>; Thu, 3 Mar 1994 10:17:07 EST Received: from merlin.rtpnc.epa.gov by epavax.rtpnc.epa.gov (PMDF V4.2-13 #5309) id <01H9J5P0QH5S8WXCTM@epavax.rtpnc.epa.gov>; Thu, 3 Mar 1994 10:16:50 EST Received: from speckle.ncsl.nist.gov by merlin.rtpnc.epa.gov (8.6.6.Beta9/1.34) id KAA02494; Thu, 3 Mar 1994 10:16:37 -0500 Received: from [129.6.59.190] (WHITE.NCSL.NIST.GOV) by speckle.ncsl.nist.gov (4.1/SMI-3.2-del/cas.6) id AA28139; Thu, 3 Mar 94 10:15:27 EST Date: Thu, 03 Mar 1994 10:15:25 +0000 From: Judith Newton Subject: New version of 11179-5 To: appel@srdslc1.fb4.noaa.gov, bargmeyer.bruce@epamail.epa.gov, mph@rruxls.bellcore.com, ngo.phongx@epamail.epa.gov, mrood@mitre.org, whkjr@digex.net, kirkbrik@hoffman-emhl.army.mil, goldfine@speckle.ncsl.nist.gov Message-id: <184.newton@speckle.ncsl.nist.gov> X-Envelope-to: bargmeyer.bruce@mr.rtpnc.epa.gov, ngo.phongx@mr.rtpnc.epa.gov Content-transfer-encoding: 7BIT X-Popmail-Version: POPMail/PC3.4_Alpha_3 X-Popmail-Charset: English --Boundary (ID UeddU+sHeUzSrP8ZDu1acg)-- From b_theodoulidis@mac.co.umist.ac.uk Mon Mar 28 06:14:41 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["8529" "" "28" "March" "1994" "10:22:36" "+0000" "Babis Theodoulidis" "b_theodoulidis@mac.co.umist.ac.uk" nil "250" "Call for Papers ER94" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA19374; Mon, 28 Mar 94 06:14:39 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA24114; Mon, 28 Mar 94 06:14:35 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA06859; Mon, 28 Mar 94 02:44:44 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA19900; Mon, 28 Mar 94 02:46:02 PST Return-Path: Received: from mac.co.umist.ac.uk ([192.84.84.248]) by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA19878; Mon, 28 Mar 94 02:43:39 PST Message-Id: <9403281043.AA19878@pierce.llnl.gov> From: "Babis Theodoulidis" To: "Aziz Ait-Braham" , "Georgios Andrianopoulos" , "Jayant Chaudhary" , "Compunode ECRC" , "dbworld" , "Ingres Users" , "Georgios Karvelis" , "CKBS KEELE" , "Nikos Lorentzos" , "Metadata" , "Adolfo Munoz" , "Terttu Orci stockholm" , Ores.RA's.COMPUTATION@mac.co.umist.ac.uk, "IKBS Rutherford" , "Carlos Hernandez Salvador" , "SICSTUS Users" , "Simon Sou" , b_theodoulidis@mac.co.umist.ac.uk, "D Tselios" Subject: Call for Papers ER94 Date: 28 Mar 1994 10:22:36 +0000 REGARDING Call for Papers ER94 CALL FOR PAPERS The Thirteenth International Conference on THE ENTITY-RELATIONSHIP APPROACH "Business Modelling and Re-engineering" December 13-16 1994 Manchester, UK Sponsored by: British Computer Society SERC UMIST ER Institute The Conference The ER conference is the primary forum for researchers and practitioners in the field of conceptual data modelling. Since its inception, the ER conference has proved to be one of the major vehicles for exchange of research results and practical experiences using many different modelling approaches including variants of the ER model, Object-Oriented models, Object-Role models, Rule-based models, Temporal models etc. as well as related technology aspects such as databases and knowledge bases. The ER '94 conference will offer a programme of state of the art papers, combined with panel sessions, invited talks and tutorials. The theme of the conference in 1994 will be Business Modelling and Re-engineering. This is a key challenging area as increasingly organisations strive to improve the co-ordination between systems and ultimately individuals. Improving the performance of large business processes, some of which may take place across different organisations, requires appropriate modelling techniques and infrastructure technology to assist in the management of the interaction between different agents participating in these processes. The ER '94 conference will represent a balance between the interrelated areas of modelling and infrastructure. Topics of Interest Papers are sought in, but not limited to, the topics below. Authors should state clearly whether their contribution is in the area of modelling or infrastructure technology. Enterprise integration, Enterprise Modelling, Process Modelling, Enterprise Engineering and Re-engineering, Software Re- engineering, CASE Environments, Repository Technology, Federated Systems, Prototyping, Verification and Validation, Temporal Information Systems, Multimedia Modelling, Quality Aspects of Conceptual Modelling, Virtual Reality and Systems Development, Distributed Knowledgebases, Hypermedia Cooperation in Heterogeneous systems, Strategic Information Systems. Information for Authors Authors must clearly state the contribution of their work to the theme of the conference. The edited proceedings of ER '94 will appear as a book from a major international publisher. Five copies of original unpublished papers up to 5000 words should be sent to: P Loucopoulos Department of Computation UMIST P.O. Box 88, Sackville Street Manchester M60 1QD UK Important Dates 30 April 1994 - Papers submission due 30 June 1994 - Notification of acceptance 31 August 1994 - Camera-ready copy due 31 October 1994 - Early registration deadline Conference Location Manchester is a city of surprises, a city of variety but most of all it is a city of colour and vitality. It is the North West of England's premier city. Manchester boasts about Europe's largest municipal park, the hottest night life in Britain, two first rank orchestras, the best U.K. theatreland outside of London and a wealth of shopping opportunities. Manchester is easy to reach by road rail or air. Manchester airport is one of the largest international airports in Europe with connections to all parts of the world. Manchester is just over one hours travel from other major European cities such as Amsterdam, Paris, Brussels and Frankfurt. The airport has a direct rail link with the city centre ant the visitor can reach the city from the airport in 15 minutes. The city is serviced by an excellent road network which links it to other parts of the country. Manchester has more miles of motorway than any other U.K. city. Manchester Piccadilly station is a key link in the inter- city network with services to all main-line stations in Britain. Average travel time by rail to London is two and a half hours. GENERAL CONFERENCE CHAIR John Mylopoulos, University of Toronto, Canada EUROPEAN CONFERENCE CHAIR Stefano Spaccapietra, Ecole Polytechnic Federale Lausanne, Switzerland NORTH AMERICAN CONFERENCE CHAIR Sham Navathe, Georgia Institute of Technology, U.S.A. ORGANISING CHAIR Keith Jeffery, RAL-SERC, U.K. PROGRAMME CHAIRS Pericles Loucopoulos, UMIST, U.K. Ramez Elmasri, University of Texas, U.S.A. PANEL ORGANISING CHAIR Colette Rolland, Universite Paris 1 - Pantheon Sorbonne TUTORIALS ORGANISING CHAIR Carole Goble, University of Manchester, U.K. TREASURER Babis Theodoulidis, UMIST, U.K. PUBLICITY CHAIR Mike Jackson, University of Wolverhampton, U.K. PROGRAMME COMMITTEE David Avison U.K. Jorge Bocca U.K. Omar Boucelma France Sjaak Brinkkemper Netherlands Janis Bubenko Sweden John Carlis U.S.A. Sharma Chakravarthy U.S.A. Valeria De Antonellis Italy Anthony Finkelstein U.K. Guy Fitzgerald U.K. Andre Flory France Donald Flynn U.K. Michael Freeston Germany Carole Goble U.K. Ted Goranson U.S.A. Georges Grosz France Terry Halpin Australia Michael Huhns U.S.A. Manfred Jeusfeld Germany Vram Kouramajian U.S.A. Mike Mannino U.S.A. Salvatore March U.S.A. Leora Morgenstern U.S.A. Renate Motschnig Austria Shamkant Navathe U.S.A. Eric Neuhold Germany Antoni Olive Spain Maria Orlowska Australia Mike Papazoglou Australia Barbara Pernici Italy Naveen Prakash India Sudha Ram U.S.A. Colette Rolland France Thomas Rose Germany Kevin Ryan Ireland Arie Segev U.S.A. Amilcar Sernadas Portugal Madan Singh U.K. Arne Solvberg Norway Il-Yeol Song U.S.A. Stefano Spaccapietra Switzerland Peter Stocker U.K. Toby Teorey U.S.A. Constantino Thanos Italy Babis Theodoulidis U.K. Aphrodite Tsalgatidou Greece Yannis Vassiliou Greece Benkt Wangler Sweden Marianne Winslett U.S.A. Bob Wood U.K. Trevor Wood-Harper U.K. Carlo Zaniolo U.S.A. REGIONAL COORDINATORS R Andersen Norwegian Institute of Technology, Norway R Carapuca INESC, Portugal J Fong City Polytechnic of Hong Kong, Hong Kong J B Grimson Trinity College, University of Dublin, Ireland M Kersten CWI, Netherlands K-C Lee Hua Hsing Information Corp, Taiwan M Leonard Universite de Geneve, Switzerland B G Lundberg University of Stockholm, Sweden S Nishio Osaka University, Japan M E Orlowska University of Queensland, Australia A Pirotte Universite de Louvain, Belgium F Plasil Czech University of Technology, Czech Republic S Sa The People's University of China, China F Saltor Technical University of Barcelona, Spain G Schlageter Fern University of Hagen, Germany D Shasha New York University, USA C K Tan National University of Singapore, Singapore L Tucherman IBM Brazil, Brazil Y Vassiliou Research Centre of Crete, Greece Further Information For details of the conference and the exhibition, please contact: Mrs. Janet Houshmand ER94 Conference Department of Computation UMIST P.O. Box 88 Sackville Street Manchester M60 1QD U.K. Tel: +44-61-200-3302 Fax: +44-61-200-3324 e-mail: er94@sna.co.umist.ac.uk To ensure that you receive the Advance Programme and that you are able to take advantage of early registration, please send your name and address to the secretariat. You may alternatively contact the Organising Committee by e-mail if you wish. Organisations interested in taking part in the exhibition or an industrial session, or in possible sponsorship of the conference or social events are also invited to contact the organisers. From BARGMEYER.BRUCE@epamail.epa.gov Sun Mar 27 20:21:24 1994 Status: RO X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil] ["29370" "Sun" "27" "March" "1994" "19:49:00" "-0500" "BRUCE BARGMEYER 202-260-5306" "BARGMEYER.BRUCE@epamail.epa.gov" nil "773" "ISO 11179, Part 4, Definitions - ASCII" "^From:" nil nil "3" nil nil nil nil] nil) Received: from cv3.cv.nrao.edu by fits.cv.nrao.edu (4.1/DDN-DLB/1.5) id AA18121; Sun, 27 Mar 94 20:21:21 EST Received: from ocfmail.ocf.llnl.gov by cv3.cv.nrao.edu (4.1/DDN-DLB/1.13) id AA21353; Sun, 27 Mar 94 20:21:18 EST Received: from pierce.llnl.gov by ocfmail.ocf.llnl.gov (4.1/SMI-4.0) id AA04000; Sun, 27 Mar 94 17:09:50 PST Received: by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA14352; Sun, 27 Mar 94 17:11:10 PST Return-Path: Received: from VAXTM1.RTPNC.EPA.GOV by pierce.llnl.gov (4.1/LLNL-1.18/llnl.gov-05.92) id AA14343; Sun, 27 Mar 94 17:11:04 PST Received: from pyxis.rtpnc.epa.gov by epavax.rtpnc.epa.gov (PMDF V4.2-13 #5309) id <01HAH9EAAKKW90UAT1@epavax.rtpnc.epa.gov>; Sun, 27 Mar 1994 20:09:34 EST Received: from mr.rtpnc.epa.gov by mail.rtpnc.epa.gov (PMDF V4.2-15 #5309) id <01HAH9DMS9808X286P@mail.rtpnc.epa.gov>; Sun, 27 Mar 1994 20:09:03 EST Received: with PMDF-MR; Sun, 27 Mar 1994 20:07:35 EST Mr-Received: by mta CARINA; Relayed; Sun, 27 Mar 1994 20:07:35 -0500 Alternate-Recipient: prohibited Disclose-Recipients: prohibited Message-Id: <01HAH9DP1PBQ8X286P@mr.rtpnc.epa.gov> X-Envelope-To: metadata@llnl.gov Mime-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-Transfer-Encoding: 7BIT Posting-Date: Sun, 27 Mar 1994 20:00:00 -0500 (EST) Importance: normal Priority: normal X400-Mts-Identifier: [;53700272304991/967921@MAIL] A1-Type: MAIL Hop-Count: 0 From: BRUCE BARGMEYER 202-260-5306 To: metadata@llnl.gov Subject: ISO 11179, Part 4, Definitions - ASCII Date: Sun, 27 Mar 1994 19:49:00 -0500 (EST) This is a draft of ISO 11179, Part 4, Rules and Guidelines for the Formulation of Data Definitions. It is going out for balloting as a Draft International Standard. If you have comments, please send them to bargmeyer.bruce@epamail.epa.gov The following is an ASCII version which may have gotten mangled a bit as all the cosmetics got stripped. The next message has the same document as a Postscript file. Note: This proposal presumes that other useful information, such as the purpose for which the data was collected, the statistical procedures involved in data collection, and examples are separate items of metadata that should be stored separately from the definition. --Bruce Bargmeyer ********************************** ISO/IEC STANDARD 11179 Specification and Standardization of Data Elements PART 4 Rules and Guidelines For The Formulation of Data Definitions January 1994 Rules and Guidelines For The Formulation of Data Definitions Contents _________________________________________________________________ Foreword 1 Scope 2 Normative References 3 Definitions 4 Summary of Data Definition Rules and Guidelines 5 Requirements Foreword This document is Part 4 of ISO/IEC 11179, an International Standard concerning data element representation. ISO/IEC 11179 includes six interrelated parts, with each part focusing on one aspect of data element development. All parts shall be used in conjunction with each other. The six parts are titled as follows: - Framework for the Generation and Standardization of Data Elements - Classification of Concepts for the Identification of Domains - Basic Attributes of Data Elements - Rules and Guidelines for the Formulation of Data Definitions - Naming and Identification Principles for Data Elements - Registration of Data Elements This document provides guidance on the construction of well- formed data element definitions. See Framework for the Generation and Standardization of Data Elements, Part 1 of this International Standard, for discussions of all parts. ISO 704:1987 (E) states that "a definition is a comprehensive description of a concept by means of known concepts expressed mainly by verbal means". The purpose of a data element definition is to define the meaning of a data element. Precise and unambiguous data element definitions are one of the most critical aspects of ensuring data shareability. When two or more parties exchange data, it is essential that all are in explicit agreement on the meaning of that data. One of the primary vehicles for carrying the data's meaning is the data element definition. Therefore, it is mandatory that every data element have a well-formed definition; one that is clearly understood by every user. Poorly formulated data element definitions foster misunderstandings and ambiguities and often inhibit successful communication. To ensure consistency and quality in constructing data element definitions, this Part of the International Standard, which specifies the characteristics of well-formed data definitions, has been developed. The rules and guidelines of this standard apply to the formulation of definitions of data elements used in information processing systems and information interchange. This Part of the International Standard contains both rules and guidelines. Rules are mandatory and testable for compliance. Guidelines are principles that shall be followed. Objective test criteria can be established for rules, conformance with guidelines is assessed by judgment of reasonableness. The data element names in this Part of the International Standard do not follow a particular syntax. In this standard, the term data element refers to data element type; the shorter term is used for convenience. This document contains one Annex, Informative References. 1 Scope This Part of the International Standard specifies rules and guidelines for constructing definitions for data elements. Only semantic structures of data element definitions are addressed; specifications for formatting the definitions are deemed unnecessary for the purposes of this standard. Although these definitional rules and guidelines pertain to data elements, they can also be applied in formulating definitions for other types of data constructs such as entity types, entities, relationships, attributes, object types (or classes), objects, segments, composites, code entries, and messages. The definitional rules and guidelines in this Part of the International standard do not always apply to terminological definitions found in glossaries and language dictionaries. Differences exist between the rules that apply in a language dictionary, and the rules that apply in a data dictionary. For example, terms in a language dictionary may have multiple definitions in the dictionary, whereas data dictionary definitions must be unique within a dictionary and have a single meaning. Many data element definitions include terms that themselves need to be defined (e.g., "charges", "allowances", "delivery"). Some of these terms may have different definitions in different industrial sectors. Therefore, there is a need for most data dictionaries to establish an associated glossary of terms used in the definitions. The area(s) of use for each term is identified in the glossary. 2 Normative References The following standards contain provisions which, through refer- ence in this text, constitute provisions of this Part of the International Standard. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this Part of the International Standard are encouraged to apply the most recent editions of the standard indicated below. Members of IEC and ISO maintain registers of currently valid International Standards. ISO 704:1987, Principles and methods of terminology ISO 2382-4:1987, Information processing systems - Vocabulary; parts 1-25 ISO Standards Handbook 10, Data Processing - Vocabulary, 1982 ISO 1087:1990, Terminology - Vocabulary ISO 10241:1992, International terminology standards - preparation and layout 3 Definitions For the purposes of this Part of the International Standard, the following definitions apply. attribute: A property or characteristic of one or more entities; for example, color, weight, sex (ISO 2382). A description of a characteristic of an object including rules and/or constraints [ISO/CD 11179]; A property inherent in an entity or associated with that entity for database purposes. See data attribute. code: (1) A set of rules that maps the elements on one set, the coded set, onto the elements of another set, the code element set. Synonymous with coding scheme (ISO 2382). (2) A set of items, such as abbreviations, that represent the members of another set (ISO 2382). (3) An assemblage of symbols used to briefly represent unique and specifically assigned words. Each code has corresponding word or phrase that it represents. concept: A unit of thought constituted through abstraction on the basis of characteristics common to a set of objects. data: (1) A representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by humans or by automatic means (ISO 2382). (2) Any representation such as characters or analog quantities to which meaning is or might be assigned (ISO 2382). (3) A representation of facts, concepts, or instructions that are collected, organized, recorded, processed, and stored in a retrievable form suitable for communication, interpretation, or processing by human or automated means. database: (1) A collection of interrelated data, often with controlled redundancy, organized according to a schema to serve one or more applications; the data are stored so that they can be used by different programs without concern for the data structure or organization. A common approach is used to add new data and to modify and retrieve existing data (ANSI X3.172-1990). (2) A collection of interrelated data objects stored together with controlled redundancy, according to one schema to serve one or more applications. data dictionary: (1) A database used for data that refers to the use and structure of other data; that is, a database for the storage of metadata (ANSI X3.172-1990). (2) An inventory that describes, defines, and lists all of the data elements that are stored in a database (ANSI X3.172-1990). (3) A subset of a data dictionary/directory that provides definitions for each data element. See data element dictionary. NOTE: Data element dictionaries may exist at various levels, e.g., ISO/IEC Committees, international associations, industry sectors, companies, application systems. data element: (1) A named unit of data that, in some contexts, is considered indivisible and in other contexts may consist of data items (ISO 2382). (2) A unit of data that in a certain context is considered indivisible [ISO 2382/4]. (3) A named identifier of each of the entities and their attributes that are represented in a database (ISO 2382). (4) A unit of data for which the identification, description and value representation have been specified [ISO 9735]. (5) A category of data which represents a data element concept and properties are expressed as a set of data element attributes, which permit it to support information interchange in automatic or manual data processing systems [JTC 1/SC 1 N1238]. (6) An item of data representing a single fact about a business object in the real world. It cannot be decomposed into more fundamental segments of information that have useful meanings within the scope of its application. Data elements are electronic or written representations of the attributes of real-world object types. (7) A unit of data for which the identification, meaning, representations and permissible values are specified by means of a set of attributes. data element dictionary: An information resource that lists and defines all relevant data elements. Synonymous with data dictionary. definition: A word or phrase expressing the essential nature of a person or thing or class of persons or things: an answer to the question "what is x?" or "what is an x?"; a statement of the meaning of a word or word group [Webster's Third New International Dictionary of the English Language Unabridged, 1986]. domain: (1) The set of possible data values of an attribute (ANSI X3.172-1990). (2) The set of permissible data values from which actual values are taken for a particular attribute or specific data element (ANSI X3.172-1990. (3) In a relational database, all of the permissible tuples for a given relation (ANSI X3.172-1990). See data value domain. name: (1) An identifier of an entity (ANSI X3.172-1990). (2) In conceptual schema language, a simple linguistic object that is used to identify an entity (ANSI X3.172-1990). (3) A descriptive designation of an entity or object, usually in a natural language, by which the object is known in real life. relationship: A special type of entity that is used to indicate a dependency, an association, or a link that may be inherent between two entities or among attributes of the same entity, and that is represented or recorded in a database. Synonymous with association (ANSI X3.172-1990). See data relationship. syntax: The relationships among characters or groups of characters, independent of their meanings or the manner of their interpretation and use. The structure of expressions in a language, and the rules governing the structure of a language. 4 Summary of Data Definition Rules and Guidelines A listing of the rules and guidelines without explanations is provided in this clause for convenience of the user. The intent is to facilitate ease of use of this document once an understand- ing of the rules and guidelines is achieved. Clause 5 describes each rule and guideline with an explanation and examples to ensure their exact meaning is understood. 4.1 Rules A data definition shall: a) be unique (within any data dictionary in which it appears) b) be stated in the singular c) state what the concept is, rather than what it is not d) be stated as a descriptive phrase or sentence(s) e) contain only commonly understood abbreviations f) be expressed without embedding definitions of other data elements or underlying concepts 4.2 Guidelines A data definition should: a) state the essential meaning of the concept b) be precise and unambiguous c) be concise d) be able to stand alone e) be expressed without embedding rationale, functional usage, domain information, or procedural information f) avoid circular reasoning g) use the same terminology and consistent logical structure for related definitions 5 Requirements 5.1 Premise Data elements exist and are used for specific purposes. Differences in use will require different operational manifestations of some rules and guidelines. For example, different levels of specificity for data element definitions are generally required in different contexts. Guideline 5.3.a) below, provides an example of this need for varying levels of specificity for different definitions. The implementation of Guideline a), "state the essential meaning of the concept" is highly context dependent. The primary characteristics deemed necessary to convey the essential meaning of a particular definition will vary according to the use of the data element in a particular environment. Primary and essential characteristics for defining concepts such as "airport" in the commercial air transportation industry might be specific, where a more general definition may be adequate in a different context. For a discussion of relationships between concepts in different contexts and how characteristics are used to differentiate concepts, see ISO 704, Clause 3. 5.2 Rules To facilitate understanding of the rules for construction of well-formed data element definitions, explanations and examples are provided below. Each rule is followed by a short explanation of its meaning. Examples are given to support the explanations. In all cases, a good example is provided to exemplify the explanation. When deemed beneficial, a poor, but commonly used example is given to show how a definition should NOT be constructed. To further explain the differences between the good and poor examples, most examples are followed by a statement of rationale behind them. A data definition shall: a) be unique (within any data dictionary in which it appears) EXPLANATION - Each definition shall be distinguishable from every other definition (within the dictionary) or the specificity of the concept is lost. One or several characteristics expressed in the definition must dif- ferentiate the concept to be defined from other concepts. EXAMPLE - 1) good definitions: "Goods Receipt Date" - Date on which goods are received by a given party. "Goods Dispatch Date" - Date on which goods are dispatched by a given party. 2) poor definitions: "Goods Receipt Date" - Date on which goods are delivered. "Goods Dispatch Date" - Date on which goods are delivered. REASON - The definition "Date on which the goods are delivered" cannot be used for both data elements "goods receipt date" and "goods dispatch date." Instead, each definition must be different. b) be stated in the singular EXPLANATION - The concept expressed by the data definition shall be expressed in the singular. (Exceptions are made if the concept itself is plural.) EXAMPLE - "Article Number" 1) good definition: Reference number identifying an article. 2) poor definition: Reference number identifying articles. c) state what the concept is, rather than what it is not EXPLANATION - When constructing definitions, the concept cannot be defined exclusively by stating what the concept is not. EXAMPLE - "Freight Cost" 1) good definition: Costs incurred by a shipper in moving goods, by whatever means, from one place to another under the terms of a contract of carriage. 2) poor definition: Costs which are not related to packing, documentation, loading, unloading, and insurance. d) be stated as a descriptive phrase or sentence(s) (in most languages) EXPLANATION - A phrase is necessary (in most languages) to form a precise definition that includes the essential characteristics of the concept. Simply stating one or more synonym(s) is insufficient. Simply restating the words of the name in a different order is insufficient. If more than a descriptive phrase is needed, use complete, grammatically correct sentences. EXAMPLE 1 - "Agent Name" 1) good definition: Name of party authorized to act on behalf of another party. 2) poor definition: Representative. EXAMPLE 2 - "Nature of Transaction" 1) good definition: Indication of the type of contract under which goods are supplied. 2) poor definition: Transaction type. REASON 'Representative' and 'Transaction type' are near-synonyms of the names, which is not adequate for a definition. e) contain only commonly understood abbreviations EXPLANATION - Understanding the meaning of an abbreviation, including acronyms and initialisms, is usually confined to a certain environment. In other environments the same abbreviation can cause misinterpretation or confusion. Therefore, to avoid ambiguity, full words, not abbreviations, shall be used in the definition. Exceptions to this rule may be made if an abbreviation is commonly understood such as 'i.e.' and 'e.g.'or if an abbreviation is more readily understood than thefull form of a complex term and has been adopted as a term in its own right such as 'radar' standing for 'radio detecting and ranging'. All acronyms must be expanded on the first occurrence. EXAMPLE 1 - "Tide Height" 1) good definition: The vertical distance from mean sea level (MSL) to a specific tide level. 2) poor definition: The vertical distance from MSL to a specific tide level. REASON- The poor definition is unclear because the acronym, MSL, is not commonly understood and some users may need to refer to other sources to determine what it represents. Without the full word, finding the term in a glossary may be difficult or impossible. EXAMPLE 2 - "Unit of Density Measurement" 1) good definition: The unit employed in measuring the concentration of matter in terms of mass per unit (m.p.u.) volume (e.g., pound per cubic foot; kilogram per cubic meter). 2) poor definition: The unit employed in measuring the concentration of matter in terms of m.p.u. volume (e.g., pound per cubic foot; kilogram per cubic meter). REASON - M.p.u. us not a common abbreviation and its meaning may not be understood by some users. The abbreviation should be expanded to full words. f) be expressed without embedding definitions of other data elements or underlying concepts EXPLANATION - The definition of a second data element or related concept should not appear in the definition proper of the primary data element . If the second definition is necessary, it may be attached by a note at the end of the primary definition's main text or as a separate entry in the dictionary. Related definitions can be accessed through relational attributes (e.g., cross-reference). EXAMPLE 1- "Sample Type Code" 1) good definition: A code identifying the kind of sample collected. 2) poor definition: A code identifying the kind of sample collected. A sample is a small specimen taken for testing. It can be either an actual sample for testing, or a quality control surrogate sample. A quality control sample is a surrogate sample taken to verify results of actual samples. REASON - The poor definition contains two extraneous definitions embedded in it. They are definitions of 'sample' and of 'quality control'. EXAMPLE 2 - "Documentary Credit Number" 1) good definition: Reference number assigned by issuing bank to a documentary credit. 2) poor definition: Reference number assigned by issuing bank to a documentary credit. A documentary credit is a document in which a bank states that it has issued a documentary credit under which the beneficiary is to obtain payment, acceptance, or negotiation on compliance with certain terms and conditions and against presentation of stipulated documents and such drafts as may be specified. 5.3 Guidelines (Guiding Principles) A data definition should: a) state the essential meaning of the concept EXPLANATION - All primary characteristics of the concept represented should appear in the definition at the relevant level of specificity for the context. The inclusion of non essential characteristics should be avoided. The level of detail necessary is dependent upon the needs of the system user and environment. EXAMPLE 1 - "Name of Celestial Body" 1) good definition (for the intended context, 'space exploration'): Name of any planet, satellite, asteroid, captured comet, meteor swarm, or other natural physical body including the sun, held by the gravitational system of the sun and revolving around it. 2) poor definition (for the intended context): Name of any planet or physical body in the universe. REASON - In this context (space exploration), it is necessary to specify what may be considered a 'celestial body'. The poor definition is too vague. EXAMPLE 2 - "Invoice Amount" 1) good definition: Total sum charged in respect to an invoice. 2) poor definition: The total sum of all chargeable items mentioned on an invoice, taking into account deductions on one hand, such as allowances and discounts, and additions on the other hand, such as charges for insurance, transport, handling, etc. b) be precise and unambiguous EXPLANATION - The exact meaning and interpretation of the defined concept should be apparent from the definition. A definition should be clear enough to allow only one possible interpretation. EXAMPLE - "Shipment Receipt Date" 1) good definition: Date on which a shipment is registered as having been received by the receiving party. 2) poor definition: Date on which a specific shipment is delivered. REASON - The poor definition does not specify what determines a 'delivery'. 'Delivery' could be understood as either the act of unloading a product at the intended destination or the point at which the intended customer actually obtains the product. It is possible that the intended customer never receives the product that has been unloaded at his site or the customer may receive the product days after it was unloaded at the site. c) be concise EXPLANATION - The definition should be brief and comprehensive. Extraneous qualifying phrases such as "for the purpose of this data dictionary," "terms to be described," shall be avoided. EXAMPLE - "Character Set Name" 1) good definition: The name given to the set of phonetic or ideographic symbols in which data is encoded. 2) poor definition: The name given to the set of phonetic or ideographic symbols in which data is encoded, for the purpose of this data dictionary, or, as used elsewhere, the capability of systems hardware and software to process data encoded in one or more scripts. d) be able to stand alone EXPLANATION - The meaning of the concept should be appar- ent from the definition. Additional explanations or references should not be necessary for understanding the meaning of the definition. EXAMPLE - "School Location City Name" 1) good definition: Name of the city where a school is situated. 2) poor definition: See "school site". REASON - The poor definition does not stand alone, it requires the aid of a second definition (school site) to understand the meaning of the first. e) be expressed without embedding rationale, functional usage, domain information, or procedural information EXPLANATION - Although they are often necessary, such statements do not belong in the definition proper because they contain information extraneous to the purpose of the definition. If deemed useful, such expressions may be placed in other data element attributes (ISO/IEC 11179, Part 3). 1) The rationale for a given definition should not be included as part of the definition (e.g. if a data element uses miles instead of kilometers, the reason should not be indicated in the definition). 2) Functional usage such as: "this data element should not be used for ..." should not be included in the definition proper. 3) Remarks about procedural aspects (e.g., this data element is used in conjunction with data element "xxx" should not appear in the definition. EXAMPLE - "Field Label" 1) good definition: Identification of a field in an index, thesaurus, query, database, etc. 2) poor definition: Identification of a field in an index, thesaurus, query, database, etc., which is provided for units of information such as abstracts, columns within tables. REASON - The poor definition contains remarks about functional usage. This information starting with "which is provided for..." must be excluded from the definition and placed in another attribute, if it is necessary information. f) avoid circular reasoning EXPLANATION - Two definitions shall not be defined in terms of each other. A definition should not use another concept's definition as its definition. This results in a situation where a concept is defined with the aid of another concept that is, in turn, defined with the aid of the given concept. EXAMPLE - "Employee ID Number" - Number assigned by an employer to identify an individual employee. "Employee- Person" corresponding to the employee ID number. REASON - Each definition refers to the other for its meaning. The meaning is not given in either definition. g) use the same terminology and consistent logical structure for related definitions EXPLANATION - A common terminology and syntax should be used for similar or associated definitions. EXAMPLE - The example for rule 5.2.a) above also illustrates this idea. Both definitions pertain to related concepts and therefore have the same logical structure and similar terminology. 1) "Goods Receipt Date" - Date on which goods were received by a given party. 2) "Goods Dispatch Date" - Date on which goods were dispatched by a given party. ________________________________________________________ Annex (informative) ________________________________________________________ Bibliographic references TRADE/WP.4/R.765/Add. 1, United Nations Economic and Social Council, 30 July 1991.