CIMT User Stories Last edited by $Author: graybeal $ $Date: 2004/05/22 18:43:37 $ Identified/Prioritized User Stories:Ê Detailed descriptions of these tasks are provided at the end of this file. Items 1-7 are infrastructure services. Most will serve any ingested and described data. Items 8 and beyond are specific CIMT implementations/utilizations of the infrastructure services. Rough scope is indicated in brackets: A=5 work days, B=5-10 work days, C=10-20 work days. ** Done i. Ingest raw SIAM data packets. ii. Manage metadata for data packets (including significant mods needed for CIMT) iii. User accesses raw data packets via web (including tuning needed for CIMT). iv. Prototype tool available to confirm consistency of data. ** Infrastructure 1. DONE Parse ASCII streams [A] 2. DONE Define metadata describing data [B] 3. DONE Create general web plotting capability (to support QC-style, parameter vs time plots only) [C] 4. DONE Simplify SIAM data ingest (internal, no visible behavior change) [B] 5. DROP Add Data notification mechanism or mechanisms (to maximize # of external processes served data) [B] 6. 90% {need standard way to submit descriptive XML} Add data submission mechanism or mechanisms (to maximize # of external processes submitting data) [B] 7. 0% Create a general query interface (expect only one or two basic queries supported for CIMT; see #14 below) [B] ** CIMT-Specific 8. DONE Provide data to specific external processes [B] 9. 85% {SOON done; ADCP close; ISUS planned} Accept data from specific external processes (TBD which ones) [A] 10. DONE Ingest SOON data as a data file [B] 11. 90% {main web pages} Provide basic plots of raw data [A] 12. 0% Merge items according to timestamp, optionally at a given time interval. (Focused on data.)[B] The following 4 items are on the bubble'; ideally all 4 can be accomplished, but possibly none of them can, depending on speed of the first 12 items. (I expect 2 of these can be done.) 13. 95% Make plots available outsisde the firewall (i..e, publicly available) [B] 14. 10% Provide point-and-click query for data sets according to instrument providing the data. [A] 15. 50% {only for SIAM data} Obtain a subset of the data within a particular time interval (ideally including non-SIAM data). [B] 16. 10% Add customized raw plots (e.g., to QC more sophisticated systems). [B] ----- Tasks Under Consideration ----- These tasks would have to be in place of tasks in the list 8 thru 16, unless we find we can do them trivially. 17. More advanced/custom plots/web page presentation. 18. Handling multi-record data packets (e.g., ISUS). 19. Attempt interoperability with UCSC CIMT effort. Corrections and modifications are encouraged, now or at any User Story meeting. ----- Detailed Task Descriptions ----- 1. Parse ASCII Streams. Be able to separate simple ASCII streams into individual variables (for plotting or other access) using XML descriptions. This will produce "parsed data items", which can be used by other tasks. 2. Define metadata describing data Create all the XML files which describe the SIAM data being delivered to the system. 3. Create general web plotting capability Create a service which makes it easy to produce simple parameter-vs-time plots for any of the instruments or parsed data items. 4. Simplify SIAM data ingest (internal, no visible behavior change) This infrastructure task makes it possible to change SIAM software without forcing matching changes to the SSDS software. 5. Add Data notification mechanism or mechanisms This service sends a notification to an external process when new data arrives, allowing external processes to operate in near real time (e.g., for event detection and error alerts). The method by which alerts would be sent (sockets, subscription services, email, other?) is not determined. 6. Add data submission mechanism or mechanisms The data submission mechanism must be sufficient for all the CIMT users to provide processed data that can be processed by SSDS. The goal is to maximize the number of external processes submitting data. Implementation options include submitting updated files, socket-based streams, or SIAM class-based streams. Solution should include a mechanism for users submitting descriptive metadata. 7. Create a general query interface Provide a simple query services which can be used by web interfaces or other user tools. Only a basic set of queries must be implemented at first, including lookups of: - data set and deployment by instrument - instrument by platform - variables by data set and the reverse. 8. Provide data to specific external processes For a set of external processes, make needed data available to those processes. Confirm that the data interface is sufficient for post-processing. The tentative process list is: - SOON - ADCP - ISUS - Radiometer 9. Accept data from specific external processes For a set of external processes, accept post-processed data and data products (e.g., plots) from those processes. Confirm that the data interface is sufficient for post-processing. The tentative process list includes data from: - SOON - ADCP - ISUS - Radiometer - CTD 10. Ingest SOON data as a data file Addressed by #9. 11. Provide basic plots of raw data For data received by SSDS, provide simple time vs. parameter plots, confirming that the metadata is present and accurate. This is a set of generic plots without custom organization. For example, providing plots organized by instruments satisfies this user story. 12. Merge items according to timestamp, optionally at a given time interval. Provide a service which allows users to request a set of data items (possibly from different instruments), integrated by their time of data collection. 13. Make plots available outside the firewall (i.e., publicly available) Provide access to the CIMT data sets and services to users who are outside the firewall. Access should be provided to the: - instrument list - QC plots on the web - data access API - raw data web interface 14. Provide point-and-click query for data sets according to instrument providing the data. Implement a web interface which uses the query interface (in number 7) to provide users a simple interface to get data from a particular instrument. (Note: Originally this was intended as a means of navigating the metadata to find data of interest. The Access Raw Data interface meets the letter of this requirement, but a solution which is expandable to other queries is the intention.) 15. Obtain a subset of the data within a particular time interval Provide a way to ask for data between start and end times. This is particularly valuable if it can be used on data which does not have SIAM timestamps (e.g., the SOON data). 16. Add customized raw plots (e.g., to QC more sophisticated systems). Some raw data must be expressed in more sophisticated ways than a simple time-parameter plot in order to be viewed or validated. Examples of plots which represent raw data in more complicated ways are: - latitude vs longitude plot of GPS data (a GPS 'watch circle' plot) - wind sticks plot (plot the wind direction and speed as vectors along a time axis - ADCP contour plot (plotting ADCP values spatially with contour lines) 17. More advanced/custom plots/web page presentation. This calls for development of custom web pages which could integrate a wider variety of SSDS and externally produced data products. Potential pages are: - pages organized by concept (per Reiko/Tim's Word doc) - advanced ADCP results - advanced radiometer results - advanced CTD and metsys data - combined data sets on single plot 18. Handling multi-record data packets (e.g., ISUS). Several records produce multiple records within a single SIAM-timestamped data packet. SSDS quick-look plots currently assume each data record is contained in a separately timestamped packet. This task would allow individual data records in a packet to be parsed into separate records in SSDS. Additional capabilities including verifying the format of a record tag (the first item in the record), and/or sorting the data into buckets according to record tags. (Note any of these changes will probably require significant modifications to the SSDS architecture.) 19. Attempt interoperability with UCSC CIMT data management effort. UCSC is maintaining a separate data system with other CIMT data (not from the mooring). It would be most beneficial if they could access SSDS data from their data system, and we could access UCSC data from our data system.