The following sections describe the technology and architecture behind the West Coast Ocean Data Portal (WCODP). It is not necessary to understand this information in order to contribute to the portal, but may be of interest to developers, technologists, and those who update, maintain, or troubleshoot the WCODP.
The West Coast Ocean Data Portal is comprised of a number of technologies and software components. The following diagram shows a generalized view of the software architecture.
The WCODP relies upon the following technologies:
- HTTP server and reverse proxy
- Apache Tomcat 6
- Java web server. Tomcat is particularly good for Java-based apps like Geoportal, but is flexible enough to handle most of the other apps as well.
- WCGA Data Portal UI
- ESRI Geoportal Server
- An open source geo-data catalog. It allows for the entry or collection (‘harvesting’) of metadata files. Using these metadata files, Geoportal Server can organize, search, and display all if the information in meaningful, human-readable ways. We are also using the “collections” feature of Geoportal Server to define custom categories and issues, and assign those as attributes of the records we have harvested. This way we can organize the data according to categories that we have determined and applied ourselves, categories that are not defined in the metadata.
- Solr is a tool used to aid in the quick searching of large amounts of data. It takes specific attributes of the data, and keeps track of which records shared that attribute, so that when it is requested, it doesn’t have to search every record: it already knows. Technically, that work is already done by a software called Lucene, and it is installed with Geoportal Server. Solr is built on top of Lucene and allows for faceted searches. The most visible result of which is the numeric indicator of how many records will match your query if you add specific filters (like the categories or issues).
- Geoportal Facet Customizations. This custom tool was built by ESRI for us specifically to handle some of our needs that were not met by the basic Geoportal Server software. Namely, this is the piece that gets our custom-defined categories and issues indexed by Solr for faceted searches, and exposes the harvest source of each record to Solr as well. As of this writing, it does not yet, but may soon also facilitate in searching which records were harvested from a given source.
- PostgreSQL is a nice database for moderate to heavy traffic websites. It is fast and powerful. PostGIS is built on top of PostgreSQL to handle geographic queries, which Geoportal Server relies on to search by location, bounding box, etc…
- Monitoring software. This tool visualizes the status of the server hardware.
- Background on the selection of technologies for the West Coast Ocean Data Portal
- WCGA RDF Data Registry Design Assessement
Registration / Harvest Process¶
Registration and metadata harvest into the WCODP is done through the Adminstrative interface of ESRI Geoportal Server. This is generally done by the WCODP Administrator, but can be done by others with administrative access.
- If the WAF or Catalog is not yet registered in the WCODP, it must be added via the Register Resource page.
Log into ESRI Geoportal (http://portal.westcoastoceans.org/geoportal) as an administrator
On the Administration page, click Add, and select Register resource on the network
Add the Host URL for the WAF or Catalog (CSW). It’s good practice to use the Test button to confirm that the URL works
Add a Title – This is what will show up in the Sources list of the WCODP, so make sure it is clear and user-friendly.
- If harvesting from a Catalog, select a Profile based on publisher’s CSW. WCODP has three custom profiles:
- ESRI Geoportal (WCGA RDF ESRI Geoportal (GPT))
- GeoNetwork APISO (WCGA RDF GeoNetwork APISO)
- pycsw Custom (pycsw - harvest by dc:identifiers)
You can leave all the other information as-is. Create and Close.
- After you register the resource for the first time, you must approve and synchronize the resource.
- Return to the Manage page of the Administration interface.
- Select the checkbox next to the resource you just registered
- Select “Set as Approved” in the pull-down menu, and click Execute Action
- Click on the blue arrow (Synchronize content) icon next to the resource of interest. This may take a minute or two and you may need to manually refresh the page
- Review the synchronization results.
- Click on the clock icon (History) next to the resource of interest.
- This will show the date of the most recent synchronization, the number of records obtained, the number of records validated, and the number of records published.
- If the number of records validated or published does not match the number of records obtained,
- Click on the empty box icon (View report)
- Click the plus sign next to Details to see the validation results which shows the metadata records harvested and the validation errors.
If the harvested resource is a WAF, all valid metadata records in that folder will be harvested.
If the harvested resource is a CSW, it is possible to selectively harvest relevant records through the use of profiles in Geoportal. The unique UUID of each metadata record of interest must be added to a specific XSLT file on the server. The XSLT file corresponding to the profile that you selected when registering the resource is the one to update.
When there are updates or additions to metadata in a WAF or Catalog that is already registered in the WCODP, simply synchronize the resource. For a CSW, you must also add the relevant UUIDs to the profile file prior to synchronization.
After registration and harvest, the portal admin assigns additional attributes to the records using the WCGA-specific controlled vocabulary/taxonomy. This assignment is accomplished either by assigning records to Collections through the Geoportal admin interface, or directly via adding records to the Postgres database. These attributes are used in the Categories tab in the WCODP.