diff --git a/README.md b/README.md index 7d92a6b..ea2ea5e 100644 --- a/README.md +++ b/README.md @@ -5,21 +5,27 @@ Also supports the inport of spreadsheets that will allow for the creation of Dig ## Current Version - For versions of ArchivesSpace **before** v2.2.2: [v1.7.8](https://github.com/harvard-library/aspace-import-excel/releases/tag/v1.7.8) + For versions of ArchivesSpace **before** v2.2.2: [v1.7.8](https://github.com/tufts-digital-collections-archives/aspace-import-excel/releases/tag/v1.7.8) - NOTE: This version does *not* support the creation of Digital Objects to be associated with already-created Archival Objects. + **NOTE**: v1.7.8 does *not* support the creation of Digital Objects to be associated with already-created Archival Objects. - For ArchivesSpace **v2.2.2 and higher**: [v2.1.19](https://github.com/harvard-library/aspace-import-excel/releases/tag/v2.1.19) + For ArchivesSpace **v2.2.2 and higher**: [v3.0.2](https://github.com/tufts-digital-collections-archives/aspace-import-excel/releases/tag/v3.0.2) + ## Development -The initial version supports interactive selection of an archival object (or resource) as the starting point of the bulk upload. +This plugin supports interactive selection of an archival object (or resource) as the starting point of the bulk upload. + +Version 3.0 incorporates new functionality for uploading archival objects (described in the [user documentation](user_documentation/archival_objects_instructions.md)), which supports the use of an [expansion](templates/extended_aspace_import_excel_template.xlsx) to the [original](templates/aspace_import_excel_template.xlsx) Excel template. Version 3.0 is, however, backward compatible, so that users whose workflow is satisfied with the original template can continue to use it. ### Bulk upload/creation of Archival Objects -The Excel template will be found in the templates/ folder as [**aspace_import_excel_template.xlsx**](/templates/aspace_import_excel_template.xlsx). +The Excel templates will be found in the templates/ folder as + * *New in V3.0*: [**extended_aspace_import_excel_template**](templates/extended_aspace_import_excel_template.xlsx) + + * [**aspace_import_excel_template.xlsx**](templates/aspace_import_excel_template.xlsx). -The intention is not to completely reproduce a Finding Aid as presented in an EAD XML, or to allow for every permutation of Archival Object creation within ArchivesSpace. We are aiming for the "80% rule"; that is, at least 80% of the work that would be done interactively can be replaced by an excel spreadsheet; additional refinements to individual archival objects (such as addition of agents-as-subjects, assignment of locations to top-level containers, etc.) would take place interactively. +The intention is not to completely reproduce a Finding Aid as presented in an EAD XML, or to allow for every permutation of Archival Object creation within ArchivesSpace. We are aiming for the "80% rule"; that is, at least 80% of the work that would be done interactively can be replaced by an excel spreadsheet; additional refinements to individual archival objects (such as assignment of locations to top-level containers) would take place interactively. See the [user documentation](user_documentation/USER_DOCUMENTATION.md) for more information. @@ -27,7 +33,7 @@ See the [user documentation](user_documentation/USER_DOCUMENTATION.md) for more **This functionality is turned on by default** See the Installation instructions for turning it off. -The Excel template will be found in the templates/ folder as [**aspace_import_excel_DO_template.xlsx**](/templates/aspace_import_excel_DO_template.xlsx). +The Excel template will be found in the templates/ folder as [**aspace_import_excel_DO_template.xlsx**](templates/aspace_import_excel_DO_template.xlsx). As with the original development, we are not completely reproducing all the functionality of ArchivesSpace: only one Digital Object, which can have either or both of one: + File with an *Xlink Actuate Attribute* of **onLoad** and an *Xlink Show Attribute* of **embed** @@ -54,9 +60,11 @@ to ```bash AppConfig[:hide_do_load] = true ``` -3. **IF** you are running ArchivesSpace on Windows: +3. **IF** you are running, on Windows, a version of ArchivesSpace that is *lower* than version **2.6.0**: - There currently is a problem with Bundler versioning. Until a new version of ArchivesSpace is released that contains a fix to the *initialize-plugin.bat* script, copy + There was a problem with Bundler versioning. + +Copy ``` archivesspace\aspace-import-excel\extras\modified_initialize-plugin.bat ``` @@ -65,15 +73,23 @@ to archivesspace\scripts ``` + **UPDATE**: You no longer need to use this modified .bat script **if** you are running ArchivesSpace 2.6.0 or higher. + + 4. Run the initializer script: * for Linux, that's ```bash scripts/initialize-plugin.sh aspace-import-excel ``` - * for Windows, that's + * for Windows, running an ArchivesSpace version **lower than 2.6.0** ,that's ``` scripts\modified_initialize-plugin.bat aspace-import-excel ``` + Otherwise, for Windows running ArchivesSpace version **2.6.0** and higher: + ``` + scripts\initialize-plugin.bat aspace-import-excel + ``` + 5. In the **common/config/config.rb** file, add 'aspace-import-excel' to the `AppConfig[:plugins]` array. 6. Stop and restart ArchivesSpace @@ -96,8 +112,10 @@ User documentation is [available](user_documentation/USER_DOCUMENTATION.md) ## Contributors -* Bobbi Fox: https://github.com/bobbi-SMR (maintainer) -* Robin Wendler: https://github.com/rwendler -* Julie Wetherill: https://github.com/juliewetherill -* h/t to Chintan Desai: https://github.com/cdesai-qi for catching inconsistencies +* Bobbi Fox: [@bobbi-SMR](https://github.com/bobbi-SMR) (maintainer) +* Robin Wendler: [@rwendler](https://github.com/rwendler) +* Julie Wetherill: [@juliewetherill](https://github.com/juliewetherill) +* Adrienne Pruitt: [@adriennepruitt2](https://github.com/adriennepruitt2) +* Dave Mayo: [@pobocks](https://github.com/pobocks) +* h/t to Chintan Desai: [@cdesai-qi](https://github.com/cdesai-qi) for catching inconsistencies diff --git a/VERSION b/VERSION index d6a686e..96506fd 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -v2.1.17 \ No newline at end of file +v3.0.2 diff --git a/frontend/controllers/concerns/linked_objects.rb b/frontend/controllers/concerns/linked_objects.rb index d0e0162..1fa0a27 100644 --- a/frontend/controllers/concerns/linked_objects.rb +++ b/frontend/controllers/concerns/linked_objects.rb @@ -2,295 +2,14 @@ module LinkedObjects extend ActiveSupport::Concern -# This module incorporates all the classes needed to handle objects that must be linked to -# Archival Objects, such as Subjects, Top Containers, etc. - +# This module originally incorporated all the classes needed to handle objects that must be linked to +# Archival Objects, such as Subjects, Top Containers, etc. These classes have be refactored out, and +# can be found in aspace-import-excel/frontend/models # a lot of this is adapted from Hudson Mlonglo's Arrearage plugin: -#https://github.com/hudmol/nla_staff_spreadsheet_importer/blob/master/backend/converters/arrearage_converter.rb - - - class AgentHandler < Handler - @@agents = {} - @@agent_relators ||= EnumList.new('linked_agent_archival_record_relators') - AGENT_TYPES = { 'families' => 'family', 'corporate_entities' => 'corporate_entity', 'people' => 'person'} - def self.renew - clear(@@agent_relators) - end - def self.key_for(agent) - key = "#{agent[:type]} #{agent[:name]}" - key - end - - def self.build(row, type, num) - id = row.fetch("#{type}_agent_record_id_#{num}", nil) - input_name = row.fetch("#{type}_agent_header_#{num}",nil) - { - :type => AGENT_TYPES[type], - :id => id, - :name => input_name || (id ? I18n.t('plugins.aspace-import-excel.unfound_id', :id => id, :type => 'Agent') : nil), - :relator => row.fetch("#{type}_agent_relator_#{num}", nil), - :id_but_no_name => id && !input_name - } - end - - def self.get_or_create(row, type, num, resource_uri, report) - agent = build(row, type, num) - agent_key = key_for(agent) - if !(agent_obj = stored(@@agents, agent[:id], agent_key)) - unless agent[:id].blank? - begin - agent_obj = JSONModel("agent_#{agent[:type]}".to_sym).find(agent[:id]) - rescue Exception => e - if e.message != 'RecordNotFound' -# Pry::ColorPrinter.pp e - raise ExcelImportException.new( I18n.t('plugins.aspace-import-excel.error.no_agent', :num => num, :why => e.message)) - end - end - end - begin - unless agent_obj || (agent_obj = get_db_agent(agent, resource_uri, num)) - agent_obj = create_agent(agent, num) - report.add_info(I18n.t('plugins.aspace-import-excel.created', :what =>"#{I18n.t('plugins.aspace-import-excel.agent')}[#{agent[:name]}]", :id => agent_obj.uri)) - end - rescue Exception => e -# Pry::ColorPrinter.pp e.message -# Pry::ColorPrinter.pp e.backtrace - raise ExcelImportException.new( I18n.t('plugins.aspace-import-excel.error.no_agent', :num => num, :why => e.message)) - end - end - agent_link = nil - if agent_obj - if agent[:id_but_no_name] - @@agents[agent[:id].to_s] = agent_obj - else - @@agents[agent_obj.id.to_s] = agent_obj - end - @@agents[agent_key] = agent_obj - agent_link = {"ref" => agent_obj.uri, "role" => 'creator'} - begin - agent_link["relator"] = @@agent_relators.value(agent[:relator]) if !agent[:relator].blank? - rescue Exception => e - if e.message.start_with?("NOT FOUND") - raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.bad_relator', :label => agent[:relator])) - else - raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.relator_invalid', :label => agent[:relator], :why => e.message)) - end - end - end - agent_link - end - - def self.create_agent(agent, num) - begin - ret_agent = JSONModel("agent_#{agent[:type]}".to_sym).new._always_valid! - ret_agent.names = [name_obj(agent)] - ret_agent.publish = !agent[:id_but_no_name] - ret_agent.save - rescue Exception => e - raise Exception.new(I18n.t('plugins.aspace-import-excel.error.no_agent', :num => num, :why => e.message)) - end - ret_agent - end - - def self.get_db_agent(agent, resource_uri, num) - ret_ag = nil - if agent[:id] - begin - ret_ag = JSONModel("agent_#{agent[:type]}".to_sym).find(agent[:id]) - rescue Exception => e - if e.message != 'RecordNotFound' -# Pry::ColorPrinter.pp e.message -# Pry::ColorPrinter.pp e.backtrace - raise ExcelImportException.new( I18n.t('plugins.aspace-import-excel.error.no_agent', :num => num, :why => e.message)) - end - end - end - if !ret_ag - a_params = {"q" => "title:\"#{agent[:name]}\" AND primary_type:agent_#{agent[:type]}"} - repo = resource_uri.split('/')[2] - ret_ag = search(repo, a_params, "agent_#{agent[:type]}".to_sym) - end - ret_ag - end - - def self.name_obj(agent) - obj = JSONModel("name_#{agent[:type]}".to_sym).new._always_valid! - obj.source = 'ingest' - obj.authorized = true - obj.is_display_name = true - if agent[:type] == 'family' - obj.family_name = agent[:name] - else - obj.primary_name = agent[:name] - obj.name_order = 'direct' if agent[:type] == 'person' - end - obj - end - end # agent - - class DigitalObjectHandler < Handler - @@digital_object_types ||= EnumList.new('digital_object_digital_object_type') - - def self.create(row, archival_object, report) - dig_o = nil - dig_instance = nil - thumb = row['thumbnail'] || row['Thumbnail'] - unless !thumb && !row['digital_object_link'] - files = [] - if !row['digital_object_link'].blank? && row['digital_object_link'].start_with?('http') - fv = JSONModel(:file_version).new._always_valid! - fv.file_uri = row['digital_object_link'] - fv.publish = row['publish'] - fv.xlink_actuate_attribute = 'onRequest' - fv.xlink_show_attribute = 'new' - files.push fv - end - if !thumb.blank? && thumb.start_with?('http') - fv = JSONModel(:file_version).new._always_valid! - fv.file_uri = thumb - fv.publish = row['publish'] - fv.xlink_actuate_attribute = 'onLoad' - fv.xlink_show_attribute = 'embed' - fv.is_representative = true - files.push fv - end - osn = row['digital_object_id'].blank? ? (archival_object.ref_id + 'd') : row['digital_object_id'] - dig_o = JSONModel(:digital_object).new._always_valid! - dig_o.title = row['digital_object_title'].blank? ? archival_object.display_string : row['digital_object_title'] - dig_o.digital_object_id = osn - dig_o.file_versions = files - dig_o.publish = row['publish'] - begin - dig_o.save - rescue ValidationException => ve - report.add_errors(I18n.t('plugins.aspace-import-excel.error.dig_validation', :err => ve.errors)) - return nil - rescue Exception => e - raise e - end - report.add_info(I18n.t('plugins.aspace-import-excel.created', :what =>I18n.t('plugins.aspace-import-excel.dig'), :id => "'#{dig_o.title}' #{dig_o.uri} [#{dig_o.digital_object_id}]")) - dig_instance = JSONModel(:instance).new._always_valid! - dig_instance.instance_type = 'digital_object' - dig_instance.digital_object = {"ref" => dig_o.uri} - end - dig_instance - end - - def self.renew - clear(@@digital_object_types) - end - end # DigitalObjectHandler - -# one of the differences is that we don't care about location, and we do lookup against the database - - class ContainerInstanceHandler < Handler - - @@top_containers = {} - @@container_types ||= EnumList.new('container_type') - @@instance_types ||= EnumList.new('instance_instance_type') # for when we move instances over here - - - def self.renew - clear( @@container_types) - clear(@@instance_types) - end - - def self.key_for(top_container, resource) - key = "'#{resource}' #{top_container[:type]}: #{top_container[:indicator]}" - key += " #{top_container[:barcode]}" if top_container[:barcode] - key - end +# https://github.com/hudmol/nla_staff_spreadsheet_importer/blob/master/backend/converters/arrearage_converter.rb - - def self.build(row) - { - :type => @@container_types.value(row.fetch('type_1', 'Box') || 'Box'), - :indicator => row.fetch('indicator_1', 'Unknown') || 'Unknown', - :barcode => row.fetch('barcode',nil) - } - end - - # returns a top container JSONModel - def self.get_or_create(row, resource, report) - begin - top_container = build(row) - tc_key = key_for(top_container, resource) - # check to see if we already have fetched one from the db, or created one. - existing_tc = @@top_containers.fetch(tc_key, false) || get_db_tc(top_container, resource) - if !existing_tc - tc = JSONModel(:top_container).new._always_valid! - tc.type = top_container[:type] - tc.indicator = top_container[:indicator] - tc.barcode = top_container[:barcode] if top_container[:barcode] - tc.repository = {'ref' => resource.split('/')[0..2].join('/')} - # UpdateUtils.test_exceptions(tc,'top_container') - tc.save - report.add_info(I18n.t('plugins.aspace-import-excel.created', :what =>"#{I18n.t('plugins.aspace-import-excel.tc')} [#{tc.type} #{tc.indicator}]", :id=> tc.uri)) - existing_tc = tc - end - rescue Exception => e - report.add_errors(I18n.t('plugins.aspace-import-excel.error.no_tc', :why => e.message + " in linked_objects")) - existing_tc = nil - end - @@top_containers[tc_key] = existing_tc if existing_tc - existing_tc - end - - def self.get_db_tc(top_container, resource_uri) - repo_id = resource_uri.split('/')[2] - if !(ret_tc = get_db_tc_by_barcode(top_container[:barcode], repo_id)) - tc_str = "#{top_container[:type]} #{top_container[:indicator]}" - tc_str += ": [#{top_container[:barcode]}]" if top_container[:barcode] - tc_params = {} - tc_params["type[]"] = 'top_container' - tc_params["q"] = "display_string:\"#{tc_str}\" AND collection_uri_u_sstr:\"#{resource_uri}\"" - ret_tc = search(repo_id,tc_params, :top_container) - end - ret_tc - end - - def self.get_db_tc_by_barcode(barcode, repo_id) - ret_tc = nil - if barcode - tc_params = {} - tc_params["type[]"] = 'top_container' - tc_params["q"] = "barcode_u_sstr:#{barcode}" - ret_tc = search(repo_id,tc_params, :top_container) - end - ret_tc - end - - - def self.create_container_instance(row, resource_uri,report) - instance = nil - raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.missing_instance_type')) if row['cont_instance_type'].blank? - if row['type_1'] - begin - tc = get_or_create(row, resource_uri, report) - sc = {'top_container' => {'ref' => tc.uri}, - 'jsonmodeltype' => 'sub_container'} - %w(2 3).each do |num| - if row["type_#{num}"] - sc["type_#{num}"] = @@container_types.value(row["type_#{num}"]) - sc["indicator_#{num}"] = row["indicator_#{num}"] || 'Unknown' - end - end - instance = JSONModel(:instance).new._always_valid! - instance.instance_type = @@instance_types.value(row['cont_instance_type']) - instance.sub_container = JSONModel(:sub_container).from_hash(sc) - rescue ExcelImportException => ee - instance = nil - raise ee - rescue Exception => e - msg = e.message #+ "\n" + e.backtrace()[0] - instance = nil - raise ExcelImportException.new(msg) - end - end - instance - end +# ParentTracker, used to keep track of hierarchy, remains in this module - end # of container handler #shamelessly stolen (and adapted from HM's nla_staff_spreadsheet plugin :-) class ParentTracker @@ -317,87 +36,4 @@ def parent_for(hier) end end #of ParentTracker - class SubjectHandler < Handler - @@subjects = {} # will track both confirmed ids, and newly created ones. - @@subject_term_types ||= EnumList.new('subject_term_type') - @@subject_sources ||= EnumList.new('subject_source') - - def self.renew - clear(@@subject_term_types) - clear(@@subject_sources) - end - - def self.key_for(subject) - key = "#{subject[:term]} #{subject[:source]}: #{subject[:type]}" - key - end - def self.build(row, num) - id = row.fetch("subject_#{num}_record_id", nil) - input_term = row.fetch("subject_#{num}_term", nil) - { - :id => id, - :term => input_term || (id ? I18n.t('plugins.aspace-import-excel.unfound_id', :id => id, :type => 'subject') : nil), - :type => @@subject_term_types.value(row.fetch("subject_#{num}_type") || 'topical'), - :source => @@subject_sources.value( row.fetch("subject_#{num}_source") || 'ingest'), - :id_but_no_term => id && !input_term - } - end - - def self.get_or_create(row, num, repo_id, report) - subject = build(row, num) - subject_key = key_for(subject) - if !(subj = stored(@@subjects, subject[:id], subject_key)) - unless subject[:id].blank? - begin - subj = JSONModel(:subject).find( subject[:id]) - rescue Exception => e - if e.message != 'RecordNotFound' - raise ExcelImportException.new( I18n.t('plugins.aspace-import-excel.error.no_subject',:num => num, :why => e.message)) - end - end - end - begin - unless subj || (subj = get_db_subj(subject)) - subj = create_subj(subject, num) - report.add_info(I18n.t('plugins.aspace-import-excel.created', :what =>"#{I18n.t('plugins.aspace-import-excel.subj')}[#{subject[:term]}]", :id => subj.uri)) - end - rescue Exception => e - raise ExcelImportException.new( I18n.t('plugins.aspace-import-excel.error.no_subject',:num => num, :why => e.message)) - end - if subj - if subj[:id_but_no_term] - @@subjects[subject[:id].to_s] = subj - else - @@subjects[subj.id.to_s] = subj - end - @@subjects[subject_key] = subj - end - end - subj - end - - def self.create_subj(subject, num) - begin - term = JSONModel(:term).new._always_valid! - term.term = subject[:term] - term.term_type = subject[:type] - term.vocabulary = '/vocabularies/1' # we're making a gross assumption here - subj = JSONModel(:subject).new._always_valid! - subj.terms.push term - subj.source = subject[:source] - subj.vocabulary = '/vocabularies/1' # we're making a gross assumption here - subj.save - rescue Exception => e - raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.no_subject',:num => num, :why => e.message)) - end - subj - end - - def self.get_db_subj(subject) - s_params = {} - s_params["q"] = "title:\"#{subject[:term]}\" AND first_term_type:#{subject[:type]}" - - ret_subj = search(nil, s_params, :subject, 'subjects') - end - end end diff --git a/frontend/controllers/resources_updates_controller.rb b/frontend/controllers/resources_updates_controller.rb index 4cba82a..56dfa20 100644 --- a/frontend/controllers/resources_updates_controller.rb +++ b/frontend/controllers/resources_updates_controller.rb @@ -84,6 +84,11 @@ def load_ss if (row[0] && (row[0].value.to_s =~ @start_marker) || row[2] && row[2].value == 'ead') #FIXME: TEMP FIX @headers = row_values(row) + begin + check_for_code_dups + rescue Exception => e + raise StopExcelImportException.new(e.message) + end # Skip the human readable header too rows.next @counter += 1 # for the skipping @@ -208,7 +213,7 @@ def check_row begin label = @date_labels.value((@row_hash['dates_label'] || 'creation')) rescue Exception => e - err_arr.push I18n.t('plugins.aspace-import-excel.error.invalid_date', :what => e.message) + err_arr.push I18n.t('plugins.aspace-import-excel.error.invalid_date_label', :what => e.message) if missing_title missing_date = true end end @@ -235,17 +240,27 @@ def check_row err_arr.join('; ') end + def check_for_code_dups + test = {} + dups = "" + @headers.each do |head| + if test[head] + dups = "#{dups} #{head}," + else + test[head] = true + end + end + if !dups.blank? + raise Exception.new( I18n.t('plugins.aspace-import-excel.error.duplicates', :codes => dups)) + end + return (dups.blank?) + end + # create an archival_object def create_archival_object(parent_uri) ao = JSONModel(:archival_object).new._always_valid! ao.title = @row_hash['title'] if @row_hash['title'] - unless [@row_hash['begin'],@row_hash['end'],@row_hash['expression']].reject(&:blank?).empty? - begin - ao.dates = create_date - rescue Exception => e - @report.add_errors(I18n.t('plugins.aspace-import-excel.error.invalid_date', :what => e.message)) - end - end + ao.dates = create_dates #because the date may have been invalid, we should check if there's a title, otherwise bail if ao.title.blank? && ao.dates.blank? raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.title_and_date')) @@ -259,7 +274,7 @@ def create_archival_object(parent_uri) ao.restrictions_apply = @row_hash['restrictions_flag'] ao.parent = {'ref' => parent_uri} unless parent_uri.blank? begin - ao.extents = create_extent unless @row_hash['number'].blank? && @row_hash['extent_type'].blank? && @row_hash['portion'].blank? + ao.extents = create_extents rescue Exception => e @report.add_errors(e.message) end @@ -267,13 +282,13 @@ def create_archival_object(parent_uri) @report.add_errors(errs) if !errs.blank? # we have to save the ao for the display_string begin + #Rails.logger.debug(ao.pretty_inspect) ao = ao_save(ao) rescue Exception => e msg = I18n.t('plugins.aspace-import-excel.error.initial_save_error', :title =>ao.title, :msg => e.message) raise ExcelImportException.new(msg) end - instance = create_top_container_instance - ao.instances = [instance] if instance + ao.instances = create_top_container_instances if (dig_instance = DigitalObjectHandler.create(@row_hash, ao, @report)) ao.instances ||= [] ao.instances << dig_instance @@ -284,66 +299,109 @@ def create_archival_object(parent_uri) ao.linked_agents = links ao end + + def create_dates + dates = [] + cntr = 1 + substr = '' + until [@row_hash["begin#{substr}"],@row_hash["end#{substr}"],@row_hash["expression#{substr}"]].reject(&:blank?).empty? + date = create_date(substr) + dates << date if date + cntr +=1 + substr = "_#{cntr}" + end + return dates + end - def create_date + def create_date(substr) + date_str = "(Date: type:#{@row_hash["date_type#{substr}"]}, label: #{@row_hash["dates_label#{substr}"]}, begin: #{@row_hash["begin#{substr}"]}, end: #{@row_hash["end#{substr}"]}, expression: #{@row_hash["expression#{substr}"]})" date_type = 'inclusive' begin - date_type = @date_types.value(@row_hash['date_type'] || 'inclusive') + date_type = @date_types.value(@row_hash["date_type#{substr}"] || 'inclusive') rescue Exception => e - @report.add_errors(I18n.t('plugins.aspace-import-excel.error.date_type', :what => @row_hash['date_type'])) + @report.add_errors(I18n.t('plugins.aspace-import-excel.error.date_type', :what => @row_hash["date_type#{substr}"],:date_str => date_str )) end - date = { 'date_type' => date_type, - 'label' => @date_labels.value((@row_hash['dates_label'] || 'creation')) } - if @row_hash['date_certainty'] + begin + date = { 'date_type' => date_type, + 'label' => @date_labels.value((@row_hash["dates_label#{substr}"] || 'creation')) } + rescue Exception => e + @report.add_errors(I18n.t('plugins.aspace-import-excel.error.date_label', + :what => @row_hash["dates_label#{substr}"],:date_str => date_str)) + #don't bother processsing if the label mis-matches + return nil + end + + if @row_hash["date_certainty#{substr}"] begin - date['certainty'] = @date_certainty.value(@row_hash['date_certainty']) + date['certainty'] = @date_certainty.value(@row_hash["date_certainty#{substr}"]) rescue Exception => e - @report.add_errors(I18n.t('plugins.aspace-import-excel.error.certainty', :what => e.message)) + @report.add_errors(I18n.t('plugins.aspace-import-excel.error.certainty', :what => e.message,:date_str => date_str)) end end %w(begin end expression).each do |w| - date[w] = @row_hash[w] if @row_hash[w] + date[w] = @row_hash["#{w}#{substr}"] if @row_hash["#{w}#{substr}"] end invalids = JSONModel::Validations.check_date(date) unless invalids.blank? - err_msg = '' + err_msg = "" invalids.each do |inv| err_msg << " #{inv[0]}: #{inv[1]}" end - raise Exception.new(err_msg) + @report.add_errors(I18n.t('plugins.aspace-import-excel.error.invalid_date', :what => err_msg,:date_str => date_str)) + return nil + end + if date_type == "single" && !date["end"].blank? + @report.add_errors(I18n.t('plugins.aspace-import-excel.warn.single_date_end', :date_str => date_str)) end d = JSONModel(:date).new(date) - [d] + #[d] end - def create_extent + def create_extent(substr) + ext_str = "Extent: #{@row_hash["portion#{substr}"] || 'whole'} #{@row_hash["number#{substr}"]} #{@row_hash["extent_type#{substr}"]} #{@row_hash["container_summary#{substr}"]} #{@row_hash["physical_details#{substr}"]} #{@row_hash["dimensions#{substr}"]}" begin - extent = {'portion' => @extent_portions.value(@row_hash['portion'] || 'whole'), - 'extent_type' => @extent_types.value((@row_hash['extent_type']))} + extent = {'portion' => @extent_portions.value(@row_hash["portion#{substr}"] || 'whole'), + 'extent_type' => @extent_types.value((@row_hash["extent_type#{substr}"]))} %w(number container_summary physical_details dimensions).each do |w| - extent[w] = @row_hash[w] || nil + extent[w] = @row_hash["#{w}#{substr}"] || nil end ex = JSONModel(:extent).new(extent) if UpdatesUtils.test_exceptions(ex, "Extent") - return [ex] + return ex end rescue Exception => e - raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.extent_validation', :msg => e.message)) + @report.add_errors(I18n.t('plugins.aspace-import-excel.error.extent_validation', :msg => e.message, :ext => ext_str)) + return nil end end - - def create_top_container_instance - instance = nil - unless @row_hash['cont_instance_type'].blank? && @row_hash['type_1'].blank? + def create_extents + extents = [] + cntr = 1 + substr = '' + until @row_hash["number#{substr}"].blank? && @row_hash["extent_type#{substr}"].blank? + extent = create_extent(substr) + extents << extent if extent + cntr +=1 + substr = "_#{cntr}" + end + return extents + end + def create_top_container_instances + instances = [] + cntr = 1 + substr = '' + until @row_hash["cont_instance_type#{substr}"].blank? && @row_hash["type_1#{substr}"].blank? && @row_hash["barcode#{substr}"].blank? begin - instance = ContainerInstanceHandler.create_container_instance(@row_hash, @resource['uri'], @report) - rescue ExcelImportException => ee - @report.add_errors(I18n.t('plugins.aspace-import-excel.error.no_container_instance', :why =>ee.message)) + instance = ContainerInstanceHandler.create_container_instance(@row_hash, substr, @resource['uri'], @report) rescue Exception => e - @report.add_errors(I18n.t('plugins.aspace-import-excel.error.no_tc', :why => e.message)) + @report.add_errors(I18n.t('plugins.aspace-import-excel.error.no_tc', :num=> cntr,:why=>e.message)) + instance = nil end + cntr +=1 + substr = "_#{cntr}" + instances << instance if instance end - instance + return instances end def fetch_archival_object(ref_id) @@ -362,8 +420,8 @@ def fetch_archival_object(ref_id) parsed = JSONModel.parse_reference(aos[0]) begin ao = JSONModel(:archival_object).find(parsed[:id], :repo_id => @repo_id) -Rails.logger.info "ao JSONMODEL" -Rails.logger.info {ao.pretty_inspect} +# Rails.logger.info "ao JSONMODEL" +# Rails.logger.info {ao.pretty_inspect} rescue Exception => e Rails.logger.info {e.pretty_inspect} end @@ -382,7 +440,13 @@ def handle_notes(ao) type = key.match(/n_(.+)$/)[1] note_type = @note_types[type] note = JSONModel(note_type[:target]).new - note.publish = publish + pubnote = @row_hash["p_#{type}"] + if pubnote.blank? + pubnote = publish + else + pubnote = (pubnote == '1') + end + note.publish = pubnote note.type = note_type[:value] begin wellformed(content) @@ -390,7 +454,7 @@ def handle_notes(ao) if note_type[:target] == :note_multipart inner_note = JSONModel(:note_text).new inner_note.content = content - inner_note.publish = publish + inner_note.publish = pubnote note.subnotes.push inner_note else note.content.push content @@ -467,18 +531,19 @@ def move_archival_objects def process_agents agent_links = [] %w(people corporate_entities families).each do |type| - (1..10).each do |num| + num = 1 + while true id_key = "#{type}_agent_record_id_#{num}" header_key = "#{type}_agent_header_#{num}" - unless @row_hash[id_key].blank? && @row_hash[header_key].blank? - link = nil - begin - link = AgentHandler.get_or_create(@row_hash, type, num.to_s, @resource['uri'], @report) - agent_links.push link if link - rescue ExcelImportException => e - @report.add_errors(e.message) - end + break if @row_hash[id_key].blank? && @row_hash[header_key].blank? + link = nil + begin + link = AgentHandler.get_or_create(@row_hash, type, num.to_s, @resource['uri'], @report) + agent_links.push link if link + rescue ExcelImportException => e + @report.add_errors(e.message) end + num += 1 end end agent_links @@ -590,7 +655,7 @@ def find_agent(primary_name, rest_name, type, source, ext_id) # use nokogiri if there seems to be an XML element (or element closure); allow exceptions to bubble up def wellformed(note) if note.match("") - frag = Nokogiri::XML("#{note}") {|config| config.strict} + frag = Nokogiri::XML("#{note}") {|config| config.strict} end end diff --git a/frontend/locales/en.yml b/frontend/locales/en.yml index 82ed8c4..3800b54 100644 --- a/frontend/locales/en.yml +++ b/frontend/locales/en.yml @@ -28,18 +28,23 @@ en: ref_id_notfound: "Ref Id %{refid} not found" warn: dup: "Managed Controlled Value List %{which} has multiple instances for the Translation '%{trans}'. '%{used}' will be used as the value." + disam: "Multiple match(es) found. Creating %{name} for disabiguation." + single_date_end: "Single date %{date_str} has end date that will be ignored." error: - date_type: "Date type [%{what}] invalid. Defaulting to 'inclusive'" - certainty: "Invalid 'date certainty' ignored: (%{what})" + date_type: "Date type [%{what}] invalid for %{date_str}. Defaulting to 'inclusive'" + date_label: "Date label [%{what}] invalid for %{date_str}. The date will not be processed." + certainty: "Invalid 'date certainty' ignored for %{date_str}: (%{what})" below_bad_ao: Cannot process because it's a child of the bad archival object enum: "NOT FOUND: '%{label}' not found in list %{which}" - invalid_date: "Invalid date definition (%{what})" + invalid_date: "Invalid date definition (%{what}) for %{date_str}. The date will not be processed." + invalid_date_label: "Invalid date label definition in first date (%{what})" too_many: More than one match found in the database type_undef: Unable to determine type file_name: File name cannot be determined system: "Some system error has occurred [%{msg}]." initialize: "Processing is terminated [%{msg}]" stopped: "Processing stopped at row %{row} [%{msg}]" + duplicates: "This spreadsheet has duplicate Archive Space Field codes: %{codes}" res_ead: This form's Resource is missing an EAD ID row_ead: This row is missing an EAD ID ead_mismatch: "Form's EAD ID [%{res_ead}] does not match row's EAD ID [%{row_ead}]" @@ -54,19 +59,22 @@ en: date: "Date must have at least one of: Date begin; Date end; or Date expression" number: Missing Extent number extent_type: Missing Extent type - extent_validation: "Unable to validate extent: %{msg}" + extent_validation: "Unable to validate extent (%{ext}): %{msg}" no_header: No header (field codes) row found; are you using the correct template? no_data: No processible data rows found! excel: "Error(s) parsing Excel File %{errs}" no_agent: "Unable to create Agent %{num}: [%{why}]" - no_tc: "Unable to create Top Container: [%{why}]" - missing_instance_type: Missing instance type + no_tc: "Unable to create Top Container %{num}: [%{why}]" + missing_instance_type: Missing container instance type no_container_instance: "Unable to create Container Instance: [%{why}]" no_subject: "Unable to create Subject %{num}: [%{why}]" no_move: "Unable to move the archival objects from the end of the list (response code %{code})" bad_note: "%{type} note is not wellformed: %{msg}" bad_relator: "Unable to create agent link: '%{label}' is not a valid relator" relator_invalid: "Unable to create agent link due to problem with relator '%{label}': %{why}" + bad_role: "Unable to create agent link: '%{label}' is not a valid role" + role_invalid: "Unable to create agent link due to problem with role '%{label}': %{why}" + has_dig_obj: "Archival object already has an associated digital object" dig_unassoc: "Unable to save archival object with associated digital object: %{msg}" ref_id_miss: No Ref Id specified diff --git a/frontend/models/agent_handler.rb b/frontend/models/agent_handler.rb new file mode 100644 index 0000000..0edea64 --- /dev/null +++ b/frontend/models/agent_handler.rb @@ -0,0 +1,141 @@ + class AgentHandler < Handler + @@agents = {} + @@agent_role ||= EnumList.new('linked_agent_role') + @@agent_relators ||= EnumList.new('linked_agent_archival_record_relators') + AGENT_TYPES = { 'families' => 'family', 'corporate_entities' => 'corporate_entity', 'people' => 'person'} + def self.renew + clear(@@agent_relators) + clear(@@agent_role) + @@agents = {} + end + def self.key_for(agent) + key = "#{agent[:type]} #{agent[:name]}" + key + end + + def self.build(row, type, num) + id = row.fetch("#{type}_agent_record_id_#{num}", nil) + input_name = row.fetch("#{type}_agent_header_#{num}",nil) + role = row.fetch("#{type}_agent_role_#{num}", nil) + role ='creator' if role.blank? + { + :type => AGENT_TYPES[type], + :id => id, + :name => input_name || (id ? I18n.t('plugins.aspace-import-excel.unfound_id', :id => id, :type => 'Agent') : nil), + :role => role, + :relator => row.fetch("#{type}_agent_relator_#{num}", nil) , + :id_but_no_name => id && !input_name + } + end + + def self.get_or_create(row, type, num, resource_uri, report) + agent = build(row, type, num) + agent_key = key_for(agent) + if !(agent_obj = stored(@@agents, agent[:id], agent_key)) + unless agent[:id].blank? + begin + agent_obj = JSONModel("agent_#{agent[:type]}".to_sym).find(agent[:id]) + rescue Exception => e + if e.message != 'RecordNotFound' + raise ExcelImportException.new( I18n.t('plugins.aspace-import-excel.error.no_agent', :num => num, :why => e.message)) + end + end + end + begin + if !agent_obj + begin + agent_obj = get_db_agent(agent, resource_uri, num) + rescue Exception => e + if e.message == 'More than one match found in the database' + agent[:name] = agent[:name] + DISAMB_STR + report.add_info(I18n.t('plugins.aspace-import-excel.warn.disam', :name => agent[:name])) + else + raise e + end + end + end + if !agent_obj + agent_obj = create_agent(agent, num) + report.add_info(I18n.t('plugins.aspace-import-excel.created', :what =>"#{I18n.t('plugins.aspace-import-excel.agent')}[#{agent[:name]}]", :id => agent_obj.uri)) + end + rescue Exception => e + raise ExcelImportException.new( I18n.t('plugins.aspace-import-excel.error.no_agent', :num => num, :why => e.message)) + end + end + agent_link = nil + if agent_obj + if agent[:id_but_no_name] + @@agents[agent[:id].to_s] = agent_obj + else + @@agents[agent_obj.id.to_s] = agent_obj + end + @@agents[agent_key] = agent_obj + agent_link = {"ref" => agent_obj.uri} + begin + agent_link["role"] = @@agent_role.value(agent[:role]) + rescue Exception => e + if e.message.start_with?("NOT FOUND") + raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.bad_role', :label => agent[:role])) + else + raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.role_invalid', :label => agent[:role], :why => e.message)) + end + end + begin + agent_link["relator"] = @@agent_relators.value(agent[:relator]) if !agent[:relator].blank? + rescue Exception => e + if e.message.start_with?("NOT FOUND") + raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.bad_relator', :label => agent[:relator])) + else + raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.relator_invalid', :label => agent[:relator], :why => e.message)) + end + end + end + agent_link + end + + def self.create_agent(agent, num) + begin + ret_agent = JSONModel("agent_#{agent[:type]}".to_sym).new._always_valid! + ret_agent.names = [name_obj(agent)] + ret_agent.publish = !(agent[:id_but_no_name] || agent[:name].ends_with?(DISAMB_STR)) + ret_agent.save + rescue Exception => e + raise Exception.new(I18n.t('plugins.aspace-import-excel.error.no_agent', :num => num, :why => e.message)) + end + ret_agent + end + + def self.get_db_agent(agent, resource_uri, num) + ret_ag = nil + if agent[:id] + begin + ret_ag = JSONModel("agent_#{agent[:type]}".to_sym).find(agent[:id]) + rescue Exception => e + if e.message != 'RecordNotFound' + raise ExcelImportException.new( I18n.t('plugins.aspace-import-excel.error.no_agent', :num => num, :why => e.message)) + end + end + end + if !ret_ag + a_params = {"q" => "title:\"#{agent[:name]}\" AND primary_type:agent_#{agent[:type]}"} + repo = resource_uri.split('/')[2] + ret_ag = search(repo, a_params, "agent_#{agent[:type]}".to_sym,'', "title:#{agent[:name]}") + end + ret_ag + end + + def self.name_obj(agent) + obj = JSONModel("name_#{agent[:type]}".to_sym).new._always_valid! + obj.source = 'ingest' + obj.authorized = true + obj.is_display_name = true + if agent[:type] == 'family' + obj.family_name = agent[:name] + else + obj.primary_name = agent[:name] + obj.name_order = 'direct' if agent[:type] == 'person' + end + obj + end + end # agent + diff --git a/frontend/models/container_instance_handler.rb b/frontend/models/container_instance_handler.rb new file mode 100644 index 0000000..44aa0f4 --- /dev/null +++ b/frontend/models/container_instance_handler.rb @@ -0,0 +1,102 @@ +# Supporting multiple containers in the row + +class ContainerInstanceHandler < Handler + @@top_containers = {} + @@container_types ||= EnumList.new('container_type') + @@instance_types ||= EnumList.new('instance_instance_type') # for when we move instances over here + + def self.renew + clear( @@container_types) + clear(@@instance_types) + end + + def self.key_for(top_container, resource) + key = "'#{resource}' #{top_container[:type]}: #{top_container[:indicator]}" + key += " #{top_container[:barcode]}" if top_container[:barcode] + key + end + + def self.build(row,substr) + { + :type => @@container_types.value(row.fetch("type_1#{substr}", 'Box') || 'Box'), + :indicator => row.fetch("indicator_1#{substr}", 'Unknown') || 'Unknown', + :barcode => row.fetch("barcode#{substr}",nil) + } + end + + # returns a top container JSONModel + def self.get_or_create(row, substr, resource, report) + begin + top_container = build(row, substr) + tc_key = key_for(top_container, resource) + # check to see if we already have fetched one from the db, or created one. + existing_tc = @@top_containers.fetch(tc_key, false) || get_db_tc(top_container, resource) + if !existing_tc + tc = JSONModel(:top_container).new._always_valid! + tc.type = top_container[:type] + tc.indicator = top_container[:indicator] + tc.barcode = top_container[:barcode] if top_container[:barcode] + tc.repository = {'ref' => resource.split('/')[0..2].join('/')} + # UpdateUtils.test_exceptions(tc,'top_container') + tc.save + report.add_info(I18n.t('plugins.aspace-import-excel.created', :what =>"#{I18n.t('plugins.aspace-import-excel.tc')} [#{tc.type} #{tc.indicator}]", :id=> tc.uri)) + existing_tc = tc + end + rescue Exception => e + report.add_errors(I18n.t('plugins.aspace-import-excel.error.no_tc', :why => e.message + " in linked_objects")) + existing_tc = nil + end + @@top_containers[tc_key] = existing_tc if existing_tc + existing_tc + end + + def self.get_db_tc(top_container, resource_uri) + repo_id = resource_uri.split('/')[2] + if !(ret_tc = get_db_tc_by_barcode(top_container[:barcode], repo_id)) + tc_str = "#{top_container[:type]} #{top_container[:indicator]}" + tc_str += ": [#{top_container[:barcode]}]" if top_container[:barcode] + tc_params = {} + tc_params["type[]"] = 'top_container' + tc_params["q"] = "display_string:\"#{tc_str}\" AND collection_uri_u_sstr:\"#{resource_uri}\"" + ret_tc = search(repo_id,tc_params, :top_container,'', "display_string:#{tc_str}") + end + ret_tc + end + + def self.get_db_tc_by_barcode(barcode, repo_id) + ret_tc = nil + if barcode + tc_params = {} + tc_params["type[]"] = 'top_container' + tc_params["q"] = "barcode_u_sstr:\"#{barcode}\"" + ret_tc = search(repo_id,tc_params, :top_container) + end + ret_tc + end + + def self.create_container_instance(row, substr, resource_uri,report) + instance = nil + raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.missing_instance_type')) if row["cont_instance_type#{substr}"].blank? + begin + tc = get_or_create(row, substr, resource_uri, report) + sc = {'top_container' => {'ref' => tc.uri}, + 'jsonmodeltype' => 'sub_container'} + %w(2 3).each do |num| + if row["type_#{num}#{substr}"] + sc["type_#{num}"] = @@container_types.value(row["type_#{num}#{substr}"]) + sc["indicator_#{num}"] = row["indicator_#{num}#{substr}"] || 'Unknown' + end + end + instance = JSONModel(:instance).new._always_valid! + instance.instance_type = @@instance_types.value(row["cont_instance_type#{substr}"]) + instance.sub_container = JSONModel(:sub_container).from_hash(sc) + rescue ExcelImportException => ee + raise ee + rescue Exception => e + msg = e.message #+ "\n" + e.backtrace()[0] + raise ExcelImportException.new(msg) + end + instance + end + +end # of container handler diff --git a/frontend/models/digital_object_handler.rb b/frontend/models/digital_object_handler.rb new file mode 100644 index 0000000..32bb94b --- /dev/null +++ b/frontend/models/digital_object_handler.rb @@ -0,0 +1,53 @@ + class DigitalObjectHandler < Handler + @@digital_object_types ||= EnumList.new('digital_object_digital_object_type') + + def self.create(row, archival_object, report) + dig_o = nil + dig_instance = nil + thumb = row['thumbnail'] || row['Thumbnail'] + unless !thumb && !row['digital_object_link'] + files = [] + if !row['digital_object_link'].blank? && row['digital_object_link'].start_with?('http') + fv = JSONModel(:file_version).new._always_valid! + fv.file_uri = row['digital_object_link'] + fv.publish = row['publish'] + fv.xlink_actuate_attribute = 'onRequest' + fv.xlink_show_attribute = 'new' + files.push fv + end + if !thumb.blank? && thumb.start_with?('http') + fv = JSONModel(:file_version).new._always_valid! + fv.file_uri = thumb + fv.publish = row['publish'] + fv.xlink_actuate_attribute = 'onLoad' + fv.xlink_show_attribute = 'embed' + fv.is_representative = true + files.push fv + end + osn = row['digital_object_id'].blank? ? (archival_object.ref_id + 'd') : row['digital_object_id'] + dig_o = JSONModel(:digital_object).new._always_valid! + dig_o.title = row['digital_object_title'].blank? ? archival_object.display_string : row['digital_object_title'] + dig_o.digital_object_id = osn + dig_o.file_versions = files + dig_o.publish = row['publish'] + begin + dig_o.save + rescue ValidationException => ve + report.add_errors(I18n.t('plugins.aspace-import-excel.error.dig_validation', :err => ve.errors)) + return nil + rescue Exception => e + raise e + end + report.add_info(I18n.t('plugins.aspace-import-excel.created', :what =>I18n.t('plugins.aspace-import-excel.dig'), :id => "'#{dig_o.title}' #{dig_o.uri} [#{dig_o.digital_object_id}]")) + dig_instance = JSONModel(:instance).new._always_valid! + dig_instance.instance_type = 'digital_object' + dig_instance.digital_object = {"ref" => dig_o.uri} + end + dig_instance + end + + def self.renew + clear(@@digital_object_types) + end + end # DigitalObjectHandler + diff --git a/frontend/models/handler.rb b/frontend/models/handler.rb index 0e3d770..bb0be81 100644 --- a/frontend/models/handler.rb +++ b/frontend/models/handler.rb @@ -11,37 +11,59 @@ class Handler require 'enum_list' require 'pp' + DISAMB_STR = ' DISAMBIGUATE ME!' + # centralize the checking for an already-found object def self.stored(hash, id, key) ret_obj = hash.fetch(id, nil) || hash.fetch(key, nil) end - # returns nil, a hash of a jason model (if 1 found), or throws a multiples found error + # returns nil, a hash of a jason model (if 1 found), or throws a multiples found error # if repo_id is nil, do a global search (subject and agent) # this is using archivesspace/frontend/app/models/search.rb - def self.search(repo_id,params,jmsym, *type) + def self.search(repo_id,params,jmsym, type = '', match = '') obj = nil search = nil + matches = match.split(':') if repo_id search = Search.all(repo_id, params) else begin - search = Search.global(params,type[0]) + search = Search.global(params,type) rescue Exception => e - s = JSONModel::HTTP::get_json("/search/#{type[0]}", params) + s = JSONModel::HTTP::get_json("/search/#{type}", params) raise e if !e.message.match('

Not Found

') # global search doesn't handle this gracefully :-( search = {'total_hits' => 0} end end total_hits = search['total_hits'] || 0 -# Pry::ColorPrinter.pp "Total hits: #{total_hits}" if total_hits == 1 && !search['results'].blank? # for some reason, you get a hit of '1' but still have empty results?? obj = JSONModel(jmsym).find_by_uri(search['results'][0]['id']) elsif total_hits > 1 - raise Exception.new(I18n.t('plugins.aspace-import-excel.error.too_many')) + if matches.length == 2 + match_ct = 0 + disam = matches[1] + DISAMB_STR + disam_obj = nil + search['results'].each do |result| + # if we have a disambiguate result get it + if result[matches[0]] == disam + disam_obj = JSONModel(jmsym).find_by_uri(result['id']) + elsif result[matches[0]] == matches[1] + match_ct += 1 + obj = JSONModel(jmsym).find_by_uri(result['id']) + end + end + # if we have more than one exact match, then return disam_obj if we have one, or bail! + if match_ct > 1 + return disam_obj if disam_obj + raise Exception.new(I18n.t('plugins.aspace-import-excel.error.too_many')) + end + else + raise Exception.new(I18n.t('plugins.aspace-import-excel.error.too_many')) + end elsif total_hits == 0 -# Pry::ColorPrinter.pp search +# Rails.logger.info("No hits found") end obj end diff --git a/frontend/models/subject_handler.rb b/frontend/models/subject_handler.rb new file mode 100644 index 0000000..49802db --- /dev/null +++ b/frontend/models/subject_handler.rb @@ -0,0 +1,96 @@ + class SubjectHandler < Handler + @@subjects = {} # will track both confirmed ids, and newly created ones. + @@subject_term_types ||= EnumList.new('subject_term_type') + @@subject_sources ||= EnumList.new('subject_source') + + def self.renew + clear(@@subject_term_types) + clear(@@subject_sources) + @@subjects = {} + end + + def self.key_for(subject) + key = "#{subject[:term]} #{subject[:source]}: #{subject[:type]}" + key + end + def self.build(row, num) + id = row.fetch("subject_#{num}_record_id", nil) + input_term = row.fetch("subject_#{num}_term", nil) + { + :id => id, + :term => input_term || (id ? I18n.t('plugins.aspace-import-excel.unfound_id', :id => id, :type => 'subject') : nil), + :type => @@subject_term_types.value(row.fetch("subject_#{num}_type") || 'topical'), + :source => @@subject_sources.value( row.fetch("subject_#{num}_source") || 'ingest'), + :id_but_no_term => id && !input_term + } + end + + def self.get_or_create(row, num, repo_id, report) + subject = build(row, num) + subject_key = key_for(subject) + if !(subj = stored(@@subjects, subject[:id], subject_key)) + unless subject[:id].blank? + begin + subj = JSONModel(:subject).find( subject[:id]) + rescue Exception => e + if e.message != 'RecordNotFound' + raise ExcelImportException.new( I18n.t('plugins.aspace-import-excel.error.no_subject',:num => num, :why => e.message)) + end + end + end + begin + if !subj + begin + subj = get_db_subj(subject) + rescue Exception => e + if e.message == 'More than one match found in the database' + subject[:term] = subject[:term] + DISAMB_STR + report.add_info(I18n.t('plugins.aspace-import-excel.warn.disam', :name => subject[:term])) + else + raise e + end + end + end + if !subj + subj = create_subj(subject, num) + report.add_info(I18n.t('plugins.aspace-import-excel.created', :what =>"#{I18n.t('plugins.aspace-import-excel.subj')}[#{subject[:term]}]", :id => subj.uri)) + end + rescue Exception => e + Rails.logger.error(e.backtrace) + raise ExcelImportException.new( I18n.t('plugins.aspace-import-excel.error.no_subject',:num => num, :why => e.message)) + end + if subj + if subj[:id_but_no_term] + @@subjects[subject[:id].to_s] = subj + else + @@subjects[subj.id.to_s] = subj + end + @@subjects[subject_key] = subj + end + end + subj + end + + def self.create_subj(subject, num) + begin + term = JSONModel(:term).new._always_valid! + term.term = subject[:term] + term.term_type = subject[:type] + term.vocabulary = '/vocabularies/1' # we're making a gross assumption here + subj = JSONModel(:subject).new._always_valid! + subj.terms.push term + subj.source = subject[:source] + subj.vocabulary = '/vocabularies/1' # we're making a gross assumption here + subj.save + rescue Exception => e + raise ExcelImportException.new(I18n.t('plugins.aspace-import-excel.error.no_subject',:num => num, :why => e.message)) + end + subj + end + + def self.get_db_subj(subject) + s_params = {} + s_params["q"] = "title:\"#{subject[:term]}\" AND first_term_type:#{subject[:type]}" + ret_subj = search(nil, s_params, :subject, 'subjects',"title:#{subject[:term]}" ) + end + end diff --git a/templates/extended_aspace_import_excel_template.xlsx b/templates/extended_aspace_import_excel_template.xlsx new file mode 100644 index 0000000..d18ffeb Binary files /dev/null and b/templates/extended_aspace_import_excel_template.xlsx differ diff --git a/user_documentation/USER_DOCUMENTATION.md b/user_documentation/USER_DOCUMENTATION.md index bf395d5..a6cf5eb 100644 --- a/user_documentation/USER_DOCUMENTATION.md +++ b/user_documentation/USER_DOCUMENTATION.md @@ -6,7 +6,7 @@ As of version V2.1.0, the *aspace-import-excel* plugin supports both the **Impor 1. Make sure the plug-in has been installed! See the [Installation instructions](../README.md#installation) in the main README document. 1. Download the appropriate Excel Spreadsheet template. - + For **Import Archival Objects**, use [aspace_import_excel_template.xlsx](https://github.com/harvard-library/aspace-import-excel/blob/master/templates/aspace_import_excel_template.xlsx) + + For **Import Archival Objects**, use [aspace_import_excel_template.xlsx](../templates/aspace_import_excel_template.xlsx) or _(new for v3.0)_ [extended_aspace_import_excel_template.xlsx](../templates/extended_aspace_import_excel_template.xlsx) + For **Add Digital Objects to Archival Objects**, use [aspace_import_excel_DO_template.xlsx](https://github.com/harvard-library/aspace-import-excel/blob/master/templates/aspace_import_excel_DO_template.xlsx) It's recommended that you make a copy of this template, renaming it to something identifiable, e.g.: ead_foo234.xslx) diff --git a/user_documentation/archival_objects_instructions.md b/user_documentation/archival_objects_instructions.md index 48c959e..d3cb522 100644 --- a/user_documentation/archival_objects_instructions.md +++ b/user_documentation/archival_objects_instructions.md @@ -2,14 +2,25 @@ ## Using the Template to Create a Spreadsheet -The Excel Spreadsheet template for importing Archival Objects is at https://github.com/harvard-library/aspace-import-excel/blob/master/templates/aspace_import_excel_template.xlsx . +**aspace-import-excel v3.0** introduces an [expanded Excel Spreadsheet template](../templates/extended_aspace_import_excel_template.xlsx) with new functionality for importing Archival Objects. -Use **Save as** *(your new filename}*.xlsx to begin creating your spreadsheet. +The new functionality consists of support for: +* Individually setting the publish/unpublish flags for Notes. +* Ability to add Agents as Source and Subject, not just Creator. +* Expanded the number of Agents for each type, including directions for adding even more agents. -The template is designed to be flexible enough to accommodate different workflows. The first row is the place where you can put identifying information, such as "Foo Collection". +* Support for more than one Extent, with the ability to add more extents. +* Support for more than one Container Instance, with the ability to add more container instances. -As long as you **don't edit** the **row** marked *"ArchivesSpace field code"*, you may hide, delete, or rearrange **columns** to suit your workflow. Indeed, you will see that there are a few already-hidden columns; these are not currently used, but may be used in future enhancements. +The code is backward-compatible with the the original [Excel Spreadsheet template](../templates/aspace_import_excel_template.xlsx) so you may continue using the original if it meets your needs. + +Once you've opened your chosen template, use **Save as** *(your new filename}*.xlsx to begin filling in your spreadsheet. + + +The template is designed to be flexible enough to accommodate different workflows. The *first row* is the place where you can put identifying information, such as "Foo Collection". + +As long as you **don't edit** the **row** marked *"ArchivesSpace field code"*, you may hide, delete, or rearrange **columns** to suit your workflow. Indeed, you will see that there are a few already-hidden columns; these are not currently used, but may be used in future enhancements. **_DO NOT_** hide required columns. **Note** that some columns already have in-column drop down data validation defined. You may of course add more of these, or edit the ones that are already defined. See [The Excel help page](https://support.office.com/en-us/article/Apply-data-validation-to-cells-29FECBCC-D1B9-42C1-9D76-EFF3CE5F7249) to learn how to create these. @@ -22,7 +33,7 @@ There are very few columns that _must_ be filled in: * **EAD ID** - of the resource to which you're adding Archival Objects. This will be used to confirm that you are trying to add your spreadsheet information to the correct resource. * The **Hierarchical Relationship** of the new Archival Object to the selected resource or selected Archival Object: If you've selected a Resource, **1** indicates that this is the first level of Archival Objects. If you have selected an Archival Object, use **1** if you're adding a sibling to a selected Archival Object, **2** if a child, etc. You can therefore describe several levels of Archival Objects in a single spreadsheet. * **The Description Level** This is an in-column drop-down. The Description Level in-column drop down -* EITHER the **Title** OR a **Creation Date** that must have at least a begin date or a date expression. +* EITHER the **Title** OR a **valid Date** having at least a begin date or a date expression. ## Column Definitions @@ -51,23 +62,36 @@ Processing Note | String | | No markup allowed ### Dates -A Date must have **at least** either a *begin date* or a *date expression.* +New in version 3.0: Support for more than one Date. The spreadsheet provides for two dates; you can add more by following the instructions for adding additional dates. + +A Date must have **a valid label** and **at least** either a *begin date* or a *date expression.* **NOTE:** The cell format for cells containing values for *Date Begin* and *Date End* **MUST** be **Text**, not some date format like `yyyy-mm-dd`, if you don't want the hours, minutes, seconds appended (e.g.: *1969-17-17T00:00:00+00.00*). Some versions of Excel will "helpfully" convert the cell to a date format if you are not watching. Column | Value | Default | Comment -------|-------|---------|--------- -Dates Label | String | creation| from the *Date Label* controlled value list +Dates Label | String | *creation* | from the *Date Label* controlled value list. **Note**: If the value given is *not* on the controlled value list, this date will not be processed. Date Begin | a Date string || in one of the following: **YYYY, YYYY-MM, or YYYY-MM-DD** -Date End | a Date string || in one of the following: **YYYY, YYYY-MM, or YYYY-MM-DD** -Date Type | String| *inclusive*| from the *Date Type* controlled value list +Date End | a Date string || in one of the following: **YYYY, YYYY-MM, or YYYY-MM-DD** **Note**: If you choose a Date Type of *'single'*, any value in this column will be ignored. +Date Type | String| *inclusive*| from the *Date Type* controlled value list. **Note**: If the given value is *not* on the controlled value list, it will be overridden with the value 'inclusive'. Date Expression |String|| Date Certainty |String | | from the *Date Certainty* controlled value list +### Adding more dates to the spreadsheet + +New in version 3.0: +The plugin supports your adding more than the two dates supplied on the spreadsheet. To do this, you may edit, locally, the [extended_aspace_import_excel_template.xlsx](../templates/extended_aspace_import_excel_template.xlsx) by copying the set of columns for the second date, inserting them into the template, and editing the labels in Rows 4 and 5 to reflect the next integer number: + * insert 6 columns to the RIGHT of second date block + * copy the six columns of the second date, then paste them into the blank columns + * edit the labels in Row 4 to increment the number. For example, for the first added date, you'd edit **dates_label_2** to **dates_label_3** . **NOTE**: it is *extremely important* that you ensure that the labels in Row 4 are edited; otherwise, you may not get the results you're expecting. + * While not necessary for proper processing, it's recommended that you also update the numbers in the copied columns in Row 5 to avoid confusion. For example, edit **Date (2) Label** to **Date (3) Label**. + Column Definitions \| Dates \| Extent \| Container \| Digital Objects \| Agents \| Subjects \| Notes ### Extent Information +New in version 3.0: Support for more than one extent. The spreadsheet provides for two extents; you can add more by following the instructions for adding additional extents. + Extent information is not required, but if you are defining an extent, please note the required fields. Column | Value | Default | Comment @@ -79,15 +103,26 @@ Container Summary|String|| Physical details |String|| Dimensions| String || +### Adding more extents to the spreadsheet + +New in version 3.0: +The plugin supports your adding more than the two extents supplied on the spreadsheet. To do this, you may edit, locally, the [extended_aspace_import_excel_template.xlsx](../templates/extended_aspace_import_excel_template.xlsx) by copying the set of columns for the second extent, inserting them into the template, and editing the labels in Rows 4 and 5 to reflect the next integer number: + * insert 6 columns to the RIGHT of second extent block + * copy the six columns of the second extent, then paste them into the blank columns + * edit the labels in Row 4 to increment the number. For example, for the first added extent, you'd edit **portion_2** to **portion_3** . **NOTE**: it is *extremely important* that you ensure that the labels in Row 4 are edited; otherwise, you may not get the results you're expecting. + * While not necessary for proper processing, it's recommended that you also update the numbers in the copied columns in Row 5 to avoid confusion. For example, edit **Extent Portion(2)** to **Extent Portion(3)**. * + Column Definitions \| Dates \| Extent \| Container \| Digital Objects \| Agents \| Subjects \| Notes ### Container Information - Creating a Container Instance +New in version 3.0: Support for more than one container instance. The spreadsheet provides for two container instances; you can add more by following the instructions for adding additional instances. + A Container instance associates the Archival Object with a Top Container, with additional information on Child and Grandchild sub-containers if present. The ingester will try to find an already-created Top Container in the database. + If you have defined a barcode: - + If there's a match for that repository, that Top Container will be used without further checking. + + If there's a match for the barcode for that resource, that Top Container will be used without further checking. + Otherwise, a new Top Container will be created. + If you have not defined a barcode: + The type and indicator will be used to search the database for a Top Container that is already associated with the resource; @@ -102,13 +137,24 @@ If you are specifying container information, note that both **type** and **indic Column | Value | Default | Comment -------|-------|---------|--------- Container Instance type| String | | **REQUIRED** if you are defining a Container Instance. Value from the *Instance Instance Type* controlled value list -Top Container type | String | Box| from the *Container Type* controlled value list Top Container indicator|String | Unknown || **REQUIRED** +Barcode|String||| Child type | String||from the *Container Type* controlled value list Child indicator|String |Unknown || *only used if a Child type is specified* Grandchild type | String||from the *Container Type* controlled value list Grandchild indicator|String | Unknown || *only used if a Grandchild type is specified* +### Adding more container instances to the spreadsheet + +New in version 3.0: +The plugin supports your adding more than the two container instances supplied on the spreadsheet. To do this, you may edit, locally, the [extended_aspace_import_excel_template.xlsx](../templates/extended_aspace_import_excel_template.xlsx) by copying the set of columns for the second extent, inserting them into the template, and editing the labels in Rows 4 and 5 to reflect the next integer number: + * insert 8 columns to the LEFT of second container block + * copy the 8 columns of the second container block, then paste them into the blank columns + * edit the labels in Row 4 to increment the number. For example, for the first added container instance, you'd edit **cont_instance_type_2** to **cont_instance_type_3** . + For container instances, there are some Row 4 values with double numbers, such as **type_2_2**, which would be edited to **type_2_3**. Sorry for the confusion! + **NOTE**: it is *extremely important* that you ensure that the labels in Row 4 are edited; otherwise, you may not get the results you're expecting. + * While not necessary for proper processing, it's recommended that you also edit the numbers in the copied columns in Row 5 to avoid confusion. For example, edit **Container Instance Type(2)** to **Container Instance Type(3)**. + Column Definitions \| Dates \| Extent \| Container \| Digital Objects \| Agents \| Subjects \| Notes ### Digital Objects @@ -126,45 +172,79 @@ URL of thumbnail| URL String || if defined, this becomes the File version with ### Agent Objects -The ingester allows you to link Agents (*CREATOR role only!*) to Archival objects. You can specify up to 3 Person Agents, up to 2 Corporate Agents, and one Family Agent per Archival object. +The ingester allows you to link Agents to Archival objects. The [extended_aspace_import_excel_template.xlsx](../templates/extended_aspace_import_excel_template.xlsx), as provided, allows for up to **5** Person Agents, up to **2** Family Agents, and up to **3** Corporate Agents per Archival object. If you need more of any of these types, you can follow the directions for adding more agents. + +If you have previously defined the Agent(s) you are using, you may use the Record ID number (e.g.: for the Agent URI /agents */agent_person/1249*, you would use **1249**) OR the full header string, with all capitalization and punctuation. + +Either the Record ID *or* the header string is **required**. + +If you include both, or only the header, and the record isn't found, a new Agent record will be created. The header string will be used as the **family_name** if it's a Family Agent, and the **primary_name** +otherwise. + +If you enter the header string *without* the ID, the ingester will try to do an **exact match** against the header; if it finds more than one match (for example, if the database contains two agents with identical headers, but different sources): + + * The ingester will create a **new** agent (with publish=false) containing the header with ' DISAMBIGUATE ME!' appended to it. For example, given a person agent with a header of 'George Washington', a new person agent would be created with a primary name of 'George Washington DISAMBIGUATE ME!'. + * After ingest, you can use the *merge* functionality to resolve the ambiguities. + +If you enter a Record ID and **not** the header string, and that ID is not found, a new Agent record will be created with the name "PLACEHOLDER FOR *{agent type}* ID *{ id number}* NOT FOUND", so that you may easily find that record later and edit/merge it. In this case, the new Agent would be marked publish=false. When you correct the record, change publish to true if appropriate. + + + +If you **only** enter the header string, and a record isn't found in the database, a new Agent will be created, with its Linked Agent Role of **Creator**. -If you have previously defined the Agent(s) you are using, you may use the Record ID number (e.g.: for the Agent URI /agents */agent_person/1249*, you would use **1249**) OR the full header header string, with all capitalization and punctuation. -Either the Record ID *or* the header string is **required**; if you include both, and the record isn't found, a new Agent record will be created. The header string will be used as the **family_name** if it's a Family Agent, and the **primary_name** otherwise. -If for some reason you enter a Record ID and **not** the header string, and that ID is not found, a new Agent record will be created with the name "PLACEHOLDER FOR *{agent type}* ID *{ id number}* NOT FOUND", so that you may easily find that record later and edit/merge it. In this case, the new Agent would be marked publish=false. When you correct the record, change publish to true if appropriate. #### Person agents: Column | Value | Default | Comment -------|-------|---------|--------- -Agent/Creator (1) Record ID | Number|| -Agent/Creator (1) header string |String|| must be the entire header, including punctuation & capitalization -Agent/Creator (1) Relator|String|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. -Agent/Creator (2) Record ID | Number|| -Agent/Creator (2) header string |String|| must be the entire header, including punctuation & capitalization -Agent/Creator (2) Relator|String|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. -Agent/Creator (3) Record ID | Number|| -Agent/Creator (3) header string |String|| must be the entire header, including punctuation & capitalization -Agent/Creator (3) Relator|String|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. - -#### Family Agent: +Agent (1) Record ID | Number|| +Agent (1) header string |String|| must be the entire header, including punctuation & capitalization +Agent Role(1)|String|Creator|New in v3.0: from the *Linked Agent Role* controlled value list. +Agent (1) Relator|String|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. The default list provided by ArchivesSpace maps to the [MARC Relator Code and Term List](http://www.loc.gov/marc/relators/relaterm.html). +Agent (2) Record ID | Number|| +Agent (2) header string |String|| must be the entire header, including punctuation & capitalization +Agent Role(2)|String|Creator|New in v3.0: from the *Linked Agent Role* controlled value list. +Agent (2) Relator|String|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. +Agent (3) Record ID | Number|| +Agent (3) header string |String|| must be the entire header, including punctuation & capitalization +Agent Role(3)|String|Creator|New in v3.0: from the *Linked Agent Role* controlled value list. +Agent (3) Relator|String|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. + +#### Family Agents: Column | Value | Default | Comment -------|-------|---------|--------- -Family Agent/Creator Record ID | Number|| -Family Agent/Creator header string |String|| must be the entire header, including punctuation & capitalization -Family Agent/Creator Relator|String|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. +Family Agent Record ID | Number|| +Family Agent header string |String|| must be the entire header, including punctuation & capitalization +Family Agent Role|String|Creator|New in v3.0: from the *Linked Agent Role* controlled value list. +Family Agent Relator|String|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. #### Corporate Agents: Column | Value | Default | Comment -------|-------|---------|--------- -Corporate Agent/Creator Record ID | Number|| -Corporate Agent/Creator header string |String|| must be the entire header, including punctuation & capitalization -Corporate Agent/Creator Relator|string|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. -Corporate Agent/Creator Record ID (2) | Number|| -Corporate Agent/Creator header string (2) |String|| must be the entire header, including punctuation & capitalization -Corporate Agent/Creator Relator (2)|String|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. +Corporate Agent Record ID | Number|| +Corporate Agent header string |String|| must be the entire header, including punctuation & capitalization +Corporate Agent Role|String|Creator|New in v3.0: from the *Linked Agent Role* controlled value list. +Corporate Agent Relator|string|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. +Corporate Agent Record ID (2) | Number|| +Corporate Agent header string (2) |String|| must be the entire header, including punctuation & capitalization +Corporate Agent Role(2)|String|Creator|New in v3.0: from the *Linked Agent Role* controlled value list. +Corporate Agent Relator (2)|String|| If supplying relator, term must be from the *Linked Agent Archival Record Relators* controlled value list. + +### Adding more agents to the spreadsheet +The plugin supports your associating with an Archival Object even more agents of each type. To do this, you may edit, locally, the [extended_aspace_import_excel_template.xlsx](../templates/extended_aspace_import_excel_template.xlsx) by copying the last set of columns of the particular type, inserting them into the template, and editing the labels in Rows 4 and 5 to reflect the next integer number. + +For example, if you were to want *3* Family Agents, you would: + * insert four blank columns next to the second Family Agent columns + * copy the four columns of the second Family Agent, and paste them into the blank columns + * edit the labels in Row 4, incrementing the number. For example, you would edit the label **families_agent_record_id_2** in the _copied_ column to **families_agent_record_id_3**. **NOTE**: it is *extremely important* that you ensure that the labels in Row 4 are edited; otherwise, you may not get the results you're expecting. + * While not necessary for proper processing, it's recommended that you also update the numbers in Row 5, to avoid confusion. For example, you would edit the label **Family Agent(2) header string** to **Family Agent(3) header string** + + + **Note:** The plugin stops at the first set of columns that are blank. This means that, if you've filled in the columns for Person Agent 1, and Person Agent 3, leaving Person Agent 2 blank, the plugin *will not* + process Person Agent 3. Column Definitions \| Dates \| Extent \| Container \| Digital Objects \| Agents \| Subjects \| Notes @@ -172,6 +252,11 @@ Corporate Agent/Creator Relator (2)|String|| If supplying relator, term must be As with Agents, you may associate Subjects with the Archival Object. You may associate up to two Subject records. If you know the Record ID, you may use that instead of the **term**, **type**, and **source** in a manner similar to the way that Agent specifications are made, with the same database lookup and handling done there. Again, if you want the ingest to look up the **term** in the database, you must use the entire Subject header, including any punctuation or capitalization. +If you enter the subject header string *without* the ID, the ingester will try to do an **exact match** against the header; if it finds more than one match (for example, if the database contains two subjects with identical headers, but different sources): + + * The ingester will create a **new** subject (with publish=false) containing the header with ' DISAMBIGUATE ME!' appended to it. For example, given a subject with a header of 'Black Lives Matter', a new subject would be created with the header 'Black Lives Matter DISAMBIGUATE ME!'. + * After ingest, you can use the *merge* functionality to resolve the ambiguities. + Column | Value | Default | Comment -------|-------|---------|--------- Subject (1) Record ID|Number|| @@ -192,6 +277,12 @@ You may specify a variety of notes fields. If the note type allows for subfields, what you specify will be put in the first subfield. +New in version 3.0: +Each Note column is accompanied by a "Publish" column, which has in-column drop down data validation (TRUE/FALSE). The publish flag will be set for that note (and any associated subnote) as follows: +* if the field is left blank, use the value of the Publish field for that Archival Object +* Otherwise, set to True or False as specified. + + As does ArchivesSpace, you may used Mixed Content (EAD/XML markup). The Ingester will check to make sure that the entry is "well formed" -- that is, that the opening and closing elements match -- but will **not** validate the text to make sure you're using the proper markup. The following Notes fields are supported: