Advanced Search - Search Data Update Process

In This Topic

Process overview
Update search data dialog
Configuration parameters
Reruns
Data update batches form
- Periodic clean up job

Process overview

With the 10027.51.210 release of Product search the search data update batch job has been replaced. The previous version divided the data into sets of max. 1000 records (depending on the max. upload documents parameter) and created a batch task for each. With large data sets (100.000+ records) this resulted in many batch tasks. On environments with many parallel threads these were picked up in parallel causing an overload in the upload to the Azure Search index. Azure Search would return a busy error code and deny the upload for that task.

We want to be able to throttle the generation of search data and the upload to Azure. To achieve this we created a new process for updating search data in the Azure Search index.

Dispatch job

The dispatch job divides the data into sets of records. The size of the set is determined by the parameter ‘Max. documents’. The maximum (also default) value for this parameter is 1000. This is a limitation from Azure Search.

The dispatch job will execute a query based on the category hierarchies in the Search configuration. If there are categories add to the Excluded categories section these will be excluded with the query. The result of the query is divided into the sets. For each set a Generator task is created.

Generator tasks

A generator task contains a subset of the data that needs to be updated. It also stores information needed for generating the data like Include attribute/field/custom search and which indexes to update.

Generator job

The generator job picks up Generator tasks. For each record in the generator task it will fetch the search data in the required language(s) based on the selected index(es). The data for all records in the set is converted into JSON and stored in an upload task. The generator will create an upload task for each selected index. Therefor one generator task will result in one or more upload task(s).

It is possible (and advisable) to create multiple generator jobs that will run in parallel. By adding more jobs data generation will be faster. The number of parallel jobs is limited by the batch settings of your environment.

Upload tasks

The upload task contains the search data in a JSON file for a set of records and additional information like the target index.

Upload job

The upload job picks up the Upload task and makes the connection with the Azure Search index and uploads the data.
It is possible (and advisable) to create multiple upload jobs that will run in parallel. By adding more jobs data upload will be faster. The number of parallel jobs is limited by the batch settings of your environment. However be careful with the number of upload jobs. If you have too many it will do too many uploads simultaneously and cause the Azure search service to deny uploads

Update search data dialog

The Update search data dialog contains 4 sections. Each section is described below.

Note: If too many historic generator and/or upload task records from previous update runs exist in the related tables, opening the data update form will first give a warning to clean up tasks, eventuelly even blocking the update batch from running if the tasks aren't deleted.

Parameters

Field	Description
REFERENCE

Company	Shows the (default) Company of the selected Search configuration. This field is information only and not editable.
Search configuration	Shows the selected Search configuration. This field is information only and not editable.
INCLUDE
Attribute search	If set, the generated search data will include all the Attributes linked to the record. The records get their attributes from the categories it is linked to. The categories can have attributes linked either directly, via Attribute groups and Category Attributes.
Custom search	If set the generated search data will include data provided by Custom search data providers. This is a framework that makes it possible to develop a source (provider) for data that is related to the record. Data can come from anywhere in the system or even from external sources (note that this does however require development).
Field search	If set the generated search data will include data from tables (fields or data methods) related to the record. It is possible to define a query in the system and setup what fields from what table need to be included in the search data. One-to-one and One-to-many relations are supported by the generator.

SETUP
Clean up tasks	If set the generator and upload tasks will be removed from the system after they have been processed by the generator and upload jobs. Tasks that were successfully generated or uploaded will be removed. Tasks that failed and have the status Error will remain in the system for analysis and further processing. It is advisable to set this parameter to Yes. Otherwise the amount of data in these tables will grow with each data update. Setting it to No temporarily is useful in troubleshooting situations.
Recreate Azure index	If set the Azure index will be deleted and a new one will be recreated. This will delete all existing data. If there are fields in the Azure index that are no longer used and no longer setup in the system (via Attribute/Custom/Field search) they will be removed. Please note that the index is deleted and recreated at the start of the update process, so if for some reason the update is not successful, the data will still be lost. There is no abort or backup process, the data update will have to be started again.
Show debug info	If set the Dispatch, Generator and Upload jobs will write information to the Infolog about the flow of the logic, the time it is waiting on other jobs, the number of loops it makes while waiting and other information. This information can help to setup the right number of Generator and Upload jobs and for troubleshooting issues, otherwise we recommend to have this setting disabled.
EXECUTION	Note that this field group is only visible if activated via the Search configuration parameters
Max. wait time	Defines the maximum time (in seconds) the job (a single Generator or Update job) will wait on other preceding jobs to finish once it is started.

Generator jobs	Specifies the number of Generator jobs the batch will create to process the Generator tasks in parallel.
Upload jobs	Specifies the number of Upload jobs the batch will create to process the Upload tasks in parallel.
Wait time	The Generator and Upload jobs both process tasks; these tasks are generated by other (preceding) jobs in the batch. The Generator and Upload jobs will pick up the (new) tasks and process them. If the preceding job is still running it might generate more tasks. If the job does not find a new task to process it will wait before trying again. With the wait time parameter it is possible to specify how long (in seconds) the job will wait before trying to fetch a new task.
UPLOAD	Note that this field group is only visible if activated via the Search configuration parameters
Max. documents	Specifies the number of records in a set of data to be processed (generated and uploaded). The Dispatch job will divide the total amount of records into these sets. Note that the value cannot be larger than 1000. This is a limitation by Azure Search.
Max. number of retries	Specifies the number of times the system will retry to send the data to Azure search if the previous attempt failed.

Retry wait time

Specifies the time (in seconds) the system will wait before retrying to send the data to Azure search.

Search indexes

In the Indexes section it is possible to select which of the indexes in the Search configuration should be updated.

Excluded categories

In the Excluded categories it is possible to select categories that should be excluded from the update. Selecting a category means that the records linked to that category and all underlying categories will not be updated.

It is possible to select multiple categories from multiple hierarchies. If an entire hierarchy should be excluded from the update the root category of that hierarchy can be added to the excluded categories list.

PLease note that in the ‘old’ non-search configuration setup of Advanced Search the categories that needed to be updated had to be selected here in this section. With the 200 release however we changed this behaviour. Now all categories are included by default and have to be excluded on purpose.

Run in the background

The Run in the background section is the standard Finance & Operations batch tab, but note that with the introduction of this Update search data process, the Batch processing checkbox is always set to 'Yes' and is not editable. The data update batch can only be run ‘in batch’.

Configuration parameters

On the parameters tab of the Search configuration form it is possible to setup default values for the Execution and Upload parameters. These values will be copied to the Update search data dialog when it is run.

On the dialog it is possible to use a different value for that one run of the update batch, however, by default the ‘Execution’ and ‘Upload’ groups are not shown on the Search data update dialog. Typically the default values are filled in the parameters and there is no need to change them on the dialog. This keeps the dialog simple. If however they do need to be changed, the ‘Show ‘Execution’ and ‘Upload’’ setting needs to be set to 'Yes' on the Search configuration parameters, after which they will show on the Search data update dialog.

Reruns

When a Generator tasks fails it is updated with the status Error. After the update batch is completed it is possible to rerun the failed tasks. The selected tasks are copied to a new data update batch.

It is not possible to rerun Upload tasks. The upload tasks contain the generated data. This data might already be outdated by the time the data update batch is monitored and ready to be rerun. Therefore only reruns of Generator tasks is supported.

Data update batches form

The Data update batches form can be opened from the Search configuration form. It shows all the Data update batches related to the selected Search configuration.

The following information is shown in the header:

Field	Description
Batch task details

IDENTIFICATION
Batch job ID	The identification of the data update batch.
Previous batch job	The identification of the original data update batch. If this update is a rerun of a previous update batch that identification will be shown here.
BATCH JOB
Job description	Standard Finance & Operations job description.
Status	Standard Finance & Operations batch status information.

Actual start date/time	Standard Finance & Operations batch timing information.
End date/time	Standard Finance & Operations batch timing information.
Scheduled start date/time	Standard Finance & Operations batch timing information.
Created by	Standard Finance & Operations batch information.
GENERATOR TASKS
Count	The total number of Generator tasks in this data update batch.

Ended	The number of Generator tasks that succesfully ended in this data update batch.
Error	The number of Generator tasks that resulted in an error in this data update batch.
Completed (%)	The percentage of Generator tasks that completed (both ended and in error) in this data update batch.
UPLOAD TASKS
Count	The total number of Upload tasks in this data update batch.
Ended	The number of Upload tasks that succesfully ended in this data update batch.

Error	The number of Upload tasks that resulted in an error in this data update batch.
Completed (%)	The percentage of Upload tasks that completed (both ended and in error) in this data update batch.
JOBS
Generator jobs	Number of Generator jobs that were created in this data update batch.
Upload jobs	Number of Upload jobs that were created in this data update batch.

Periodic clean up job

When running the Search Data Update Process it is strongly recommended to make sure the 'Clean up tasks' parameter is activated (as it is by default) and to only temporarily deactivate it for troubleshooting purposes etc. If the parameter does get deactivated for a longer period of time, or if errors keep adding up, the amount of data update batch records can seriously build up in the related tables, even causing issues where updating search data is no longer possible (due to time-outs).

To prevent this from happening, a periodic clean up job is available under:

Product information management > Periodic tasks > Advanced search > Clean up data update tasks

When selecting this job, it's possible to make selections to include (or exclude) the generator tasks, the upload tasks and/or empty parent records. There are also options for filtering by date and/or Search configuration and based on tasks statusses.

The (regular) use of this function in combination with the recommended use of the 'clean up tasks' parameter will ensure there is no unneeded build up of records in the related tables (DYSASBCloudDataGeneratorTaskV2, DYSASBCloudDataUploadTaskV2), making sure they do not eat up unnecesary storage and that the update proces can run smoothly.