The ingestion workflows (upload and back4 queues) execute the parsing of the input file passing it to our parsers modules, that are:
ParseAPDFL;ParseArchive; ParseImg; ParseMultimedia ; ParseOffice; ParseText;ParseCalendar
In the order specified in the appsettings.xml file.
Each parser will try to recognise the file, and if it feels to be the right one, it will generate an xml file containing the info that can be extracted from the file.
By default parser will extract all the available metadata, and in the workflow call we can also tell to do some extra, such as generating thumbnail, preview, video tiles, extract text etc.
The resulting xml file is then used in the workflow to fill the attributes of the gn4 object we want to create/manipulate.
The parsers are independent of the core software. That means that if we find a "not recognized" file format, we simply fix the specific parser, and you can update only this module in existing installations, without touching other gn4 dlls.
About parser order
Should it happen that some Microsoft Office files get wrongly recognized, e.g. as audios, change the order of the parsers in the appsettings.xml file, by putting the ParseOffice before ParseMultimedia.
See also