After executing my model using the validation data there are a high number of FPs in the group-gold-vs-extracted.csv. Looking at the data it appears that the gold data is not being correctly identified.
Here is an example of a product field that is extracted from the validation data.
HP DL380 GEN9 24SFF CUSTOM SRV
The products that are not identified as gold data contain extended ASCII characters. These strings are extracted correctly but appear as an FP because the values do not appear in the gold data columns.
Should these products be filtered out so that they do not appear as FPs with the test data set?