*** Wartungsfenster jeden ersten Mittwoch vormittag im Monat ***

Skip to content
Snippets Groups Projects
Commit f0b74436 authored by Moser, Maximilian's avatar Moser, Maximilian
Browse files

Add new section in the README about the "known results" mechanism

parent 59a48a75
No related branches found
No related tags found
1 merge request!13Allow specification of known results for files
...@@ -23,6 +23,10 @@ This means that the knowledge base about file formats gets extended over time. ...@@ -23,6 +23,10 @@ This means that the knowledge base about file formats gets extended over time.
Per default, formatscaper will create a summary of endangered files it encountered and print it to standard out. Per default, formatscaper will create a summary of endangered files it encountered and print it to standard out.
A more comprehensive summary for all encountered formats will be stored in a [results file](#results). A more comprehensive summary for all encountered formats will be stored in a [results file](#results).
Since file format detection is effectively still based on heuristics, no identification procedure is infallible - sometimes, even the best guess is wrong.
For such cases, we added a mechanism to override the result detected by siegfried on a per-file basis.
More information for this can be found [further down](#known-results).
Example call, with a custom path for the `sf` binary: Example call, with a custom path for the `sf` binary:
```sh ```sh
...@@ -181,6 +185,41 @@ The results file (e.g. `results.yml`) contains information about each investigat ...@@ -181,6 +185,41 @@ The results file (e.g. `results.yml`) contains information about each investigat
Note that the contents of the ZIP archive are inspected as well, with `#` as the delimiter between the archive's filename and the contained file's name. Note that the contents of the ZIP archive are inspected as well, with `#` as the delimiter between the archive's filename and the contained file's name.
### Known results
The "known results" file can be used to override the detected file format information from siegfried per file (by filename).
The structure of this file is very similar to that of the usual results file described above, with a few minor differences.
Each entry can specify whether or not the format is `safe`, which will be taken into consideration when reporting "endangered" files.
Further, it can optionally provide information about the actual file `format`.
If present, this information will override the format information as reported by siegfried.
Example:
```yml
- filename: /mnt/data/de/ad/be/ef/data#something.idk
format:
puid: fmt/729
name: SQLite Database File Format
mime: application/x-sqlite3
endangered: false
- filename: /mnt/data/de/ad/be/ef/data#anotherthing.bin
format:
puid: fmt/899
name: Windows Portable Executable
mime: application/vnd.microsoft.portable-executable
endangered: true
safe: true
- filename: /mnt/data/de/ad/be/ef/data#program.exe
safe: true
```
This example will override the file format detected by siegfried with the supplied values for the first two files.
For the second and third file, it will also mark the files as explicitly "safe" which will prevent formatscaper to report them as "endangered", even if their format is otherwise known to be.
## Generating an input file from Invenio ## Generating an input file from Invenio
The required information is relatively straight-forward to generate using `invenio shell`: The required information is relatively straight-forward to generate using `invenio shell`:
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment