
# Black Star Information Extraction

The Black Star Information Extraction (BSIE) package provides a pipeline
to extract metadata and content-derived features from files and stores
that information in a BSFS storage.

## Installation

You can install BSIE via pip. BSIE comes with support for various file formats.
For this, it needs to install many external packages. BSIE lets you control
which of these you want to install. Note that if you choose to not install
support for some file types, BSIE will show a warning and skip them.
All other formats will be processed normally.

To install only the minimally required software, use:

    $ pip install --extra-index-url https://pip.bsfs.io bsie

To install all dependencies, use the following shortcut:

    $ pip install --extra-index-url https://pip.bsfs.io bsie[all]

To install a subset of all dependencies, modify the extras part (``[image, preview]``)
of the follwing command to your liking:

    $ pip install --extra-index-url https://pip.bsfs.io bsie[image,preview]

Currently, BSIE providesthe following extra flags:

* image: Read data from image files.
  Note that you may also have to install ``exiftool`` through your system's
  package manager (e.g. ``sudo apt install exiftool``).
* preview: Create previews from a variety of files.
  Note that support for various file formats also depends on what
  system packages you've installed. You should at least install ``imagemagick``
  through your system's package manager (e.g. ``sudo apt install imagemagick``).
  See [Preview Generator](https://github.com/algoo/preview-generator) for
  more detailed instructions.
* features: Extract feature vectors from images.


## Development

Set up a virtual environment:

    $ virtualenv env
    $ source env/bin/activate

Install bsie as editable from the git repository:

    $ git clone https://git.bsfs.io/bsie.git
    $ cd bsie
    $ pip install -e .[all]

If you want to develop (*dev*), run the tests (*test*), edit the
documentation (*doc*), or build a distributable (*build*),
install bsfs with the respective extras (in addition to file format extras):

    $ pip install -e .[dev,doc,build,test]

Or, you can manually install the following packages besides BSIE:

    $ pip install coverage mypy pylint
    $ pip install rdflib requests types-PyYAML
    $ pip install sphinx sphinx-copybutton furo
    $ pip install build

To ensure code style discipline, run the following commands:

    $ coverage run ; coverage html ; xdg-open .htmlcov/index.html
    $ pylint bsie
    $ mypy

To build the package, do:

    $ python -m build

To run only the tests (without coverage), run the following command from the **test folder**:

    $ python -m unittest

To build the documentation, run the following commands from the **doc folder**:

    $ sphinx-apidoc -f -o source/api ../bsie/ --module-first -d 1 --separate
    $ make html
    $ xdg-open build/html/index.html

