Skip to content

FlySearch: Exploring how vision-language models explore

FlySearch.py

FlySearch: Exploring how vision-language models explore

Benchmark
Getting started
Getting started
User guide
User guide
- FlySearch.py FlySearch.py
  Table of contents
- Conversation
- Result analysis
- Adding new models
Internals
Internals
Examples

`flyserach.py` parameters

Base options

 Usage: flysearch.py [OPTIONS] COMMAND [ARGS]...                                                                         

 FlySearch benchmark                                                                                                     

╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --model-backend            [vllm|openai|anthropic|gemini]              The backend of the model to use [required]  │
│ *  --model-name               TEXT                                        The name of the model to use (passed to the │
│                                                                           model backend)                              │
│                                                                           [required]                                  │
│    --run-name                 TEXT                                        The name of the benchmark run (default to   │
│                                                                           date and time)                              │
│    --results-directory        PATH                                        The directory to store the experiment       │
│                                                                           results                                     │
│                                                                           [default: all_logs]                         │
│    --agent                    [simple_llm|description_llm|generalist_one  The type of agent to use (use default for   │
│                               |detection_driven_description_llm|detectio  oryginal FlySearch)                         │
│                               n_cheater_factory|parsing_error]            [default: simple_llm]                       │
│    --skip-sanity-check                                                    Whether to skip running a sanity check      │
│                                                                           before the benchmark (not recommended)      │
│    --number-of-runs           INTEGER                                     The number of runs to perform               │
│                                                                           [default: 300]                              │
│    --continue-from-idx        INTEGER                                     The index of the scenario to continue       │
│                                                                           running from (e.g. if execution was         │
│                                                                           interrupted)                                │
│                                                                           [default: 0]                                │
│    --log-level                [CRITICAL|ERROR|WARNING|INFO|DEBUG]         The level of logging to use [default: INFO] │
│    --help                                                                 Show this message and exit.                 │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ benchmark          Run a predefined benchmark set.                                                                    │
│ random-scenarios   Run FlySearch with random scenario generation.                                                     │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

benchmark command

Runs the predefined benchmark set, e.g. FS-1 or FS-2. Default scenario sets are located in the run_templates directory. You can also create your own scenario sets from logs of any experiment, just copy the scenario_params.json files and directory structure.

 Usage: flysearch.py benchmark [OPTIONS] SCENARIO_DIRECTORY                                                              

 Run a predefined benchmark set.                                                                                         

╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    scenario_directory      PATH  The directory containing the scenarios to run the benchmark on [required]          │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

random-scenarios command

Runs FlySearch with random scenario generation. You can specify the number of scenario profile and difficulty levels to use. You can create new difficulty levels in rl/evaluation/configs/difficulty_levels.py.

 Usage: flysearch.py random-scenarios [OPTIONS] SCENARIO_TYPE:{city|forest|city_anomaly|forest_anomaly}
                                      DIFFICULTY:{fs1|fs2}                                                               

 Run FlySearch with random scenario generation.                                                                          

╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    scenario_type      SCENARIO_TYPE:{city|forest|city_anomaly|fores  The type of scenario to generate [required]    │
│                         t_anomaly}                                                                                    │
│ *    difficulty         DIFFICULTY:{fs1|fs2}                           The difficulty of the scenario [required]      │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                                                           │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯