[go: up one dir, main page]

Skip to content

Commit

Permalink
.
Browse files Browse the repository at this point in the history
  • Loading branch information
penguine-ip committed Sep 24, 2024
1 parent 7dfb593 commit 6a2c03c
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 4 deletions.
13 changes: 11 additions & 2 deletions docs/docs/evaluation-introduction.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,14 @@ deepeval test run test_example.py -v
When a metric's `verbose_mode` is `True`, it prints the intermediate steps used to calculate said metric to the console during evaluation.
:::

### Skip Test Cases With Missing Parameters

The `-s` flag (with no arguments) allows you to skip metric executions where the test case has missing//insufficient parameters (such as `retrieval_context`) that is required for evaluation. An example of where this is helpful is if you're using a metric such as the `ContextualPrecisionMetric` but don't want to apply it when the `retrieval_context` is `None`.

```
deepeval test run test_example.py -s
```

### Repeats

Repeat each test case by providing a number to the `-r` flag to specify how many times to rerun each test case.
Expand Down Expand Up @@ -221,14 +229,15 @@ answer_relevancy_metric = AnswerRelevancyMetric()
evaluate(dataset, [answer_relevancy_metric])
```

There are two mandatory and nine optional arguments when calling the `evaluate()` function:
There are two mandatory and ten optional arguments when calling the `evaluate()` function:

- `test_cases`: a list of `LLMTestCase`s **OR** `ConversationalTestCase`s, or an `EvaluationDataset`. You cannot evaluate `LLMTestCase`/`MLLMTestCase`s and `ConversationalTestCase`s in the same test run.
- `metrics`: a list of metrics of type `BaseMetric`.
- [Optional] `hyperparameters`: a dict of type `dict[str, Union[str, int, float]]`. You can log any arbitrary hyperparameter associated with this test run to pick the best hyperparameters for your LLM application on Confident AI.
- [Optional] `run_async`: a boolean which when set to `True`, enables concurrent evaluation of test cases **AND** metrics. Defaulted to `True`.
- [Optional] `throttle_value`: an integer that determines how long (in seconds) to throttle the evaluation of each test case. You can increase this value if your evaluation model is running into rate limit errors. Defaulted to 0.
- [Optional] `ignore_errors`: a boolean which when set to `True`, ignores all exceptions raised during metrics execution for eac test case. Defaulted to `False`.
- [Optional] `skip_on_missing_params`: a boolean which when set to `True`, skips all metric executions for test cases with missing parameters. Defaulted to `False`.
- [Optional] `ignore_errors`: a boolean which when set to `True`, ignores all exceptions raised during metrics execution for each test case. Defaulted to `False`.
- [Optional] `verbose_mode`: a optional boolean which when **IS NOT** `None`, overrides each [metric's `verbose_mode` value](metrics-introduction#debugging-a-metric). Defaulted to `None`.
- [Optional] `write_cache`: a boolean which when set to `True`, uses writes test run results to **DISK**. Defaulted to `True`.
- [Optional] `use_cache`: a boolean which when set to `True`, uses cached test run results instead. Defaulted to `False`.
Expand Down
5 changes: 3 additions & 2 deletions docs/docs/evaluation-test-cases.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -402,14 +402,15 @@ metric = HallucinationMetric(threshold=0.7)
evaluate([test_case], [metric])
```

There are two mandatory and eight optional arguments when calling the `evaluate()` function:
There are two mandatory and ten optional arguments when calling the `evaluate()` function:

- `test_cases`: a list of `LLMTestCase`s **OR** `ConversationalTestCase`s, or an `EvaluationDataset`. You cannot evaluate `LLMTestCase`/`MLLMTestCase`s and `ConversationalTestCase`s in the same test run.
- `metrics`: a list of metrics of type `BaseMetric`.
- [Optional] `hyperparameters`: a dict of type `dict[str, Union[str, int, float]]`. You can log any arbitrary hyperparameter associated with this test run to pick the best hyperparameters for your LLM application on Confident AI.
- [Optional] `run_async`: a boolean which when set to `True`, enables concurrent evaluation of test cases **AND** metrics. Defaulted to `True`.
- [Optional] `throttle_value`: an integer that determines how long (in seconds) to throttle the evaluation of each test case. You can increase this value if your evaluation model is running into rate limit errors. Defaulted to 0.
- [Optional] `ignore_errors`: a boolean which when set to `True`, ignores all exceptions raised during metrics execution for eac test case. Defaulted to `False`.
- [Optional] `skip_on_missing_params`: a boolean which when set to `True`, skips all metric executions for test cases with missing parameters. Defaulted to `False`.
- [Optional] `ignore_errors`: a boolean which when set to `True`, ignores all exceptions raised during metrics execution for each test case. Defaulted to `False`.
- [Optional] `verbose_mode`: a optional boolean which when **IS NOT** `None`, overrides each [metric's `verbose_mode` value](metrics-introduction#debugging-a-metric). Defaulted to `None`.
- [Optional] `write_cache`: a boolean which when set to `True`, uses writes test run results to **DISK**. Defaulted to `True`.
- [Optional] `use_cache`: a boolean which when set to `True`, uses cached test run results instead. Defaulted to `False`.
Expand Down
2 changes: 2 additions & 0 deletions docs/docs/metrics-contextual-precision.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ To use the `ContextualPrecisionMetric`, you'll have to provide the following arg
- `expected_output`
- `retrieval_context`

:::

## Example

```python
Expand Down

0 comments on commit 6a2c03c

Please sign in to comment.