Change Log
==========
The format is based on `Keep a Changelog `__
and this project adheres to `Semantic
Versioning `__.
`v2.15.0 `__ — 2025/5/29
-------------------------------------------------------------------------
This is a minor H2O Sonar release.
Added
~~~~~
- **Features**:
- Added configurable GPU acceleration to the following evaluators:
- Answer Relevancy (Sentence Similarity) evaluator.
- Answer Semantic Sentence Similarity evaluator.
- Context Relevancy (Soft Recall and Precision) evaluator.
- Fairness Bias evaluator.
- Groundedness (semantic similarity) evaluator.
- Hallucination evaluator.
- Machine Translation (GPTScore) evaluator.
- Perplexity evaluator.
- Question Answering (GPTScore) evaluator.
- Summarization with reference (GPTScore) evaluator.
- Summarization without reference (GPTScore) evaluator.
- Step Alignment and Completeness evaluator.
- Summarization (Completeness and Faithfulness) evaluator.
- Toxicity evaluator.
- **Enhancements**:
- ``hf-xet`` to improve Hugging Face models handling performance.
- ``onnxruntime-gpu`` to improve ONNX models performance when GPU is
available.
Changed
~~~~~~~
- ``lmppl`` Python dependency ``0.0.1`` patched with
https://github.com/asahi417/lmppl/pull/13 and the wheel moved to the
public S3 bucket.
`v2.14.0 `__ — 2025/5/22
-------------------------------------------------------------------------
This is a minor H2O Sonar release.
Security
~~~~~~~~
- Package ``langchain`` upgraded to version ``0.3.1`` to fix the
vulnerability ``CVE-2024-7042``.
- Package ``langchain-community`` upgraded to version ``0.3.1`` to fix
the vulnerability ``CVE-2024-7042``.
- Package ``openai`` upgraded to version ``1.81.0`` as a dependency of
``langchain`` to fix the vulnerability ``CVE-2024-7042``.
`v2.13.0 `__ — 2025/5/20
-------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-1:
Added
~~~~~
- **Evaluators**:
- Encoding guardrail evaluator - a tool designed to assess the
LLM/RAG’s ability to handle encoding attacks. It evaluates whether
the system can be tricked into generating incorrect or unexpected
outputs through manipulation of the prompt encoding, such as
encoding the prompt text using Base64 or Base16, which should be
discarded by the guardrails or the system.
- **Features**:
- Introducing statefull conversations / multi-turn chats /
contextual conversation support for the h2oGPTe client - the
client can now maintain the context of the conversation across
multiple turns, allowing for new types of evaluations and attacks.
- Encoding perturbator - a perturbator which encodes the prompt text
using base16 encoding.
- Adding ability to configure and enforce CPU, GPU or automatic
device selection for running predictive and generative models.
Automatic device selection is the default.
- Added module which calculates various statistics to compare
distributions: Kolmogorov-Smirnov test, Wasserstein distance, and
Jensen-Shannon divergence.
- H2O Sonar newly automatically uses the shell environment
configuration overrides from environment variables starting with
``H2O_SONAR_CFG_`` prefix. The environment variables are
automatically converted to the H2O Sonar configuration parameters
(primitive values only).
- **Enhancements**:
- Introducing ``NaN`` tolerance to heatmap leaderboard which brings
tolerance for ``NaN`` values on average metric value calculation -
it allows to ignore evaluation results with ``NaN`` metric values
if the number of evaluation results is lower or equal to the given
percentage of the total number of evaluation results.
- RAGAs family evaluators newly support the ``NaN`` tolerance which
can be configured using the evaluator parameters.
- Test lab completion newly supports ``auto``, ``""`` and ``None``
LLM selectors when the test lab is built from h2oGPTe collections.
The ``auto`` selector lets h2oGPTe to automatically select the LLM
model for the test lab completion; ``""`` and ``None`` inherit the
LLM model from the h2oGPTe collection configuration.
Fixed
~~~~~
- Classification evaluator was fixed to correctly handle unknown labels
in HTML report confusion matrices.
- Classification leaderboard explanation improved to provide stable
confusion matrices in the HTML report with unexpected labels.
- Perplexity evaluator no longer requires Open AI API key.
- Markdown representations texts are newly escaped to ensure formatting
and avoid XSS.
- Test suites and labs corpus URLs fixed to reflex AWS S3 bucket
migration from ``eu-central-1`` to ``us-east-1``.
.. _changed-1:
Changed
~~~~~~~
- HMLI moved from public to private S3 bucket, which is accessible only
from the H2O.ai infrastructure. Therefore, the HMLI wheel dependency
must be installed from the private S3 bucket before installing H2O
Sonar.
- h2oGPTe client upgraded to the custom S3 hosted build
``h2ogpte-1.6.28.dev8-py3-none-any.whl``, which has been moved from a
public to a private S3 bucket.
- Package ``pip`` upgraded to version ``25.1.1``.
Deprecated
~~~~~~~~~~
No deprecations.
Removed
~~~~~~~
No removals.
.. _security-1:
Security
~~~~~~~~
- Package ``langchain-community`` upgraded to version ``0.2.19`` to fix
the vulnerability ``CVE-2024-8309``.
`v2.12.2 `__ — 2025/4/25
-------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _changed-2:
Changed
~~~~~~~
- Changing an AWS region for the H2O Eval Studio artifacts from
``eu-central-1`` to ``us-east-1``.
`v2.12.1 `__ — 2025/4/4
------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _fixed-1:
Fixed
~~~~~
- Perplexity evaluator no longer requires Open AI API key.
.. _changed-3:
Changed
~~~~~~~
- h2oGPTe client upgraded to version ``1.6.27.post1``.
`v2.12.0 `__ — 2025/3/27
-------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _fixed-2:
Fixed
~~~~~
- Uploaded documents purging fixed for h2oGPTe client.
.. _changed-4:
Changed
~~~~~~~
- h2oGPTe client upgraded to version ``1.6.25``.
- NLTK upgraded to version ``3.9.1`` to fix the vulnerability
``CVE-2024-39705``.
- Hugging Face Transformers Library upgraded to version ``4.50.2``.
`v2.11.1 `__ — 2025/3/17
-------------------------------------------------------------------------
This is a patch H2O Sonar release.
.. _added-2:
Added
~~~~~
- **Enhancements**:
- Improving performance of test lab completion parallelization in
case that the suite has less than 20 test cases.
`v2.11.0 `__ — 2025/3/12
-------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-3:
Added
~~~~~
- **Enhancements**:
- The Text matching evaluator uses the expected answer as the
condition (exact match) if available, when no condition is
specified by the test case.
.. _fixed-3:
Fixed
~~~~~
- Fixed JSon representations of the LLM evaluation result explanation
to contain evaluator descriptor again.
- LLM evaluation result JSon representation does not include typed
structure friendly metrics serialization by default.
.. _changed-5:
Changed
~~~~~~~
- Tokens presence evaluator renamed to Text matching evaluator.
.. _deprecated-1:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-1:
Removed
~~~~~~~
No removals.
.. _security-2:
Security
~~~~~~~~
No security fixes.
`v2.10.0 `__ — 2025/3/6
------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-4:
Added
~~~~~
- **Features**:
- LLM evaluation result JSon representation newly includes metrics
serialized which is can be described using the proto definitions.
- Exponential backoff driven timeout added to the h2oGPTe client to
better perform and report the h2oGPTe timeouts.
- **Enhancements**:
- Test lab completion parallelization and sharding improved to
parallelize also inputs assigned to particular RAG/LLM model if
the number of the RAG/LLM models is smaller than a configurable
threshold.
.. _fixed-4:
Fixed
~~~~~
- HTML report performance statistics fixed to handle missing keys.
.. _changed-6:
Changed
~~~~~~~
- h2oGPTe client upgraded to version ``1.6.23``.
.. _deprecated-2:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-2:
Removed
~~~~~~~
No removals.
.. _security-3:
Security
~~~~~~~~
No security fixes.
`v2.9.0 `__ — 2025/2/17
-----------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-5:
Added
~~~~~
- **Enhancements**:
- Improved Step alignment and completeness evaluator - better step
extraction from the retrieved context and model answer,
propagation of the dynamic programming metrics and alignment
matrix to the HTML report, and new ability to combine multiple
steps into one if the reference or the generated text contains
compound step (left combined, right without the step combination).
- Markdown summary of the evaluation newly includes statistics for
response times per LLM model.
- Identical insights reported by different evaluators are newly
deduplicated and reported as a single insight.
.. _fixed-5:
Fixed
~~~~~
No fixes.
.. _changed-7:
Changed
~~~~~~~
- The default threshold for the Toxicity evaluator has been changed
from ``0.75`` to ``0.25`` based on empirical observations and
feedback from users.
- h2oGPTe client upgraded to version ``1.6.22``.
.. _deprecated-3:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-3:
Removed
~~~~~~~
No removals.
.. _security-4:
Security
~~~~~~~~
- Upgrading ``scikit-learn`` to version ``1.5.2`` to fix the
vulnerability ``CVE-2024-5206``.
`v2.8.2 `__ — 2025/2/05
-----------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _changed-8:
Changed
~~~~~~~
- h2oGPTe client upgraded to version ``1.6.18.post1``.
`v2.8.1 `__ — 2025/1/13
-----------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _changed-9:
Changed
~~~~~~~
- HMLI wheel dependency location changed to the H2O Eval Studio AWS
account.
`v2.8.0 `__ — 2025/1/10
-----------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-6:
Added
~~~~~
- **Evaluators**:
- Step alignment and completeness evaluator (preview) - a tool for
evaluating the steps of procedures, sequences, or process
descriptions.
- **Features**:
- Support for agent-based and LLM-based perturbators.
- New Contextual misinformation perturbator.
- **Evaluation data**:
- Test suite evaluation library with 1M+ test cases published at
https://eval-studio-artifacts.s3.us-east-1.amazonaws.com/h2o-eval-studio-suite-library/index.html
and
https://eval-studio-artifacts.s3.us-east-1.amazonaws.com/h2o-eval-studio-suite-library/index.json
Makefile targets to maintain the test suite evaluation library
added to the project.
- **Documentation**
- ReStructuredText documentation for the new Step alignment and
completeness evaluator.
- Added ReStructuredText documentation for Fact-check evaluator
evaluator parameters.
.. _fixed-6:
Fixed
~~~~~
No fixes.
.. _changed-10:
Changed
~~~~~~~
No changes.
.. _deprecated-4:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-4:
Removed
~~~~~~~
No removals.
.. _security-5:
Security
~~~~~~~~
No security fixes.
`v2.7.0 `__ — 2024/12/16
------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-7:
Added
~~~~~
- **Evaluators**:
- Fact-check evaluator (agent-based).
- **Enhancements**:
- Enhanced test lab prompt cache which is meant for testing/demo
purposes: improved configuration (environment variable and H2O
Sonar configuration), added retrieved context caching.
- **Documentation**
- New evaluator documentation for the Fact-check evaluator.
- Added documentation for new perturbators which were added in H2O
Sonar 2.6.0.
.. _fixed-7:
Fixed
~~~~~
- Heatmap leaderboard explanation no longer shows empty most difficult
prompts section.
`v2.6.0 `__ — 2024/12/05
------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-8:
Added
~~~~~
- **Evaluators**:
- Answer Semantic Sentence Similarity evaluator.
- **Features**:
- New character level perturbators - insert/delete random
character(s), QWERTY keyboard typos, and common OCR errors.
.. _fixed-8:
Fixed
~~~~~
- Keywords of RAGAs evaluator, Classification evaluator and GPTScore
Q&A evaluator fixed.
- RAGAs evaluator metadata in leaderboard serializations fixed to
include exactly the metrics it calculates.
- Fixed the escaping of special characters in classification class
names for the multi-class Classification evaluator.
- ``httpx`` Python dependency fixed to ``0.27.0`` to avoid ``openai``
Python library issues with unexpected proxy parameter.
- Resolved random hangs that occurred during h2oGPTe RAG retrieved
context fetching when using a session connection managed by the
resource manager.
.. _changed-11:
Changed
~~~~~~~
- H2O Sonar models online cache location has been moved from root to
H2O EvalStudio tenant to download the models from the right location
in case of the deployments with the internet access (and to cache the
models from the right location in case of the air-gapped
deployments).
`v2.5.4 `__ — 2024/10/14
------------------------------------------------------------------------
This is a patch H2O Sonar release.
.. _fixed-9:
Fixed
~~~~~
- Fixed missing and non-float bool metrics in BYOP evaluators.
- Fixed ``punkt`` caching in Context relevancy (soft recall and
precision) and Answer relevancy (sentence similarity) evaluators.
- Fixed keywords metadata in multiple evaluators.
`v2.5.3 `__ — 2024/11/13
------------------------------------------------------------------------
This is a minor H2O Sonar release
.. _added-9:
Added
~~~~~
- **Enhancements**:
- Perturbators can newly work without raising the exceptions -
instead they gather the errors and return them in the passed
lists.
`v2.5.2 `__ — 2024/10/12
------------------------------------------------------------------------
This is a patch H2O Sonar release.
.. _fixed-10:
Fixed
~~~~~
- Fixed missing taglines in the evaluator descriptors.
- Fixed singular/plural in classification evaluator metadata.
`v2.5.1 `__ — 2024/10/09
------------------------------------------------------------------------
This is a patch H2O Sonar release.
.. _added-10:
Added
~~~~~
- **Enhancements**:
- Improved - shorter and concise - taglines in evaluators.
`v2.5.0 `__ — 2024/10/08
------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-11:
Added
~~~~~
- **Enhancements**:
- Tagline added to all evaluators to provide a brief description of
the evaluator.
.. _fixed-11:
Fixed
~~~~~
- Fixed bugs / inconsistencies between evaluator metadata and keywords
like LLM vs. RAG compatibility.
- Fixed Classification evaluator metrics values included in the
evaluation result to be consistent with the declared metrics in the
evaluator metadata.
- Ensured caching of the ``punkt`` tokenizer for the Fairness Bias
evaluator, Groundedness evaluator, and Hallucination evaluator to
work correctly in air-gapped deployments.
- Fixed Groundedness evaluator AVID error codes and tokenization
unpacking.
.. _changed-12:
Changed
~~~~~~~
- Summarization (Completeness and Faithfulness) evaluator excluded from
the explainer container as it is resource intensive, expensive,
difficult to interpret, and not suitable for the use without GPU HW
support.
- h2oGPTe client upgraded to version ``1.5.26``.
.. _deprecated-5:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-5:
Removed
~~~~~~~
No removals.
.. _security-6:
Security
~~~~~~~~
No security fixes.
`v2.4.0 `__ — 2024/10/14
------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-12:
Added
~~~~~
- **Features**:
- Amazon Bedrock RAG newly supports creation of the knowledge bases
(collections) from test suites as a part of the test lab build and
completion.
- **Enhancements**:
- The following evaluators newly report metrics values in the
evaluation results on the sentence granularity as actual answer
metadata and they highlight problems in the HTML report:
- Groundedness evaluator
- Toxicity evaluator
- Fairness Bias evaluator
- Answer Relevancy (sentence similarity) evaluator
- Hallucination evaluator
- PII evaluator
- Sensitive Data evaluator
- Token presence evaluator reports which part of the condition
caused the evaluation failure. The error message is provided in
the ``meta`` section of the actual answer metadata and highlighted
in the HTML report (error message section).
- Summarization evaluator error messages improved to indicate the
root cause of the summarization evaluation failure.
- Test lab is newly accepting custom HTTP headers for the document
caching when building the test lab or synchronizing the documents.
- **Documentation**
- Added generative AI section to the introduction of the
ReStructuredText documentation.
- Added missing licenses to the ReStructuredText documentation.
.. _fixed-12:
Fixed
~~~~~
- AVID problem taxonomy fixed to report codes in the problems.
- Failure to get LLM statistics in the h2oGPTe client no longer causes
the evaluation to fail (it is optional to get the statistics).
- Fixed case in the names of evaluators to ensure the naming
consistency.
.. _changed-13:
Changed
~~~~~~~
No changes.
.. _deprecated-6:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-6:
Removed
~~~~~~~
No removals.
.. _security-7:
Security
~~~~~~~~
No security fixes.
`v2.3.0 `__ — 2024/10/07
------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-13:
Added
~~~~~
- **Enhancements**:
- Text matching evaluator reports result parsing failures in the
evaluation results.
- Text matching evaluator ability to evaluate both actual answer and
retrieved context is newly configurable - default is to actual
answer only.
- PII evaluator and Sensitive data evaluator ability to evaluate
both actual answer and retrieved context is newly configurable -
default is to evaluate both actual answer and retrieved context.
.. _fixed-13:
Fixed
~~~~~
- Evaluation result JSon representation fixed to correctly serialize
infinity and NaN values.
- Generation/Retrieval/Generation+Retrieval prefix of model failure
errors in the HTML report fixed to be visible again.
- Passed and failed test cases counts in the test lab completion
progress report fixed to be correctly calculated (when retrieved
context failures are not considered).
- Fixed missing resolved test cases when building lab using the
parallel job completion - if resolution of all test cases fails in
the job, then the result is not discarded, but kept.
.. _changed-14:
Changed
~~~~~~~
- Boolean leaderboard (JSon, Markdown, dataset) results changed to fail
the test case evaluation if the generation fails, and/or retrieval
fails, and/or generation+retrieval fails. Previously, retrieval
failures were not considered as a failure of the test case evaluation
which lead to confusing results. Users can enable/disable the
retrieval checks in Text matching evaluator, PII evaluator, and
Sensitive data evaluator.
- h2oGPTe client upgraded to version ``1.5.22``.
.. _deprecated-7:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-7:
Removed
~~~~~~~
No removals.
.. _security-8:
Security
~~~~~~~~
No security fixes.
`v2.2.0 `__ — 2024/09/26
------------------------------------------------------------------------
This is a minor H2O Sonar release.
.. _added-14:
Added
~~~~~
- **Enhancements**:
- h2oGPTe LLM performance statistics - like cost, input tokens,
output tokens and time to the first token - added to the
explainable model and Markdown boolean leaderboard explanation.
- Markdown report newly includes h2oGPTe LLM vision model associated
with the evaluated model.
- Conditional evaluation by the Text matching evaluator newly
reports sub-condition which caused the evaluation failure.
- All row keys and all test cases added to problems reporting that
model didn’t pass a metric threshold check
- Evaluator descriptor added to LLM result JSon.
.. _fixed-14:
Fixed
~~~~~
No fixes.
.. _changed-15:
Changed
~~~~~~~
- Added LLM model metadata to the explainable (LLM and RAG) model,
which changes the serialization and deserialization of the model
metadata, test labs (impact H2O Eval Studio) and test results.
- Problem attribute ``test_case_key`` renamed to ``test_case_keys`` and
type changed to list.
.. _deprecated-8:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-8:
Removed
~~~~~~~
No removals.
.. _security-9:
Security
~~~~~~~~
No security fixes.
`v2.1.0 `__ — 2024/09/25
------------------------------------------------------------------------
This minor H2O Sonar release brings **looping detection** evaluator and
smaller enhancements.
.. _added-15:
Added
~~~~~
- **Evaluators**:
- Looping Detection evaluator.
- **Enhancements**:
- Amazon Bedrock RAG client models listing speed up.
- Evaluated models added to the JSon representation of the
evaluation results.
- Test case key added to the JSon representation of the evaluation
results.
- **Documentation**
- Added reStructuredText documentation of the evaluators.
- Added prompts documentation for LLM judge-based evaluators.
.. _fixed-15:
Fixed
~~~~~
- Row key(s), test case keys and model keys added to the problems and
insights (where applicable) to simplify the mapping of the evaluation
results to the original data.
.. _changed-16:
Changed
~~~~~~~
- Metrics column names of boolean leaderboard evaluators changed from
ad hoc names to actual boolean metrics names.
.. _deprecated-9:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-9:
Removed
~~~~~~~
No removals.
.. _security-10:
Security
~~~~~~~~
No security fixes.
`v2.0.0 `__ — 2024/09/18
------------------------------------------------------------------------
This major H2O Sonar releases brings **generative AI** evaluation.
.. _added-16:
Added
~~~~~
- **Evaluators**
- Generation evaluation
- Answer Correctness evaluator.
- Answer Relevancy evaluator.
- Answer Relevancy (Sentence Similarity) evaluator.
- Answer Semantic Similarity evaluator.
- Bring Your Own Prompt (BYOP) evaluator.
- Faithfulness evaluator.
- Groundedness (semantic similarity) evaluator.
- Hallucination evaluator.
- Language Mismatch evaluator.
- Machine Translation (GPTScore) evaluator.
- Perplexity evaluator.
- Question Answering (GPTScore) evaluator.
- RAGAS evaluator.
- Text matching evaluator.
- Retrieval evaluation
- Context Precision evaluator.
- Context Recall evaluator.
- Context Relevancy evaluator.
- Context Relevancy (Soft Recall and Precision) evaluator.
- Privacy evaluation
- Contact Information evaluator.
- PII evaluator.
- Sensitive Data evaluator.
- Fairness evaluation
- Fairness Bias evaluator.
- Sexism evaluator.
- Stereotype evaluator.
- Toxicity evaluator.
- Summarization evaluation
- BLEU evaluator.
- ROUGE evaluator.
- Summarization (Completeness and Faithfulness) evaluator.
- Summarization (Judge) evaluator.
- Summarization with reference (GPTScore) evaluator.
- Summarization without reference (GPTScore) evaluator.
- Classification evaluation
- Classification evaluator.
- **Features**
- Introducing ``Evaluators`` as a new type of explainers which are
able to evaluate the quality of Retrieval-Augmented Generations
(RAG) products.
- New evaluator API - ``evaluate`` module to run evaluators,
``evaluators`` module to implement new evaluators and Bring Your
Own Evaluator (BYOE).
- New evaluator specific datasets based on ``LlmDataset``, models
``ExplainableRagModel`` with implementations for ``h2oGPTe`` and
OpenAI Assistants with retrieval.
- New evaluator ``testing`` module with the test support bringing
test suites, test cases tests and test labs.
- New ``genai`` module with LLM/RAG host clients:
- H2O Enterprise ``h2oGPTe``
- H2O GPT
- H2O LLMOps
- OpenAI Chat
- Open AI Assistants with Retrieval tool (version 1) or File
Search tool (version 2)
- Microsoft Azure hosted OpenAI Chat
- Open AI Chat compatible endpoints
- Amazon Bedrock
- ollama
- HTML report branding for the EvalStudio.
- Insights - new feature allowing explainers and explanations to
provide insights into the evaluation results and suggest actions
to be taken.
- **Explanations and formats**
- New leaderboard (heatmap and bool) explanations with support for
multiple evaluation metrics along with HTML, JSon and Markdown
formats.
- New (normalized) evaluator result (``EvalResult``) and explanation
formats (JSon, Markdown).
- **Enhancements**
- Installation of H2O Sonar using package extras - install only what
you need: core, ``explainers`` and/or ``evaluators``.
- ``ragas`` library integration (license).
- **Testing**
- New ``llm`` pytest label for LLM and RAG tests.
- Test suites, test labs and test datasets for the LLM and RAG
evaluation: ``h2oGPTe`` benchmark, Kaggle LLM Data Science
competition, Talk to report and evalgpt.ai.
- **Changes**
- ``Cython`` Python dependency upgraded from ``0.29.32`` to
``0.29.37``.
- **Backward compatibility breaking changes:**
- Python 3.8 is no longer officially supported.
- Python 3.9 is no longer officially supported.
- Python 3.10 is no longer officially supported.
- JSon file with interpretation parameters which was stored in the
interpretation directory is no longer persisted as it contained
duplicate information which can be found in the
``interpretation.json`` file.
- **Documentation**
- Updated documentation of new features and enhancements.
v2.0.0 Release Candidates
~~~~~~~~~~~~~~~~~~~~~~~~~
List of 2.0.0 release candidates with the detailed description of the
changes:
- **RC 68** - 2024/09/13
- Evaluators:
- The new Answer Relevancy (Sentence Similarity) evaluator
assesses how relevant the actual answer is by computing the
semantic similarity between the question and the actual answer
sentences.
- The new Context Relevancy (Soft Recall and Precision) evaluator
measures the relevancy of the retrieved context based on the
question and context sentences sentences semantic similarity.
- Enhancements:
- Toxicity evaluator improved to calculate the toxicity metrics
on the sentence granularity and report the maximum of the
toxicity metrics values. This enhancement makes the evaluator
results more valuable as it can detect the toxic content in the
generated text regardless its length (toxic content can no
longer hide in long(er) actual answers).
- Amazon Bedrock model host newly checks the accessibility of the
LLM models supported by the RAG and filters out the
inaccessible models.
- **RC 67** - 2024/09/06
- Fixes:
- Summarization (Completeness and Faithfulness) evaluator fixed
to safely use MD5 for the metrics calculation.
- Changes:
- ``h2oGPTe`` client downgraded to version ``1.5.16`` to
integrate with old(er) servers.
- **RC 66** - 2024/09/06
- Fixes:
- Evaluated model ID added to the HTML report to simplify mapping
of model IDs in the evaluation results (JSon, CSV, frame) to
human readable model metadata.
- Changes:
- ``h2oGPTe`` client upgraded to version ``1.6.0.dev3``.
- H2O Eval Studio leaderboards Markdown representation title
heading level changed to ``H2``.
- **RC 65** - 2024/09/05
- Features:
- Amazon Bedrock model host support - evaluation of Amazon
Bedrock RAG - knowledge bases (collections) and configured LLM
models.
- Fixes:
- Perturbation flip detection fixed - it didn’t consider answers
created by the different RAG/LLM models and reported false
negatives.
- **RC 64** - 2024/09/03
- Fixes:
- h2oGPTe LLM models listing retries fixed to avoid the flakiness
and ensure it will be performed at least once.
- Documentation:
- Comprehensive update of evaluator documentation: formulas,
methods, prompts, links to used models, and fixes.
- **RC 63** - 2024/08/29
- Enhancements:
- H2O Eval Studio Markdown representation revamp - new header
section for bool/heat/class based leaderboard summaries,
model/prompt/… failure sections truncated to at most 3 entries
to scale the UI in case of many failures.
- Model ``vectara/hallucination_evaluation_model``, which is used
by the Hallucination evaluator, updated to ``HHEM-2.1-Open``
and is frozen to avoid the model changes.
- Added retries to the ``h2oGPTe`` client to avoid the flakiness
when listing base LLM models.
- Improved rendering of the multinomial classification confusion
matrices in the HTML report.
- **RC 62** - 2024/08/27
- Fixes:
- Fixed broken retrieval and generation error messages
construction in the Text matching evaluator.
- Model and prompt leaderboard in the HTML report/Markdown/JSon
representations - result failures are shown based on
**generation** failures (not union of retrieval and generation
failures) which ensures that failures and passes give 100%.
- Model failure entries colors in the HTML report fixed - if the
problem is in retrieval, only the context is in red. If the
problem is in a generation, then the actual answer is in red.
- Input field in the model failure list of the H2O Eval Studio
markdown de-duplicated. Missing fields added to be on par with
the HTML.
- **RC 61** - 2024/08/22
- Enhancements:
- Groundedness (semantic similarity) evaluator documentation
updated.
- Improved Hallucination evaluator error reporting on too long
retrieved context chunks.
- More robust perturbation flip direction detection.
- **RC 60** - 2024/08/21
- Evaluators:
- The new Groundedness evaluator assesses the groundedness of the
generated text by considering the retrieved context - measuring
hallucinations and fabricated text. It reports problems on
sentence granularity in order to identify the hallucinations
and fabricated root causes.
- Enhancements:
- Added infrastructure to detect the low number of evaluation
examples in evaluators and report it as a problem.
- Problems are newly categorized using the AVID taxonomy:
https://docs.avidml.org/taxonomy/effect-sep-view/security
- Fixes:
- Threshold consistency between evaluator thresholds and metrics
threshold defaults fixed.
- Propagating of actual threshold values to the JSon leaderboard
representation fixed.
- Exception handling in the test lab completion on the parallel
job failure fixed.
- **RC 59** - 2024/08/20
- Enhancements:
- Test lab completion progress reporting is now more detailed -
it includes prompt, LLM, and RAG/LLM host names.
- Fixes:
- Rounding of metrics values in insights, problems and Markdown
representations aligned to 4 decimal places. Percentage values
are rounded to 1 decimal place.
- Changes:
- ``h2oGPTe`` client upgraded to version ``1.5.11``.
- **RC 58** - 2024/08/14
- Fixes:
- Links to explanation data in the HTML report changed from
directories to files in case of H2O Eval Studio branding as
(S3) directories cannot be listed in case of the H2O Eval
Studio deployment.
- Changes:
- Rollback to vulnerable NLTK ``3.8.1`` (``CVE-2024-39705``)
Python dependency as ``3.8.2`` has been purged from pypi.org
- **RC 57** - 2024/08/14
- Enhancements:
- Default h2oGPTe client timeout to get answer from the LLM or
RAG collection is newly 420s (was 1000s).
- Metrics values in Markdown are newly rounded to 4 decimal
places.
- Fixes:
- Perturbation of perturbed test suites is newly cloned when not
perturbing in place.
- GPTScore threshold parameter description fixed in the evaluator
metadata.
- Hiding H2O Sonar specific texts in the HTML report in case of
H2O Eval Studio branding.
- **RC 56** - 2024/08/12
- Enhancements:
- Added detection of Summarization evaluator failures on all
dataset rows and fail fast via raising an exception.
- Added precondition check on empty evaluation results to all
leaderboard types.
- Evaluator metadata lookup made possible for incompatible
evaluators in the HTML report.
- Test lab completion no longer uses “shard” terminology, but
“parallel job” instead.
- English variant of ``punkt`` from NLTK is newly cached as the
model used by the evaluators.
- Changes:
- Updated vulnerable NLTK ``3.8.1`` (``CVE-2024-39705``) Python
dependency to fixed version ``3.8.2``.
- **RC 55** - 2024/08/09
- Fixes:
- Minor robustness fix in the handling of extra argument passed
to the h2oGPTe client.
- **RC 54** - 2024/08/08
- Fixes:
- Fixed problem detection on Answer semantic similarity evaluator
flip detection. RAGAs evaluator fixed to declare all metrics it
calculates in the metadata. Also RAGAs evaluator docstring
changed to announce RAGAs metrics only in the documentation.
- **RC 53** - 2024/08/08
- Fixes:
- Perturbation of a test suite using multiple perturbators no
longer creates exponential number of perturbed test cases.
Instead, there are original tests with their test cases and
perturbed tests with their perturbed test cases. Thus the
number of test cases is 2x the original number of test cases.
- Changes:
- Internal perturbation API of test suites, tests and test cases
changed to support multiple perturbators so that the
perturbations can be created in place and relationships
properly set.
- **RC 52** - 2024/08/07
- Enhancements:
- Test lab completion newly fails fast - raises exception - in
case that completion of all test lab’s test cases fail.
- Evaluations, interpretations and their JSon representations has
new ``error`` field which contains the error message in case of
the evaluation/interpretation failure.
- Changes:
- ``h2oGPTe`` client upgraded to version ``1.5.11-dev2``.
- **RC 51** - 2024/08/06
- Enhancements:
- Missing expected answer in the test case is reported as a
problem by the evaluators.
- Fixes:
- The HTML report generator doesn’t fail on an invalid explainer
ID when getting the display name, but returns the ID with a
prefix. An error message is logged.
- **RC 50** - 2024/08/02
- Enhancements:
- Progress report in the test lab completion no longer includes a
full prompt, but just a prefix.
- **RC 49** - 2024/08/02
- Enhancements:
- Brief evaluators descriptions were shortened - newly contain
just the first paragraph of the full description.
- Evaluators check whether actual answers in test
cases/suites/labs has correct type and if not, they generate
the corresponding problems.
- Air-gapped deployment support improved - 3rd party models used
by the evaluators/evaluation libraries are newly frozen (where
possible) to prevent model changes.
- Fixes:
- In an attempt to complete the test lab for exactly one model in
parallel, the test lab automatically switches to the serial
mode.
- Insights about the fastest/slowest/cheapest/most expensive
models are not generated for the evaluations with exactly one
model.
- Changes:
- Progress reports generated by evaluators newly start with
display names of the evaluators rather then IDs.
- **RC 48** - 2024/07/30
- Enhancements:
- Brief evaluator description added to the public API -
``list_evaluators()`` and ``describe_evaluator()`` newly return
it.
- **RC 47** - 2024/07/30
- Features:
- Added support of the Open AI RAG version 2.0 - Assistants with
File Search tool.
- New conditions in Token Presence evaluator - new syntax which
brings support of ``NOT`` and parentheses for the complex
conditions.
- Red teaming test suite with various LLM/RAG attacks added to
the repository. This test suite can be used for penetration
testing of the LLM/RAG models.
- Enhancements:
- Improved test lab API allows to complete test labs of RAG
system using given (existing) collections instead of creating
new ones. This API allows user to create, configure and
customize the collections, upload corpus and documents, and
then use them in the test lab completion.
- Evaluator container newly detects invalid LLM dataset rows
which contain RAG/LLM host error messages instead of the actual
data and reports them as problems.
- Evaluators newly provide brief description apart to full
description.
- Perturbators are newly ensuring that the perturbed data are not
equal to the original data and fail if the perturbation did not
change the data.
- Connection configuration has new ``extra_params`` dictionary
field which can be used to pass additional parameters to the
connection client. For example, setting the ``timeout``
parameter on the h2oGPTe connection will apply the timeout
parameter to all requests (that support it) made by the h2oGPTe
client.
- Versions of cached/downloaded models - like
``vectara/hallucination_evaluation_model`` or ``gpt2-medium`` -
used by evaluators are newly frozen to avoid the model changes.
- Fixes:
- Negative (RAG/LLM) cost of the prompt is reported as a problem
by evaluators which create boolean leaderboards. The cost is
also set to ``0.0`` in the evaluation results to minimize the
impact of the cost on the evaluation.
- Changes:
- ``h2oGPTe`` client upgraded to version ``1.5.8``.
- Perturbation probability intensity increased in Qwerty and
Antonym perturbators to ensure sufficient perturbation of the
data.
- Security:
- ``setuptools`` upgraded to ``70.0.0`` to fix vulnerability
``CVE-2024-6345``.
- Open AI RAG version 2.0 support brings upgrade of the
``openai`` Python library from version ``1.20.0`` to the
version ``1.35.13``, which fixes LangChain community
vulnerability ``CVE-2024-2965``.
- **RC 46** - 2024/06/28
- Evaluators:
- Four new GPTScore-based evaluators for the evaluation of the
summarizations with the reference summaries, evaluation of the
summarizations without the reference summaries, evaluation of
machine translations and evaluation of the question answering.
- Features:
- Evaluation / interpretation API can list all and incompatible
evaluators / explainers.
- Enhancements:
- Evaluators assessing ``boolean`` metrics, such as token
presence or PII leakage, now have the ability to use custom
metric names and descriptions to make reports and evaluation
data more comprehensive.
- Evaluators newly have keywords indicating whether they require
LLM judge, prompt, expected answer, actual answer, retrieved
context or constraints.
- Significantly improved descriptions of all evaluators -
descriptions are mostly generated from the evaluator class
metadata.
- Problems are newly sorted by severity (from highest to lowest).
- Insights are sorted by type (alphabetically).
- All and incompable evaluators/explainers newly shown in the
evaluation report.
- Fixes:
- Missing threshold added to parametrizable BYOP evaluator.
- Breaking changes:
- Evaluator keyword ``sr-11-7-ongoing-analysis`` has been fixed
to the correct ``sr-11-7-ongoing-monitoring`` keyword.
- Documentation:
- reStructuredText documentation of the evaluators rewritten -
every evaluator has brief description, requirements, evaluation
method, evaluation metrics, insights, and problems sections.
- **RC 45** - 2023/06/25
- Enhancements:
- New Random character type perturbator.
- Fixes:
- Integrity checks and validation of the model configuration
(like embeddings, tokenization, temperature, token limits) used
to build the test lab.
- Changes:
- Interpretation/evaluation is marked as successful if at least
one evaluator successfully finishes.
- h2oGPTe client upgraded to version ``1.5.1-dev7``.
- Python 3.11 dependencies upgraded: ``cryptography`` to version
``42.0.8``, ``scikit-learn`` to version ``1.5.0``, and ``toml``
to version ``0.10.2``.
- **RC 44** - 2023/06/14
- Enhancements:
- h2oGPTe client upgraded to version ``1.5.0-dev21`` to support
the upcoming H2O Enterprise h2oGPTe release.
- Colorized evaluation status added to the HTML report.
- Crash of an evaluator is newly reported as a high severity
problem, and makes the evaluation to be marked as failed.
However, the evaluation continues with the other evaluators.
- An attempt to run non-registered evaluator is newly reported as
a high severity problem, and makes the evaluation to be marked
as failed. However, the evaluation continues with the other
evaluators.
- Improved measurements of the LLM latency in the GenAI client.
- Fixes:
- Fixed duplicate prompts in the model weak points (the most
difficult prompts) section of the HTML report.
- **RC 43** - 2024/06/11
- Features:
- Ability to configure h2oGPTe, h2oGPT, H2O LLMOps, ollama,
OpenAI chat, OpenAI RAG, and Microsoft Azure hosted OpenAI
clients to control the evaluation of LLM models (for instance
``temperature``) and RAG systems (for instance
``embeddings provider``, ``system prompt`` or
``prompt template``).
- Enhancements:
- All perturbators are newly deterministic for improved
robustness and testability (except synonym and antonym
pertubators which are deterministic in testing only).
- Synonym and antonym perturbators improved with eager
synonym/antonym swap which tries to match the percentage of
words swapped (prior the fix perturbators tried only x times,
and if the new synonym/antonym was the same word, it would not
swap anything).
- Fixes:
- Fixed all perturbators for issues with special tokens in
de/tokenization like undesired spaces around expressions in
parenthesis after detokenization.
- Security:
- Upgraded ``scikit-learn`` library to version ``1.5.0`` to solve
vulnerabilities detected by SNYK.
- Upgraded ``cryptography`` library to version ``42.0.8`` to
mitigate vulnerabilities detected by SNYK.
- Documentation:
- reStructuredText documentation of the evaluation and new
features (host configuration) with configuration prototypes
examples.
- **RC 42** - 2024/05/31
- Enhancements:
- Keyword groups for grouping of keywords which are used to tag
evaluators.
- H2O Eval Studio *purpose* keyword group which organizes
evaluators into disjunct sets.
- **RC 41** - 2024/05/30
- Evaluators:
- New perplexity evaluator for LLMs which calculates the
perplexity - “measure of uncertainty” - of the generated text.
- Enhancements:
- Save JSon data decoder for NaN and infinities.
- H2O Sonar can be configured whether to use GPU or CPU for the
evaluation.
- Fixes:
- HTML report generation fixed in case that evaluation of all
rows in the dataset fails.
- **RC 40** - 2024/05/29
- Evaluators:
- New summary evaluator provides completeness and faithfulness
metrics for LLM summarization tasks evaluation without the need
for a reference summary.
- Features:
- Insights - new feature allowing explainers and explanations to
provide insights into the evaluation results and suggest
actions to be taken.
- Enhancements:
- Evaluation JSon and HTML result includes overall evaluation
result represented as one value which is based on the severity
of the problems detected in the evaluation. It is represented
as traffic light colors (green, yellow, red) in the HTML
report.
- All evaluators report insights about the evaluation results and
suggest actions to be taken via insight enhancements in bool,
heatmap and classification leaderboards explanations.
- Text matching, PII and Sensitive data leakage evaluators report
apart problems and accuracy related insights also insights
about cost and performance (speed) of evaluated models.
- Models section in the HTML report rewritten to contain model
details, insights, and problems.
- Example PIIs (emails, credit cards, SSNs) in the PII evaluator
are no longer reported as problems. These false positives are
now marked as ``False`` in the evaluation results.
- Test lab statistics.
- Fixes:
- Hallucination evaluator fixed to correctly handle low values as
hallucinations (not vice versa).
- Changes:
- Bool leaderboard JSon representation values (and metrics
metadata) changed from percentages to ``[0.0, 1.0]`` float
range.
- **RC 39** - 2024/05/06
- Enhancements:
- ``ragas`` library upgrade to version ``0.1.7``.
- Fixes:
- Added on-demand caching of ``tiktoken``\ ’s BLOBs which are
used by ``ragas`` library.
- Fixed Faithfullness evaluator and RAGAs evaluator flakiness
(``NaN``) by ``ragas`` library upgrade.
- **RC 38** - 2024/05/03
- Features:
- ``ollama`` (https://ollama.com/) hosted LLMs support - new
connection, client and test lab builder.
- Enhancements:
- All evaluators detect flip of metrics and report the flip in
the evaluation results as problems. In case of boolean metrics,
the flip is detected as change from ``True`` to ``False`` and
vice versa. In case of numeric metrics, the flip is detected as
change from above to below the threshold and vice versa. In
case of the classification, the flip is detected as change from
the correct to incorrect classification and vice versa.
- Changes:
- Introducing relationships among test cases which adds new
``relationships`` key to test case, test suite and test lab as
well as column ``relationsihps`` to LLM dataset and LLM
evaluation result. JSon representations (key) and CSV
representations are extended (column). Old JSon files are
deserialized in loosely coupled way to avoid the backward
compatibility breaking changes.
- Added ``key`` field inputs in the test lab.
- Added ``key`` field/column to LLM dataset inputs (rows).
- Added ``key`` field/column to evaluation result inputs (rows).
- Fixes:
- Fixed undesired retries in the RAG/LLM test lab completion of
h2oGPTe LLM and H2O LLMOps hosts in case of the successful
completion of the test cases.
- Fixed ``NaN`` (not a number) handling in leaderboard pallette
color lookup.
- **RC 37** - 2024/04/25
- Evaluators:
- New Classification evaluator for RAGs/LLMs used for
classification problems. The evaluator calculates common
metrics used in case of binomial and multinomial classification
problems like accuracy, precision, recall and F1. The
Classification evaluator is also bringing new classification
leaderboard explanation.
- Features:
- New perturbations module with the ability to perturb the input
data (5 perturbation methods) in order to test the robustness
of the RAGs/LLMs and the quality of the data: comma, word swap,
QWERTY, synonym and antonym.
- New public perturbations API with list, filtering and
(multiple) perturbation methods application to string, test
case, test suite or LLM dataset prompts.
- 3 new summarization tests for evaluation of summaries both with
and without reference summary (Frank, SamSum and SummEval).
- Enhancements:
- Format specifier in evaluation metrics metadata changed from
Python f-strings to JavaScript D3 format strings.
- Fixes:
- Ranges in evaluation metrics metadata fixed - [0, 1] vs. [0,
100].
- Testing:
- RAG/LLM test suite can finish successfully even if OpenAI API
key is not set (auto reconfiguration to 3rd party judges; tests
which use OpenAI endpoints are skipped).
- **RC 36** - 2024/04/18
- Fixes:
- OpenAI client fixed to version 1.20.0 to keep version 1 API
compatibility (OpenAI Assistants code in H2O Sonar must be
rewritten to version 2 to move from retrieval tool to file
search).
- **RC 35** - 2024/04/18
- Feature:
- New metrics metadata - all evaluators newly declare the metrics
they calculate with the metadata (name, description, type,
unit, range, scale, …). Metrics metadata are used in the
evaluator (descriptor, evaluation, results), in the leaderbords
(JSon representation, HTML report generation), and
explanation/evaluation formats (JSon, HTML, Markdown).
- Loosely coupled serialization and deserialization of
object/JSon data structures: ``ExplainerDescriptor``,
``ExplanationDescriptor``, ``ConfigItem`` and ``FilterEntry``.
- Caching of the models used (internally) by evaluators and
explainers: public API, caching module, and caching
configuration enabling air gapped evaluators deployment.
- Backward compatibility breaking changes:
- ``data`` key added to heatmap and bool leaderboards JSon
representations.
- **RC 34** - 2024/04/12
- Fixes:
- ``NaN`` (not a number) handling/encoding in the heatmap
leaderboard JSon “all metrics” data file.
- **RC 33** - 2024/04/12
- Features:
- Microsoft Azure hosted OpenAI LLMs support - new connection,
client and test lab builder.
- H2O LLMOps hosted LLMs support - new connection, client and
test lab builder.
- Security:
- HTTPS requests SSL certificate verification configuration: H2O
Sonar configuration controls the SSL certificate verification
process/level in requests library, LLM hosts client libraries
and other HTTP(S) clients.
- Changes:
- H2O GPT client rewritten to OpenAI API client (please update
server port and base URL).
- H2O LLMOps client rewritten to OpenAI API client (no
configuration changes needed).
- Base URL parameter removed from OpenAI API client constructor
(connection configuration is used).
- **RC 32** - 2024/04/10
- Enhancements:
- Constants for keys in ``datasets.py`` Python modules.
- Documentation:
- BLEU and ROUGE evaluators .rst documentation.
- **RC 31** - 2024/04/08
- Evaluators:
- BLEU evaluator.
- ROUGE evaluator.
- Enhancements:
- New keywords for the most important ML problem types solved by
RAGs/LLMs: question answering, information retrieval,
summarization, classification (binomail and multinomial) and
regression. All evaluators were decorated with relevant
keywords.
- New keyword for the referential user role: regulator.
- Fixes:
- NaN (not a number) handling in the evaluator results, formats
and leaderboard.
- Security:
- ``nltk`` added as evaluators Pytho extras dependency.
- ``rouge-score`` added as evaluators Pytho extras dependency.
- ``punkt`` is new cached NLTK model for text to sentence
tokenization.
- **RC 30** - 2024/03/27
- Enhancements:
- Toxicity evaluator reimplemented to directly use the
``toxicity`` library and show several metrics which explain
what type of toxic content has been detected in the answer.
- Fairness bias evaluator reimplemented to directly use bias
detection model (in ONNX format) for the evaluation.
- Hallucinations evaluator reimplemented to use LLM judge for the
hallucination detection.
- Security:
- ``deepeval`` Python dependency removed. ``Evaluators`` based on
``deepeval`` were rewritten to use underlying libraries without
relying on ``deepeval``.
- ``TensorFlow`` and ``DBias`` Python dependencies removed.
Fairness Bias evaluator newly does not rely on ``DBias`` Python
library as the underlying model was ported from ``TensorFlow``
to ``ONNX``.
- ``HMLI`` moved from the core H2O Sonar dependencies to the
``explainers`` package extras. in order to avoid the CVE
vulnerabilities which must be fixed for H2O Eval Studio cloud
deployment certification.
- ``H2O-3`` moved from the core H2O Sonar dependencies to the
``explainers`` package extras in order to avoid the CVE
vulnerabilities which must be fixed for H2O Eval Studio cloud
deployment certification.
- **RC 29** - 2024/03/21
- Security:
- HMLI upgraded to MLI version 1.10.26 to mitigate CVE-2023-39013
(HMLI’s Duke dependency vulnerability).
- **RC 28** - 2024/03/17
- Features:
- Bring Your Own Judge (BYOJ) - ability to configure H2O Sonar so
that evaluators use custom LLM judges. For instance in order to
ensure privacy and avoid sending of the sensitive data to a 3rd
party. This feature includes reconfiguration of embeddings
provider from the same reasons. Custom judges can be either
forced from the H2O Sonar configuration or specified in the
evaluator parameters.
- Bring Your Own Prompt (BYOP) - ability to easily run evaluation
just be providing a prompt template or implement a new
evaluator just by inheriting from the BYOP abstract class and
specifying prompt which returns the boolean value.
- OpenAI LLM client (only Assistants with retrieval tool was
supported before). The client supports both OpenAI service (no
base URL specified) and OpenAI compatible endpoints (base URL
specified).
- Evaluators:
- Contact Information evaluator (BYOP).
- Language Mismatch evaluator (BYOP).
- Parametrizable BYOP evaluator with the ability to specify the
prompt template in the evaluator parameters.
- Sexism evaluator (BYOP).
- Stereotype evaluator which detects undesired gender/race
content in the answer (BYOP).
- Summarization evaluator (BYOP).
- Enhancements:
- ``ragas`` library upgrade to version ``0.1.3``.
- **RC 27** - 2024/03/14
- Security:
- Fairness Bias evaluator removed as it used ``dbias`` library
which depends on vulnerable TensorFlow version. This change
ensures there is no TensorFlow, un-registers the evaluator and
skips all evaluator tests (code is kept in the codebase).
- Enhancements:
- Problems are newly loaded with the load of the evaluation from
the JSon representation.
- QA:
- MMC builds disabled (it was extra cost in addition to GH
Actions build; MMC has old Python version)
- **RC 26** - 2024/03/08
- Features and enhancements:
- progress reporting:
- end to end, evaluation, all evaluators, lab (build and
completion)
- callback or file-system
- evaluators can be filtered by labels for:
- SR 11-7
- NIST AI MRM
- HTML report refactored
- sections shuffled by importance
- new evaluation (details) group added
- dataset section content reordering
- explanation and title added to insight leaderboards
- Markdown representation robustness
- input/output escaping
- LLM vs. RAG failures listing fixed
- ES summary .md redesigned
- improved text matching regexp error messages and docstring (ES
UI)
- improved .rst documentation
- evaluator parameters refactored from standalone file to
i/e.json
- Changes:
- evaluation result is stored on the file system (no longer
discarded)
- Fixed:
- 3x faster lab completion (fixed duplicate requests)
- hangs/deadlocks in the lab completion (configurable
multiprocessing)
- QA:
- new GH Actions test suite in GREEN since
09117c3ce891e410e68e62361d109f179ed4c79f
- GHA builds and test H2O EvalStudio deployment runtime
configuration only
- improved h2oGPT/h2oGPTe test server selection (config switch)
- method to purge h2oGPTe relics
- new h2oGPT servers
- **RC 25** - 2024/02/05
- Evaluators:
- Fairness bias evaluator (``deepeval`` based).
- **RC 24** - 2024/02/02
- Evaluators:
- Toxicity evaluator (``deepeval`` based).
- Fixed:
- PII and sensitive data leakages (regexps).
- **RC 23** - 2024/01/26
- Features:
- LLM/RAG clients telementry.
- Prompt cache: LLM/RAG responses can be cached on building a
test lab. The cache can be build from existing test lab and
used in RD only mode.
- Enhancements:
- LLM/RAG client retries (3 by default).
- Evaluators which require OpenAI key are tagged using keywords.
- …
- **RC 22** - 2024/01/22
- Enhancements:
- Changing h2oGPTe dependency to the last Python package version.
- Fixes:
- Hiding retrieval errors in the bool leaderboard.
- Evaluation tests:
- Removal of constraints OR expressions from test suite/labs for
Atlanta event as H2O Eval Studio does not support it yet.
- **RC 21** - 2024/01/21
- Fixes:
- Fixed RAGAs leaderboard calculation.
- Retrieved context builder enhancements.
- Tests:
- OpenAI end to end CI test which runs all evaluators.
- Evaluation tests:
- Polished, fixed (duplicate prompts) and extended SR 11-7 and
Bank teller test suites.
- **RC 20** - 2024/01/18
- Evaluators:
- Sensitive data leakage evaluator.
- Enhancements:
- Test lab build fallbacks: dummy doc for RAG.
- Fixes:
- OpenAI test lab build (missing arguments).
- Evaluation tests:
- SR 11-7 test suite w/ 171 prompts.
- **RC 19** - 2024/01/18
- Evaluators:
- PII evaluator.
- Fixes:
- Asynchronous interpretation execution fixed (inconsistent
method signatures).
- Changes:
- ``datatable`` upgraded from AWS S3 hosted version, to ``1.1.0``
pypi.org hosted version.
`v1.1.1 `__ — 2023/10/9
-----------------------------------------------------------------------
A patch release bringing minor fixes and enhancements.
.. _added-17:
Added
~~~~~
- Both CLI and Python API accept library configuration and encryption
key parameters in case that the the interpretation arguments are
provided as JSon.
.. _fixed-16:
Fixed
~~~~~
- HTML interpretation report path in the CLI output fixed (it was
pointing to the interpretation HTML index).
- False positive feature importance leak detection is no longer
reported in case of multinomial problems.
- Morris Sensitivity Analysis no longer fails in case of non-numeric
boolean columns presence in the training dataset.
.. _changed-17:
Changed
~~~~~~~
No changes.
.. _deprecated-10:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-10:
Removed
~~~~~~~
No removals.
.. _security-11:
Security
~~~~~~~~
No security fixes.
`v1.2.0 `__ — 2023/10/31
------------------------------------------------------------------------
Talk to H2O Sonar report - upload your interpretation report to the
`Enterprise h2oGPT `__ in
order to find out more about your model, data, problems, insights and
suggested (mitigation) actions.
.. _added-18:
Added
~~~~~
- **Features**
- Ability to upload your interpretation report to `Enterprise
h2oGPT `__ either using
Python API (``run_interpretation()`` method parameter,
``upload_interpretation()`` method), or CLI. The feature is
supported with Python 3.10 and Python 3.11 only.
- **Documentation**
- H2O.ai documentation theme.
.. _fixed-17:
Fixed
~~~~~
- Wheels are no longer built with the legacy pip resolver which was
causing dependency conflicts in some cases on certain platforms.
- Test and validation dataset details are newly shwon in the HTML
report.
- Opened port / Driverless AI server port check is no longer verbose.
.. _changed-18:
Changed
~~~~~~~
- No changes.
.. _deprecated-11:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-11:
Removed
~~~~~~~
No removals.
.. _security-12:
Security
~~~~~~~~
No security fixes.
.. _v1.1.1-2023109-1:
`v1.1.1 `__ — 2023/10/9
-----------------------------------------------------------------------
A patch release bringing minor fixes and enhancements.
.. _added-19:
Added
~~~~~
- Both CLI and Python API accept library configuration and encryption
key parameters in case that the the interpretation arguments are
provided as JSon.
.. _fixed-18:
Fixed
~~~~~
- HTML interpretation report path in the CLI output fixed (it was
pointing to the interpretation HTML index).
- False positive feature importance leak detection is no longer
reported in case of multinomial problems.
- Morris Sensitivity Analysis no longer fails in case of non-numeric
boolean columns presence in the training dataset.
.. _changed-19:
Changed
~~~~~~~
No changes.
.. _deprecated-12:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-12:
Removed
~~~~~~~
No removals.
.. _security-13:
Security
~~~~~~~~
No security fixes.
`v1.1.2 `__ — 2023/10/13
------------------------------------------------------------------------
A patch release bringing minor fixes and enhancements.
.. _added-20:
Added
~~~~~
No additions.
.. _fixed-19:
Fixed
~~~~~
- SHAP library version fixed to ``shap>=0.40.0,<=0.42.5`` as new
version is causing instability in feature importance explainers.
.. _changed-20:
Changed
~~~~~~~
- H2O Model Validation upgraded to ``0.16.3`` with updated ``h2osteam``
and H2O MLOps clients which avoid version clashes in upcoming H2O.ai
Cloud notebook kernels.
.. _deprecated-13:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-13:
Removed
~~~~~~~
No removals.
.. _security-14:
Security
~~~~~~~~
No security fixes.
.. _v1.1.1-2023109-2:
`v1.1.1 `__ — 2023/10/9
-----------------------------------------------------------------------
A patch release bringing minor fixes and enhancements.
.. _added-21:
Added
~~~~~
- Both CLI and Python API accept library configuration and encryption
key parameters in case that the the interpretation arguments are
provided as JSon.
.. _fixed-20:
Fixed
~~~~~
- HTML interpretation report path in the CLI output fixed (it was
pointing to the interpretation HTML index).
- False positive feature importance leak detection is no longer
reported in case of multinomial problems.
- Morris Sensitivity Analysis no longer fails in case of non-numeric
boolean columns presence in the training dataset.
.. _changed-21:
Changed
~~~~~~~
No changes.
.. _deprecated-14:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-14:
Removed
~~~~~~~
No removals.
.. _security-15:
Security
~~~~~~~~
No security fixes.
`v1.1.0 `__ — 2023/10/03
------------------------------------------------------------------------
Integration of H2O Sonar and H2O Model Validation projects.
.. _added-22:
Added
~~~~~
- **New explainers**
- Adversarial Similarity explainer.
- Backtesting explainer.
- Drift Detection explainer (reports exceeded PSI threshold as a
problem).
- Size Dependency explainer.
- Segment Performance explainer.
- Calibration Score explainer.
- **Features**
- H2O Model Validation based explainers are able to use H2O AIEM
hosted Driverless AI, H2O Enterprise Steam hosted Driverless AI or
any H2O Driverless AI which uses username/password authentication.
- Ability of H2O Sonar to run with or without H2O Model Validation
library installed. If H2O Model Validation is not available, then
H2O Model Validation based explainers just indicate
incompatibility and do not cause the interpretation to fail.
- Portable export and import of ``MVTest`` related instances like
settings, results, artifacts, and logs. The implementation is
based on JSon, CSV, and directory hierarchy. Therefore it can be
used by a wide range of tools, programming languages, and
runtimes.
- ``RemoteHandle``\ s bring support for remote (Driverless AI)
datasets and models. Apart from the data structure, it is a part
of explainers metadata and compatibility checks.
- Model is no longer required when running a new interpretation
which allows to run explainers on datasets only.
- Automatic fallback guess of the model metadata - like problem
type, labels, and used features - in case the model does not
provide them.
- **Enhancements**
- Attributes (dictionary) added to ``ProblemAndAction`` class which
enables explainers to pass machine-processable data from problems
to actions for further actionability.
- Connections and licenses are newly identified by unique keys
(identifiers) in the H2O Sonar configuration and through the
runtime.
- Python 3.10 support.
- Python 3.11 support - H2O Model Validation explainers not
available as transitive library dependencies do not support Python
3.11.
- ``daimojo`` library pre-heat prediction to activate the MOJO
models introspection.
- Interpretations index HTML path added to the CLI interpretation
output.
- Completion of the ``testset`` and ``validset`` handling
implementation in the explainer container - datasets are newly
passed to explainers along with their metadata.
- The following configuration keys were added to the H2O Sonar
library configuration:
- ``server_id``
- ``environment_url``
- ``token_use_type``
- Shapley Values for Original Features (Kernel SHAP Method)
explainer is approximately 3x faster case of multinomial problems
(the speed up is proportional to the number of classes - more
classes, more speed up).
- **Utilities**
- Shapley contributions sorter which can be used by all
Shapley-based explainers whenever multi-class contributions are
reported within the same frame - makes the code cleaner and
simpler.
- **Documentation**
- Library configuration CLI API ``reStructuredText`` documentation.
- Jupyter Notebook with examples of how to run H2O Model Validation
explainers using the Python API and CLI.
- ``reStructuredText`` documentation of all new H2O Model Validation
based explainers.
- New explainers overview table with per-explainer features and
requirements added to both ``README.md`` and ``reStructuredText``.
- Explainers overview diagram is newly organized according to the
functional architecture of explainers.
- **Tests**
- Python and CLI tests of all H2O Model Validation explainers.
.. _fixed-21:
Fixed
~~~~~
- Shapley Values for Original Features (Kernel SHAP Method) explainer
reports per-class contributions in the case of multinomial problems
(contributions were mixed together).
- Morris Sensitivity Analysis explainer fixed to work with
``InterpretML`` 0.1.20.
- Pseudocode and Python code generated by the Decision Tree explainer
is consistent again.
- HTML report fixed to properly handle if no explainer is run within
the interpretation.
- Thread safe interpretation executor shutdown.
.. _changed-22:
Changed
~~~~~~~
- The following configuration key was changed in H2O Sonar library
configuration:
- ``client_refresh_token`` has been renamed to ``token``.
.. _deprecated-15:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-15:
Removed
~~~~~~~
- Test suites which were replaced by Pytest markers.
- Tests of legacy Driverless AI models (``Makefile`` targets, S3
archives).
.. _security-16:
Security
~~~~~~~~
- No security enhancements.
`v1.0.0 `__ — 2023/6/30
-----------------------------------------------------------------------
The first stable H2O Sonar release.
.. _added-23:
Added
~~~~~
- **Enhancements**
- Multiple sampling methods for the explainer dataset (stratified,
random, head).
- Configurable out-of-memory (OOM) protection.
- Improved ability of the interpretable model to extract
``scikit-learn`` models metadata.
- **Utilities**
- Random attack utility that tests H2O Sonar on many datasets and
models: it gets a directory with datasets as a parameter, trains a
(``scikit-learn``) model for a random dataset and its column, and
finally runs all the explainers to test the H2O Sonar.
- **Documentation**
- Explainers overview diagram indicates whether the explainer
reports problem(s).
- Configuration management documentation (including encryption).
- Per-explainer problem reporting capabilities documentation.
.. _fixed-22:
Fixed
~~~~~
- Summary Shapley explainer and Original feature importance explainer
fixed to properly use SHAP library to get Shapley values for
regression vs. multinomial (experiment type detection).
- Disparate Impact Analysis calculation fixes (comparisons in metrics)
in case of string features.
- Decision tree Python code and pseudo-code generator fixed.
- HTML report fixed to properly display explanations type and
format(s).
- Division by zero fixed in the progress reporting runtime.
.. _changed-23:
Changed
~~~~~~~
- CLI, JSon and Python parameter names were unified - this change
breaks backward compatibility and was intentionally done before the
first stable release.
.. _deprecated-16:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-16:
Removed
~~~~~~~
No removals.
.. _security-17:
Security
~~~~~~~~
- Added encryption of sensitive fields in the H2O Sonar configuration
(config, CLI, documentation).
`v0.11.2 `__ — 2023/7/26
-------------------------------------------------------------------------
.. _added-24:
Added
~~~~~
- **Enhancements**
No enhancements.
.. _fixed-23:
Fixed
~~~~~
- Fix .py/pseudo code generated by DT: > vs. >=
.. _changed-24:
Changed
~~~~~~~
- Upgrade MLI jar to 1.10.23
.. _deprecated-17:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-17:
Removed
~~~~~~~
No removals.
.. _security-18:
Security
~~~~~~~~
No security fixes.
`v0.11.1 `__ — 2023/5/22
-------------------------------------------------------------------------
Handle missing value bins for PD when OOR is enabled and output
histogram data to PD results.
.. _added-25:
Added
~~~~~
- **Enhancements**
- Output previously missing histogram data to PD results.
.. _fixed-24:
Fixed
~~~~~
- Correctly handle missing value bins for PD when OOR is enabled.
.. _changed-25:
Changed
~~~~~~~
No changes.
.. _deprecated-18:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-18:
Removed
~~~~~~~
No removals.
.. _security-19:
Security
~~~~~~~~
No security fixes.
`v0.11.0 `__ — 2023/4/24
-------------------------------------------------------------------------
Leak detection added to feature importance explainers.
.. _added-26:
Added
~~~~~
- **Enhancements**
- Leak detection added to feature importance explainers: Shapley
Values for Original Features (naive method) explainer, Morris
Sensitivity Analysis explainer, Shapley Values for Original
Features (Kernel SHAP method) explainer.
- Missing values are treated as a separate bin in the PD explainer.
- H2O Sonar CLI can read arguments from JSon file.
.. _fixed-25:
Fixed
~~~~~
- Fixed display of plots in Jupyter notebooks.
.. _changed-26:
Changed
~~~~~~~
No changes.
.. _deprecated-19:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-19:
Removed
~~~~~~~
No removals.
.. _security-20:
Security
~~~~~~~~
No security fixes.
`v0.10.1 `__ — 2023/3/9
------------------------------------------------------------------------
Patch release bringing ``Result`` (documentation) enhancements.
.. _added-27:
Added
~~~~~
No new features.
.. _fixed-26:
Fixed
~~~~~
No fixes.
.. _changed-27:
Changed
~~~~~~~
No changes.
.. _deprecated-20:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-20:
Removed
~~~~~~~
No removals.
.. _security-21:
Security
~~~~~~~~
No security fixes.
`v0.10.0 `__ — 2023/2/2
------------------------------------------------------------------------
New Dataset and Model Insights explainer and fixes of bugs found by a
new random attack.
.. _added-28:
Added
~~~~~
- **Explainers**
- New Dataset and Model Insights explainer.
- **Enhancements**
- Residual Decision Tree explainer newly highlights the whole path
to the highest residuum in the visualized tree.
- DIA result API help related to the reference level improved.
.. _fixed-27:
Fixed
~~~~~
- Surrogate Decision Tree Python code generator fixed: added missing
``(`` ``)`` in boolean expressions, features can have any characters
in their names.
- Move from ``os.rename`` to ``shutil.move`` in order to ensure that
the operation will not fail if the source and target are on different
file systems.
- Missing ``isna`` symbol used in the Disparate Impact Analysis
explainer.
- Comparison of ``string``\ s and ``bool``\ s in the ICE method.
- Float division by zero in the Residual Decision Tree explainer.
.. _changed-28:
Changed
~~~~~~~
No changes.
.. _deprecated-21:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-21:
Removed
~~~~~~~
No removals.
.. _security-22:
Security
~~~~~~~~
No security fixes.
`v0.9.0 `__ — 2023/1/13
-----------------------------------------------------------------------
Minor H2O Sonar release which brings asynchronous interpretation
execution.
.. _added-29:
Added
~~~~~
- **Features**
- New option allowing to run interpretations asynchronously.
- **Enhancements**
- New introspection API for Result classes (method parameters).
.. _fixed-28:
Fixed
~~~~~
- Sqrt MSE to get RMSE in the Surrogate Decision Tree explainer.
- Handling of date, time and date time features in the PD/ICE
explainer.
.. _changed-29:
Changed
~~~~~~~
No changes.
.. _deprecated-22:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-22:
Removed
~~~~~~~
No removals.
.. _security-23:
Security
~~~~~~~~
No security fixes.
`v0.8.0 `__ — 2022/12/8
-----------------------------------------------------------------------
New Partial Dependence for 2 Features explainer and enhancements for H2O
Sonar explainer container implementation for Driverless AI.
.. _added-30:
Added
~~~~~
- **Explainers**
- New Partial Dependence for 2 Features explainer.
- **Features**
- New Global 3D Data result, explanation and associated formats
(JSon, CSV).
- **Enhancements**
- Command-line interface with pretty-printed listing of explainers,
improved formatting of explainer descriptions and H2O Sonar
version ``show`` action.
- Residual PD/ICE for multinomial problems added.
- Improved explainer container resolution and creation (identifier,
instance).
- Model agnostic API to indicate the ability to provide/calculate
Shapley values added.
- Improved compatibility checks and new compatibility error type.
- Explainable model’s features metadata simplification, completion
and consolidation.
- Explainable dataset’s metadata simplification, completion and
consolidation.
- Improved HTML report highlights failed explainers, brings a
comprehensive overview section, shows new modal and dataset
metadata fields.
- **Documentation**
- Added Jupyter Notebook documentation of how to run H2O Sonar in
the Internal H2O.ai Cloud.
.. _fixed-29:
Fixed
~~~~~
- Disparate Impact Analysis explanations completed to be 100% binary
compatible with Driverless AI’s Grammar of MLI (entities).
- Disparate Impact Analysis explainer feature resolution for DIA
calculation rewritten.
- Disparate Impact Analysis explainer and PD/ICE explainer fixed to
work on a dataset with string (target) column(s).
- Residual PD/ICE no longer returns regular PD/ICE as the default
representation (and residual as an extension), but the residual
PD/ICE.
- Residual PD/ICE HTML fragment representation path to images fixed so
that it no longer renders the same charts for all classes.
- Summary Shapley explainer name correctly indicates SHAP method (not
wrong naive Shapley method).
.. _changed-30:
Changed
~~~~~~~
- Features metadata class of the explainable model has been refactored
to the ``h2o_sonar.methods.core.method`` module and all constant
references consolidated to this class.
- Operating system version to build Linux distribution and wheels has
been changed from ``Ubuntu 20.04`` to ``Ubuntu 18.04`` to ensure that
H2O Sonar wheels will work both on this and new Ubuntu versions.
.. _deprecated-23:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-23:
Removed
~~~~~~~
No removals.
.. _security-24:
Security
~~~~~~~~
- MLI upgrade to 1.10.21 to mitigate CVE-2022-2048 and CVE-2022-25647.
`v0.7.0 `__ — 2022/10/18
------------------------------------------------------------------------
H2O Sonar **beta** release with Bring Your Own Explainer based
extensibility, reporting of model problems, new Residual PD/ICE
explainer, new Morris sensitivity analysis and various smaller
enhancements.
.. _added-31:
Added
~~~~~
- **Features**
- BYOE - Bring Your Own Explainer.
- Model problems and actions.
- **Explainers**
- New Residual Partial Dependence/Individual Conditional
Expectations explainer.
- New Morris Sensitivity Analysis explainer.
- Residual Decision Tree explainer reports problems and actions.
- **Explanations**
- New interpretation report - structure, content, and theme in
H2O.ai colors.
- Organization of explainers to functional groups.
- **Utilities**
- Improved label encoder to simplify the use of 3rd party libraries
that require numeric (non-categorical) features. Label encoder is
integrated into both explainable dataset and explainable model
APIs.
- **Command-line interface**
- All Python API’s interpretation parameters are newly available on
CLI.
- **Documentation**
- Added Getting started with BYOE.
.. _fixed-30:
Fixed
~~~~~
- HTML report paths to images and explanations are relative and valid
regardless of the results directory location.
- Explainer container runtime and explainers stabilized to work on raw
(non-sanitized) datasets.
- Explainers listing action help fixed on the command line interface.
.. _changed-31:
Changed
~~~~~~~
- ``list_explainers()`` method on both Python API and CLI lists **all**
explainers by default (it listed only basic explainers with
``run-by-default`` keywords before
- this change).
- Logging consolidated to single module ``h2o_sonar.loggers`` and
loggers renamed/refactored so that it can be used both in methods and
explainers.
- Migration of explainer container runtime from ``HMLI`` to ``h2o``
wheel dependency.
- Parameter ``path`` of ``zip()`` method used by explainer’s ``Result``
class has been changed to ``file_path`` to make it consistent with
other ``Result`` parameters.
- ``Result`` classes refactoring from explainer implementations into
consolidated and reusable results classes for main supported
explanation types.
- The ``summary()`` method’s functionality is moved to ``params()`` and
the new ``summary()`` method returns the summary of the explanation
(content of ``result_descriptor.json``)
.. _deprecated-24:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-24:
Removed
~~~~~~~
No removals.
.. _security-25:
Security
~~~~~~~~
No security fixes.
`v0.6.0 `__ — 2022/9/8
----------------------------------------------------------------------
New Friedman’s H-statistic and Residual Surrogate Decision Tree
explainers, Driverless AI REST interface model support and improved HTML
interpretation representation.
.. _added-32:
Added
~~~~~
- **Explainers**
- Friedman’s H-statistic explainer for feature behavior
explanations.
- Residual Surrogate Decision Tree for model debugging (new default
explainer).
- **Model support**
- Added Driverless AI REST interface model support.
- **Explanations**
- Significantly improved HTML interpretation representation with new
explanation charts for every explainer, interpretation parameters
and explainers parameters.
- **Command line interface**
- Added parameter to run all explainers (not just basic explainers).
- Interpretation listing including HTML representation.
- **Documentation**
- Bring Your Own Explainer templates and examples added to
distributions.
.. _fixed-31:
Fixed
~~~~~
- Improved scikit-learn multinomial models support with labels lookup.
- Compatibility check function gets all available parameters for more
advanced checks.
- DIA HTML fragment representation path to images.
- In-memory persistence store (keys) stabilization.
- Logging names and interpretation and explainer logging keys
consistency.
.. _changed-32:
Changed
~~~~~~~
- ``hmli`` and ``daimojo`` dependencies updated.
- Source distribution - ``tarball`` - build changed so that doesn’t
contain ``.whl``.
- Binary distributions are built for every supported platform.
.. _deprecated-25:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-25:
Removed
~~~~~~~
No removals.
.. _security-26:
Security
~~~~~~~~
No security fixes.
`v0.5.0 `__ — 2022/8/16
-----------------------------------------------------------------------
Fix release which brings binary distribution with improved documentation
and Jupyter Notebook examples.
.. _added-33:
Added
~~~~~
- **Documentation**
- Improved ``ReStructuredText`` documentation with getting started,
library documentation (interpretation, configuration, explainers),
licenses and change log.
- New and improved Jupyter Notebook examples.
- **Model support**
- Added pickled (Scikit-learn) models interpretability.
- **Command line interface**
- Added parameters to specify features used by the model and
per-explainer parameters.
.. _fixed-32:
Fixed
~~~~~
- Summary Shapley explainer stabilization: scatter plot feature values
fixed, main chart includes all features,
regression/binomial/multinomial labels fixed, ``max_features``
parameter honored, per-class multinomial explanations are generated
in all supported formats.
- Fixed the simple mock model prediction function and added SHAP method
support for mock models.
.. _changed-33:
Changed
~~~~~~~
- Models and datasets - used by examples, demos and tests -
consolidated and refactored to indicate dataset and model type.
.. _deprecated-26:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-26:
Removed
~~~~~~~
No removals.
.. _security-27:
Security
~~~~~~~~
No security fixes.
`v0.4.2 `__ — 2022/11/29
------------------------------------------------------------------------
Fix of the following MLI Java backend security issues: CVE-2022-2048 and
CVE-2022-25647.
.. _added-34:
Added
~~~~~
.. _fixed-33:
Fixed
~~~~~
.. _changed-34:
Changed
~~~~~~~
.. _deprecated-27:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-27:
Removed
~~~~~~~
No removals.
.. _security-28:
Security
~~~~~~~~
- MLI upgrade to 1.10.17.2 to mitigate CVE-2022-2048 and
CVE-2022-25647.
`v0.4.1 `__ — 2022/11/17
------------------------------------------------------------------------
Fix of the following MLI Java backend security issues: CVE-2022-2048 and
CVE-2022-25647.
.. _added-35:
Added
~~~~~
.. _fixed-34:
Fixed
~~~~~
.. _changed-35:
Changed
~~~~~~~
.. _deprecated-28:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-28:
Removed
~~~~~~~
No removals.
.. _security-29:
Security
~~~~~~~~
- MLI upgrade to 1.10.17.1 to mitigate CVE-2022-2048 and
CVE-2022-25647.
`v0.4.0 `__ — 2022/6/29
-----------------------------------------------------------------------
New Transformed Feature Importance explainer for Driverless AI MOJO
models and preparation for H2O Sonar integration to Driverless AI.
.. _added-36:
Added
~~~~~
- **Explainers**
- Transformed Feature Importance explainer for Driverless AI MOJO
models.
- **Explainer container API and CLI**
- H2O Sonar version available in runtime.
- **Documentation**
- Jupyter Notebook with interpretation result API for the new
explainer.
- H2O Sonar explainers overview diagram updated.
.. _fixed-35:
Fixed
~~~~~
- All `MLI-2 `__ fixes between `H2O
Sonar `__ fork and now ported to
this repository.
- Naive Shapley Feature Importance explainer multinomial explanations
fixed and the performance improved.
.. _changed-36:
Changed
~~~~~~~
- Core H2O Sonar dependencies updated to be aligned with Driverless AI
1.10.4, two separate builds will be available going forward - regular
and Driverless AI.
.. _deprecated-29:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-29:
Removed
~~~~~~~
No removals.
.. _security-30:
Security
~~~~~~~~
- MLI upgrade to 0.10.17 to mitigate CVE-2022-25647.
`v0.3.0 `__ — 2022/6/22
-----------------------------------------------------------------------
New Kernel SHAP feature Importance explainer.
.. _added-37:
Added
~~~~~
- **Explainers**
- Kernel SHAP Feature Importance explainer for all supported
interpretable models.
- **Explainer container API and CLI**
- H2O-3 is automatically started (or reused) - based on H2O-3
configuration.
- CLI rewrite to provide more accurate help, error reporting and
robust execution.
- **Documentation**
- Jupyter Notebook with interpretation result API for the new
explainer.
.. _fixed-36:
Fixed
~~~~~
- Interpretation HTML representation links are no longer broken on the
use of the relative path.
- Explainers’ summary method returns the correct (non-empty) parameters
of the explainer run.
- Disparate Impact Analysis explainer core dump on invalid target
column specification.
.. _changed-37:
Changed
~~~~~~~
No changes.
.. _deprecated-30:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-30:
Removed
~~~~~~~
No removals.
.. _security-31:
Security
~~~~~~~~
No security fixes.
`v0.2.0 `__ — 2022/6/3
----------------------------------------------------------------------
New Feature Importance explainer for Driverless AI MOJO models.
.. _added-38:
Added
~~~~~
- **Explainers**
- Naive Shapley Feature Importance explainer for Driverless AI MOJO
models.
- **Explainer container API and CLI**
- list explainers to get available explainer IDs or descriptors.
- **Documentation**
- Jupyter Notebook with interpretation result API for the new
explainer.
.. _fixed-37:
Fixed
~~~~~
- CLI: log level specification case insensitivity.
- macOS: Driverless AI MOJO import made local.
.. _changed-38:
Changed
~~~~~~~
No changes.
.. _deprecated-31:
Deprecated
~~~~~~~~~~
No deprecations.
.. _removed-31:
Removed
~~~~~~~
No removals.
.. _security-32:
Security
~~~~~~~~
No security fixes.
`v0.1.0 `__ — 2022/5/27
-----------------------------------------------------------------------
Initial H2O Sonar internal MVP release.
.. _added-39:
Added
~~~~~
- **Explainers**
- Partial dependence/Individual Conditional Expectations explainer
(PD/ICE)
- Shapley summary plot explainer
- Decision tree explainer
- Disparate Impact Analysis explainer (DIA)
- **Explainer container with public explainer APIs**
- Interpretation, model, dataset, explainer and persistence API.
- Explainer container (runtime).
- File-system and in-memory persistence.
- Easy to use API for retrieval of explainer results.
- **Model vendor support**
- Scikit-learn models.
- H2O-3 models.
- Driverless AI MOJO models.
- **Command line interface**
- CLI support of MOJO and pickled models interpretations.
- **Documentation**
- Per-explainer Jupyter Notebook with interpretation result API.
- Installation, Getting Started and Reference Guide (Sphinx/HTML).
.. _fixed-38:
Fixed
~~~~~
No fixes (initial release).
.. _changed-39:
Changed
~~~~~~~
No changes (initial release).
.. _deprecated-32:
Deprecated
~~~~~~~~~~
No deprecations (initial release).
.. _removed-32:
Removed
~~~~~~~
No removals (initial release).
.. _security-33:
Security
~~~~~~~~
No security fixes (initial release).