Lint: detect duplicated observable IDs #446

lcontento · 2020-06-18T17:56:09Z

The linter function check_observable_df does not detect duplicated observable IDs. MWE is

import petab
import pandas as pd

observable_df = pd.DataFrame(data={
    petab.OBSERVABLE_ID: ['obs1', 'obs1'],
    petab.OBSERVABLE_FORMULA: ['x1', 'x2'],
    petab.NOISE_FORMULA: ['sigma1', 'sigma2']
}).set_index(petab.OBSERVABLE_ID)

petab.lint.check_observable_df(observable_df)

Incidentally, this causes a problem in check_measurement_df too.

codecov · 2020-06-18T17:57:59Z

Codecov Report

Merging #446 into develop will increase coverage by 0.10%.
The diff coverage is 92.85%.

@@             Coverage Diff             @@
##           develop     #446      +/-   ##
===========================================
+ Coverage    77.88%   77.99%   +0.10%     
===========================================
  Files           22       22              
  Lines         2211     2222      +11     
  Branches       529      531       +2     
===========================================
+ Hits          1722     1733      +11     
  Misses         363      363              
  Partials       126      126

Impacted Files	Coverage Δ
petab/lint.py	`77.77% <92.85%> (+0.82%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 843edd3...05dd5bf. Read the comment docs.

yannikschaelte · 2020-06-18T18:06:59Z

petab/lint.py

@@ -621,6 +625,21 @@ def assert_noise_distributions_valid(observable_df: pd.DataFrame) -> None:
                    f"table: {distr}.")


+def assert_observable_id_is_unique(observable_df: pd.DataFrame) -> None:


Suggested change

def assert_observable_id_is_unique(observable_df: pd.DataFrame) -> None:

def assert_unique_observable_ids(observable_df: pd.DataFrame) -> None:

Name is indeed better. But since I essentially copied assert_parameter_id_is_unique, maybe if I change the name for the new function I should change it for the old too.

right. yes, makes sense to change the old one too then.

yannikschaelte · 2020-06-18T18:09:24Z

petab/lint.py

+    if len(observable_df.index) != len(set(observable_df.index)):
+        raise AssertionError(
+            f"{OBSERVABLE_ID} column in observable table is not unique.")
+


may be good to also return the duplicate ids.

You mean to put the duplicated list inside the AssertionError message? It is certainly more informative. Again, if I do it here I should do it for assert_parameter_id_is_unique too I guess.

not crucial, but I guess that would be helpful.

Release 0.1.9 Library: * Allow URL as filenames for YAML files and SBML models (Closes #187) (#459) * Allow model time in observable formulas (#445) * Make float parsing from CSV round-trip (#444) * Validator: Error message for missing IDs, with line numbers. (#467) * Validator: Detect duplicated observable IDs (#446) * Some documentation and CI fixes / updates * Visualization: Add option to save visualization specification (#457) * Visualization: Column XValue not mandatory anymore (#429) * Visualization: Add sorting of indices of dataframes for the correct sorting of x-values (#430) * Visualization: Default value for the column x_label in vis_spec (#431)

Lint: detect duplicated observable IDs

128e4bc

yannikschaelte approved these changes Jun 18, 2020

View reviewed changes

Lint: duplicated IDs: better naming and messages

05dd5bf

lcontento merged commit 9cbf54e into develop Jul 1, 2020

lcontento deleted the fix_lint_duplicated_obs branch July 1, 2020 08:37

yannikschaelte mentioned this pull request Jul 23, 2020

Release 0.1.8 #454

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lint: detect duplicated observable IDs #446

Lint: detect duplicated observable IDs #446

		@@ -621,6 +625,21 @@ def assert_noise_distributions_valid(observable_df: pd.DataFrame) -> None:
		f"table: {distr}.")


		def assert_observable_id_is_unique(observable_df: pd.DataFrame) -> None:

	def assert_observable_id_is_unique(observable_df: pd.DataFrame) -> None:
	def assert_unique_observable_ids(observable_df: pd.DataFrame) -> None:

Lint: detect duplicated observable IDs #446

Lint: detect duplicated observable IDs #446

Conversation

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment