Wikibase support#4
Conversation
exowanderer
left a comment
There was a problem hiding this comment.
I added a lot of questions and suggestions. Nothing looks problematic so far, but I have not completed my review -- only 4 out of 13 files
| "text": self.text, | ||
| "lang": self.lang | ||
| } | ||
| return {"text": self.text, "lang": self.lang} |
There was a problem hiding this comment.
Code reviewability and maintainability standards prefer the previous version:
return {
"text": self.text,
"lang": self.lang
}
There was a problem hiding this comment.
This was changed by Ruff Lint's formatting
| self.latitude is not None | ||
| and self.longitude is not None | ||
| ) | ||
| return self.latitude is not None and self.longitude is not None |
There was a problem hiding this comment.
This could go either way, but I prefer the maintainabilitiy of the previous version
return (
self.latitude is not None
and self.longitude is not None
)
There was a problem hiding this comment.
This was changed by Ruff Lint's formatting
| and self.label is not None | ||
| and str(self.label) != "" | ||
| ) | ||
| return bool(self.id) and self.label is not None and str(self.label) != "" |
There was a problem hiding this comment.
I think we should use a temp variable here for clarity
id_and_label_check = bool(self.id) and self.label is not None and str(self.label)
return id_and_label_check != ""
| bool(self.property) | ||
| and any(bool(v) for v in self.values) | ||
| ) | ||
| return bool(self.property) and any(bool(v) for v in self.values) |
There was a problem hiding this comment.
For maintainability and readability, I prefer the previous version
return (
bool(self.property) and any(bool(v)
for v in self.values)
)
There was a problem hiding this comment.
This was changed by Ruff Lint's formatting
| @@ -302,8 +289,8 @@ class WikidataClaimValue: | |||
| value: Optional[ | |||
There was a problem hiding this comment.
This is a little difficult to read. Maybe this could be better:
Optional[
Union[
WikidataEntity,
WikidataQuantity,
WikidataTime,
WikidataCoordinates,
WikidataText,
WikidataMonolingualText
]
] = None
There was a problem hiding this comment.
This was also changed by Ruff Lint's formatting
| @@ -36,10 +37,11 @@ | |||
|
|
|||
|
|
|||
| class WikidataLabel(Base): | |||
There was a problem hiding this comment.
Should this be generalised to WikibaseLabel(Base)?
There was a problem hiding this comment.
I made this change and renamed everything with "Wikidata" in it to "Wikibase".
I additionally added aliases to keep support for Wikidata classes and files, otherwise external repositories importing Wikidata would break (such as the vector database pipeline)
| """Create tables if they do not already exist.""" | ||
| try: | ||
| Base.metadata.create_all(engine) | ||
| WikidataLabel._migrate_labels_table_for_wikibase() |
There was a problem hiding this comment.
If we want to generalise the class name, then we would need to change this to WikibaseLabel as well
There was a problem hiding this comment.
I made this change and renamed everything with "Wikidata" in it to "Wikibase".
I additionally added aliases to keep support for Wikidata classes and files, otherwise external repositories importing Wikidata would break (such as the vector database pipeline)
|
|
||
| @staticmethod | ||
| def add_bulk_labels(data): | ||
| def _migrate_labels_table_for_wikibase(): |
There was a problem hiding this comment.
Is this completely new?
If so, what problem does it solve or feature does it enable?
There was a problem hiding this comment.
The database structure has been changed to include the wikibase action api as a new column and primary key. This function runs at start and migrates the old database to the new one by filling this column with Wikidata's action API.
| """ | ||
| SELECT COUNT(*) | ||
| FROM information_schema.COLUMNS | ||
| WHERE TABLE_SCHEMA = :schema_name |
There was a problem hiding this comment.
Can this be an F-String?
f"""
SELECT COUNT(*)
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = {DB_NAME} # <---- HERE
AND TABLE_NAME = 'labels'
AND COLUMN_NAME = 'wikibase_url'
"""
There was a problem hiding this comment.
Bound parameters are safer than f-string because they handle SQL injection and syntax breakage.
| SELECT COLUMN_NAME | ||
| FROM information_schema.KEY_COLUMN_USAGE | ||
| WHERE TABLE_SCHEMA = :schema_name | ||
| AND TABLE_NAME = 'labels' |
There was a problem hiding this comment.
f"""
SELECT COLUMN_NAME
FROM information_schema.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA = {DB_NAME} <--- HERE
AND TABLE_NAME = 'labels'
AND CONSTRAINT_NAME = 'PRIMARY'
ORDER BY ORDINAL_POSITION
"""
There was a problem hiding this comment.
Bound parameters are safer than f-string because they handle SQL injection and syntax breakage.
exowanderer
left a comment
There was a problem hiding this comment.
Completed review of another 5 files -- the smaller ones
| "-k", "uvicorn.workers.UvicornWorker", \ | ||
| "-w", "1", \ | ||
| "-b", "0.0.0.0:5000"] No newline at end of file | ||
| CMD ["uv", "run", "gunicorn", "main:app", "-k", "uvicorn.workers.UvicornWorker", "-w", "2", "--bind", "0.0.0.0:5000", "--timeout", "30", "--graceful-timeout", "15", "--max-requests", "1000", "--max-requests-jitter", "100", "--keep-alive", "5", "--access-logfile", "-", "--error-logfile", "-"] No newline at end of file |
There was a problem hiding this comment.
nit: Please add a trailing space after the las line
|
|
||
| ## Future Plan | ||
|
|
||
| - Replace Action API with GraphQL once Wikibase GraphQL is available for Wikibases: |
There was a problem hiding this comment.
We may never "replace" the ActionAPI, but we will supplement some functionality with GraphQL in addition.
…with additional aliases for Wikidata
| text( | ||
| """ | ||
| SELECT COLUMN_NAME | ||
| FROM information_schema.KEY_COLUMN_USAGE |
There was a problem hiding this comment.
Is the DB schema stored in this repo as well?
There was a problem hiding this comment.
The DB schema can be found under the WikidataLabel class
| ] | ||
|
|
||
| if primary_key_cols == ["id"]: | ||
| connection.execute( |
There was a problem hiding this comment.
I assume this only happens when a new DB is first instantiated.
I think we should log that on the server and possibly flag it during deployment
| missing_ids = set(ids) - set(labels.keys()) | ||
| if missing_ids: | ||
| missing_labels = WikidataLabel._get_labels_wdapi(missing_ids) | ||
| missing_labels = WikidataLabel._get_labels_wdapi(missing_ids, wb_url=normalized_wb_url) |
There was a problem hiding this comment.
| missing_labels = WikidataLabel._get_labels_wdapi(missing_ids, wb_url=normalized_wb_url) | |
| missing_labels = WikidataLabel._get_labels_wdapi( | |
| missing_ids, | |
| wb_url=normalized_wb_url | |
| ) |
There was a problem hiding this comment.
Ruff Lint's formatter will revert this back to one line
| # Cache labels | ||
| WikidataLabel.add_bulk_labels( | ||
| [{"id": entity_id, "labels": entity_labels} for entity_id, entity_labels in missing_labels.items()] | ||
| [{"id": entity_id, "labels": entity_labels} for entity_id, entity_labels in missing_labels.items()], |
There was a problem hiding this comment.
| [{"id": entity_id, "labels": entity_labels} for entity_id, entity_labels in missing_labels.items()], | |
| [ | |
| {"id": entity_id, "labels": entity_labels} | |
| for entity_id, entity_labels in missing_labels.items() | |
| ], |
There was a problem hiding this comment.
Ruff Lint's formatter will revert this back to one line
| dict[str, dict]: Mapping of each ID to compressed labels. | ||
| """ | ||
| entities_data = get_wikidata_json_by_ids(ids, props="labels") | ||
| entities_data = get_wikidata_json_by_ids(ids, action_api_url=wb_url, props="labels") |
There was a problem hiding this comment.
| entities_data = get_wikidata_json_by_ids(ids, action_api_url=wb_url, props="labels") | |
| entities_data = get_wikidata_json_by_ids( | |
| ids, | |
| action_api_url=wb_url, | |
| props="labels" | |
| ) |
There was a problem hiding this comment.
Ruff Lint's formatter will revert this back to one line
| calls.append((list(ids), wb_url)) | ||
| return {"Q42": {"en": "Douglas Adams"}} | ||
|
|
||
| monkeypatch.setattr(WikidataLabel, "get_bulk_labels", staticmethod(fake_get_bulk_labels)) |
There was a problem hiding this comment.
| monkeypatch.setattr(WikidataLabel, "get_bulk_labels", staticmethod(fake_get_bulk_labels)) | |
| monkeypatch.setattr( | |
| WikidataLabel, | |
| "get_bulk_labels", | |
| staticmethod(fake_get_bulk_labels) | |
| ) |
There was a problem hiding this comment.
Ruff Lint's formatter will revert this back to one line
|
|
||
| monkeypatch.setattr(WikidataLabel, "get_bulk_labels", staticmethod(fake_get_bulk_labels)) | ||
|
|
||
| factory = LazyLabelFactory(lang="en", fallback_lang="en", wb_url="https://example.wikibase.local/") |
There was a problem hiding this comment.
| factory = LazyLabelFactory(lang="en", fallback_lang="en", wb_url="https://example.wikibase.local/") | |
| factory = LazyLabelFactory( | |
| lang="en", | |
| fallback_lang="en", | |
| wb_url="https://example.wikibase.local/" | |
| ) |
|
|
||
| qids = [q.strip() for q in id.split(",")] | ||
| label_factory = LazyLabelFactory(lang=lang, fallback_lang=fallback_lang) | ||
| label_factory = LazyLabelFactory(lang=lang, fallback_lang=fallback_lang, wb_url=action_api_url) |
There was a problem hiding this comment.
| label_factory = LazyLabelFactory(lang=lang, fallback_lang=fallback_lang, wb_url=action_api_url) | |
| label_factory = LazyLabelFactory( | |
| lang=lang, | |
| fallback_lang=fallback_lang, | |
| wb_url=action_api_url | |
| ) |
There was a problem hiding this comment.
Ruff Lint's formatter will revert this back to one line
| entity_id=qids[0], | ||
| ttl_text=entity_data, | ||
| try: | ||
| entity_data = utils.get_wikidata_json_by_ids(qids, action_api_url=action_api_url) |
There was a problem hiding this comment.
| entity_data = utils.get_wikidata_json_by_ids(qids, action_api_url=action_api_url) | |
| entity_data = utils.get_wikidata_json_by_ids( | |
| qids, | |
| action_api_url=action_api_url | |
| ) |
There was a problem hiding this comment.
Ruff Lint's formatter will revert this back to one line
There was a problem hiding this comment.
There are a lot of changes that should have been single commits.
Could you please comment on each deletion section and addition section, "what changed and why"?
| from fastapi.middleware.cors import CORSMiddleware | ||
|
|
||
| from src import utils | ||
| from src.Normalizer import JSONNormalizer, TTLNormalizer |
There was a problem hiding this comment.
TTLNormalizer has been removed to support Wikibases.
TTLNormalizer is only capable of parsing Wikidata entities while JSONNormalizer supports all Wikibases with an Action API endpoint.
|
|
||
| entity_data = TTLNormalizer( | ||
| entity_id=qids[0], | ||
| ttl_text=entity_data, |
There was a problem hiding this comment.
In the previous implementation, TTLNormalizer was used to parse requents of one Wikidata entity. This has been removed and replaced by the JSONNormalizer to support Wikibase entities as well.
| if entity | ||
| else None | ||
| for qid, entity in entity_data.items() | ||
| } |
There was a problem hiding this comment.
All the removed lines here are also based on the decision to remove TTLNormalizer and only use the JSONNormalizer
Including support for textifying Wikibase entities: