[Epic] Better understanding of WDQS users and use cases
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Gehel
	Jul 3 2020, 2:21 PM

Description

To better understand how WDQS can be improved (particularly in terms of scaling) we need a better understanding of our users and their use cases.

Various questions consolidated from Search Platform virtual offsite

2% of queries are taking 95% of the server time: what are those 2% of queries doing? Can / should we restrict them? Are those broken bot queries, or actually valuable?
what are the most expensive User Agents? Can we identify heavy users and work with them to reduce that load?
what percentage of queries / which kind of queries care about the freshness of the data?
how important is it to have the full graph to answer questions that people are asking? can we infer that from the queries + data?
do we have strongly connected components in the Wikidata graph? Can this be used to split the graph in sub graphs?

random notes from Search Platform virtual offsite:

we need to pair someone who knows what to look for with someone who knows how to look for things (@Addshore and @JAllemandou?)
Currently, we only log the queries. For search, we also log the results, maybe something similar could help answer our questions. Maybe that’s a lot of data. Even just the size of the response might be interesting.
If we can find entities used in queries and can we group them. If queries are person A, person B, … we should be able to know that queries are about people.

Status	Assigned	Task
Resolved	dcausse	T221938 [Epic] Scaling strategy for Wikidata Query Service
Resolved	Gehel	T257045 [Epic] Better understanding of WDQS users and use cases
Declined	None	T258269 Add query result to the current WDQS event logging
Resolved	• Zbyszko	T261841 Tag WDQS query log with the source of the query (UI vs direct access)
Resolved	• Zbyszko	T261937 Add CPU load and query concurrency as context to event logging from WDQS
Resolved	Lydia_Pintscher	T264194 Examine WDQS queries for intent
Resolved	• Zbyszko	T264447 Examine cases where Blazegraph generates results that timeout and don’t make it back to the user
Resolved	JAllemandou	T266022 Programmatically categorize WDQS queries by potential alternative solution
Resolved	None	T280075 Create WDQS UI user survey to collect data on user priorities
Resolved	• MPhamWMF	T280077 Announce WDQS UI user survey