Jump to content

Wikipedia:Request a query

From Wikipedia, the free encyclopedia
(Redirected from Wikipedia:SQL requests)

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of article titles that meet certain criteria, consider using PetScan (user manual) or the default search. Petscan can generate list of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.

Serial commas in page titles

[edit]

I posted the following request at WP:BOTR and was advised to come here.

Extended content

Hello, I'm not sure that this request can be completed automatically; please accept my apology if it can't. I just want some lists, without edits to anything except the page where you put the lists, so it's not a CONTEXTBOT issue: just a "good use of time" issue. Could you compile some lists of pages in which serial commas are present or are omitted? I just discovered List of cities, towns and villages in Cyprus and created List of cities, towns, and villages in Cyprus as a redirect to support serial commas. Ideally, whenever a page could have a serial comma in the title, we'd have a redirect for the form not used by the current title, but I assume this isn't always the case.

First off, I'd like a list of all mainspace pages (whether articles, lists, disambiguation pages, anything else except redirects) that use a serial comma. I think the criteria might be:

  • [one or more words]
  • comma
  • [one or more words]
  • comma
  • ["and" or "or"]
  • [one or more words]

I'm unsure whether they're rigid enough, or whether they might return a lot of false positives.

Secondly, I'd like a list of all pages whose titles are identical to the first list, except lacking a serial comma. Redirects would be acceptable here, since if I'm creating serial-comma redirects, it helps to know if it already exists.

Thirdly, I'd like a list of all mainspace pages (whether articles, lists, disambiguation pages, anything else except redirects) that could use a serial comma but don't. I think the criteria would be:

  • [Page is not on first or second list]
  • [one or more words]
  • comma
  • one or more words]
  • ["and" or "or", but no comma immediately beforehand]
  • [one or more words]

Once the list is complete, the bot checks each page with the following process: "if I inserted a comma immediately before 'and' or 'or', would it appear on the first list?" If the answer is "no", the bot removes it from the list.

Fourthly, I'd like a list of all pages whose titles are identical to the third list, except they have a serial comma. Again, redirects are acceptable.


Is this a reasonable request? Please let me know if it's not, so I don't waste your time.

After my request, User:Qwerfjkl suggested that I come here and offered some comments: intitle:/[A-Za-z ]+, [A-Za-z ]+, (and|or) [A-Za-z ]+/ would work for the first request and intitle:/[A-Za-z ]+, [A-Za-z ]+ (and|or) [A-Za-z ]+/ would work for the second.
The latter two lists are trickier.
To this, I replied Is there a way to download a list of results from a particular search? As far as I know, the only way to get a list of results is to copy/paste the whole thing somewhere and delete everything that's not a page title. (With 11,544 results for the first search, this isn't something I want to do manually.) Also, the first search includes redirects, e.g. Orders, decorations, and medals of the United Nations is result #1.

Please ping me if you reply; I've never heard of this page before, and I might not remember to come back if you don't ping me. Thanks for your help. Nyttend (talk) 03:22, 20 January 2025 (UTC)[reply]

@Nyttend: Doing the whole thing would be very awkward; we don't have access to create temporary tables on the replicas, so it would have to all be in one query, and I haven't been able to cajole the query pessimizer into doing it efficiently. What I can do quite easily is give you lists of all mainspace titles that match each of those regexes, along with whether each is a redirect, and what the titles would be if the comma is added/removed (as appropriate). quarry:query/90019 and 90020. Those should be sufficient to construct your four lists, though perhaps not trivially. I can do that too if you're unable, though not tonight and maybe not tomorrow either; let me know. —Cryptic 04:32, 20 January 2025 (UTC)[reply]
Could you give me two tables with four columns of data? First column is the link, second column is existing status (red, blue redirect, blue non-redirect), third column is alternate-comma variant (with-comma for the without-comma table; without-comma for the with-comma table), and fourth column is status of variant. If you do this in a simple table, I should be able to copy/paste it into Excel, sort so I can easily remove everything I don't want, and move the remaining content back to MediaWiki. Nyttend (talk) 04:44, 20 January 2025 (UTC)[reply]
Ah, never mind, I've explored the links and just realised that there's a way to download a CSV with almost all this information. This should suffice. Thanks! Nyttend (talk) 04:47, 20 January 2025 (UTC)[reply]
@Nyttend: I've updated both queries to include the status of the variant titles. —Cryptic 05:00, 20 January 2025 (UTC)[reply]

Ref count

[edit]

Can anyone create/link me to a modified version of User:Bunnypranav/Reports#Probable "draftify because of no sources" candidates which has 1-5 refs (inline citations) only. I would appreciate if you could just add another section below the above link and create/test the query. Thanks in advance! ~/Bunnypranav:<ping> 13:50, 24 January 2025 (UTC)[reply]

References aren't stored in the database at all except in the page text, which isn't copied to the toolforge replicas. —Cryptic 14:37, 24 January 2025 (UTC)[reply]
Hmm. How is the no refs query made? Is there any way to change it from the existing no refs exists to x number of refs exist. From my very novice understanding of the existing query, it checks no of ext links, can we modify it to find a specific number of them for the requested query? Thanks for your help. ~/Bunnypranav:<ping> 16:32, 24 January 2025 (UTC)[reply]
It's looking at whether there are any external links, yes, and I'll go ahead and change the query. But it's going to get hardly any results either way - the only pages it finds even without limiting external links at all are Battle of Nalos (3 links) and Operation Qazanchi (6), I suspect by its requirement of a no-citations tag, and one specifically added through page-triage at that. —Cryptic 16:50, 24 January 2025 (UTC)[reply]
Thanks Cryptic for that change, that will help catch some more possibilities eventually. @DreamRimmer, since I see you have expertise in python scripts, could a py script help find the no. of citations tags for pages younger than 90 days and list them somewhere? ~/Bunnypranav:<ping> 06:28, 25 January 2025 (UTC)[reply]
I would say this is technically doable, but it's not advisable due to the large number of pages involved, roughly 50,000 to 60,000 pages created in the last 90 days. Checking the number of references for these pages would require making the same number of API requests, which would significantly waste resources. A more efficient approach would be to download the relevant database dumps and generate your list from that data. – DreamRimmer (talk) 07:40, 25 January 2025 (UTC)[reply]
Is there any way to limit the number of pages queried from the beginning? In any case, I am ready to atleast try to process the data dumps from my end, could you possibly help me with a py script since I do not know much to start from scratch. Up to you though and thanks for the reply! ~/Bunnypranav:<ping> 07:43, 25 January 2025 (UTC)[reply]
There's a tool in WP:AWB called the database scanner that may be helpful. –Novem Linguae (talk) 13:35, 25 January 2025 (UTC)[reply]
Noted, thanks! ~/Bunnypranav:<ping> 13:36, 25 January 2025 (UTC)[reply]

Website Infobox parameter

[edit]

Hi Cryptic, could you query me a list of articles that: Are part of {{WikiProject Academic Journals}}, have a {{Infobox journal}}, don't have a |website parameter? Thanks Nobody (talk) 18:00, 24 January 2025 (UTC)[reply]

There's no way to see whether they have a parameter. The usual solution is to add a hidden tracking category in the template. —Cryptic 18:51, 24 January 2025 (UTC)[reply]
Thanks for the reply, I've asked for it at Template talk:Infobox journal#Template-protected edit request on 24 January 2025, which was the reason for this request. Nobody (talk) 19:45, 24 January 2025 (UTC)[reply]

Twenty Year Society of Wikipedia editors

[edit]

I'm curious as to how accurate a reflection Category:Members of the Twenty Year Society of Wikipedia editors is, of the number of editors still active who have been here for 20 years or more?

Can Quarry be used to measure editors who either:

  1. Made edits at least 20 years apart
  2. First edited over 20 years ago and are still (for some value, say: edited in the last three months) active?

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:27, 4 February 2025 (UTC)[reply]

The second is at quarry:query/90085. —Cryptic 19:21, 4 February 2025 (UTC)[reply]

What groups create what percentages of new pages?

[edit]

Could I get some help generating the following queries? I plan to make some pie charts with the data.

  • Data for pie chart 1 - 2024 - how many new articles (mainspace, non-redirects) created (and what percent overall are created)...
    • by autopatrolled users
    • by non autopatrolled, non admin users
    • by non autopatrolled, admin users
  • Data for pie chart 2 - 2024 - how many new articles (mainspace, non-redirects) created (and what percent overall are created)...
    • by autopatrolled users
    • by non autopatrolled editors with less than 10,000 edits
    • by non autopatrolled editors with greater than or equal to 10,000 edits
  • Data for pie chart 3 - 2024 - how many new articles (mainspace, non-redirects) created (and what percent overall are created)...
    • by autopatrolled users?
    • by non autopatrolled editors who are not new page patrollers
    • by non autopatrolled editors who are new page patrollers

Thanks. –Novem Linguae (talk) 21:53, 5 February 2025 (UTC)[reply]

@Novem Linguae I had to use my staff account to run queries against MediaWiki history, so I can't link the queries directly, and these should be taken with "Sam isn't an expert and might have missed something" caveats, but I found the following:
  • In 2024, 678,142 pages were created in the article namespace by users (not including bots). Of those, 170,684 are not currently redirects (they may have changed after initial creation, so this isn't precisely a reflection of how many non-redirects were created). The rest of the numbers will be non-redirects and also compared to this number. These numbers all exclude bots, too.
  • 101,987 (60%) pages were created by non-autopatrolled, non-admin users.
  • 756 (0.4%) pages were created by non-autopatrolled, admin users.
  • 67,941 (40%) pages were created by autopatrolled users.
  • 77,412 (45%) pages were created by non-autopatrolled editors with less than 10,000 edits.
  • 25,331 (15%) pages were created by non-autopatrolled editors with greater than or equal to 10,000 edits.
  • 100,929 (59%) pages were created by non-autopatrolled editors who do not have the patroller user group.
  • 1,814 (1%) pages were created by non-autopatrolled editors who do have the patroller user group.
Sam Walton (talk) 23:25, 5 February 2025 (UTC)[reply]
Public version at quarry:query/90509 quarry:history/90509/976754/947548. Can't figure out why my numbers disagree - I'm seeing about twice as many pages total - even considering I included pages created by non-users (either ips, or where the first revision had the user revdelled/suppressed, or conceivably bad imports). Percentages excluding those are broadly similar though. —Cryptic 00:11, 6 February 2025 (UTC)[reply]
Oh, it's because I didn't see the "non-redirects" added later. —Cryptic 00:13, 6 February 2025 (UTC)[reply]
Thank you both for your hard work. This data makes it clear that advocating for something like autopatrolling admins would have a negligible effect on reducing the WP:NPP queue (0.4% reduction). This is good data that will help inform some NPP-related decisions. –Novem Linguae (talk) 00:22, 6 February 2025 (UTC)[reply]
Frown. Now I'm seeing fewer pages than you - 195800 total currently-non-redirects, 107941 currently-non-redirects by users. And that's without excluding bots. Are you including currently-deleted pages? —Cryptic 00:25, 6 February 2025 (UTC) Nmind, I see what I did, duh. quarry:query/90509 again, now with much-less-disagreeable numbers - all somewhat higher than Sam's, since it still includes bots. Interesting that the percentages turned out close to right even though I was looking at the wrong (essentially random) users' groups. —Cryptic 00:46, 6 February 2025 (UTC)[reply]
I am actually kind of curious as to why an admin would ever not be autopatrolled. Are there admins who make poor quality new articles? BD2412 T 02:46, 6 February 2025 (UTC)[reply]
Something needed to be done to get more administrators, removing autopatrol was something, so removing autopatrol needed to be done. —Cryptic 03:00, 6 February 2025 (UTC)[reply]
I agree and I did not support the rfc, but I was in the minority. –Novem Linguae (talk) 06:28, 6 February 2025 (UTC)[reply]
@Cryptic Good to hear we're on the same track :) MediaWiki history includes since-deleted pages, so that will be another minor source of disagreement. Sam Walton (talk) 09:03, 6 February 2025 (UTC)[reply]
Then my numbers should be lower than yours, instead of ranging from 10-30% higher across the board. Including bots isn't it; there were only four bot "creations" in my dataset, Kansas City shooting, Pedro Campos (disambiguation), 10 Lives (disambiguation), and Chris Sheppard (disambiguation), all by User:RussBot; and two are bad data anyway, showing up only because they were history merged. (History merges, despite being half of that very small sample, aren't going to come anywhere close to accounting for the difference. Even considering that the other two of these pages were cut-and-paste moves and could stand to have their histories fiddled with too.) Is your data source looking at where pages were initially created, or where they are now? —Cryptic 11:25, 6 February 2025 (UTC)[reply]