360Studies

Your Destination for Career Excellence in Bioscience, Statistics, and Data Science

Extrcat Wikipedia Data using Python

extrcat-wikipedia-data-using-python

Basic Knowledge of Wikipedia Data Scraping using Python

  1. import requests: This line imports the requests library, commonly used for making HTTP requests in Python.
  2. endpoint = "https://en.wikipedia.org/w/api.php": This sets the endpoint URL for Wikipedia’s API.
  3. params = {...}: This dictionary contains parameters sent with the API request, specifying the action as “query,” the format as JSON.
  4. response = requests.get(endpoint, params=params): This line sends a GET request to the Wikipedia API with the specified parameters and stores the response in the response variable.
  5. data = response.json(): Converts the JSON response from the API into a Python dictionary.
  6. page_data = data["query"]["pages"]: Extracts relevant information about the page from the API response.
  7. Now you can explore advanced options.

Fetching Contents from a Wikipedia Page using Wikipedia API

This Python script utilizes the requests library to retrieve content from the “2023 Cricket World Cup” from “Wikipedia’s API“.

Code

import requests
endpoint = "https://en.wikipedia.org/w/api.php"
params = {
"action": "query",
"format": "json",
"prop": "extracts",
"exintro": True,
"explaintext": True,
"titles": "2023_Cricket_World_Cup"
}
response = requests.get(endpoint, params=params)
data = response.json()
page_data = data["query"]["pages"]
page_id = list(page_data.keys())[0]
page_info = page_data[page_id]
page_title = page_info["title"]
page_extract = page_info["extract"]

print(f"Title: {page_title}")
print(f"Extract:\n{page_extract}")

Output

Title: 2023 Cricket World Cup Extract: The 2023 Cricket World Cup, officially known as the 2023 ICC Men's Cricket World Cup, was the 13th edition of the Cricket World Cup. It started on 5 October and concluded on 19 November 2023, with Australia winning the tournament. A quadrennial One Day International (ODI) cricket tournament contested by national teams, it was organised by the International Cricket Council (ICC). Ten national teams participated in the tournament. It was the first men's Cricket World Cup which India hosted solely. The tournament took place in ten different stadiums, in ten cities across the country. In the first semi-final India beat New Zealand, and in the second semi-final Australia beat South Africa. The final took place between India and Australia at Narendra Modi Stadium on 19 November with Australia winning the title for the sixth time.The top eight placed teams in the tournament's final points table qualified for the 2025 ICC Champions Trophy, the next ICC ODI tournament. Virat Kohli was the player of the tournament and also scored the most runs; Mohammed Shami was the leading wicket-taker. A total of 1,250,307 spectators attended matches, the highest number in any cricket World Cup to date.

Fetching Categories of a Wikipedia Page using Wikipedia API

This Python script will fetch Category Information using the Wikipedia API for the “2023_Cricket_World_Cup” page.

Code:

import requests
endpoint = "https://en.wikipedia.org/w/api.php"
params = {
"action": "query",
"format": "json",
"prop": "categories",
"titles": "2023_Cricket_World_Cup"
}
response = requests.get(endpoint, params=params)
data = response.json()
page_data = data["query"]["pages"]
page_id = list(page_data.keys())[0]
page_info = page_data[page_id]
categories = page_info.get("categories", [])

category_names = [category["title"] for category in categories]

print(f"Categories: {', '.join(category_names)}")

Output : 

Categories: Category:2023 Cricket World Cup, Category:2023 in Indian sport, Category:2023 in cricket, Category:Articles with excerpts, Category:Articles with short description, Category:Cricket World Cup tournaments, Category:Cricket events postponed due to the COVID-19 pandemic, Category:International cricket competitions in India, Category:November 2023 sports events in India, Category:October 2023 sports events in India

Fetching Pageviews of Wikipedia Page using Wikipedia API

This Python script will fetch Pageview data using the Wikipedia API for the “Lakshadweep” page.

Code

import requests
endpoint = "https://en.wikipedia.org/w/api.php"
params = {
"action": "query",
"format": "json",
"prop": "pageviews",
"titles": "Lakshadweep",
"pvipdays": 10 # Number of days for which you want pageview data
}
response = requests.get(endpoint, params=params)
data = response.json()
page_data = data["query"]["pages"]
page_id = list(page_data.keys())[0]
page_info = page_data[page_id]
page_title = page_info["title"]
page_views = page_info["pageviews"]

print(f"Title: {page_title}")
print("Pageviews:")

for date, views in page_views.items():
print(f"{date}: {views}")

Output : 

Date Pageviews 

0 2024-01-13 34439 
1 2024-01-14 29359 
2 2024-01-15 22904 
3 2024-01-16 17460 
4 2024-01-17 12851 
5 2024-01-18 12598 
6 2024-01-19 9729 
7 2024-01-20 8068 
8 2024-01-21 8234 
9 2024-01-22 6569

Fetching Associated Wikipedia Pages using Wikipedia API

This Python script will be used to perform a Search on Wikipedia for pages related to “Domestic Violence”

Code

import requests
endpoint = "https://en.wikipedia.org/w/api.php"
search_term = "domestic violence"

params = {
"action": "query",
"format": "json",
"list": "search",
"srsearch": search_term
}
response = requests.get(endpoint, params=params)
data = response.json()
search_results = data["query"]["search"]

print("Wikipedia pages related to 'domestic violence':")
for result in search_results:
page_title = result["title"]
print(page_title)

Output 

Wikipedia pages related to 'domestic violence': 
Domestic violence 
Islam and domestic violence 
Domestic violence in lesbian relationships 
Domestic violence against men 
Domestic violence in India 
Christianity and domestic violence 
Domestic violence in Russia 
Domestic violence in same-sex relationships 
Domestic violence in China 
Epidemiology of domestic violence

 

Looking for latest updates and job news, join us on Facebook, WhatsApp, Telegram and Linkedin

You May Also Like

Scroll to Top