This web-scraping tool aims to extract activities data from Strava Club to complete the lack of features of the standard Strava API. The main features are:
- Strava Club Activities scraper: imports "Recent Activity" for public or activities that the user has access to a dataset (requires a Strava account).
- Strava Club Leaderboard scraper: imports current and previous week leaderboard information (including athletes'
id
) to a dataset (requires a Strava account). - Strava Club Members scraper: imports all members that joined a Strava Club (including athletes'
id
) to a dataset (requires a Strava account). - Strava Club to Google Sheets importer: automatically retrieves data and updates Strava Club Activities, Leaderboard and/or Members dataset(s) into a Google Sheets (requires a Google API key).
This tool does not rely on the Strava API. Strava's API turned to be very limited in the recent years. For getting List Club Activities, it returns only the following variables:
athlete variables: resource_state
, firstname
and lastname
(first letter only);
activity variables: name
, distance
, moving_time
, elapsed_time
, total_elevation_gain
, type
and workout_type
.
Given that Strava does not offer an athlete id
variable, athletes with the same first name and first digit of the last name would not be distinguishable.
-
Strava Club Activities scraper: the main drawback/limitation of this tool is that Strava's dashboard activity feed is very limited in the number of activities shown. Scrolling until the bottom of the page is not endless; after some scrolls the warning "No more recent activity available. To see your full activity history, visit your Profile or Training Calendar." is shown. Strava has the
num_entries
URL query string (e.g. https://www.strava.com/dashboard?club_id=319098&feed_type=club&num_entries=1000), but still this string does not necessarily load the requested number of activity entries to the feed. This tool also requires that the athletes' activities to be scraped are either public or that the account that is scraping the club activities data has access to the activities to be scraped (by either following the athlete or by owning the activity). -
Strava Club Leaderboard scraper: the club leaderboards include only data for current and previous week; no historical data is provided by Strava. Additionally, club leaderboards display only the weekly top 100 members (Source).
To avoid these limitations, this tool offers an integration to Google Sheets, updating/incrementing specified scraped Strava Club(s) data for activities/leaderboard/members, keeping previously scraped data that cannot be accessed anymore in Strava Club.
Strava allows users to create a Group Challenge, which is limited to up to 25 participants. To circumvent this limitation, one possible use case is to create one or multiple Strava Clubs (e.g. Cycling, Multisport, Run/Walk/Hike), adapt this script to update/increment an existing Google Sheets sheet with the club(s) activities, leaderboard and members information data. The script can be set up to run automatically on a scheduled basis on cloud platform services such as GitHub Actions (see GitHub Actions Workflow .yaml template) and Railway (see Dockerfile template). To connect the script to a Google Sheets file, a Google Sheets API .json key is required and the file needs to be shared with a Service Account email address. The Google Sheets can then be connected to a dashboard tool (e.g. Google Data Studio, Microsoft PowerBI).
This tool assumes that Strava's Display Preferences are set to:
Units & Measurements
= "Kilometers and Kilograms"
Temperature
= "Celsius"
Feed Ordering
= "Latest Activities" (chronological feed)
And that your Strava display language is English (US)
. To change the language, log in to Strava and on the bottom right-hand corner of any page, select English (US)
from the drop-down menu (more on this here).
python -m pip install python-dateutil geopy google-api-python-client google-auth lxml pandas selenium webdriver-manager
strava_club_activities(club_ids, filter_activities_type, filter_date_min, filter_date_max, timezone='UTC')
- Scraps and imports activities belonging to one or multiple Strava Club(s) (public activities or activities that the account that is scraping the data has access to) to a dataset.
club_ids
: str list. List of Strava Club ids in which the tool should scrap data from (e.g.club_ids=['445017', '1045852']
).filter_activities_type
: str list, default: None. List of activities type filter (e.g.filter_activities_type=['E-Bike Ride', 'Hike', 'Ride', 'Run', 'Walk']
).filter_date_min
: str. Start date filter (e.g.filter_date_min='2023-06-05'
).filter_date_max
: str. End date filter (e.g.filter_date_max='2023-07-30'
).timezone
: str or timezone object, default: 'UTC'.
strava_club_members(club_ids, club_members_teams=None, timezone='UTC')
- Scraps and imports members of one or multiple Strava Club(s) to a dataset.
club_ids
: str list. List of Strava Club ids in which the tool should scrap data from (e.g.club_ids=['445017', '1045852']
).club_members_teams
: dict, default: None. Option to addathlete_id
to one or multiple teams (stored in theathlete_team
column).athlete_id
assigned to multiple teams will have its unique teams assignment comma separated.timezone
: str or timezone object, default: 'UTC'.
Example of club_members_teams
:
club_members_teams={
'Team A': ['1234, 5678'],
'Team B': ['1234, 12345'],
}
strava_club_leaderboard(club_ids, filter_date_min, filter_date_max, timezone='UTC')
- Scraps and imports leaderboard of one or multiple Strava Club(s) to a dataset.
club_ids
: str list. List of Strava Club ids in which the tool should scrap data from (e.g.club_ids=['445017', '1045852']
).filter_date_min
: str. Start date filter (e.g.filter_date_min='2023-06-05'
).filter_date_max
: str. End date filter (e.g.filter_date_max='2023-07-30'
).timezone
: str or timezone object, default: 'UTC'.
strava_club_to_google_sheets(df, sheet_id, sheet_name)
- Update/increment a Google Sheet sheet given an inputted dataset.
df
: DataFrame. Input dataset to be updated/incremented in a specified Google Sheets sheet.sheet_id
: str. Google Sheets file id.sheet_name
: str. Google Sheets sheet/tab where the data should be updated/incremented.
execution_time_to_google_sheets(sheet_id, sheet_name, timezone='UTC')
- Update a Google Sheet sheet given the current time that the code was executed.
sheet_id
: str. Google Sheets file id.sheet_name
: str. Google Sheets sheet/tab where the data should be updated/incremented.timezone
: str or timezone object, default: 'UTC'.
strava_export_activities(activities_id, file_type)
- Export a list of activity_id to a GPS file.
activities_id
: int list or str list. List of activity_id to be exported (e.g.activities_id=[696657036, 696657037]
).file_type
: str, default: '.gpx'. Activity export format. Note that the '.gpx' format uses Strava's built-in feature to export the activities, and '.tcx' uses Sauce for Strava Chrome Extension (which needs to be installed on Selenium's WebDriver to work). Strava's built-in export .gpx feature includes only trackpoints (with latitude and longitude); it is possible to manipulate those .gpx exports by converting them to other GPS file types (e.g. .tcx) and add faketimes using GPSBabel (see gps_tools.sh).
selenium_webdriver_quit()
- Terminates the WebDriver session.
- None.
Please note that the use of this code/tool may not comply with Strava's Terms of Service (especially the "Distributing, or disclosing any part of the Services in any medium, including without limitation by any automated or non-automated “scraping”" term) and Strava's API Agreement (especially the "You may not use web scraping, web harvesting, or web data extraction methods to extract data from the Strava Platform" term). Use this tool at your own risk.
Strava Club Tracker: Tool that generates a progress tracker/dashboard for Club activities (relies on Strava's API) (HTML, PHP).
StravaClubActivities: Tool that downloads Club activities and generates a .csv for processing virtual race events (relies on Strava's API) (Ruby).