ref https://linear.app/ghost/issue/ANAL-120/bounce-rate-data-seems-to-mix-units
closes https://linear.app/ghost/issue/ANAL-119/visit-duration-metric-inaccurate
closes https://linear.app/ghost/issue/ANAL-118/charts-are-empty-with-only-1-data-point
- The original [web analytics starter kit KPI endpoint](ad1efb766e/tinybird/pipes/kpis.pipe (L122)) had this simpler endpoint, but as I've messed around adding features, I've unintentionally overcomplicated it and introduced a tonne of bugs.
- This reverts the KPI endpoint back towards the original structure, and moves all the calculations and where statements up to the data node
- This means that the left join at the end works and pulls in all the dates from the timeseries node correctly, without the need for using `WITH FILL STEP 1` which generated a result for every second when looking at a single days data.
- Moving the where clause handling up to the `data` node, rather than being on the endpoint still works as expected, which confused me when I first started working with tinybird
- This should resolve several bugs we've experienced with the visit duration, with missing data points and empty charts, and perhaps even the bounce rate (but need to look at that more closely)
- This case works significantly differently to the normal KPIs and was
untested, so we didn't spot that we broke it.
- This adds a test, and brings in the fix by @FGonzalezLopez from
41029a4476
- This fixes the charts in so much as they no longer error
- However, the test result is indicating another bug as we're getting a row per second, instead of one row per hour which is what we actually expect to happen
---------
Co-authored-by: Paco Gonzalez <paco@tinybird.co>
ref https://github.com/TryGhost/Ghost/pull/21794
- This was missed in the PR to add versioning to all the resources - the
endpoints are now different, and the tests don't run
- I've been struggling to deploy out my changes, and part of it is
because this is a wholesale change to having versions, where previously
we didn't
- This change brings the tests into line, so we can be certain that the
new endpoints with the versions work the same as the old
- TODO: really must get CI working for tinybird!
- Previously the script would error out if a resource was missing e.g.
if a deploy had gone wrong
- That meant I frequently had to make further, manual changes
- These updates mean the script only attempts to delete a resource if it
is present
- Each type of resource is listed in an array and iterated over
- note there is no real difference between data and endpoint pipes, but
we need to manage them in order
- This should make the script much much more robust!
ref https://github.com/TryGhost/Ghost/pull/21765
- This change was split out of
https://github.com/TryGhost/Ghost/pull/21765
- We're adding versioning to all the resources in tinybird so that we
can iterate on them in future in a way that tinybird understands
- The next step is to build an example of what making a change will look
like in versioned world
---------
Co-authored-by: Paco Gonzalez <paco@tinybird.co>
- This changes the Tinybird Materialized Views to circumvent a currently
existing Tinybird bug that prevents iterating the code in its current state.
- The idea is that this will allow us to be more flexible in making changes, as it works
around some restrictions where Tinybird won't let us change the MV because other
parts of the pipe depends on it
- The idea is to remove the dependency to `analytics_hits.pipe` in the
materialized views.
- This does create code duplication, but we can clean that up later using includes,
or refactor the pipe again later if Tinybird fixes the issues
---------
Co-authored-by: Hannah Wolfe <github.erisds@gmail.com>
ref https://linear.app/ghost/issue/ANAL-115/data-retention
- The bad news here is I didn't notice that the tinybird web analytics
starter kit included a TTL on the analytics_events datasource of 60 days
- This means any data older than 60days was automatically dropped from
the table
- I updated this in the UI when I noticed it a few days ago, this makes
sure it can't come back
- The good news is that we don't have to implement anything to make this
work when we do get to the point where we want a TTL!
ref https://linear.app/tryghost/issue/ANAL-96/data-discrepancy-between-charts-when-filtering
- This adds a set of tests to describe what the data should look like when we filter on various values
- We have tests for source and browser which are pulled from different MVs
- The result files are generated using ./scripts/gen_test_results.sh, and then manually verified
- We know they are not yet fully correct
- Added yarn command to update TB CLI, as that needs doing frequently and I can never remember the command
- Improved safety & usability of tinybird test script by ensuring branches are correctly created before running & adding optional delete
- Updated tinybird test to warn only for sanity check as that's not always a valid check (Will prob remove soon)
- Improved output of tinybird test script on failure, so that the diff is readable and closer to what git shows you
- Added tool to convert tinybird ndjson to csv to make it easier to bring the data into google sheets for verifying numbers
- TODO: make these run in CI
- Right now you run them by running `yarn tb` and then `./script/branch_and_test.sh`
- These are snapshot tests that check we get the desired result
Co-authored-by: alejandromav <hi@alejandromav.com>
ref
https://linear.app/tryghost/issue/ANAL-60/click-through-filtering-for-sources
- In our stats page we use the referrer without a protocol or www, that
is the pure domain as our source that we output
- Meanwhile all the data pipelines had the full url as the referrer
passed through
- When we come to add clickthroughs/filtering, we'll need to use this
value to filter the data. If we have a different value locally in the UI
to what is in the DB, we won't be able to make the filters match
- Also, we pay for everything we store, and this removes all the
https:// and www. data
- Whilst we are in development, we can safely make changes to all
aspects of our pipeline without worrying
- This is because currently, it's safe to delete all data and start over
- This script removes everything excepts the analytics_events
datasource, and then recreates everything fresh, repopulating from the
datasource where possible
- This shouldn't be used after tinybird is in production, we need a
better change process
closes https://linear.app/tryghost/issue/ANAL-77/na-data-should-be-zero
ref https://www.tinybird.co/blog-posts/tips-9-filling-gaps-in-time-series-on-clickhouse
- Sometimes we have no matching data for a particular date/date range, which makes our charts look super janky
- Clickhouse has a feature to fill these in called WITH FILL, which makes it really easy to fix this!
- WITH FILL works except for on bounce rate. That seems to be due to the column being marked as nullable and so WITH FILL fills missing data with NULL instead of 0
- To fix that, I've updated the code that generates the bounce rate so that it doesn't generate nulls, and that seems to result in a not-nullable column, which then works with WITH FILL
- When I went off, I quickly recreated all our endpoints with some new functionality
- However, I forgot that I was manually managing tokens, this meant the UI for stats broke with a token error
- Adding the tokens to the endpoint definitions should prevent this happening again, by automating the management of the token scopes
closes https://linear.app/tryghost/issue/ANAL-23/filtering-by-logged-out-logged-in-traffic
- Updated all of our tinybird datasources and pipes to handle member status
- Added member_status as an array query param to the API endpoints
- Added a really dodgy power select multiple to the stats page to demonstrate it works (needs styling)
- Added all of the wiring so each chart updates
- This was done pretty fast, and may not be 100% right yet
ref https://linear.app/tryghost/issue/ANAL-27/setup-tinybird-project-and-cicd
- Tinybird has a system for managing it's configuration as code, with full ci/cd support
- The tinybird CLI tool uses python, so we'll run that using docker, via `yarn tb`
- Some of the files tinybird adds should not be in source control, so we've added those to git ignore
- Everything in /ghost/tinybird is tinybird's init config