Blog / May 2022
-
24 May 2022Database Maintenance Completed
The website was unavailable today (around noon CEST) for ~1h20m for database maintenance as described in the previous post. I'm sorry I didn't give advance notice today, just had to make do with the time available - I wasn't able to do this last Friday afternoon.
The goal of this maintenance was to prevent the database from slowing down due to large indexes, as well as keep the full backups to a manageable size. The full backup is now 215 MB smaller (the gzipped file size) - pretty good!
-
19 May 2022Downtime Friday 20th: Spring cleaning time for the database
The website will be unavailable tomorrow (Friday May 20th, 2022) for up to an hour. The downtime will be sometime between 3-6 pm CEST.
This is an annual "spring cleaning time" where I will run some manual queries on the database to trim stale data from very old, inactive accounts. The goal is to prevent the database from slowing down due to large indexes, as well as keep the full backups to a manageable size.
As I have done in the past I will use the following criteria :
- Delete flashcards from accounts inactive for 2 YEARS.
- Delete private stories where the user's account was inactive for 5 YEARS.
An inactive account is an account for which the user did not sign in within the period. For example if you sign in just once within the last 5 years, the account data is persisted. It is not necessary to do any activity - just sign in. In other words what I'm looking at is "the last time the user logged in".
For flashcards : I have found in the past from emails asking "how do I reset my account?" that users ; when they come back after a long break ; usually want to delete all their cards and start over. It makes sense for how the SRS works after all. So the 2 YEAR interval seems sensible to me. Keep in mind on Koohii a "flashcard" refers to the SRS status for each kanji - not stories.
For stories : we're looking at deleting private stories only, from old inactive accounts. This ensures the shared stories area is unaffected and as you'll see below, the shared stories only account for 7% of 7 million+ stories! Still, it's not unusual for users to come back after several years break, which is why I used a more sensible 5 year interval.
How to Backup Your Data
If you think you may take a long break, or have finished with Kanji Koohii for the foreseeable future - keep in mind your data is not "locked in". You can export all your stories from the My Stories page. This will download a CSV format file which you can load up in a typical spreadsheet app like Google Docs or Libre Office.
The stories export could be useful also for eg. printing and browsing offline.
You can also export your kanji flashcards from the Manage Flashcards section of the site. This is a CSV export and how you can use that data elsewhere is up to you. To clarify, this is the SRS status data.
On Koohii, stories are independent of kanji cards. Many users are only interested in the stories section of the site, other users are only interested in the SRS. Some use both. Just remember : stories are stored independently of flashcards. You can remove a card and add it back later and the corresponding story for that kanji will still be there.
More database nitty gritty and thoughts
I have found over the years that as long as the full gzipped database backup is under 1 GB, it is manageable for me. It doesn't take too long to download, and if I need to debug something, it doesn't take forever either, to "copy" that database in my development environment.
By running some statistics I have found in the past that the stories table in the database is, unsurprisingly, the most space consuming.
As of a recent May backup, there are over 7,294,500 stories! (yes, that is seven millions). Of those, "only" 513,800 are shared - that is just 7% !
So what this means, is when users go through RTK and input or copy shared stories for 2000 kanji, and then leave and never come back,... this just adds up to a lot of data over the years.
Thankfully due to the nature of this service, I can afford to trim some of that excess data: it turns out that those users who do in fact return (after several years break) often want to "reset" their account and start over. I know that because I'd get occasional emails like "hey, I haven't used koohii in a while, how can I reset my account ?".
More recently, the crude admin panel that I have lets me see when users delete accounts and create accounts - useful to pinpoint issues with registration, or abuse by spam bots. And I could see a user initiated an account deletion, and minutes later created a new account under the same name. I had a hunch that not all accounts deleted are necessarily users leaving and this confirmed it - many users want to "reset" their account after a long break.
While we're here let's have some numbers - as of a recent May backup:
Table Rows data_length index_length ---------------------------------------------------- reviews 8,513,399 194 MB 273 MB stories 7,294,513 1103 MB 191 MB custkeywords 1,834,099 58 MB 23 MB
The reviews (ie. flashcards) table may have an issue with how I designed it. I'm not sure. It's pretty silly you can see that the index itself is larger than the data... although I think it is not unusual when the average row length is small (24 bytes).
Based on some test runs tomorrow's "spring cleaning" will shave off 2,057,521 private stories (from accounts not used in 5+ years), of a total of 7,294,513. That is 28% of the stories table, and not negligible 300 MB trim. It will also shave 40% of the flashcards table (flashcards from accounts not used in 2+ years), another significant ~180 MB trim.
Hope you enjoyed this little foray into the database!
I will post an update when the site is back up tomorrow after the downtime.
-
9 May 2022Minor update to the dashboard progress bar
A minor update today adds a decimal to the percentage displayed on the main progress bar of the dashboard. Thanks UncomprehendingGale for the suggestion! (issue 261 on Github)
-
6 May 2022New : Create a Review Deck from Japanese Text
The Custom Review page now features a new mode called Create a Review Deck from Japanese Text.
With this feature you may post raw text from Japanese video subtitles (or even Japanese stories or novels) and review all of the RTK kanji within that text using a one-off study deck.
You can also use the "Shuffle cards" and "Kanji to Keyword" options in this mode. Please note the "Shuffle cards" option is now checked by default (this is meant to be convenient, assuming most people want to shuffle the kanji cards).
I made sure the Repeat Review button works (from the Review Summary screen), so you don't have to paste the text again should you want to repeat the review. Keep in mind if you checked Shuffle cards, the kanji cards in that one-off study deck will be shuffled again when you repeat the review.
When you paste text, you'll see the count of unique RTK kanji below the text area. For now, this feature filters out any non-RTK kanji, since those do not have keywords - and we need some kind of prompt on the flashcards. So keep in mind the count shown is the count of kanji present in the raw text, which are included in the RTK Volume 1 & 3 books (this is also affected by your selection of Old or New Edition in Account Settings). Many kanji used in placenames and family names for example are not part of RTK.
If you have any related ideas or suggestions you can leave a comment on issue #259 on Github.
You can also simply send feedback via the contact form.
Thanks to long time user chamcham for this idea!
By Month
- Oct 2024 (1)
- Sep 2024 (1)
- Jun 2024 (2)
- May 2024 (4)
- Apr 2024 (3)
- Mar 2024 (1)
- Feb 2024 (1)
- Dec 2023 (1)
- Nov 2023 (2)
- Oct 2023 (2)
- Apr 2023 (2)
- Mar 2023 (2)
- Feb 2023 (1)
- Jan 2023 (2)
- Dec 2022 (1)
- Nov 2022 (2)
- Oct 2022 (3)
- Sep 2022 (1)
- May 2022 (4)
- Apr 2022 (1)
- Feb 2022 (2)
- Jan 2022 (2)
- Dec 2021 (4)
- Nov 2021 (2)
- Oct 2021 (2)
- Sep 2021 (2)
- Aug 2021 (1)
- Apr 2021 (2)
- Feb 2021 (3)
- Jan 2021 (3)
- Dec 2020 (1)
- Nov 2020 (1)
- May 2020 (1)
- Apr 2020 (1)
- Jan 2020 (1)
- Oct 2019 (1)
- Sep 2019 (1)
- Aug 2019 (4)
- Jul 2019 (3)
- Jun 2019 (1)
- May 2019 (1)
- Mar 2019 (2)
- Jan 2019 (1)
- Nov 2018 (3)
- Oct 2018 (8)
- Sep 2018 (4)
- Aug 2018 (3)
- Jul 2018 (1)
- Jun 2018 (4)
- May 2018 (1)
- Apr 2018 (1)
- Mar 2018 (1)
- Jan 2018 (1)
- Dec 2017 (6)
- Nov 2017 (4)
- Oct 2017 (4)
- Sep 2017 (5)
- Aug 2017 (5)
- Jun 2017 (3)
- May 2017 (2)
- Apr 2017 (3)
- Mar 2017 (7)
- Feb 2017 (10)
- Jan 2017 (11)
- Dec 2016 (6)
- Nov 2016 (5)
- Oct 2016 (6)
- Sep 2016 (7)
- Aug 2016 (3)
- May 2016 (1)
- Mar 2016 (2)
- Jan 2016 (1)
- Dec 2015 (3)
- Nov 2015 (1)
- Oct 2015 (1)
- Sep 2015 (7)
- Jul 2015 (2)
- Jun 2015 (1)
- May 2015 (5)
- Apr 2015 (4)
- Mar 2015 (5)
- Feb 2015 (4)
- Jan 2015 (5)
- Dec 2014 (4)
- Nov 2014 (3)
- Oct 2014 (2)
- Jun 2014 (1)
- Apr 2014 (2)
- Mar 2014 (4)
- Feb 2014 (3)
- Jan 2014 (4)
- Dec 2013 (2)
- Oct 2013 (1)
- Sep 2013 (1)
- Jun 2013 (4)
- May 2013 (1)
- Mar 2013 (1)
- Jan 2013 (2)
- Oct 2012 (2)
- Aug 2012 (1)
- Jul 2012 (2)
- Jun 2012 (2)
- May 2012 (1)
- Mar 2012 (2)
- May 2011 (1)
- Apr 2011 (4)
- Mar 2011 (3)
- Feb 2011 (2)
- Jan 2011 (2)
- Dec 2010 (8)
- Nov 2010 (8)
- Oct 2010 (3)
- Sep 2010 (3)
- Aug 2010 (1)
- Jul 2010 (2)
- Jun 2010 (5)
- May 2010 (1)
- Apr 2010 (3)
- Mar 2010 (4)
- Feb 2010 (2)
- Jan 2010 (1)
- Dec 2009 (5)
- Nov 2009 (5)
- Oct 2009 (1)
- Aug 2009 (1)
- May 2009 (5)
- Apr 2009 (2)
- Mar 2009 (1)
- Feb 2009 (2)
- Jan 2009 (2)
- Nov 2008 (1)
- Oct 2008 (1)
- Sep 2008 (1)
- May 2008 (2)
- Apr 2008 (1)
- Feb 2008 (6)
- Jan 2008 (5)
- Dec 2007 (6)
- Oct 2007 (1)
- Sep 2007 (2)
- Aug 2007 (3)
- Jun 2007 (1)
- May 2007 (5)
- Apr 2007 (1)
- Mar 2007 (2)
- Feb 2007 (1)
- Jan 2007 (4)
- Dec 2006 (3)
- Aug 2006 (1)
- Jun 2006 (3)
- Apr 2006 (6)
- Mar 2006 (8)
- Feb 2006 (1)
- Jan 2006 (4)
- Nov 2005 (1)
- Oct 2005 (4)
- Sep 2005 (1)
- Aug 2005 (11)