-
Notifications
You must be signed in to change notification settings - Fork 21
/
Copy pathTouhou.page
746 lines (630 loc) · 46.5 KB
/
Touhou.page
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
---
title: Touhou music by the numbers
description: Collect music metadata and look for patterns
tags: statistics, sociology, Haskell
created: 28 Feb 2013
status: in progress
belief: possible
...
<!-- TODO: from Culture is not about esthetics: mv here?
[^touhou]: This is a general assertion that is fairly hard to prove, but an example may be suggestive. The [Touhou](!Wikipedia "Touhou Project") doujinshi-game phenomenon has a [fair amount of music](http://touhou.wikia.com/wiki/Category:Music), but to get a sense of the true scale, we can look at some numbers. From the talk ["Riding on Fans' Energy: Touhou, Fan Culture, and Grassroot Entertainment"](http://cardcaptor.moekaku.com/?p=112) ([Barcamp](!Wikipedia) Bangkok 2 on August 31, 2008):
> Touhou is [ZUN](!Wikipedia "Team Shanghai Alice#Member")'s work as much as it is a gigantic repertoire of fan-made manga, games, music, and video clips. I estimate that there are roughly at least three thousands short manga, five hundred music rearrangement albums, and one hundred derivative games created since 2003. These works are traded mainly in conventions dedicated to them, and some commercial firms are starting to capitalize on their popularity. Doujinshi shops like [Tora no Ana](!Wikipedia "Comic Toranoana") and [Mandarake](!Wikipedia) have shelves dedicate to Touhou comics. And Amazon.co.jp are carrying CDs of arranged/sampled Touhou music (but not ZUN's originals). More and more people are attracted to the franchise because its diverse derivative works provide a variety of entry points for potential fans. In fact, Touhou's popularity skyrocketed when it became one of the killer content of [Nico Nico Douga](!Wikipedia), a Japanese equivalent of YouTube launched one year and a half earlier. There, Touhou content spread like wild fire and gave rise to many recurring memes and tens of thousands of mashup videos. To give a sense of how popular Touhou is in Nico Nico Douga, 18 of 100 most viewed videos are Touhou-related, and the best Touhou video ranks the 6th. [[5]](http://cardcaptor.moekaku.com/?p=112#touhou-popularity-in-nico-nico-douga)
For a recent estimate, we can turn to [TV Tropes](!Wikipedia)'s [article](http://tvtropes.org/pmwiki/pmwiki.php/CrowningMusic/TouhouProject) on Touhou music:
> The Touhou Project really gets a lot of great pieces of music for [the music] being [originally] made up by a single guy with a synthesizer. To put the sheer number of remix CDs in perspective, there is a torrent with over 870.4 gigabytes of over 3000 Touhou remixes, and that only includes the ones that the (English-speaking) maintainers of the torrent have added.
(This is outdated; the October 2011 lossless torrent is [1,020 gigabytes](http://www.nyaa.eu/?page=torrentinfo&tid=255825). Personally, [I](http://www.reddit.com/r/TOUHOUMUSIC/search?q=author%3A%27gwern%27&restrict_sr=on) enjoy the orchestral pieces like the [WAVE](http://www.circle-wave.net/) group's [Luna Forest (第七楽章)](http://www.youtube.com/watch?v=xhDNC12hWKc).)
-->
<!--
16:19:10 <@gwern> Speed: it is difficult to explain. also, it's much more than 44k
16:19:23 < klfwip> speed, touhou is a video game franchise which has one of the best fan bases in the world.
16:19:24 <@gwern> that's just that one highly incomplete torrent
16:19:28 < Speed> I've found the Touhou Project on wikipedia
16:19:40 < Speed> but that one seems to list maybe 10 CD's
16:19:48 < klfwip> they have multiple festivals per year in japan almost completely devoted to selling touhou music and other collectables
16:21:13 <@gwern> klfwip: actually, I'm not entirely sure about that... I think zun may've put out his stuff under a custom license
16:21:58 <@gwern> http://cardcaptor.moekaku.com/?p=125 eg
16:22:08 <@gwern> http://en.touhouwiki.net/wiki/Touhou_Wiki:Copyrights#Copyright_status.2FTerms_of_Use_of_the_Touhou_Project
16:22:40 <@gwern> this seems pretty unusual for doujin works, but both touhou and vocaloid are big enough that they've'd to've make things more explicit
16:24:23 <@gwern> Speed: nope. I think it's heavily biased toward recent music, the growth seems too rapid
16:24:44 < Speed> how much Touhou music does then exist by your estimations?
16:25:56 <@gwern> Speed: I'm not sure yt. I haven't cleaned the data enough to get a good idea of the counts, and I can't run a capture-recapture estimate of the total universe until the data has improved
16:26:09 <@gwern> Speed: I wouldn't be too surprised if it were >100k though
16:26:30 < klfwip> for every big artist who releases dozens of albums, I suspect there are thousands who post one vocaloid mix of a touhou song to niconico
video.
16:26:39 < Siod> how is most touhou music made?
16:27:23 < Speed> I'm still not entirely able to believe that many derivatives could have been made from just a few soundtracks
16:27:39 < Siod> what? it's all derived from soundtracks?
16:27:41 < klfwip> speed, there are hundreds of original touhou songs.
16:27:47 < Speed> klfwip: since you have a collection, can you give me a few samples of the music?
16:27:58 < klfwip> electro, classical, metal, what do you want?
16:28:04 <@gwern> Speed: oh come on, that's like saying 'I don't know how there can be thousands of apocrypha when there's onyl like a hundred canonical books
in the bible'
16:28:25 <@gwern> or, 'I don't know how there can be 300m+ books in total'
16:28:26 < klfwip> and some of them honestly have very little in comon with the source except a melody.
16:28:38 * gwern links Speed to https://xkcd.com/915/
16:31:55 <@gwern> Burninate: remixes, distant inspirations, borrowing a hook or melody... they can be remixes in the same way Book of Mormon is a remix of
Zoroastrianism
16:32:34 < quanticle> gwern: More apropos, it's like saying, "I don't know how a single 5-second drum and bass solo can give rise to an entire genre of music."
http://en.wikipedia.org/wiki/Amen_break
16:33:46 < quanticle> gwern: Yeah. Pretty much the entire DNB scene is made up of derivatives and riffs on Amen Break.
16:35:59 < Speed> gwern: so, how does one then classify something as Touhou music or not?
16:36:02 <@gwern> but it's only 'indie' because zun can make a living off it
16:37:04 <@gwern> Speed: well, obviously if something is a rearrangement of an existing Zun track like 'Shanghai Teahouse', it's touhou, but there are a lot of
marginal cases where it's simply social - does the creator consider it touhou? did they use touhou-related artowkr or characters? do they
thank zun?
16:37:38 < klfwip> some really big names in the touhou scene also have succesful careers are pro musicians outside of it.
16:37:49 <@gwern> nothing stops people from writing their own music without reference to the canonical Zun melodies or themes and simply branding it as Touhou
16:37:57 < klfwip> if they don't call a particular album a touhou remix album, I would just assume it not to be.
16:38:14 < Burninate> So Zun made 15 games. Himself. Without involving a studio (dojin).
16:38:28 < Burninate> and then fans made a bunch of derivatives of those 15 games
16:38:42 <@gwern> Burninate: hm, wasn't the last one or two a collaboration with another group? is that 15 counting the collaborations?
16:38:44 < Burninate> songs, other games, anime, comics
16:38:46 <@gwern> novels
16:38:48 < klfwip> correct
16:39:01 < Burninate> the descriptions are confusing as fuck
16:39:05 * Tuxedage would like to add that Zun is insanely talented
16:39:07 < Tuxedage> Except for the art
16:39:10 < klfwip> gwern, also the earliest games were not under his control, remember.
-->
<!-- "Videos containing Touhou tag Total Hits: 153,553" http://www.nicovideo.jp/tag/Touhou TODO: scrape niconico -->
Idea: correlate Touhou music production against Japanese youth unemployment: does the total production of music as measured in seconds increase with unemployment?
Opposite view, recessions dent production (perhaps because people are working harder and so have less free time even if other people are unemployed?) http://www.gamesetwatch.com/2009/12/sound_current_yokohamas_mediam.php
> While the turnout at M3 remains strong, at the same time an economic recession cannot help but touch a community whose activities rely on having free time. Furthermore while previously many hobbyists dreamed of someday breaking into the industry, more recently many also fear that game companies will begin cracking down on unlicensed tributes.
# Data
## Unemployment data source
Used FRED [Adjusted Unemployment Rate for Youth [15-24yo] in Japan (JPNURYNAA)](http://research.stlouisfed.org/fred2/series/JPNURYNAA); downloaded as CSV, annual percentage 2000-2011
~~~{.R}
jpn <- read.table(stdin(),header=TRUE)
DATE VALUE
2000-01-01 8.9
2001-01-01 9.1
2002-01-01 9.5
2003-01-01 9.6
2004-01-01 9.0
2005-01-01 8.1
2006-01-01 7.5
2007-01-01 7.5
2008-01-01 7.0
2009-01-01 8.9
2010-01-01 9.0
2011-01-01 8.1
~~~
## Touhou data sources
TODO Suggestions: http://www.reddit.com/r/TOUHOUMUSIC/comments/19hh2m/touhou_music_databases_comprehensive_easily/ http://boards.4chan.org/jp/res/10559057
alternative sources
- [VGMdb Touhou entries](http://vgmdb.net/product/9) but it is substantially smaller with metadata on <1389 albums
- The open source alternative is [MusicBrainz](!Wikipedia); looks like [it has 190 albums](http://musicbrainz.org/tag/touhou/release), but like 90%+ link to VGMdb, so I'm not sure I want to include them (waste of effort, and if someone just copied over all of VGMdb a few years ago, it'll be badly misleading to any capture-recapture analysis of population size).
- [Touhouwiki.net](http://en.touhouwiki.net/wiki/Doujin_circles) another source, with <1182 albums
- "東方音団録 ~ Arrange Circle Database ver.3.0"; [homepage](http://www16.atwiki.jp/toho) & [release info](http://www16.atwiki.jp/toho/pages/727.html) ([debate](http://www16.atwiki.jp/toho/pages/13.html), [chat](http://www16.atwiki.jp/toho/pages/948.html)), [product page](http://www.toranoana.jp/bl/article/04/0030/04/86/040030048682.html) with j-subculture.com quoting a partial total of $15, Paypal and shipping outside Japan boosting to ~$30! (Alternative proxies included [Yokatta](http://yokattaweb.jp/index.html).)
Strategy: first I'll ask for an Arrange CD download in the wiki's chatroom; then I'll look for an email by someone running it and email them directly; then I'll see if there's anyone in Japan who either owes me a favor or is willing to do me a favor; then I'll ask around for a cheaper reshipper; and if all that fails, then I'll pay j-subculture's extortionate $25 probable total cost.
Email in 1 March 2013 failed to elicit any reply by 27 May; then [requested a purchase](http://www.reddit.com/r/TOUHOUMUSIC/comments/1f61p6/request_anyone_ordering_from_tora_no_ana_soon/ "[request] Anyone ordering from Tora no Ana soon? (self.TOUHOUMUSIC)") on `/r/TOUHOUMUSIC.
Need more databases for capture-recapture analysis!
<!--
The ISO file seems to be broken: `file` just calls it 'data', and when I mount it as a loopback iso9660 file, `mount` throws an error. I redownloaded it and compared it, but the copies were identical.
The good news is that the zip file seems to work fine. The data is in a .accdb file in a subfolder, which turns out to be the latest Microsoft Access database format. Unfortunately, this turns out to be almost entirely unsupported by anything on Linux (except for a Java library), but fortunately, an acquaintance had an Office 365 subscription and re-exported the .accdb file as a .mdb file (the older Access format) which was successfully read and converted to CSV by `mdb-tools`. The entries look like this:
$ mdb-tables Toho_arrange_circle_database-gwern.mdb
アレンジサークルリスト
$ mdb-export Toho_arrange_circle_database-gwern.mdb `mdb-tables Toho_arrange_circle_database-gwern.mdb` > Toho_arrange_circle_database-gwern.csv
$ head Toho_arrange_circle_database-gwern.csv
ID,pin,サークル名,ふりがな,URL,ジャンル,Vocal,主な頒布CD,原曲アレンジの程度,一言,memo
1,,"っ´Д`)っゼロ式の処刑場(っ´Д`)っゼロ式の家・DESTRUCTIVE ANGEL)","ぜろしきのしょけいじょう","http://shinzanzeroshiki.fc2web.com/","メタル","男","Crazy Trancy Ecstasy","原曲維持","重厚感あふれる荘厳なメタル。(Black)",
2,,"α music","あるふぁみゅーじっく","http://www19.atpages.jp/tatu4/","オーケストラ、ピアノ",,"東方風水華月","原曲維持",,
3,,"凸凹えんたーていめんとすたじお","でこぼこえんたーていめんとすたじお","http://3rd.geocities.jp/deko_boko_es/index/","リコーダー、ゲームミュージック",,"Electronic Magus","原曲維持","8ビットあり、リコーダー生演奏ありと多彩。(Black)",
4,,"[4989]","しくはっく","http://4989mm.littlestar.jp/","ロック","男女","musick for me","原曲維持","スローなロック風アレンジにハイトーン男声ボーカル、エレクトロも混ざったインストも。ドラマCDあり。(Black)",
5,,"[ kapparecords])","かっぱれこーず","http://www5f.biglobe.ne.jp/~kapparecords/","ハードロック","男","SCARLET FANTASIA","原曲維持","生演奏ギタードラムベースと男声ボーカルがんばれ。(Black)",
6,,"<echo>PROJECT","えこぷろじぇくと","http://echoproject.3rin.net/","ダンス、ポップス、ロック","女","eclat:","原曲重視~維持","女性ボーカルを軸にしてジャンルは何でもアレンジ。(Black)",
7,,"#039","しゃーぷさんきゅー","http://sharp039.web.fc2.com/",,,"EMERGENCE",,,
8,,"#ゆうかりんちゃんねる","ゆうかりんちゃんねる","http://yuukach.web.fc2.com/","オーケストラ、クラシック、ピアノ、エレクトロ、ロック",,"ゴリラ人間のための華麗なる大幻想曲集","原曲維持","クラシカルな構成と管弦を散りばめたオーケストラ、クラシック風インスト。バイオリンの倍音の響きが印象的。(Black)",
9,,"10-GALLON(Digit Smith)","てんがろん","http://10-gallon.net/","ロック、エレクトロ、ハードロック","女","悪魔城レミリア","原曲維持","原曲メロをミドルテンポなロックとエレクトロに乗せて。(Black)",
$ tail -1 Toho_arrange_circle_database-gwern.csv
1316,,"侘助","わびすけ","http://ameblo.jp/wa-bi-su-ke/","エレクトロ、ロック","女","東方乙女椿","原曲維持","速めのエレクトロ、ロックアレンジが多め。女声ボーカル。(Black)",
I am a little surprised that there are only 1316 entries. Either I've overestimated their thoroughness or this is limited to a specific convention or something like that... Need to look into this more.
-->
### Torrent
Music source: [Touhou lossy music collection v.15.2](http://www.nyaa.eu/?page=torrentinfo&tid=387790) (derived from the [Touhou lossless music collection](http://www.tlmc.eu/) collection), 265.2GB of 44421 tracks from 4952 albums produced by <1,264 groups or "circles".
~~~{.Bash}
$ find ~/torrent/Touhou\ lossy\ music\ collection/ -type f -name "*.mp3" | wc
44421
$ ls torrent/Touhou\ lossy\ music\ collection/ | wc
1264
$ ls torrent/Touhou\ lossy\ music\ collection/*/ | wc
7477
~~~
File name, music length, and metadata year (if any) are extracted using `exiftool`:
events:
例大祭["","SP","SP2",2-9]: annual Reitaisai
M3*: annual Media Mix Market eg. http://polymetrica.wordpress.com/2009/10/09/things-i-am-excited-about-04-m324/
C[63-82]: semi-annual Comiket
サンクリ[28-50]: annual? Sunshine Creation http://ja.wikipedia.org/wiki/%E3%82%AF%E3%83%AA%E3%82%A8%E3%82%A4%E3%82%B7%E3%83%A7%E3%83%B3_%28%E5%90%8C%E4%BA%BA%E5%8D%B3%E5%A3%B2%E4%BC%9A%29
東方紅楼夢[?2-8]: annual Koromu
月の宴?2-5: annual? Feast of the Month
紅のひろば?2-6: semiannual Red Square
東方不敗小町?2-6, SP, ぷちこまち: Komachi
杜の奇跡[15-16]
東方杜郷想[2-3]
幺樂団カァニバル!?2-3
東方幻楽祭[2]: semiannual
コミコミ[12-14]
FF[9-17] ?
こみトレ[12-17]
COMIC1☆2-6
COMIC CITY大阪[63,73]
恋魔理?2-3
東方椰麟祭?2-3
東方名華祭2
exiftool; json
Is `exiftool`'s length approximation trustworthy? Yes, it seems to be always within seconds of the full `mp3info` answer:
~~~{.Bash}
$ find "/home/gwern/torrent/Touhou lossy music collection/" -type f -name "*.mp3" \
-exec mp3info -F -p "0:%02m:%02s " {} \; -exec exiftool -Duration {} \;
0:00:23 Duration : 23.12 s (approx)
0:04:28 Duration : 0:04:28 (approx)
0:02:44 Duration : 0:02:44 (approx)
0:04:52 Duration : 0:04:52 (approx)
0:03:56 Duration : 0:03:56 (approx)
0:01:44 Duration : 0:01:44 (approx)
0:04:34 Duration : 0:04:34 (approx)
0:02:30 Duration : 0:02:30 (approx)
0:03:02 Duration : 0:03:02 (approx)
0:03:43 Duration : 0:03:42 (approx)
0:03:23 Duration : 0:03:23 (approx)
0:03:11 Duration : 0:03:11 (approx)
0:04:22 Duration : 0:04:22 (approx)
0:03:13 Duration : 0:03:13 (approx)
0:04:04 Duration : 0:04:04 (approx)
0:03:58 Duration : 0:03:57 (approx)
0:05:24 Duration : 0:05:24 (approx)
0:04:17 Duration : 0:04:17 (approx)
0:03:14 Duration : 0:03:14 (approx)
0:01:59 Duration : 0:01:58 (approx)
0:03:21 Duration : 0:03:21 (approx)
0:02:35 Duration : 0:02:35 (approx)
0:04:20 Duration : 0:04:20 (approx)
...
~~~
~~~{.R}
# generate and parse and cleanup data
#
# takes ~30m:
# R> system("exiftool -extension mp3 -json -forcePrint
# -Title -Year -Album -Artist -Duration -Genre -Track -Directory -FileName -FileSize -AudioBitrate
# ~/torrent/Touhou\\ lossy\\ music\\ collection/*/* > ~/touhou.json")
library(rjson)
# download from http://www.gwern.net/docs/touhou/2013-torrent.json.xz and decompress with xz
json_data <- fromJSON(paste(readLines("2013-gwern-touhoutorrent.json"), collapse=""))
touhou <- data.frame(matrix(unlist(json_data), ncol=12, byrow=TRUE))
colnames(touhou) <- c("SourceFile", "Title", "Year", "Album", "Artist", "Length", "Genre",
"Track", "Directory", "FileName", "FileSize", "AudioBitrate")
# Delete SourceFile column; redundant with Directory/FileName
touhou <- touhou[,-1]
touhou$Directory <- sub("/home/gwern/torrent/Touhou lossy music collection/", "",
as.character(touhou$Directory))
touhou[touhou==""] <- NA
touhou[touhou=="-"] <- NA
touhou$Year <- as.integer(as.character(touhou$Year))
# torrent doesn't cover 2013 music, and music predating the PC-98 games doesn't exist...
touhou$Year[touhou$Year<1990] <- NA
touhou$Year[touhou$Year>2012] <- NA
# Genre is "None" or " "? both useless and false (thanks, tagger); so it goes too:
touhou$Genre[touhou$Genre=="None"] <- NA
touhou$Genre[touhou$Genre==" "] <- NA
# turn the track lengths and bitrates into usable numbers on a common scale (seconds and MBs, respectively)
touhou$Length <- gsub(" \\(approx\\)","",as.character(touhou$Length))
touhou$AudioBitrate <- as.integer(sub(" kbps","",as.character(touhou$AudioBitrate)))
# exiftool leaves us "16 s"; if so, strip the " s" and turn it into an integer
# else, eg. "0:04:37"; split on colon,
# multiply hour by 3600 seconds, minutes=60 each, seconds=seconds; and sum it
interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x)) as.integer(sub(" s","",x))
else { y <- unlist(strsplit(x, ":"));
as.integer(y[[1]])*3600 + as.integer(y[[2]])*60 + as.integer(y[[3]]); }
}
else NA
}
touhou$Length <- sapply(touhou$Length,interval)
filesize <- function(x) { if (grepl(" kB",x)) (as.integer(sub(" kB","",x))/1000) else as.integer(sub(" MB","",x))}
touhou$FileSize <- sapply(touhou$FileSize, filesize)
# Serious work: turn the encoded information in Directory into usable columns. Not for the faint of heart.
#
# The Directory column looks like "[twith1450]/2009.03.08 TOHOMOHO [例大祭6]"
# The schema here is "[circle]/eventDate album [event]"
#
# "[Angelic Quasar]/2006.01.29 [AQSH-0003] Racial Ethnic Nation"
# "[Alstroemeria Records]/[ARCD0001] The regret of stars, but stars shine bright (C65) (mp3)"
# "[Aqua Style/ひえろぐらふ]/2010.05.24 [AQUA-0031] 春宵一刻値千金 -シュンショウイッコク アタイセンキン-"
brackets <- function(b) sub("\\]","", sub("\\[","",b))
# easy first step: parse out the leading group/circle (always there, terminated by forward-slash) as new column
touhou$Circle <- sapply(touhou$Directory, function(x) brackets(unlist(strsplit(as.character(x), "/"))[1]))
# destructively update by removing the group/circle, to make the next step easier
# this makes Directory looks like "2009.06.07 [PAER-0007] #01 -LILITH- [東方幻楽祭2]"
# or "[ARCD0001] The regret of stars, but stars shine bright (C65) (mp3)"
touhou$Directory <- sapply(touhou$Directory, function(x) unlist(strsplit(as.character(x), "/"))[2])
touhou$Date <- as.Date(sapply(touhou$Directory, function(x) substring(x, 1, 10)), format="%Y.%m.%d")
# and like before, we strip the event date that we've parsed out, leaving eg. "ピアノのための東方小品集 Op.1-1 [御射宮司祭]"
# or "月遊 [例大祭8]" or "[AQUA-0031] 春宵一刻値千金 -シュンショウイッコク アタイセンキン-"
touhou$Directory <- sapply(touhou$Directory, function(x) substring(x, 12))
# extracting the next parameter, the event the album was released at, is harder still
library(stringr)
# if the directory does not end in a right-bracket, there's no event info and we should bomb out
# else, grab w/regexp last pair of brackets with a space before (excludes any album numbering schemes) & trim
# that didn't work? then it must be one of the directories where there's no space before the bracketed event, retry without leading space
touhou$Event <- sapply(touhou$Directory, function(x) { if (str_sub(x,start=-1) == "]") { res <- brackets(unlist(str_split(x, " \\["))[2]); if (!is.na(res)) res else brackets(unlist(str_split(x, "\\["))[2]) } else x})
# if you examine the Event column, it's full of wrong entries. I have made a list of 19 event-prefixes (I hope), which
# we'll use as a whitelist by erasing anything which lacks all of the 19 prefixes.
isPrefix <- function(x,y) grepl(paste0("^",x), y)
events <- c("例大祭","M3","C","サンクリ","東方紅楼夢","月の宴","紅のひろば","東方不敗小町","杜の奇跡","東方杜郷想",
"幺樂団カァニバル!","東方幻楽祭","コミコミ","FF","こみトレ","COMIC1","COMIC CITY大阪","恋魔理","東方椰麟祭","東方名華祭")
touhou$Event <- sapply(as.character(touhou$Event),
function(target) if (sum(sapply(events, function(e) isPrefix(e,target))) != 0) target else NA)
# this whitelist covers almost the entire sample, so I think it works well:
## R> sum(!is.na(touhou$Event))
## [1] 39190
## R> length(touhou$Event)
## [1] 41866
#
# one final thing, since (almost) all directories had Dates while not all files had Years; overwrite any missing Years
# based on the Date we just extracted
touhou$Year <- as.integer(format(touhou$Date, "%Y"))
touhou$Directory <- NULL # clean up
# escape with the loot:
write.csv(touhou, file="2013-gwern-touhoumusic-torrent.csv", row.names=FALSE)
~~~
## VGMdb
The [Touhou project page](http://vgmdb.net/product/9) turns out to be incomplete: each entry had to be manually annotated as related to Touhou. I was pointed to [a search query](http://vgmdb.net/search?do=results&id=161863) which turned up many more results by looking for any page with the string "Touhou" in the "games" field.
The VGMdb administrators kindly gave me read-only access to their MySQL databases. I grabbed the entirety of the tables `vgmdb_albums` and `vgmdb_tracks` from the main VGMdb database; I exported them as 2 CSV files with comma separators, renamed `2013-vgmdb-albums.csv` and `2013-vgmdb-tracks.csv`. Before loading the exports, I had to delete all escaped quotes; the default R CSV parsing doesn't handle them. The track rows are 1 track with an album ID, so to turn each track record/row into an equivalent of the torrent rows, I need to fill in based on the album table.
~~~{.R}
albums <- read.csv("http://www.gwern.net/docs/touhou/2013-vgmdb.csv")
albums <- with(albums,
data.frame(albumid,reldate,publisher,game,albumtitles))
albums <- albums[grepl("ouhou", albums$game),]; albums$game <- NULL
tracks <- read.csv("2013-vgmdb-tracks.csv")
tracks$tracklistid <- NULL; tracks$trackid <- NULL; tracks$disctitle <- NULL; tracks$disc <- NULL
tracks$length[tracks$length==0] <- NA # 0 is the default in the VGMdb schema
tracks <- tracks[tracks$albumid %in% albums$albumid,]
touhou <- merge(tracks, albums)
touhou$albumid <- NULL; albums <- NULL; tracks <- NULL
# deal with the 41 dates with the format "2005-09-00" (the 0th months or days are not real...)
touhou$date <- as.Date(sub("-00","-01",as.character(touhou$reldate)))
touhou$reldate <- NULL
touhou$year <- as.integer(format(touhou$date, "%Y"))
# upcase, rearrange to torrent's order
colnames(touhou) <- c("Track","Title","Length","Circle","Album","Date","Year")
touhou <- touhou[c(2,7,5,3,1,4,6)]
write.csv(touhou, file="2013-vgmdb-touhou.csv", row.names=FALSE)
~~~
## `touhouwiki.net`
## Personal downloads
### 4chan /jp/ C83 threads
A loose group of [4chan](!Wikipedia) users on the [/jp/](https://boards.4chan.org/jp/) subforum collaborate each Comiket to upload and distribute doujin manga, games, and music released at that Comiket; some are uploaded by Comiket attendees, some are bought from resellers like [Comic Toranoana](!Wikipedia), and many files are harvested from Japanese P2P filesharing networks like [Winny](!Wikipedia)/[Share](!Wikipedia "Share (P2P)")/[Perfect Dark](!Wikipedia "Perfect Dark (P2P)"). I compiled a list of ~400 files from the [/r/TouhouMusic](http://www.reddit.com/r/TOUHOUMUSIC/comments/15pp33/c83_resource_thread/) C83 thread (principally from the 4chan links) & the blog [All Doujin Music](http://alldoujinmusic.wordpress.com) and gradually downloaded them from January to March 2013. After dead links, I was left with 400-500 files. Many are not music, or even Touhou-related, so I hand-filtered albums, looking for signs of being Touhou doujin works (credits to ZUN, Touhou characters in the artwork, themes I recognized as Touhou, etc); when I was not sure, I erred on the side of exclusion. The [final compilation](/docs/touhou/2013-c83-downloads.txt) yielded 3503 files (evenly split: 1776 Touhou vs 1728 "other") with 953 Touhou music files.
~~~{.R}
# exiftool -extension ogg -json -forcePrint -Title -Year -Album -Artist -Duration -Genre -TrackNumber -Directory
# -FileName -FileSize -NominalBitrate -Date -recurse
# ~/c83/touhou/ ~/c83/touhou/*/** ~/c83/touhou/*/*/** ~/c83/touhou/*/*/*/** > ~/2013-c83-downloads.json
library(rjson)
# download from http://www.gwern.net/docs/touhou/2013-torrent.json.xz and decompress with xz
json_data <- fromJSON(paste(readLines("2013-c83-downloads.json"), collapse=""))
touhou <- data.frame(matrix(unlist(json_data), ncol=13, byrow=TRUE))
colnames(touhou) <- c("SourceFile", "Title", "Year", "Album", "Artist", "Length", "Genre",
"Track", "Directory", "FileName", "FileSize", "AudioBitrate", "Date")
# Delete SourceFile column; redundant with Directory/FileName
touhou <- touhou[,-1]
for (filter in c("/home/gwern/c83/touhou/", "\\[touhou.vnsharing.net\\]", " \\(320K\\+BK\\)", "/mp3",
" MP3v0", " v0", " \\(flac\\+scans\\)", " \\(128K\\)", " \\(V0\\)", " \\(320\\)",
" \\(mp3 320\\)", " \\(v0\\+jpg\\)"))
{ touhou$Directory <- sub(filter, "", as.character(touhou$Directory)) }
touhou[touhou==""] <- NA
touhou[touhou=="-"] <- NA
~~~
Playback length:
~~~{.Bash}
find c83/touhou/ -name "*.ogg" -exec ogginfo {} \;|fgrep "Playback length"
~~~
### 4chan /jp/ C84 threads
591 Touhou music files:
/docs/touhou/2013-c84-downloads.json
### 4chan /jp/ C85 threads
449 Touhou music files:
/docs/touhou/2013-c85-download.json
### 4chan /jp/ Reitaisai 10 threads
Similarly to above, drawing on the [/r/TouhouMusic](http://www.reddit.com/r/TOUHOUMUSIC/comments/1f3ikk/the_reitaisai_10_resource_thread/) discussion and manually pruning duplicates & non-Touhou down to 491 music files.
~~~{.Bash}
exiftool -extension ogg -json -forcePrint -Title -Year -Album -Artist -Duration -Genre -TrackNumber -Directory -FileName -FileSize -NominalBitrate -Date */*.ogg > ~/2013-reitaisai-downloads.json
~~~
### Reitaisai 10 torrent
In May & June 2013, an anonymous person compiled 67 albums and released two torrents of Reitaisai 10 albums ([vol. 1](http://www.nyaa.se/?page=view&tid=438733), [vol. 2](http://www.nyaa.se/?page=view&tid=440787))
~~~{.Bash}
exiftool -extension ogg -json -forcePrint -Title -Year -Album -Artist -Duration -Genre -TrackNumber -Directory -FileName -FileSize -NominalBitrate -Date */*.ogg > ~/2013-reitaisai-downloads-torrent.json
~~~
# Analysis
~~~{.R}
touhou <- read.csv("http://www.gwern.net/docs/touhou/2013-torrent.csv",
colClasses=c("character", "integer", "factor", "character", "integer", "factor",
"character", "character", "numeric", "integer", "factor", "Date"))
# do stuff with the data
# general correlations
t <- data.frame(touhou$Year, touhou$Length, touhou$FileSize, touhou$AudioBitrate)
cor(t,use="pairwise.complete.obs")
# touhou.Year touhou.Length touhou.FileSize touhou.AudioBitrate
# touhou.Year
# touhou.Length -0.01091
# touhou.FileSize 0.04484 0.93915
# touhou.AudioBitrate 0.19188 0.11091 0.35499
# test the correlation between higher bitrate and larger files:
cor.test(touhou$FileSize, touhou$AudioBitrate)
# the genre metadata is useless!
sort(table(touhou$Genre), decreasing=TRUE)
# boxplot avg length per year
plot(touhou$Length ~ factor(touhou$Year))
~~~
Economics modeling:
~~~{.R}
jpn <- read.csv(stdin(),header=TRUE)
DATE,VALUE
2000-01-01,8.9
2001-01-01,9.1
2002-01-01,9.5
2003-01-01,9.6
2004-01-01,9.0
2005-01-01,8.1
2006-01-01,7.5
2007-01-01,7.5
2008-01-01,7.0
2009-01-01,8.9
2010-01-01,9.0
2011-01-01,8.1
# number of works per year does not correlate:
cor.test(jpn$VALUE[3:12], table(touhou$Year)[1:10])
Pearson`s product-moment correlation
data: jpn$VALUE[3:12] and table(touhou$Year)[1:10]
t = -0.3053, df = 8, p-value = 0.768
alternative hypothesis: true correlation is not equal to 0
95% confidence interval:
-0.6903 0.5602
sample estimates:
cor
-0.1073
model <- lm(table(touhou$Year)[1:10] ~ c(2002:2011) + jpn$VALUE[3:12]); summary(model)
Call:
lm(formula = table(touhou$Year)[1:10] ~ c(2002:2011) + jpn$VALUE[3:12])
Residuals:
Min 1Q Median 3Q Max
-2128.6 -716.4 52.4 632.6 2253.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2775278 342171 -8.11 8.3e-05
c(2002:2011) 1379 170 8.13 8.2e-05
jpn$VALUE[3:12] 1450 567 2.56 0.038
Residual standard error: 1400 on 7 degrees of freedom
Multiple R-squared: 0.905, Adjusted R-squared: 0.878
F-statistic: 33.5 on 2 and 7 DF, p-value: 0.00026
logModel <- lm(log(table(touhou$Year)[1:10]) ~ c(2002:2011) + jpn$VALUE[3:12])
summary(logModel)
Call:
lm(formula = log(table(touhou$Year)[1:10]) ~ c(2002:2011) + jpn$VALUE[3:12])
Residuals:
Min 1Q Median 3Q Max
-1.218 -0.632 0.108 0.551 0.982
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1295.060 207.535 -6.24 0.00043
c(2002:2011) 0.652 0.103 6.34 0.00039
jpn$VALUE[3:12] -0.725 0.344 -2.11 0.07301
Residual standard error: 0.849 on 7 degrees of freedom
Multiple R-squared: 0.906, Adjusted R-squared: 0.879
F-statistic: 33.8 on 2 and 7 DF, p-value: 0.000253
plot(c(2002:2011),table(touhou$Year)[1:10])
points(c(2002:2011),exp(predict(logModel)),type='l',col='blue')
~~~
## Growth over time
How fast is the corpus of Touhou music growing?
Constant growth model: the first game was released in 1996, no? So that gives 17 years to accumulate 1.26TB or 1,260GB or 74.1GB per year. The screenshot is downloading at 0kb/s, which is not useful, but it says 2640 days left so we can estimate that he's downloading at 0.47GB per day (`1260/2640`), and over a year 0.47GB is 174GB which is 2.35x faster than the 74GB per year. So at that annual increase, OP is *not* doomed and can in fact catch up.
Exponential growth mode: a little trickier since we can't force a formula just from the cumulative total and elapsed time. I need more data. So using my 2012 Touhou Lossy Torrent data, I can try to regress an exponential against the annual count... but wait! The amount of music does not seem to be increasing exponentially!
R> touhou <- read.csv("http://www.gwern.net/docs/touhou/2013-torrent.csv",
+ colClasses=c("character", "integer", "factor", "character",
+ "integer", "factor", "character",
+ "character", "numeric", "integer",
+ "factor", "Date"))
R> summary(touhou$Year)
R> perYear <- table(touhou$Year); perYear; plot(perYear)
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
13 23 255 1241 2070 2599 5073 10278 9765 7494 2999
Graph: http://i.imgur.com/23fMA5c.png
Looks like Touhou music's growth peaked in 2009; this might reflect the torrent's incompleteness, except the torrent is from 2012, and you'd expect coverage of 2010 or 2011 to be pretty good by that point. So the growth of the torrent *overall* looks more like a sigmoid or log:
R> runningTotal <- cumsum(table(touhou$Year)); runningTotal; plot(runningTotal)
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
13 36 291 1532 3602 6201 11274 21552 31317 38811 41810
http://i.imgur.com/Lv9vHrZ.png
So probably it'd be better to ask, 'if 2012-rate growth continues, what is the ratio to his download speed?'
2012 added 19.5GB to the torrent:
R> # FileSize is in megabytes, so gigabytes
R> sum(touhou[touhou$Year==2012,]$FileSize, na.rm=TRUE) / 1000
[1] 19.5
# Appendix
## `touhouwiki.net` scraping code
Uses the [Tagsoup](http://hackage.haskell.org/package/tagsoup) and [split](http://hackage.haskell.org/package/split) packages; emits CSV to standard out. It is a pile of kludges which I am a little ashamed to post publicly, and may not work for you.
~~~{.Haskell}
import Text.HTML.TagSoup (fromAttrib, fromTagText, isTagOpen, isTagText,
parseTags, (~/=), Tag(TagComment, TagOpen,TagText))
import Network.HTTP (getResponseBody, getRequest, simpleHTTP)
import Data.List (isInfixOf, isPrefixOf, nub, sort, stripPrefix)
import Data.Char (isSpace)
import Codec.Binary.UTF8.String (decodeString)
import Data.Maybe (fromMaybe, isJust, listToMaybe)
import Data.List.Split (keepDelimsL, split, whenElt)
import Control.Monad (join, unless)
main :: IO ()
main = do albums1 <- getAlbums "http://en.touhouwiki.net/wiki/List_by_Groups"
albums2 <- getAlbums "http://en.touhouwiki.net/wiki/List_of_Old_Touhou_Arrangement_Groups"
let albumURLs = map ("http://en.touhouwiki.net"++) $ nub $ sort $ albums1 ++ albums2
putStrLn header
mapM_ getAlbum albumURLs
getAlbums :: String -> IO [String]
getAlbums index = do touhou <- openURL index
return $ drop 22 $ reverse [link | (TagOpen "a" ((ref, link):_)) <- parseTags touhou,
ref=="href",
"/wiki/" `isPrefixOf` link,
let fltr x = not (x `isInfixOf` link),
fltr "_Groups", fltr "Touhou_Wiki:", fltr "Special:",
fltr "Template:", fltr "Category:", fltr "Talk:"]
type Album = [Track]
data Track = Track { title :: String, year :: Int, date :: String,artist :: String,
album :: String, event :: String, circle :: String, duration :: Maybe Int,
track :: Int } deriving Show
empty :: Track
empty = Track {title="",year=0,date="",album="",event="",circle="",artist="",duration=Nothing,track=0}
header :: String
header = "Title,Year,Date,Album,Event,Circle,Duration,Track"
convert :: Track -> String
convert t = "\"" ++ dequote (title t) ++ "\"," ++ show (year t) ++ "," ++ show (date t) ++ ",\"" ++ dequote (album t) ++ "\",\"" ++
event t ++ "\",\"" ++ circle t ++ "\"," ++ maybe "" show (duration t) ++ "," ++ show (track t)
-- You are not expected to understand this.
getAlbum :: String -> IO ()
getAlbum a = do t <- fmap parseTags $ openURL a
--TagText "Released",TagClose "th",TagOpen "td" [],TagText "\n2007-03-23"
let dt = let target = dropWhile (TagText "Released" ~/=) t
in if null target then ""
else deleteParens $ fromTagText $ head $ tail $ filter isTagText target
-- TagText "Released",TagClose "th",TagOpen "td" [],TagText "\n2009/02/08 (",
-- TagOpen "a" [("href","/wiki/Category:Sunshine_Creation_42"),("title","Category:Sunshine Creation 42")],
-- TagText "Sunshine Creation 42",TagClose "a",TagText ")",TagClose "td"
let evnt = let stream = filter isTagText $ dropWhile (TagText "Released" ~/=) t
in if '(' == last (fromTagText $ head $ take 5 $ drop 1 stream) -- )
then fromTagText (filter isTagText (dropWhile (TagText "Released" ~/=) t) !! 2)
else ""
-- "2007-03-23"
let yr = if null dt then 0 else read (take 4 dt)::Int
-- TagText "Album by CODE ZTS LABEL"
let hasCrcl = [cl | TagText cl <- t, "Album by " `isPrefixOf` cl]
unless (null hasCrcl || (yr==0 && null dt)) $ do
let crcl = lookForCircle t
-- TagText "Selfregards2 - Touhou Wiki - Characters, games, locations, and more"
let albm = reverse $ drop 55 $ reverse $ head [al | TagText al <- t,
" - Touhou Wiki - Characters, games, locations, and more" `isInfixOf` al]
let dflt = empty { date = dt, year = yr, circle = crcl, album = albm, event = evnt }
let table = filter (\x -> not ("Disc" `isInfixOf` x || " CD" `isInfixOf` x)) $
filter (not . all isSpace) $
map (trim . fromTagText) $ filter isTagText $
drop 5 $ takeWhile (TagOpen "table"
[("class","navbox"),("cellspacing","0"),
("style","background:#FFFBEE;border-color:#A8A077;")] ~/=) $
takeWhile (TagComment "" ~/=) $
dropWhile (TagOpen "span"
[("class","mw-headline"),("id","Tracks")] ~/=) t
let tracks = filter (not . null) $
split (keepDelimsL $ whenElt
(\x -> length x==3 && "." `isInfixOf` x && isJust(maybeRead x :: Maybe Int))) table
mapM_ (putStrLn . convert . trackToTrack dflt) tracks
-- TagText "Album by ",TagOpen "a"
-- [("href","/wiki/ALiCE%27S_EMOTiON"),("title","ALiCE'S EMOTiON")],TagText
-- "ALiCE'S EMOTiON",TagClose "a"
lookForCircle :: [Tag String] -> String
lookForCircle t = let c = head [cl | TagText cl <- t, "Album by " `isPrefixOf` cl]
res = if c == "Album by " then (let tg = (dropWhile (TagText "Album by " ~/=) t !! 2)
in if isTagText tg then fromTagText tg else
(if isTagOpen tg then fromAttrib "title" tg else "") ) else drop 9 c
in if "(page does not exist)" `isInfixOf` res then takeWhile (/='(') res else res -- )
-- ["01.","The mom","(04:07)","arrangement: ZTS","composition: ZTS","original title: The mom","source: Parhelia"]
trackToTrack :: Track -> [String] -> Track
trackToTrack tr t = tr { track = fromMaybe 0 (maybeRead (head t) :: Maybe Int),
title = t !! 1,
duration = if length t >=3 then Just (timeConverter $ deleteParens (t !! 2)) else Nothing,
artist = lookForAnArtist t }
lookForAnArtist :: [String] -> String
lookForAnArtist t = let targets = dropWhile (\x -> not ("arrangement:" `isPrefixOf` x || "composition:" `isPrefixOf` x)) t
target
| null targets = ""
| last (head targets) == ':' = head targets ++ (targets !! 1)
| otherwise = head targets
in trim $ fromMaybe "" $ join $ fmap (stripPrefix "arrangement:") $ stripPrefix "composition:" target
-- utility functions
openURL :: String -> IO String
openURL url = fmap decodeString (simpleHTTP (getRequest url) >>= getResponseBody)
deleteParens, trim, dequote :: String -> String
deleteParens = trim . filter (\x -> x /= '(' && x /= ')')
trim = reverse . dropWhile isSpace . reverse . dropWhile isSpace
dequote = map (\x -> if x=='"' then '\'' else x) -- "')
timeConverter :: String -> Int
timeConverter n = let (m,s) = break (==':') n
m' = maybeRead m :: Maybe Int
s' = maybeRead (drop 1 s) :: Maybe Int
in (fromMaybe 0 m' * 60) + fromMaybe 0 s'
maybeRead :: Read a => String -> Maybe a
maybeRead = fmap fst . listToMaybe . reads
~~~
## VGMdb scraping code
The following is a buggy program for scraping Touhou albums from VGMdb; it works on a limited subset of album pages, but has an unknown number of fatal bugs. I abandoned it once I was offered read-only database access, and that was what I actually used to get my VGMdb data. This is in case I ever need to go back.
~~~{.haskell}
import Text.HTML.TagSoup (fromTagText, isTagOpenName, isTagText, Tag(TagOpen,TagText), parseTags)
import Network.HTTP (getResponseBody, getRequest, simpleHTTP)
import Data.List (isPrefixOf, sort)
import Data.Char (isSpace)
import Codec.Binary.UTF8.String (decodeString)
main :: IO ()
main = do albumsURLs <- getAlbums
albums <- mapM openURL (sort albumsURLs)
let metadata = map toAlbum albums
writeFile "vgmdb.csv" $ unlines (header : concatMap (map convert) metadata)
type Album = [Track]
data Track = Track { title :: String,
year :: Int,
date :: String,
album :: String,
circle :: String,
duration :: Maybe Int,
track :: Int } deriving Show
empty :: Track
empty = Track {title="",year=0,date="",album="",circle="",duration=Nothing,track=0}
header :: String
header = "Title,Year,Date,Album,Circle,Duration,Track"
convert :: Track -> String
convert t = "\"" ++ title t ++ "\"," ++ show(year t) ++ "," ++ show (date t) ++ ",\"" ++
album t ++ "\",\"" ++ circle t ++ "\"," ++ maybe "" show(duration t) ++ "," ++ show (track t)
-- example album link: 'TagOpen "a" [("class","albumtitle album-doujin"),
-- ("href","http://vgmdb.net/album/36901"),
-- ("title","Majo to Ringo to Samayou Kimi to")]'
getAlbums :: IO [String]
getAlbums = do touhou <- openURL "http://vgmdb.net/product/9"
return [snd(atts !! 1) | TagOpen "a" atts <- parseTags touhou, snd(head atts)=="albumtitle album-doujin"]
-- need 'decodeString' to deal with Japanese glyphs; see http://stackoverflow.com/questions/10558003/how-to-get-utf8-rss-feed
openURL :: String -> IO String
openURL url = fmap decodeString (simpleHTTP (getRequest url) >>= getResponseBody)
toAlbum :: String -> Album
toAlbum page = let tags = parseTags page
(yr,dt) = extractDate tags
albm = extractAlbum tags
crcl = extractCircle tags
files = extractMusic tags
in map (\t -> Track {title = title t, year = yr, date = dt,
album = albm, circle = crcl, duration = duration t,
track = track t}) files
-- TagOpen "a" [("title","View albums released on Dec 30, 2011"),("href","/db/calendar.php?year=2011&month=12#20111230")]
extractDate :: [Tag String] -> (Int,String)
extractDate t = let (a:b:_) = map snd $ head [atts | TagOpen "a" atts <- t,
let ttle = snd(head atts),
"View albums released on " `isPrefixOf` ttle]
in (read(reverse $ take 4 $ reverse a)::Int,
tail$ snd $ break (=='#') b)
-- TagOpen "title" [],TagText "Gensou Rashinban - VGMdb",TagClose "title",
extractAlbum :: [Tag String] -> String
extractAlbum t = (\(TagText x) -> reverse $ drop 8 $ reverse x) (dropWhile (not . isTagOpenName "title") t !! 1)
-- [TagText "Published by",TagClose "b",TagClose "span",TagClose "td",TagText "\r\n",
-- TagOpen "td" [],TagOpen "a" [("href","/org/217")],TagOpen "span"
-- [("class","productname"),("lang","en"),("style","display:inline")],TagText "PopKorn"]
extractCircle :: [Tag String] -> String
extractCircle t = fromTagText(head (drop 8 (dropWhile (\x -> not(isTagText x && (fromTagText x)=="Published by")) t)))
extractMusic :: [Tag String] -> [Track]
extractMusic t = let tracks = filter (not . all isSpace) $
tail $ dropWhile (\z -> z /= "Disc 1") $
map fromTagText $ filter isTagText $
takeWhile (\y -> not(isTagText y && (fromTagText y)=="Disc length")) $
dropWhile (\x -> not(isTagText x && (fromTagText x)=="Tracklist")) t
in if length tracks `rem` 3 == 0 then threezip tracks else twozip tracks
where
twozip,threezip :: [String] -> [Track]
threezip [] = []
threezip (a:b:c:d) = empty {title=b,duration=Just (timeConverter c),track=read a} : threezip d
threezip _ = []
twozip [] = []
twozip (a:b:d) = empty {title=b,duration=Nothing,track=read a} : twozip d
twozip _ = []
timeConverter :: String -> Int
timeConverter n = let (m,s) = break (==':') n in ((read m :: Int) * 60) + (read (tail s) :: Int)
~~~