{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Projet FDD - Arthur Brandao & Maxence Bacquet\n", "\n", "L'objectif du projet est récupérer la liste des anime regardés par un utilisateur sur le site Anilist.co pour lui faire des recommendations.\n", "\n", "## Import\n", "\n", "Import des class utile pour la recommendation. La class AnilistApi et AnilistQuery ont été créées à l'occasion de ce projet pour permettre d'interroger facielement l'API. de plus l'API d'Anilist utilisant GraphQL une class GraphQLClient à aussi été créée pour l'occasion et est utiliser par la class AnilistApi." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "numpy: 1.16.4\n", "pandas: 0.25.2\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "from anilist_api import AnilistApi\n", "from anilist_api import AnilistQuery\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.ensemble import RandomForestClassifier\n", "# Affichage des versions\n", "print('numpy: {}'.format(np.__version__))\n", "print('pandas: {}'.format(pd.__version__))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Définition des variable utile pour tous le projet (variable global et instance d'objet)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Le nom de l'utilisateur pour qui sont faite les recommendations\n", "USER = 'Loquicom'\n", "# La precision minimum du model lors de la phase de test (en %) /!\\ Un trop grand nombre peut être impossible à atteindre\n", "ACCURACYMIN = 80\n", "# Nombre d'iteration maximum avant de considerer que le model ne peut pas atteindre la précision démandée (pourr éviter une boucle infini)\n", "ITERATIONMAX = 25 # -1 <=> Pas de limite\n", "# Le nombre de requete effectué pour la recherche\n", "NBITERATION = 4\n", "# Le nombre d'anime récupérés par requete (max 50)\n", "NBANIME = 50\n", "# L'api d'Anilist\n", "anilist = AnilistApi()\n", "# Le modèle utilisé pour l'apprentissage des features utile à la recommendation\n", "model = RandomForestClassifier(n_estimators = 1000, random_state = 42, max_features=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Definition des fonctions utilitaires\n", "\n", " - **valid_data**: Permet de verifier et mesurer la precision des données predite par le model\n", " - **identical_features**: Fait en sorte que le dataframe est les même feature que la dataframe source\n", " - **make_querry**: Création d'une query basique utile pour requeter l'api d'Anilist" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def valid_data(predict, value):\n", " result = pd.Series(value - predict)\n", " error = result[result != 0].count()\n", " accuracy = 100 - ((error / result.size) * 100)\n", " return result.size, error, accuracy\n", "\n", "def identical_features(source, target, keep = []):\n", " if keep is str:\n", " keep = [keep]\n", " # Ajout feature manquante\n", " for feature in source.columns:\n", " if feature not in target.columns:\n", " target[feature] = 0\n", " # Suppr feature en trop\n", " drop = []\n", " for feature in target.columns:\n", " if feature not in source.columns and feature not in keep:\n", " drop.append(feature)\n", " return target.drop(drop, axis = 1)\n", "\n", "def make_query(score = -1, popularity = -1, epMin = -1, epMax = -1, durationMin = -1, durationMax = -1, source = [], formatType = []):\n", " query = AnilistQuery()\n", " if score != -1:\n", " query.scoreGreaterThan(score)\n", " if popularity != -1:\n", " query.popularityGreaterThan(popularity)\n", " if epMin != -1:\n", " query.episodeBetween(epMin, epMax)\n", " if durationMin != -1:\n", " query.durationBetween(durationMin, durationMax)\n", " if len(formatType) > 0:\n", " query.formatIn(formatType)\n", " if len(source) > 0:\n", " query.sourceIn(source)\n", " return query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Récupèration des données\n", "\n", "Les données sont récupérées sur l'API d'Anilist. Les données utilisées correspondent à la liste des anime complétés par l'utilisateur. Dans les données on retrouve notamment le score donné par l'utilisateur, le score moyen sur le site, le nombre d'episode et leur durée, la popularité, des tags, les genre, le format et la source." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitlescorepopularityformatepisodedurationsourceuserScoretag1tag2tag3genre1genre2genre3genre4genre5
097636Akiba's Trip: The Animation6212475TV1324VIDEO_GAME80.0Super PowerOtaku CultureDemonsActionEcchiSupernaturalComedyFantasy
120602Amagi Brilliant Park7442303TV1324LIGHT_NOVEL70.0Ensemble CastMagicMale ProtagonistComedyRomanceFantasyNoneNone
221077Amagi Brilliant Park: Nonbirishiteiru Hima ga ...708614OVA124LIGHT_NOVEL70.0Ensemble CastMale ProtagonistWorkComedyFantasyNoneNoneNone
36547Angel Beats!7988258TV1324ORIGINAL75.0AfterlifeTragedySchoolActionComedyDramaSupernaturalNone
420755Ansatsu Kyoushitsu7972532TV2223MANGA90.0AssassinsSchoolShounenActionComedySupernaturalNoneNone
......................................................
97104252Maou-sama, Retry!5911116TV1224LIGHT_NOVEL70.0IsekaiAnti-HeroMale ProtagonistActionAdventureFantasyNoneNone
98107704Kawaki wo Ameku791676MUSIC14OTHER0.0MusicalFemale ProtagonistPrimarily Female CastMusicNoneNoneNoneNone
9999425Promare839545MOVIE1115ORIGINAL95.0FirefightersRobotsCGIActionMechaComedySci-FiNone
100107226Dumbbell Nan Kilo Moteru?7420176TV1224MANGA80.0FitnessEducationalAthleticsComedyEcchiSportsSlice of LifeNone
101112381raison d'etre64208MUSIC14ORIGINAL0.0NoneNoneNoneMusicActionNoneNoneNone
\n", "

102 rows × 17 columns

\n", "
" ], "text/plain": [ " id title score \\\n", "0 97636 Akiba's Trip: The Animation 62 \n", "1 20602 Amagi Brilliant Park 74 \n", "2 21077 Amagi Brilliant Park: Nonbirishiteiru Hima ga ... 70 \n", "3 6547 Angel Beats! 79 \n", "4 20755 Ansatsu Kyoushitsu 79 \n", ".. ... ... ... \n", "97 104252 Maou-sama, Retry! 59 \n", "98 107704 Kawaki wo Ameku 79 \n", "99 99425 Promare 83 \n", "100 107226 Dumbbell Nan Kilo Moteru? 74 \n", "101 112381 raison d'etre 64 \n", "\n", " popularity format episode duration source userScore \\\n", "0 12475 TV 13 24 VIDEO_GAME 80.0 \n", "1 42303 TV 13 24 LIGHT_NOVEL 70.0 \n", "2 8614 OVA 1 24 LIGHT_NOVEL 70.0 \n", "3 88258 TV 13 24 ORIGINAL 75.0 \n", "4 72532 TV 22 23 MANGA 90.0 \n", ".. ... ... ... ... ... ... \n", "97 11116 TV 12 24 LIGHT_NOVEL 70.0 \n", "98 1676 MUSIC 1 4 OTHER 0.0 \n", "99 9545 MOVIE 1 115 ORIGINAL 95.0 \n", "100 20176 TV 12 24 MANGA 80.0 \n", "101 208 MUSIC 1 4 ORIGINAL 0.0 \n", "\n", " tag1 tag2 tag3 genre1 \\\n", "0 Super Power Otaku Culture Demons Action \n", "1 Ensemble Cast Magic Male Protagonist Comedy \n", "2 Ensemble Cast Male Protagonist Work Comedy \n", "3 Afterlife Tragedy School Action \n", "4 Assassins School Shounen Action \n", ".. ... ... ... ... \n", "97 Isekai Anti-Hero Male Protagonist Action \n", "98 Musical Female Protagonist Primarily Female Cast Music \n", "99 Firefighters Robots CGI Action \n", "100 Fitness Educational Athletics Comedy \n", "101 None None None Music \n", "\n", " genre2 genre3 genre4 genre5 \n", "0 Ecchi Supernatural Comedy Fantasy \n", "1 Romance Fantasy None None \n", "2 Fantasy None None None \n", "3 Comedy Drama Supernatural None \n", "4 Comedy Supernatural None None \n", ".. ... ... ... ... \n", "97 Adventure Fantasy None None \n", "98 None None None None \n", "99 Mecha Comedy Sci-Fi None \n", "100 Ecchi Sports Slice of Life None \n", "101 Action None None None \n", "\n", "[102 rows x 17 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "animelist = anilist.findUser(USER).getUserAnimeList()\n", "completedList = animelist.toDataFrame()\n", "completedList" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pré-traitement\n", "\n", "Les données sont mis sous une forme plus adéquate pour le model (suppression des chaines de caractères) et une colonne like est ajoutée pour trier les anime en 2 class, ceux aimer par l'utilisateur (like = 1) et les autres (like = 0). On considère que tous anime avec un score égale ou supèrieur à 80 est aimé par l'utilisateur. De plus les colonnes titre, id et userScore sont supprimer car inutile pour l'apprentissage du model" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
scorepopularityepisodedurationtag_Super_Powertag_Ensemble_Casttag_Afterlifetag_Assassinstag_Cute_Girls_Doing_Cute_Thingstag_School...format_OVAformat_SPECIALformat_TVsource_LIGHT_NOVELsource_MANGAsource_ORIGINALsource_OTHERsource_VIDEO_GAMEsource_VISUAL_NOVELlike
062124751324100000...0010000101
174423031324010000...0011000000
2708614124010000...1001000000
379882581324001001...0010010000
479725322223000101...0010100001
..................................................................
9759111161224000000...0011000000
9879167614000000...0000001000
998395451115000000...0000010001
10074201761224000000...0010100001
1016420814000000...0000010000
\n", "

102 rows × 124 columns

\n", "
" ], "text/plain": [ " score popularity episode duration tag_Super_Power tag_Ensemble_Cast \\\n", "0 62 12475 13 24 1 0 \n", "1 74 42303 13 24 0 1 \n", "2 70 8614 1 24 0 1 \n", "3 79 88258 13 24 0 0 \n", "4 79 72532 22 23 0 0 \n", ".. ... ... ... ... ... ... \n", "97 59 11116 12 24 0 0 \n", "98 79 1676 1 4 0 0 \n", "99 83 9545 1 115 0 0 \n", "100 74 20176 12 24 0 0 \n", "101 64 208 1 4 0 0 \n", "\n", " tag_Afterlife tag_Assassins tag_Cute_Girls_Doing_Cute_Things \\\n", "0 0 0 0 \n", "1 0 0 0 \n", "2 0 0 0 \n", "3 1 0 0 \n", "4 0 1 0 \n", ".. ... ... ... \n", "97 0 0 0 \n", "98 0 0 0 \n", "99 0 0 0 \n", "100 0 0 0 \n", "101 0 0 0 \n", "\n", " tag_School ... format_OVA format_SPECIAL format_TV \\\n", "0 0 ... 0 0 1 \n", "1 0 ... 0 0 1 \n", "2 0 ... 1 0 0 \n", "3 1 ... 0 0 1 \n", "4 1 ... 0 0 1 \n", ".. ... ... ... ... ... \n", "97 0 ... 0 0 1 \n", "98 0 ... 0 0 0 \n", "99 0 ... 0 0 0 \n", "100 0 ... 0 0 1 \n", "101 0 ... 0 0 0 \n", "\n", " source_LIGHT_NOVEL source_MANGA source_ORIGINAL source_OTHER \\\n", "0 0 0 0 0 \n", "1 1 0 0 0 \n", "2 1 0 0 0 \n", "3 0 0 1 0 \n", "4 0 1 0 0 \n", ".. ... ... ... ... \n", "97 1 0 0 0 \n", "98 0 0 0 1 \n", "99 0 0 1 0 \n", "100 0 1 0 0 \n", "101 0 0 1 0 \n", "\n", " source_VIDEO_GAME source_VISUAL_NOVEL like \n", "0 1 0 1 \n", "1 0 0 0 \n", "2 0 0 0 \n", "3 0 0 0 \n", "4 0 0 1 \n", ".. ... ... ... \n", "97 0 0 0 \n", "98 0 0 0 \n", "99 0 0 1 \n", "100 0 0 1 \n", "101 0 0 0 \n", "\n", "[102 rows x 124 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = animelist.toDataFrame(dummies = True)\n", "df['like'] = 0\n", "df.loc[df.userScore >= 80, 'like'] = 1\n", "df = df.drop(['id', 'title', 'userScore'], axis=1)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Entrainement du model\n", "\n", "Les données sont découpées en un jeu d'apprentissage (80% des données) et un jeu de test pour verifier le bonne apprentissage (20% des données). On entraine donc le model puis on le test jusqu'a avoir un niveau de prédiction supèrieur à 70% de réussite" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration 1 : 21 elements, 8 erreur(s), 61.9 % de precision\n", "Iteration 2 : 21 elements, 4 erreur(s), 80.95 % de precision\n" ] } ], "source": [ "# Separation x et y\n", "y = df.like\n", "x = df.drop('like', axis=1)\n", "# Tant que la precision n'est pas suffisante\n", "accuracy = 0\n", "ite = 1\n", "while accuracy < ACCURACYMIN:\n", " # Creation jeu d'apprentissage\n", " x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n", " # Apprentissage du model\n", " model.fit(x_train, y_train)\n", " # Test du model\n", " y_predict = model.predict(x_test)\n", " elt, error, accuracy = valid_data(y_predict, y_test)\n", " # Affichage resultat apprentissage du model sur cette iteration\n", " print('Iteration', ite, ':', elt, 'elements,', error, 'erreur(s),', round(accuracy, 2), '% de precision')\n", " ite += 1\n", " # Si trop d'iteration on coupe\n", " if ITERATIONMAX != -1 and ite > ITERATIONMAX:\n", " raise Exception('Imposible to reach ' + str(ACCURACYMIN) + '% of accuracy')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Récupération des features importantes\n", "\n", "On récupére les features importantes pour qu'un anime plaise à l'utilisateur d'après le modèle" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
featurescore
0score0.12
1popularity0.11
2episode0.05
3duration0.04
18tag_Magic0.02
22tag_Video_Games0.02
36tag_Female_Protagonist0.02
49tag_Male_Protagonist0.02
92genre_Action0.02
93genre_Comedy0.02
94genre_Slice_of_Life0.02
96genre_Supernatural0.02
99genre_Adventure0.02
107genre_Psychological0.02
117source_LIGHT_NOVEL0.02
118source_MANGA0.02
\n", "
" ], "text/plain": [ " feature score\n", "0 score 0.12\n", "1 popularity 0.11\n", "2 episode 0.05\n", "3 duration 0.04\n", "18 tag_Magic 0.02\n", "22 tag_Video_Games 0.02\n", "36 tag_Female_Protagonist 0.02\n", "49 tag_Male_Protagonist 0.02\n", "92 genre_Action 0.02\n", "93 genre_Comedy 0.02\n", "94 genre_Slice_of_Life 0.02\n", "96 genre_Supernatural 0.02\n", "99 genre_Adventure 0.02\n", "107 genre_Psychological 0.02\n", "117 source_LIGHT_NOVEL 0.02\n", "118 source_MANGA 0.02" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Recuperation de la liste des features et de leur score associé par le model\n", "feature_list = list(x.columns)\n", "importances = list(model.feature_importances_)\n", "feature_importances = [(feature, round(importance, 2)) for feature, importance in zip(feature_list, importances)]\n", "# Transformation en DataFrame et on ne garde que les features avec un score supèrieur à 0.02\n", "fi = pd.DataFrame(feature_importances, columns=['feature', 'score'])\n", "fi = fi[fi.score >= 0.02]\n", "fi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On interprete cette liste de features et on extrait les infos interessantes pour pouvoir requeter l'API après et récupèrer das anime susceptible de plaire à l'utilisateur" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "numerical = ['score', 'popularity', 'episode', 'duration']\n", "\n", "# Extraction des features de type numerique\n", "scoreMin = -1\n", "popularityMin = -1\n", "nbEpisodeMin = -1\n", "nbEpisodeMax = -1\n", "durationMin = -1\n", "durationMax = -1\n", "if 'score' in fi.feature.values:\n", " scoreMin = x.score.mean()\n", "if 'popularity' in fi.feature.values:\n", " popularityMin = x.popularity.mean()\n", "if 'episode' in fi.feature.values:\n", " mean = x.episode.mean()\n", " nbEpisodeMin = mean - 5\n", " nbEpisodeMax = mean + 5\n", "if 'duration' in fi.feature.values:\n", " mean = x.duration.mean()\n", " durationMin = mean - 10\n", " durationMax = mean + 10\n", "\n", "# Extraction des feature de type string\n", "formatIn = []\n", "sourceIn = []\n", "genreIn = []\n", "tagIn = []\n", "for tpl in fi.itertuples():\n", " # Recuperation feature et passage feature numerique qui ont un traitement particulier\n", " feature = tpl[1]\n", " if feature in numerical:\n", " continue\n", " # Recuperation du nom du type de la feature et de sa valeur\n", " name = feature.split('_')\n", " if(len(name) > 2):\n", " feature = name[1] + '_' + name[2]\n", " else:\n", " feature = name[1]\n", " name = name[0]\n", " # Traitement\n", " if name == 'format':\n", " formatIn.append(feature)\n", " elif name == 'source':\n", " sourceIn.append(feature)\n", " elif name == 'genre':\n", " genreIn.append(feature)\n", " elif name == 'tag':\n", " tagIn.append(feature)\n", " else:\n", " raise ValueError(\"Unknown feature: \" + name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Récupèration d'anime en fonction des features importantes\n", "\n", "On requete l'api pour récupèrer des animes qui correspondent à nos features, puis ont les verifie les features qui ne peuvent pas être inclus dans la requete à l'api" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitlescorepopularityepisodedurationtag_Super_Powertag_Ensemble_Casttag_Afterlifetag_Assassins...format_ONAformat_OVAformat_SPECIALformat_TVsource_LIGHT_NOVELsource_MANGAsource_ORIGINALsource_OTHERsource_VIDEO_GAMEsource_VISUAL_NOVEL
0245Great Teacher Onizuka843160443.0250000...0001010000
197922Inuyashiki733163111.0230000...0001010000
210162Usagi Drop833188311.0220000...0001010000
317549Non Non Biyori783197612.0240000...0001010000
414967Boku wa Tomodachi ga Sukunai Next723199212.0240000...0001100000
..................................................................
4514741Chuunibyou demo Koi ga Shitai!766764912.0240000...0001100000
4620920Dungeon ni Deai wo Motomeru no wa Machigatteir...746837213.0240000...0001100000
476746Durarara!!806855024.0240100...0001100000
4810087Fate/Zero837051013.0260100...0001100000
4920623Kiseijuu: Sei no Kakuritsu827107224.0240000...0001010000
\n", "

200 rows × 125 columns

\n", "
" ], "text/plain": [ " id title score \\\n", "0 245 Great Teacher Onizuka 84 \n", "1 97922 Inuyashiki 73 \n", "2 10162 Usagi Drop 83 \n", "3 17549 Non Non Biyori 78 \n", "4 14967 Boku wa Tomodachi ga Sukunai Next 72 \n", ".. ... ... ... \n", "45 14741 Chuunibyou demo Koi ga Shitai! 76 \n", "46 20920 Dungeon ni Deai wo Motomeru no wa Machigatteir... 74 \n", "47 6746 Durarara!! 80 \n", "48 10087 Fate/Zero 83 \n", "49 20623 Kiseijuu: Sei no Kakuritsu 82 \n", "\n", " popularity episode duration tag_Super_Power tag_Ensemble_Cast \\\n", "0 31604 43.0 25 0 0 \n", "1 31631 11.0 23 0 0 \n", "2 31883 11.0 22 0 0 \n", "3 31976 12.0 24 0 0 \n", "4 31992 12.0 24 0 0 \n", ".. ... ... ... ... ... \n", "45 67649 12.0 24 0 0 \n", "46 68372 13.0 24 0 0 \n", "47 68550 24.0 24 0 1 \n", "48 70510 13.0 26 0 1 \n", "49 71072 24.0 24 0 0 \n", "\n", " tag_Afterlife tag_Assassins ... format_ONA format_OVA format_SPECIAL \\\n", "0 0 0 ... 0 0 0 \n", "1 0 0 ... 0 0 0 \n", "2 0 0 ... 0 0 0 \n", "3 0 0 ... 0 0 0 \n", "4 0 0 ... 0 0 0 \n", ".. ... ... ... ... ... ... \n", "45 0 0 ... 0 0 0 \n", "46 0 0 ... 0 0 0 \n", "47 0 0 ... 0 0 0 \n", "48 0 0 ... 0 0 0 \n", "49 0 0 ... 0 0 0 \n", "\n", " format_TV source_LIGHT_NOVEL source_MANGA source_ORIGINAL \\\n", "0 1 0 1 0 \n", "1 1 0 1 0 \n", "2 1 0 1 0 \n", "3 1 0 1 0 \n", "4 1 1 0 0 \n", ".. ... ... ... ... \n", "45 1 1 0 0 \n", "46 1 1 0 0 \n", "47 1 1 0 0 \n", "48 1 1 0 0 \n", "49 1 0 1 0 \n", "\n", " source_OTHER source_VIDEO_GAME source_VISUAL_NOVEL \n", "0 0 0 0 \n", "1 0 0 0 \n", "2 0 0 0 \n", "3 0 0 0 \n", "4 0 0 0 \n", ".. ... ... ... \n", "45 0 0 0 \n", "46 0 0 0 \n", "47 0 0 0 \n", "48 0 0 0 \n", "49 0 0 0 \n", "\n", "[200 rows x 125 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Creation de la query\n", "query = make_query(score=scoreMin, popularity=popularityMin, source=sourceIn, formatType=formatIn)\n", "# Creation variable pour receptionner les recommendations\n", "col = x.columns.values\n", "col = np.append(['id', 'title'], col)\n", "finalList = pd.DataFrame(columns = col)\n", "# On parcours plusieurs page \n", "for searchList in anilist.iterateAnimeListFetch(query, NBITERATION, NBANIME):\n", " # Si il n'y a plus de données\n", " if len(searchList.data) < 1:\n", " break\n", " # Transformation en DataFrame utilisable\n", " searchList = identical_features(x, searchList.toDataFrame(dummies = True), ['id', 'title'])\n", " # Verification des features nom presentes dans la requete\n", " valid = []\n", " for row in searchList.iterrows():\n", " isValid = False\n", " row = pd.Series(row[1])\n", " for genre in genreIn:\n", " if ('genre_' + genre) in row.index:\n", " isValid = True\n", " break\n", " if not isValid:\n", " for tag in tagIn:\n", " if ('tag_' + genre) in row.index:\n", " isValid = True\n", " break\n", " if isValid:\n", " valid.append(row)\n", " # Ajoute si tous est ok\n", " if len(valid) > 0:\n", " finalList = finalList.append(pd.DataFrame(valid), sort=False)\n", "\n", "# Met en forme la liste final et l'affiche\n", "finalList = identical_features(x, finalList.fillna(0), ['id', 'title'])\n", "finalList" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interrogation du model\n", "\n", "On interroge le model pour savoir si les animes récupérés plairont ou non à l'utiliateur" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtitle
020652Durarara!!x2 Shou
18425Gosick
220593Hanamonogatari
37674Bakuman.
499726Net-juu no Susume
.........
9720832Overlord
9814813Yahari Ore no Seishun Love Comedy wa Machigatt...
9920920Dungeon ni Deai wo Motomeru no wa Machigatteir...
1006746Durarara!!
10110087Fate/Zero
\n", "

82 rows × 2 columns

\n", "
" ], "text/plain": [ " id title\n", "0 20652 Durarara!!x2 Shou\n", "1 8425 Gosick\n", "2 20593 Hanamonogatari\n", "3 7674 Bakuman.\n", "4 99726 Net-juu no Susume\n", ".. ... ...\n", "97 20832 Overlord\n", "98 14813 Yahari Ore no Seishun Love Comedy wa Machigatt...\n", "99 20920 Dungeon ni Deai wo Motomeru no wa Machigatteir...\n", "100 6746 Durarara!!\n", "101 10087 Fate/Zero\n", "\n", "[82 rows x 2 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Liste des recommendations\n", "recommendation = []\n", "#On parcours chaque entrées de la liste fina\n", "for row in finalList.iterrows():\n", " row = pd.Series(row[1])\n", " # Récupéréation id et titre puis suppr (elles sot inutiles au model)\n", " info = {'id': row.id, 'title': row.title}\n", " row = row.drop(['id', 'title'])\n", " # Prediction\n", " if model.predict([row])[0] == 1:\n", " recommendation.append(info)\n", "\n", "# Transformation en DataFrame et suppression doublon\n", "recommendation = pd.DataFrame(recommendation)\n", "recommendation = recommendation.drop_duplicates()\n", "recommendation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Affichage resultat\n", "\n", "On supprime tous les anime déjà vue par l'utilisateur et on affiche la liste " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "69 recommendation(s)\n", " - Akatsuki no Yona\n", " - Akira\n", " - Baccano!\n", " - Bakuman.\n", " - Black Lagoon\n", " - Bleach\n", " - Dororo\n", " - Dr. STONE\n", " - Dragon Ball\n", " - Dragon Ball Z\n", " - Dungeon ni Deai wo Motomeru no wa Machigatteiru Darou ka\n", " - Dungeon ni Deai wo Motomeru no wa Machigatteiru Darou ka II\n", " - Durarara!!\n", " - Durarara!!x2 Shou\n", " - Enen no Shouboutai\n", " - Fairy Tail\n", " - Gintama\n", " - Gosick\n", " - Hagane no Renkinjutsushi\n", " - Hai to Gensou no Grimgar\n", " - Haikyuu!!\n", " - Haikyuu!! 2\n", " - Haikyuu!!: Karasuno Koukou VS Shiratorizawa Gakuen Koukou\n", " - Hanamonogatari\n", " - Hataraku Saibou\n", " - Hotarubi no Mori e\n", " - JoJo no Kimyou na Bouken\n", " - JoJo no Kimyou na Bouken: Diamond wa Kudakenai\n", " - JoJo no Kimyou na Bouken: Stardust Crusaders\n", " - JoJo no Kimyou na Bouken: Stardust Crusaders - Egypt-hen\n", " - Kaichou wa Maid-sama!\n", " - Katanagatari\n", " - Kekkai Sensen\n", " - Kimetsu no Yaiba\n", " - Kokoro Connect\n", " - Kono Subarashii Sekai ni Shukufuku wo! 2\n", " - Koukaku Kidoutai\n", " - Kuroko no Basket\n", " - Kyoukai no Kanata\n", " - Log Horizon\n", " - Log Horizon 2\n", " - Made in Abyss\n", " - Magi: The Kingdom of Magic\n", " - Magi: The Labyrinth of Magic\n", " - Mahouka Koukou no Rettousei\n", " - Nanatsu no Taizai: Imashime no Fukkatsu\n", " - Naruto: Shippuuden\n", " - Nekomonogatari (Kuro)\n", " - Net-juu no Susume\n", " - Nisekoi\n", " - Nisemonogatari\n", " - No Game No Life -Zero-\n", " - Noragami Aragoto\n", " - One Piece\n", " - Ookami to Koushinryou\n", " - Seishun Buta Yarou wa Bunny Girl-senpai no Yume wo Minai\n", " - Shingeki no Kyojin 3\n", " - Shingeki no Kyojin 3 Part 2\n", " - Soul Eater\n", " - Suzumiya Haruhi no Shoushitsu\n", " - Suzumiya Haruhi no Yuuutsu\n", " - Toaru Majutsu no Index\n", " - Toki wo Kakeru Shoujo\n", " - Vinland Saga\n", " - Violet Evergarden\n", " - Wotaku ni Koi wa Muzukashii\n", " - Yahari Ore no Seishun Love Comedy wa Machigatteiru.\n", " - Youjo Senki\n", " - orange\n" ] } ], "source": [ "# Récupèration uniquement des anime nom present dans la liste\n", "result = []\n", "for row in recommendation.iterrows():\n", " row = pd.Series(row[1])\n", " if completedList[completedList.id == row.id].empty:\n", " result.append(row.title)\n", "\n", "# Affichage\n", "result.sort()\n", "print(len(result), 'recommendation(s)')\n", "for anime in result:\n", " print(' -', anime)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 1 }