In greater than 80% of movies in YouTube, the potential inhabitants fascinated in the video will increase over time. Two of the six models (The modified negative exponential. Further, the modified adverse exponential characterizes the dynamic of a non-viral content material and it predicts that the accumulated variety of view doesn’t contribute to the propagation of the content. This mannequin corresponds to the scenario whereby the content has been broadcasted to a pool of users. Both fashions seize the case of immigration course of by which the potential inhabitants or the ceiling value turn out to be dynamic. On the other side, the Gompertz model captures viral movies by which a part of this dynamic is propagated by word-of-mouth. We lastly use the above classification together with the automated parameters extraction in order to predict the evolution of videos’ view-rely. We consider two situations: In the first we use half of the view-rely curve as a coaching sequence whereas in the second one, we take a set training sequence that corresponds to the primary 50 days in the lifetime of the video.

It is non-viral if the propagation of the video essentially relies on broadcast of the video from the supply (it’s then mentioned to have the broadcast property). In that case, a large fraction of potential target inhabitants can obtain the data directly from the supply. We research the robustness of these fashions to the different thematic classes of the video in YouTube. We suggest six mathematical biology-inspired models and we present that a minimum of 90% of videos in YouTube are associated to one of these six mathematical models with a Mean Error Rate less than 5%. We further present learn how to extract the mannequin parameters for every video. To totally different values of the peak popularity of the video. We show that the fraction of movies withing a given mannequin is kind of robust and exhibits little dependence on the completely different thematical classes of the video, apart from Education class which has a distinct behaviour: For this class plainly the phrase-of-mouth is the dominate mechanism by way of which contents are disseminated.

Figure 5 is an example the place we fit these fashions to one YouTube content material (Figure 5a). We observe that the S-form of the logistic mannequin curve is symmetric because of the symmetrical property of sigmoid function (Figure 5b). However, the convex phase and the concave phase are non symmetric as we can observe in Figure 5a. Hence the Logistic model does not match well. Then, Gompertz mannequin and modified Gompertz mannequin are fitted to the same YouTube content. Figure 5: From a YouTube video with a S-shaped view-depend curve ( 5a), we first match the logistic mannequin in 5b. The estimated curve (dashed) is compared with the precise normalised view-count curve (plain). The Gompertz model (Figure 5c) matches better than the logistic mannequin, and the modified Gompertz mannequin (Figure 5d) describes higher the behaviour of the data at the horizon (immigration phenomena). A difficulty that results from the mannequin we use is the adjustments of the curve dynamics on the horizon.

This dataset incorporates some static info for each video similar to YouTube id, title of the video, name of the creator, duration and list of associated movies. It additionally provides the evolution of some metrics (shares, subscribers, watch time and views) in an each day kind and in a cumulative kind, from the add day until the date of crawling. Previous analyses of YouTube confirmed a powerful correlation between view-depend and different metrics as variety of comments, favourites and ranking. We model the dynamic evolution of view-depend some mathematical models from the biology. We focus the analysis on view-rely as the main popularity metric of a video. Size of the target population: The goal population dimension is the utmost quantity of people that can be, probably, involved by the content. A target population belongs to 1 of these two types: (i) a fixed finite target population or (ii) a possible goal population that grows in time which we name the immigration course of.

We present the models distribution in Figure 7b. The identical two fashions: modified adverse exponential and modified Gompertz, nearly cover the entire dataset with the identical quantity of videos for each. Regarding their reliability, it enforces the classification effectivity. The previous is a non viral model whereas the latter is viral, meaning that there is kind of a stability between viral and non viral contents. Typically, the models distribution in each class will not be far from the distribution contemplating all categories mixed. Both spotlight an immigration process, leading to the conclusion that a variety of YouTube contents still attracting viewers even after an extended interval. We note that the modified Gompertz mannequin dominates. However, Education category is kind of totally different. Seventy five % of the videos. One would possibly assume that Education is a phrase-of-mouth category where videos dissemination results primarily from viewers influence. Few advantages from advertising processes or internal YouTube mechanisms reminiscent of suggestion system. We analyse the models distribution considering the popularity of videos.

This dynamic seems to be relevant in line with some examples in the dataset. This section describes how we use the models introduced in section III in order to categorise the YouTube contents in our dataset. In addition of the dynamics of view-count used for modelling, the options we consider for each video are: the age (in variety of days), the YouTube class and the popularity (i.e the total variety of views on the day of crawling). Figure 3a reveals their distribution. Table II summarises the values of age and popularity metrics. Figure 3: Some options distributions from the YouTube dataset. Table I lists the YouTube classes contained inside the dataset. For the info fitting, we only use the cumulative evolution of view-depend. We estimate the parameters of the models described in part III utilizing regression algorithms both based on the imply squares criterion minimisation. S be the expression for one mannequin.