Skip to main content

Table 2 Research using ML to detect eating disorder risk via social media and internet data

From: Potential benefits and limitations of machine learning in the field of eating disorders: current research and future directions



Predictors variables

Outcome variables

Best performing ML approach

Explanatory power of best performing model


Benítez-Andrades et al. [47]

494,025 posts containing ED-related words on Twitter

Posts with ED-related words

(1) Posts written by users who identify online as suffering from EDs; (2) posts that promoted having an ED; (3) informative posts; (4) scientific posts

Bidirectional encoder representations from transformer–based models (RoBERTa)

(1) 83%; (2) 89%; (3) 84%; (4) 94%

Outcome based on content of posts not validated measures of EDs

Only identifies content of people who publicly acknowledge their ED on Twitter

Specific predictors unknown

Chancellor et al. [40]

62,000 posts with removed pro-ED content or ED content remaining publicly available on Instagram

Combinations and frequencies of different ED hashtags and captions

Whether the post was removed or still publicly available

Logistic regression


Only identifies content of people who publicly acknowledge their ED on Instagram

Chancellor et al. [41]

26 million posts from 100,000 users who post pro-ED content on Instagram

Mental illness severity (MIS; low, medium, high) in a user’s previous posts based on the content of hashtags

MIS (low, medium, high) based on the content of hashtags

Multinomial logistic regression


MIS inferred from posts not validated measures

Chancellor et al. [39]

877,000 pro-ED photo posts shared on Tumblr, 569 of which were removed by Tumblr

Text, hashtag, and photo content from Tumblr posts

Whether the post would be/was removed by Tumblr for violating community guidelines

Deep neural network


Specific predictors unknown

De Choudhury [37]

55,334 posts collected from 18,923 blogs on Tumblr who mentioned common ED and anorexia symptomatology tags

Social, affective, cognitive, and linguistic style expression in posts

(1) Whether a post shares any kind of anorexia related content; (2) Whether a post relates to the proana or the pro-recovery community

Support vector machine classifier

(1) 83%; (2) 81%

Outcome based on content of posts not validated measures of EDs

Only identifies content of people who publicly acknowledge their ED on Tumblr

Hwang et al. [45]

185,950 posts and 3,528,107 comments from a weight management subcommunity on Reddit

4 types of emotional eating behaviours and 5 types of feedback based on Latent Dirichlet Allocation topic modelling method

Emotional eating diagnosis based on authors’ expertise

Stochastic gradient descent


Outcome based on content of posts not validated measures of EDs

Sadeh-Sharvit et al. [48]

231 adult women on Prolific who contributed their internet browsing history over the past 6-months

Keywords related to EDs, daily visits to social media, fraction of searches on Google or Bing, activity rates, participant age

ED status (clinical/subclinical ED, high risk for an ED, or no ED) based on responses to validated surveys



Small sample size

Wang et al. [44]

119,825,361 posts on Twitter from 72,047 users, of which 1,797,239 posts were from 3380 users who self-identify with an ED on Twitter

User engagement and activity, posting preference, interaction diversity, psychometric properties of posts

ED status (ED or non-ED user)

Support vector machine


ED status determined by self-identifying ED on Twitter

Only identifies content of people who publicly acknowledge their ED on Twitter

Yan et al. [46]

4759 posts from 6 ED-related subcommunities on Reddit

Relationships between key words within the text of each post

Whether users need immediate mental health support for an ED based on expertise of two clinical psychologists

Logistic regression


Required human coders with extensive expertise

Zhou et al. [43]

18,288 posts on Twitter with ED-related words

ED-related words in posts

ED-related topic clusters/themes

Correlation Explanation (CorEx) topic model


Outcome based on content of posts not validated measures of EDs

Only identifies content of people who publicly acknowledge their ED on Twitter

Zhou et al. [42]

123,977 posts on Twitter with ED-related words

Posts with ED-related words

(1) ED-relevant and ED-irrelevant posts; (2) ED-promotional and education and ED-laypeople posts

Convolutional neural network (CNN) and long short-term memory (LSTM)

(1) 89%; (2) 90%

Outcome based on content of posts not validated measures of EDs

Only identifies content of people who publicly acknowledge their ED on Twitter

  1. ED, eating disorder; ML, machine learning