The Truth Initiative Foundation GIS Support
Problem
Impact of tobacco availability on behavior not well-understood for all populations.
Researchers have hypothesized that sociodemographic disparities exist in tobacco-related disease, in part, due to differing levels of exposure to tobacco retail outlets and in store advertising.
Higher tobacco outlet density has been found to be associated with greater intentions to smoke among youth, increased tobacco use among youth 13-16, initiation of cigarette use among young adults, heavier smoking patterns among adults, and reduced quit attempts. In 2018, point of sale advertising represented the largest category of advertising expenditure for cigarette manufacturers in the U.S., and increased exposure to tobacco marketing, has been found to be associated with smoking more positive community norms around tobacco use.
However, studies assessing sociodemographic disparities in the tobacco retail environment use non-spatial analytic techniques which fail to adequately capture the true availability of retail tobacco due to aggregation bias. Meanwhile, studies of in store advertising rely on manual audits of individual stores, which require substantial resources, training, and time; and thus can be difficult to conduct in under-resourced communities.
Solution
Understanding geospatial aspects of tobacco availability across communities.
Using nationwide tobacco retailer location data provided by the Truth Foundation, NORC produced tobacco retailer density images using adaptive-bandwidth KDE, a non-parametric method of extrapolating spatially-distributed point location data over an area by calculating the density of the point locations using a specified bandwidth. With adaptive-bandwidth KDE, the influence of each tobacco retail outlet is limited to the surrounding population of 1,000 people. Thus, the resulting smooth, continuous tobacco retailer density surface accounted for the underlying population density.
To improve point of sale advertising data collection, NORC employed machine learning algorithms to collect data on the presence and location of tobacco point of sale advertising. The Truth Foundation collected images of the interiors of tobacco retailers. The clearest 694 photos were selected and used to create a training and test data set. NORC then used a pre-trained image classification network model, InceptionV3, to discover the presence of tobacco logos, as well as a manual unified object detection system, You Only Look Once (YOLO), to identify logo locations. NORC then employed the Python TensorFlow and Keras libraries to build a deep leaning neural network to classify whether the images contained specific tobacco brands.
Result
Making new insights into the relationships between tobacco availability and key outcomes.
Using tobacco retailer adaptive kernel density surfaces allowed NORC to create retailer density measures by demographic subgroup that were could be extracted into Census geographies while mitigating any aggregation bias or edge effect concerns. These measures allowed for NORC and the Truth Foundation to measure associations between neighborhood sociodemographics and retail density while accounting for spatial autocorrelation.
NORC’s point of sale advertising detection model was able to detect advertisements using familiar logos and colors from a small training dataset, and showed how machine learning techniques for efficient image classification and object detection could assist with many of the logistical challenges of collecting point of sale data.
Related Tags
Project Leads
-
Ned English
Associate Director -
Leah Christian
Senior Vice President -
Susan Paddock
Executive Vice President & Chief Scientist
Data & Findings
-
opens in new tab"Image Processing for Public Health Surveillance of Tobacco Point-of-Sale Advertising: Machine Learning-Based Methodology."
Journal Article | November 22, 2021
-
opens in new tab"Sociodemographic Disparities in the Tobacco Retail Environment in Washington, DC: A Spatial Perspective."
Journal Article | August 19, 2020