Lin, 2001).
4 TRAINING AND TEST DATA
The training and test data consisted of images found
on theWWW. A set of 17,300 examples was hand labeled
for training as either adult-content or not. The
images in this set were selected from websites known
to be family-safe and those known to contain adult
content; images from the latter set were then manually
labeled the authors as to whether each was adultcontent
or not. Of these 17,300 samples, 812 were
adult-content and 16,488 were not.
The test set consists of 51,960 sample images uniformly
collected from the Internet and manually labeled
by the authors. Of these images, 1,331 were
labeled as adult-content, and the remaining 50,629 as
not adult-content. The test data was also manually
labeled by authors. The next section will show the
classication results for this data.
5 EXPERIMENTAL RESULTS
In this section, we describe the training of the SVM,
experiments to evaluate the contributions of the features,
and present comparisons with related work.
5.1 Training
Because the number of positive adult-content training
examples greatly outweighs the negative samples,
LIBSVM's weight setting for the positive examples
was set to 18 during training. While this does not
exactly match the proportion of ratio of positive to
negative examples for historical reasons, the results
of the training do not seem very sensitive to this parameter,
as long as the positive examples are given a
fairly large weight. RBF kernels with a gamma parameter
of 0.05 were used, and all other parameters to
LIBSVM were left at their defaults. LIBSVM used a
total of 3,102 support vectors for the nal classier.
5.2 Testing
To see the impact of each set of features on the performance
of the system, a number of tests were conducted.
We began by training an SVM with only the
basic skin color features (the mean and standard deviation
of the skin map in the ROI), and then used the
SVM to produce scores for each of the test images.
From this data, and ROC curve plotting the number
of errors on adult-content vs. the number of errors on
family safe images could be produced. This curve is
the upper curve in the graph of Figure 6a. For this
type of graph, a better result means the curve comes
closer to the origin.
In steps, we then added batches of features, retrained
the SVM and computed the resulting ROC
curve. The results are shown together in Figure 6a
(full range) and 6b (zoomed). Each curve represents
adding one batch of features (described in a paragraph
in Section 3.1) to the set of features used by
the previous curve, with the exception of the last two
curves. The “Face Detection” curve adds features for
the number of faces and largest face, modies the skin
color model using the face pixel samples, and also
modies the skin map by removing the face rectangles.
The “All Features” curve uses both versions of
the features, with and without the faces removed from
the skin map, which improves accuracy slightly.
As can be seen from Figure 6, as more features
are added, the accuracy improves. However, there
appears to be diminishing returns from the types of
straightforward features we are using.
We evaluated the speed of the algorithm on a corpus
of around 1.5 billion (/.0% ) thumbnail images
of less than 12 pixels. Processing the entire
corpus took less than 8 hours using 2,500 computers,
for an overall throughput of around 20 images per second
per computer.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Error Rate on Non-Adult-Content
Error Rate on Adult-Content
Impact of Adding Features
Skin Color Detection
Connected Component Analysis
Skin Texture Features
Lines
Image Features
Entropy Features
Clutter Features
Face Detection
All Features
(a)
0.15
0.2
0.25
0.3
0.15 0.2 0.25 0.3
Error Rate on Non-Adult-Content
Error Rate on Adult-Content
Impact of Adding Features
Skin Color Detection
Connected Component Analysis
Skin Texture Features
Lines
Image Features
Entropy Features
Clutter Features
Face Detection
All Features
(b)
Figure 6: This gure shows the ROC curves as more features
are added to the overall feature set. (a) shows the
curves at their original scale, while (b) is zoomed in to show
the differences between the feature sets more clearly.
5.3 Comparison with Related Work
Although we have not been able to locate any two papers
which use the same test sets, it may be useful
to examine the reported error rates which have been
published. A numerical summary of related results is
shown in Table 1. Each of these numbers is drawn
from the paper, or estimated from an ROC curve presented
in the paper. The same results are plotted on
a graph in Figure 7, along with the ROC curve from
our system. In addition, the ROC curve for “Zheng
(our data)” evaluates the work of (Zheng et al., 2004)
using its open source implementation in (di Linguistica
Computazionale, 2004) applied to our test set.
The results presented by (Arentz and Olstad, 2004)
are interesting because in addition to showing the accuracy
for a general set of images, results are also
reported for a test set containing portrait pictures of
people. As might be expected, the false alarm rate on
these pictures is higher, as they contain a signicant
Table 1: A summary of accuracy results (on different test
sets) from related work.
Citation Positive
error
rate
Negative
error
rate
(Ioffe and Forsyth, 1998) 49% 10%
(Fleck et al., 1996) 56% 2%
(Wang et al., 1998) 4% 9%
(Arentz and Olstad, 2004)
(all images)
4.6% 12%
(Arentz and Olstad, 2004)
(portraits)
4.6% 26.5%
(Duan et al., 2002) 19.3% 10%
(Zeng et al., 2004) 23.5% 5%
(Jones and Rehg, 1998b) 14.2% 7.5%
(Zheng et al., 2004) 9% 20%
amount of skin color while not being adult-content.
It is expected that our system's use of a face detector
may eliminate some of these false alarms.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Error Rate on Non-Adult-Content
Error Rate on Adult-Content
Comparison with Related Work
Ioffe
Fleck
Wang
Arentz (general)
Arentz (portraits)
Duan
Zeng
Jones
Zheng (their data)
Zheng (our data)
Our System
Figure 7: Graphical summary of accuracy results (on different
test sets) from related work, as well as the ROC curve
from our system for comparison purposes. The only direct
comparison which can be made is between Our System and
Zheng (our data), and it can be seen that our system has
signicantly better accuracy.
As can be seen, there is a wide variety in the reported
accuracies of the systems that have been published.
Given that many of these systems are heavily
dependent on skin color models and features that
are very similar across systems, it is quite likely that
the wide range of differences in accuracy can be attributed
to differences in the training and test set distributions.
This underscores the importance of creating
a consistent test set with which to measure
progress in this eld. Using the WWW as the source
of test images as we did creates a very difcult, realworld
test case. Not only is the content extremely diverse,
but the quality, resolution, color balance, and
brightness vary greatly per image—increasing the dif-
culty of this problem dramatically. It is worth noting
that many of the false negatives in our results are due
to grayscale or cartoon adult-content images, which
cannot be detected by the algorithms described here.
6 SUMMARY AND FUTURE
WORK
The results above indicate that the system is able to
detect roughly 50% of the adult-content images in a
small test set, with roughly 10% of the safe images
being incorrectly marked as adult-content; or at a different
threshold detecting 90% of adult-content images
with a false alarm rate of 35%. However, since
safe images signicantly outnumber those with adultcontent,
this leads to a large number of false alarms,
indicating there is much work remaining to be done.
The system described above has been incorporated
into Google's adult-content ltering infrastructure,
and is now in active use for image safe-search.
There are a number of potential ways to improve
this work. Face image processing can determine the
age (Lanitis et al., 2004) and gender (Moghaddam and
Yang, 2000; Baluja and Rowley, 2005) of people in
the image, which may turn out to be useful features,
perhaps simply as priors. More elaborate representations
of the shape of the skin color blobs, such as
those used in (Wang et al., 1998) may be helpful without
signicantly increasing the computational cost.
Better measures of texture beyond simple edge detection
may improve the recognition of skin regions.
Finally, image similarity matching against a database
of adult-content image signatures (using ideas similar
to (Tieu and Viola, 2000; Lowe, 2004)) may be useful.
Any image-based system can be used to complement
the results of a keyword based lter, since each
approach has its own strengths and weaknesses.
To end on a practical note, because of the ubiquity
of the Internet, search engines, and the widespread
proliferation of electronic images, adult-content detection
is an extremely important problem to address.
To improve the rate of progress in this eld, it would
be useful to establish a large xed test set which can
be used by both researchers and commercial ventures.
Another avenue to pursue is the creation of an annual
competition for ltering systems. Unfortunately,
the images used in this paper were obtained from
the WWW, and likely contain copyrighted content,
so cannot be redistributed. Nonetheless, the authors
look forward to collaborating with other researchers,
both in terms of the algorithms and approaches taken,
as well as ideas to promote more interest and faster
progress in this eld.
REFERENCES
Arentz, W. A. and Olstad, B. (2004). Classifying offensive
sites based on image content. In Computer Vision and
Image Understanding.
Baluja, S. and Rowley, H. A. (2005). Boosting sex identi
cation performance. In Innovative Applications of
Articial Intelligence, Pittsburgh, PA, USA.
Bradski, G. (2000). Programmer's Toolchest:
The OpenCV Library. Software available at
http://www.intel.com/research/mrl/
research/opencv/index.htm.
Canny, J. (1986). A computational approach to edge detection.
IEEE Transactions on Pattern Analysis and
Machine Intelligence, 8(6).
Chang, C.-C. and Lin, C.-J. (2001). LIBSVM: a library
for support vector machines. Software available
at
http://www.csie.ntu.edu.tw/˜cjlin/
libsvm.
di Linguistica Computazionale, I. (2004). POESIA:
Public Open-source Environment for a Safer Internet
Access. Software available at http:
//www.poesia-filter.org, alpha 2 release,
poesieasoft_snapshot_25july2003.tgz.
Duan, L., Cui, G., Gao, W., and Zhang, H. (2002). Adult
image detection method base-on skin color model and
support vector machine. In Asian Conference on Computer
Vision, pages 797–800, Melbourne, Australia.
Fleck, M. M., Forsyth, D. A., and Bregler, C. (1996). Finding
naked people. In European Conference on Computer
Vision.
Hunke, H. M. (1994). Locating and tracking of human faces
with neural networks. Master's thesis, University of
Karlsruhe.
Ioffe, S. and Forsyth, D. (1998). Learning to nd pictures
of people. In Neural Information Processing Systems.
Ioffe, S. and Forsyth, D. (1999). Finding people by sampling.
In International Conference on Computer Vision.
Jones, M. J. and Rehg, J. M. (1998a). Stastical color models
with applications to skin detection. International
Journal of Computer Vision, 46(1):81–96.
Jones, M. J. and Rehg, J. M. (1998b). Stastical color models
with applications to skin detection. Technical report,
Compaq Cambridge Research Laboratory.
Kiryati, N., Eldar, Y., and Bruckstein, A. M. (1991). A
probabilistic Hough transform. Pattern Recognition,
2(4):303–316.
Kruppa, H., Bauer, M. A., and Schiele, B. (2002). Skin
patch detection in real-world images. In Annual Pattern
Recognition Symposium DAGM.
Lanitis, A., Draganova, C., and Chirstodoulou, C. (2004).
Compariong difference classiers for automatic age
estimation. IEEE Transactions on Systems, Man, and
Cybernetics, 34(1).
Lienhart, R. and Maydt, J. (2003). Empirical analysis of detection
cascades of boosted classiers for rapid object
detection. In Annual Pattern Recognition Symposium
DAGM.
Lowe, D. G. (2004). Distinctive image features from scaleinvariant
keypoints. International Journal of Computer
Vision, 60(2):91–110.
Moghaddam, B. and Yang, M.-H. (2000). Sex with support
vector machines. In Neural Information Processing
Systems.
Rosenberg, C., Minka, T., and Ladsariya, A. (2003).
Bayesian color constancy with non-gaussian models.
In Neural Information Processing Systems.
Sprague, N. and Luo, J. (2002). Clothed people detection in
still images. In International Conference on Pattern
Recognition.
Tieu, K. and Viola, P. (2000). Boosting image retrieval. In
Computer Vision and Pattern Recognition.
Viola, P. and Jones, M. (2001). Rapid object detection using
a boosted cascade of simple features. In Computer
Vision and Pattern Recognition.
Wang, J. Z., Li, J., Wiederhold, G., and Firschein, O.
(1998). System for screening objectionable images.
In Computer Communications Journal.
Zeng, W., Gao, W., Zhang, T., and Liu, Y. (2004). Image
guarder: An intelligent detector for adult images. In
Asian Conference on Computer Vision, pages 1080–
1084, Jeju Island, Korea.
Zheng, H., Daoudi, M., and Jedynak, B. (2004). Blocking
adult images based on statistical skin detection. Electronic
Letters on Computer Vision and Image Analysis,
4(2):1–14.