Recognizing emotions via an automated system plays a vital role in a variety of applications such as treating behavipresentation diseases, video surveillance, mood tracking, and human computer interaction. Identifying expressions in adults is challenging, especially during the transition between two emotions. Children have a more complex way of expressing emotions. This makes the recognition task more difficult. In this research, we develop a deep learning approach to detect emotions using visual datasets. We construct a Convolutional Neural Network (CNN) to predict happy and neutral emotions, the two most common expressions among children, after detecting children’s faces in a video. The CNN is a custom VGG13 network consisting of 10 convolution layers interleaved with max pooling and dropout layers. This network is trained on the Facial Expression Recognition (FER+) dataset, an enhanced version of a well-known face dataset with eight annotated emotions for people from different demographics and ages. We explicitly test the proposed approach on a smaller dataset of children’s faces, which are captured from the study of the interaction between children and robots. This test dataset is manually labeled with neutral and happy emotions. The proposed approach achieves an accuracy of 88.79% in predicting two emotions on children’s faces.