Baixe o app para aproveitar ainda mais
Prévia do material em texto
1 CHAPTER 1 INTRODUCTION 1.1 MACHINE LEARNING Machine learning is an application of artificial intelligence that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves. Learning can be supervised, semi-supervised or unsupervised. Supervised learning is a learning in which teach or train the machine using data which is well labelled that means some data is already tagged with correct answer. After that, machine is provided with new set of examples so that supervised learning algorithm analyses the training data and produces a correct outcome from labelled data. Unsupervised learning is the training of machine using information that is neither classified nor labelled and allowing the algorithm to act on that information without guidance. Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data. Semi- supervised Learning Problems where you have a large amount of input data and only some of the data is labelled, are called semi-supervised learning problems. These problems sit in between both supervised and unsupervised learning. For example, a photo archive where only some of the images are labelled and the majority are unlabelled. 1.2 OPENCV OpenCV is an open source computer vision and machine learning software library. OpenCV provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the- art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera 2 movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented problems. OpenCV-Python makes use of NumPy, which is a highly optimized library for numerical operations with a MATLAB-style syntax. All the OpenCV array structure converts to and from NumPy arrays. 1.3 FACE RECOGNITION SYSTEM A facial recognition system is a technology capable of identifying or verifying a person from a digital image or a video frame from a video source. There are multiple methods in which facial recognition systems work, but in general, they work by comparing selected facial features from given image with faces within a database. The values measured against the variable associated with points of a person’s face help in uniquely identifying or verifying the person. With this technique, applications can use data captured from faces and can accurately and quickly identify target individuals. Facial recognition techniques are quickly evolving with new approaches such as 3-D modelling, helping to overcome issues with existing techniques. There are many advantages associated with facial recognition. Compared to other biometric techniques, facial recognition is of a non-contact nature. Face images can be captured from a distance and can be analysed without ever requiring any interaction with the user/person. As a result, no user can successfully imitate another person. Facial recognition can serve as an excellent security measure for time tracking and attendance. Facial recognition is also cheap technology as there is less processing involved, like in other biometric techniques. 3 CHAPTER 2 LITERATURE REVIEW Matthew Turk and Alex Pentland (2014) proposed approach treats the face recognition problem as an intrinsically two-dimensional (2-D) recognition problem rather than requiring recovery of three-dimensional geometry, taking advantage of the fact that faces are normally upright and thus may be described by a small set of 2-D characteristic views. The system functions by projecting face images onto a feature space that spans the significant variations among known face images. The significant features are known as "eigenfaces," because they are the eigenvectors (principal components) of the set of faces; they do not necessarily correspond to features such as eyes, ears, and noses. The projection operation characterizes an individual face by a weighted sum of the eigenface features, and so to recognize a particular face it is necessary only to compare these weights to those of known individuals. Some particular advantages of our approach are that it provides for the ability to learn and later recognize new faces in an unsupervised manner, and that it is easy to implement using a neural network architecture. Paul Viola and Michael J (2017) describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm. 4 CHAPTER 3 SYSTEM SPECIFICATION 3.1 HARDWARE RQUIREMENT Processor : Intel Processor Speed : 1.80GHz Hard Disk : 1 TB RAM : 4GB 3.2 SOFTWARE REQUIREMENT Language : Python Software : Visual Studio Code Operating System : Windows 10 3.3 SOFTWARE DESCRIPTION 3.3.1 Python Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming language. Python source code is also available under the GNU General Public License (GPL). It provides constructs that enable clear programming on both small and large scales. Python feature a dynamic type system and automatic memory management. It supports multiple programming paradigms, including object-oriented, imperative, functional and procedural, and has a large and comprehensive standard library. Python is open source software and has a community-based development model. Python and CPython are managed by the non-profit Python Software Foundation. The features include ● Easy-to-learn − Python has few keywords, simple structure, and a clearly defined syntax. This allows the student to pick up the language quickly. ● Easy-to-read − Python code is more clearly defined and visible to the eyes. 5 ● Easy-to-maintain − Python's source code is fairly easy to maintain. ● A broad standard library − Python's bulk of the library is very portable and cross- platform compatible on UNIX, Windows, and Macintosh. ● Interactive Mode − Python has support for an interactive mode which allows interactive testing and debugging of snippets of code. ● Portable − Python can run on a wide variety of hardware platforms and has the same interface on all platforms. ● Databases − Python provides interfaces to all major commercial databases. ● GUI Programming − Python supports GUI applications that can be created and ported to many system calls, libraries and windows systems, suchas Windows MFC, Macintosh, and the X Window system of Unix. ● Scalable − Python provides a better structure and support for large programs than shell scripting. 3.3.2 OPENCV Python is a general-purpose programming language started by Guido van Rossum, which became very popular in short time mainly because of its simplicity and code readability. It enables the programmer to express his ideas in fewer lines of code without reducing any readability. The support of NumPy makes the task easier. NumPy is a highly optimized library for numerical operations. It gives a MATLAB-style syntax. All the OpenCV array structures are converted to-and-from NumPy arrays. increases number of weapons. The several other libraries like SciPy, Matplotlib which supports NumPy can be used with OpenCV. 3.3.3 XLSXWRITER XlsxWriter is a Python module for writing files in the XLSX file format. It can be used to write text, numbers, and formulas to multiple worksheets and it supports features such as formatting, images, charts, page setup, auto filters, conditional formatting and many others. 6 Advantages • It supports more Excel features than any of the alternative modules. • It has a high degree of fidelity with files produced by Excel. In most cases the files produced are 100% equivalent to files produced by Excel. • It has extensive documentation, example files and tests. • It is fast and can be configured to use very little memory even for very large output files. Disadvantages • It cannot read or modify existing Excel XLSX files. 7 CHAPTER 4 PROPOSED SYSTEM 4.1 INTRODUCTION Attendance for the students is an important task in class. When done manually it generally, wastes a lot of productive time of the class. This proposed solution for the current problem is through automation of attendance system using face recognition. Face is the primary identification for any human. The face database is collected to recognize the faces of the students. The system is initially trained with the student`s faces which is collectively known as student database. The project describes the method of detecting and recognizing the face in real-time. The project describes an efficient algorithm using open source image processing framework known as OpenCV. The project can be used for many other applications where face recognition can be used for authentication. 4.2 MODULE DESCRIPTION The project has four modules – Face Training, Face Recognition, Face Detection and Attendance import to ExcelSheet. 8 Fig4.1: Module Diagram 4.2.1 FACE TRAINING A lot of images are used for training a face recognizer so that it can learn different looks of the same person, for example with glasses, without glasses, laughing, sad, happy, crying, with beard, without beard etc. The training data consists of total 2 persons with 12 images of each person. All training data is inside training-data folder. training-data folder contains one folder for each person and each folder is named with format label (e.g. s1, s2) where label is actually the integer label assigned to that person. For example, folder named s1 means that this folder contains images for person 1. The directory structure tree for training data is as follows: 9 training-data |-------------- Prem | |-- 1.jpg | |-- ... | |-- 12.jpg |-------------- Sanjeev | |-- 1.jpg | |-- ... | |-- 12.jpg The test-data folder contains images that will use to test the face recognizer after it has been successfully trained. As OpenCV face recognizer accepts labels as integers so that need to define a mapping between integer labels and persons actual names so below, define a mapping of persons integer labels and their respective names. Preparation of training data For example, if face training has 2 persons and 2 images for each person. PERSON-1 PERSON-2 img1 img1 img2 img2 Then, the above step will produce following face and label vectors. FACES LABELS person1_img1_face 1 person1_img2_face 1 person2_img1_face 2 person2_img2_face 2 Preparing data step can be further divided into following sub-steps. 1.Read all the folder names of subjects/persons provided in training data folder. In folder names: s1, s2. For each subject, extract label number. Folder names follow the format sLabel where Label is an integer representing the label and have assigned to that subject. So, for 10 example, folder name s1 means that the subject has label 1, s2 means subject label is 2 and so on. The label extracted in this step is assigned to each face detected in the next step. 2.Read all the images of the subject, detect face from each image. 3. Add each face to faces vector with corresponding subject label which adds to labels vector. 4.2.2 FACE DETECTION A computer program that decides whether an image is a positive image (face image) or negative image (non-face image) is called a Classifier. OpenCV provides with two pre- trained and ready to be used face detection classifiers: 1. Haar Classifier 2. LBP Classifier 4.2.1 Haar Classifier The Haar Classifier is a machine learning based approach, an algorithm created by Paul Viola and Michael Jones; which are trained from many positive images with faces and negatives images without faces. It starts by extracting Haar features from each image as shown by the windows below: https://en.wikipedia.org/wiki/Haar-like_features 11 Fig4.2: Haar Classifier Features For example, in above image, extracting two features. The first one focuses on the property that the region of the eyes is often darker than the area of the nose and cheeks. The second feature relies on the property that the eyes are darker than the bridge of the nose. 4.2.2 LBP Cascade Classifier As any other classifier, the LBP in short, also needs to be trained on hundreds of images and composed of micro visual patterns features are extracted to form a feature vector that classifies a face from a non-face.Each training image is divided into some blocks as shown in the picture below. Fig4.3: LBP Classifier Conversion 12 Fig4.4: Difference between Haar and LBP Classifier 4.2.3 FACE RECOGNITION The face recognition systems can operate basically in two modes: ● Authentication of a facial image: It compares the input facial image with the facial image related to the user which is requiring the authentication. ● Identification or facial recognition: It compares the input facial image with all facial images from a dataset with the aim to find the user that matches that face. There are different types of face recognition algorithms, for example: ● Eigenfaces (1991) ● Local Binary Patterns Histograms (LBPH) (1996) ● Fisherfaces (1997) Each method has a different approach to extract the image information and perform the matching with the input image. 4.2.1 Local Binary Patterns Histograms (LBPH) Local Binary Pattern (LBP) is a simple yet very efficient texture operator which labels the pixels of an image by thresholding the neighbourhood of each pixel and considers the result as a binary number. 13 LBP is a powerful feature for texture classification. It has further been determined that when LBP is combined with Histograms of Oriented Gradients (HOG) descriptor, it improves the detection performance considerably on some datasets. Using the LBP combined with histograms that can represent the face images with a simple data vector. As LBP is a visual descriptorit can also be used for face recognition tasks, as can be seen in the following step-by-step explanation. The LBPH uses 4 parameters as given below: ● Radius: The radius is used to build the circular local binary pattern and represents the radius around the central pixel. It is usually set to 1. ● Neighbors: The number of sample points to build the circular local binary pattern. The more sample points you include, the higher the computational cost. It is usually set to 8. ● Grid X: The number of cells in the horizontal direction. The more cells, the finer the grid, the higher the dimensionality of the resulting feature vector and usually set to 8. ● Grid Y: The number of cells in the vertical direction. The more cells, the finer the grid, the higher the dimensionality of the resulting feature vector and usually set to 8. Training the Algorithm: First, the algorithm has to be trained. The dataset with the facial images of the people has to be used and to recognize an image and an ID (number or the name of the person) for each image has to be set, so the algorithm will use this information to recognize an input image and gives an output. The images of the same person must have the same ID. Applying the LBP operation: The first computational step of the LBPH is to create an intermediate image that describes the original image in a better way, by highlighting the facial characteristics. To do so, the algorithm uses a concept of a sliding window, based on the radius and neighbors. The image below shows this procedure: 14 Fig4.5: The LBP Operation Based on the image above, the process breaks into several small steps so that can understand it easily: ● The Facial image in grayscale and the part of this image as a window of 3x3 pixels is retrieved and it is represented as a 3x3 matrix containing the intensity of each pixel (0~255). ● Then, to take the central value of the matrix to be used as the threshold and used to define the new values from the 8 neighbors. ● For each neighbor of the central value (threshold), set a new binary value. set 1 for values equal or higher than the threshold and 0 for values lower than the threshold. ● Now, the matrix will contain only binary values (ignoring the central value) and need to concatenate each binary value from each position from the matrix line by line into a new binary value (e.g. 10001101). Note: some authors use other approaches to concatenate the binary values (e.g. clockwise direction), but the final result will be the same. ● Then, convert this binary value to a decimal value and set it to the central value of the matrix, which is actually a pixel from the original image. ● At the end of this procedure (LBP procedure), a new image which represents better the characteristics of the original image is retrieved. It can be done by using bilinear interpolation. If some data point is between the pixels, it uses the values from the 4 nearest pixels (2x2) to estimate the value of the new data point. 15 Extracting the Histograms: Now, using the image generated in the last step, then can use the Grid X and Grid Y parameters to divide the image into multiple grids, as can be seen in the following image: Fig4.6: Histograms Extraction Based on the image above, can extract the histogram of each region as follows: ● The image in grayscale, each histogram (from each grid) will contain only 256 positions (0~255) representing the occurrences of each pixel intensity. ● Each histogram to create a new and bigger histogram. ● The final histogram represents the characteristics of the image original image. Performing the face recognition: In this step, the algorithm is already trained. Each histogram created is used to represent each image from the training dataset. So, given an input image, then perform the steps again for this new image and creates a histogram which represents the image. ● It finds the image that matches the input image then just need to compare two histograms and return the image with the closest histogram. ● The various approaches to compare the histograms (calculate the distance between two histograms), for example: Euclidean distance, Chi-square, Absolute value, etc. In this example, ● Euclidean distance based on the following formula: 16 ● The algorithm output is the ID from the image with the closest histogram. The algorithm should also return the calculated distance, which can be used as a ‘confidence’ measurement. ● Use a threshold and the ‘confidence’ to automatically estimate if the algorithm has correctly recognized the image, then can assume that the algorithm has successfully recognized if the confidence is lower than the threshold defined. 4.2.4 ATTENDANCE IMPORTED TO EXCELSHEET The training data contain images of the students, so the attendance sheet contain their name. The cell array is created which contain the name of the students. When the face of the test data is recognized, attendance is marked for that student for that particular day. If some students are absent, then no attendance is marked for them. Fig4.7: Attendance ExcelSheet 17 4.3 RESULTS AND DISCUSSION The result assigned with project is providing attendance to the student whenever his/her face is focused to the camera. After focusing to the camera, the face is detected and the face alignment is done to align the feature of the face then the feature is matched with the database results. If the result of the captured image is matched with the database then the attendance is provided to the particular student. Then the attendance sheet is generated for a list of students. The results that are associated with the face recognition system vary the results like some of the results may work perfectly and some results may not work exactly with the specifications when sometimes the face will not be detected due to the heavy focus of light. The image may be blur due to low light. The image will be captured as the person frames to the camera in a correct position. If the student changes the direction of his face while capturing, then the duplication of the person is replaced with the other. The exact face is not determined if the student moves his face. The images may not be stored into the database due to specification error. 18 CHAPTER 6 CONCLUSION AND FUTURE WORKS It can be concluded that automated student attendance system in classroom using human face recognition technique works quite well. Certainly, it can be improved for yielding a better result particularly by paying attention in feature extraction or recognition process. This improvement may help the recognition process become more robust. The success rate of the proposed system in recognizing facial images of the students who are in front of webcam is about 72%. In near future, steps are done to improve the accuracy and to evaluate the model with other parameters such as sensitivity and specificity. It can be scaled to predict a large number in input image in the near future. 19 APPENDIX 1 facerecognition.py import cv2, sys, numpy, os import xlsxwriter size = 2 candidate=["prem","sanjeev","sampath","pra veen","U.S.Sanjeev","priya"] present=[] absent=[] workbook = xlsxwriter.Workbook('attendence.xlsx') worksheet = workbook.add_worksheet() fn_haar = 'haarcascade_frontalface_default.xml' fn_dir = 'att_faces' print('Training...') (images, lables, names, id) = ([], [], {}, 0) for (subdirs, dirs, files) in os.walk(fn_dir): for subdir in dirs: names[id] = subdir subjectpath = os.path.join(fn_dir, subdir) for filename in os.listdir(subjectpath):f_name, f_extension = os.path.splitext(filename) if(f_extension.lower() not in ['.png','.jpg','.jpeg','.gif','.pgm']): print("Skipping "+filename+", wrong file type") continue path = subjectpath + '/' + filename lable = id 20 images.append(cv2.imread(path, 0)) lables.append(int(lable)) id += 1 (im_width, im_height) = (112, 92) (images, lables) = [numpy.array(lis) for lis in [images, lables]] model = cv2.face.LBPHFaceRecognizer_create() model.train(images, lables) haar_cascade = cv2.CascadeClassifier(fn_haar) webcam = cv2.VideoCapture(0) while True: rval = False while(not rval): (rval, frame) = webcam.read() if(not rval): print("Failed to open webcam. Trying again...") frame=cv2.flip(frame,1,0) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) mini = cv2.resize(gray, (int(gray.shape[1] / size), int(gray.shape[0] / size))) faces = haar_cascade.detectMultiScale(mini,scaleFac tor=1.1,minNeighbors=3) for i in range(len(faces)): face_i = faces[i] (x, y, w, h) = [v * size for v in face_i] face = gray[y:y + h, x:x + w] face_resize = cv2.resize(face, (im_width, im_height)) 21 prediction = model.predict(face_resize) cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 3) cv2.putText(frame, '%s - %.0f' % (names[prediction[0]],prediction[1]), (x-10, y-10), cv2.FONT_HERSHEY_PLAIN,1,(0, 255, 0)) present.append(names[prediction[0]]) cv2.imshow('OpenCV', frame) key = cv2.waitKey(10) if key == ord('s'): print("presentee name are:",set(present)) absent=set(candidate)-set(present) print("absentee name are:",set(absent)) #IMPORT TO EXCEL FILE. cell_format = workbook.add_format() cell_format.set_bold() cell_format.set_font_color('red') bold = workbook.add_format({'bold': True}) worksheet.write('A1', 'CANDIDATE NAME', bold) worksheet.write('B1', 'PRESENT/ABSENT', bold) present=set(present) row = 1 col=0 worksheet.set_row(0, 18, cell_format) worksheet.set_column('A:D', 20, cell_format) for i in present: 22 worksheet.write(row, col,i) worksheet.write(row, col + 1, "PRESENT") row += 1 for i in absent: worksheet.write(row, col,i) worksheet.write(row, col + 1, "absent") row += 1 workbook.close() cv2.destroyAllWindows() break train.py import cv2, sys, numpy, os size = 4 fn_haar= 'haarcascade_frontalface_default.xml' fn_dir = 'att_faces' try: fn_name = sys.argv[1] except: print("You must provide a name") sys.exit(0) path = os.path.join(fn_dir, fn_name) if not os.path.isdir(path): os.mkdir(path) (im_width, im_height) = (112, 92) haar_cascade=cv2.CascadeClassifier(fn_haar ) webcam = cv2.VideoCapture(0) 23 pin=sorted([int(n[:n.find('.')]) for n in os.listdir(path) if n[0]!='.' ]+[0])[-1] + 1 print("\n\033[94mThe program will save 20 samples. \ Move your head around to increase while it runs.\033[0m\n") count = 0 pause = 0 count_max = 20 while count < count_max: rval = False while(not rval): (rval, frame) = webcam.read() if(not rval): print("Failed to open webcam. Trying again...") height, width, channels = frame.shape frame = cv2.flip(frame, 1, 0) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) mini = cv2.resize(gray, (int(gray.shape[1] / size), int(gray.shape[0] / size))) faces = haar_cascade.detectMultiScale(mini) faces = sorted(faces, key=lambda x: x[3]) if faces: face_i = faces[0] (x, y, w, h) = [v * size for v in face_i] face = gray[y:y + h, x:x + w] 24 face_resize = cv2.resize(face, (im_width, im_height)) cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 3) cv2.putText(frame, fn_name, (x - 10, y - 10), cv2.FONT_HERSHEY_PLAIN, 1,(0, 255, 0)) if(w * 6 < width or h * 6 < height): print("Face too small") else: if(pause == 0): print("Saving training sample "+str(count+1)+"/"+str(count_max)) cv2.imwrite('%s/%s.png' % (path, pin), face_resize) pin += 1 count += 1 pause = 1 if(pause > 0): pause = (pause + 1) % 5 cv2.imshow('OpenCV', frame) key = cv2.waitKey(10) if key == 27: break 25 APPENDIX 2 Fig A2.1: Face Training Output Fig A2.2: Dataset Folder 26 Fig A2.3: Face Recognition Output Fig A2.4: Attendance ExcelSheet 27 REFERENCES [1] Ahonen, Timo, Abdenour Hadid, and Matti Pietikäinen. “Face recognition with local binary patterns.” Computer vision-eccv 2004 (2004): 469–481. [2] Ahonen, Timo, Abdenour Hadid, and Matti Pietikainen. “Face description with local binary patterns: Application to face recognition.” IEEE transactions on pattern analysis and machine intelligence 28.12 (2006): 2037–2041.M. Turk and A. Pentland, “Eigenfaces for recognition," Journal of Cognitive Neuroscience, vol. 3, no. 1, pp.71-86, 1991 [3] A. Pentland, B. Moghaddam, and T. Starner, “View-based and modular eigenspaces for face recognition,” IEEE Conf. on Computer Vision and Pattern Recognition, MIT Media Laboratory Tech. Report No. 245 1994 [4] B. J. Oh, “Face recognition using radial basis function network based on LDA,” International Journal of Computer, Information Science and Engineering, vol. 1, pp. 401- 405, 2007. [5] K. I. Diamantaras and S. Y. Kung, Principal Component Neural Networks: Theory and Applications, John Wiley & Sons, Inc., 1996. [6] M. Alwakeel and Z. Shaaban, “Face recognition based on haar wavelet transform and principal component analysis via Levenberg-Marquardt back propagation neural network,” European Journal of Scientific Research, vol.42, No.1, pp.25-31, 2010. [7] M. H. Muchri, S. Lukas, and D. H. Hareva, “Implementation discrete cosine transform and radial basis function neural network in facial image recognition,” in Proc. International Conference on Soft Computing Intelligent Systems, and Information Technology, 2015 [8] Ojala, Timo, Matti Pietikainen, and Topi Maenpaa. “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns.” IEEE Transactions on pattern analysis and machine intelligence 24.7 (2002): 971–987. 28 [9] S. A. Khayam, The Discrete Cosine Transform (DCT): Theory and Application, Michigan State University, 2003 [10] S. E. Handy, S. Lukas, and H. Margaretha, “Further tests for face recognition using discrete cosine transform and hidden Markov model,” in Proc. International Conference on Electrical Engineering and Informatics (MICEEI), Makassar, 2012. [11] V. V. Kohir and U. B. Desai, Face Recognition Using a DCTHMM Approach, Indian Institute of Technology, Mumbai, India, 1998.
Compartilhar