We checked our fake detection models recently, and found some problems. Thus in this blog we refresh our fake detection models.
The first problem is about data preprocessing. Our original thinking is that by normalizing numeric data into mean=0 and std=1, it is more convenient for SVM with RBF Kernel to training. However, we mistakenly preprocess fake and true data separately, which makes them have much more different to each other, resulting in better classification result. This time, we first try to preprocessing them together, and then train models without preprocessing. By comparing result, we abandon data preprocessing of numerical data. This time, we alternate judge score as follows:
judge_score = (cross_valid_score * 0.7 + recall * 0.3)
The relationship between class weight and judge score are shown in figure 1, and get the best weight based on judge score, as shown in the following table.
Features | Model | Best Weight | judge_score |
numeric | SVM (RBF) | 12 | 0.849 |
description | Logistic Regression | 10 | 0.796 |
Linear SVM | 10 | 0.802 |
UFO's may appear for a different duration of time: some are transient while others last for hours, if not days. Is it possible for us to tell their duration based on the weather conditions? To answer this question, we have performed machine-learning algorithms to determine their relationships. Using each column in our weather data as independent features, we use DecisionTree and Navie Bayes to predict outcome labels --- the duration. The results, however, show little connection between these features and the UFO appearance duration. In the following figure, we provide a table of the accuracy of features vs label (duration), and it shows that the accuracy is mostly unsatisfying. Based on our weather and UFO data (>90000), we cannot effectively predict the relation between weather features and the duration outcome.
We did, nevertheless, find that most of UFO appearance will last 5 minutes, with almost 40% within 60 seconds. This gives us insights on a reasonable duration UFOs will appear. A log-transformed histogram is also present to show the relationship between appearance and duration.
In this blog, we start to design and implement our web application for UFO report. To begin with, our website has two functions:
Currently, we let each model gives out a probability for a new input report, and vote to determine whether a new UFO report is fake, using judge_score as the weight of each classifier. The structure of our web application is shown in figure 2. We use Node.js as server to combine front end and back end. We have already acheived fake detection part, as shown in figure 3.
In the home page, users may choose view our statistic result or report an UFO sighting:
If the user choose 'report', we will give a form to the user:
After user submit the form, we give the probability that his report is true: