본문
However, from 200 tokens onward, the scores for AI-written code are typically decrease than human-written code, with growing differentiation as token lengths grow, meaning that at these longer token lengths, Binoculars would higher be at classifying code as either human or AI-written. Our outcomes showed that for Python code, all of the fashions typically produced increased Binoculars scores for human-written code in comparison with AI-written code. Due to the poor efficiency at longer token lengths, here, we produced a new version of the dataset for each token length, through which we solely kept the functions with token size at the very least half of the goal number of tokens. The above ROC Curve exhibits the identical findings, with a transparent split in classification accuracy when we compare token lengths above and below 300 tokens. Here, we investigated the impact that the mannequin used to calculate Binoculars score has on classification accuracy and the time taken to calculate the scores. However, with our new dataset, the classification accuracy of Binoculars decreased considerably. Because it showed better efficiency in our preliminary research work, we began using DeepSeek as our Binoculars model. Reliably detecting AI-written code has confirmed to be an intrinsically onerous downside, and one which stays an open, however thrilling analysis area.
The AUC values have improved in comparison with our first try, indicating solely a restricted amount of surrounding code that should be added, but extra research is required to determine this threshold. Looking at the AUC values, we see that for all token lengths, the Binoculars scores are virtually on par with random likelihood, in terms of being ready to tell apart between human and AI-written code. The AUC (Area Under the Curve) value is then calculated, which is a single worth representing the performance throughout all thresholds. To get an indication of classification, we also plotted our outcomes on a ROC Curve, which shows the classification performance throughout all thresholds. Despite our promising earlier findings, our final results have lead us to the conclusion that Binoculars isn’t a viable technique for this job.
댓글목록
등록된 댓글이 없습니다.