All sessions of Transformation 2021 are now available on demand. Now look.


In a new paper, OpenAI researchers have revealed details about the Aanda Education Model Codex, which generates software source code. Codex Power CoPilot, an “AI Pair Programmer” tool developed jointly by OpenAI and GitHub. Copylot is currently available in beta testing mode for a limited number of users.

This paper is an interesting reading that illustrates the process by which OpenAI scientists revisited their main language model GPT-3 to create a codex. But, importantly, the paper will also shed much-needed light on how much you can rely on deep learning in programming.

‘No lunch free’ theorem

The codex is a descendant of GPT-3, which was released last year largely through a deep learning language model. The complexity of deep learning models is often measured by the number of parameters they have. In general, the learning ability of a model increases with the number of parameters. GPT-3 comes with 175 billion dimensions, more than two orders more than its predecessor, GPT-2 (1.5 billion dimensions). The GPT-3 was trained for more than 600 gigabytes. 2 training was more than 50 times larger than the dataset.

Given the large increase in size, the main innovation of the GPT-of was “few shot learning”, the ability to do the work for which it was not trained. The paper presented by GPT-introduced titled “Language Samples Are a Few Short Learners” and stated: “Here we show Scaling the language models dales greatly improves the performance of a few shots. [emphasis mine], Sometimes even reaching competitiveness with previous advanced fine tuning approaches. “

Basically, Aadhaar was a model trained on a large body of text that could match or outperform some of the models featuring specific tasks.

But according to a new paper by OpenAI, none of the various versions of GPT-3 were able to solve any of the coding problems used to evaluate the codecs. To be fair, GPT-3’s training dataset has no coding patterns, so we can’t expect to be able to code it. But OpenAI scientists have also included the GP billion-parameter model, the 500 billion-five-meter parameter model, the 500-gigabyte dataset, which includes 95 g gigabytes of GitHub and g૨ gigabytes of stack exchange data. GPT-JA solved 11.4 percent of coding problems. The 12 billion dimensional version of GPT-3, Codex, tuned in to GitHub’s 159 gigabyte code examples, which solved 28.8 percent of the problems. A separate version of the codex, called Codex-S, was performed well by the supervised educator, who trained the operation to .7 to..7 percent (other GPT and codex models are trained by non-consultative study). Is) extended to.

The codex proves that machine learning is still governed by the “no free lunch” theorem (NFL), which means that generalizations come at the expense of operations. In other words, machine learning models are more accurate when they are designed to solve a specific problem. On the other hand, when their problem domain expands, their performance decreases.

Codecs with high accuracy at the expense of poor natural language processing capabilities can perform a specific task (converting function descriptions and signatures into source code). GPT-3, on the other hand, is a common language model that can generate appropriate text about many subjects (including complex programming concepts), but cannot write a single line of code.

Size versus price

Experiments by OpenAI researchers show that increasing the size of the machine learning model has improved the performance of the codecs. At 300 million dimensions, the Codex solves 13.2 percent of evaluation problems against 28.8 percent performance of the 12 billion dimension model.

But the full version of GPT-3 has 175 billion dimensions, a whole sequence larger in size than the one used to create the codecs. Not training large models on codex training data, which gives better results?

One possible reason to hang on to 12 billion could be the size of the dataset. Larger codex models will require larger datasets. Training him in a 159-gigabyte corpus is probably the reason for the excess, where the model becomes too good at remembering and studying examples of his training and too bad at dealing with novel situations. The aggregation and maintenance of large datasets is an expensive and time consuming process.

A similar waxing problem would be the cost of the codex. Aside from scientific experimentation, Codex was supposed to be the backbone of future production that could turn into a profit for a semi-proprietary research laboratory by a professional organization. As I have already discussed, the cost of training and operating the 175-billion GPT-3 model will make it very difficult to develop a profitable business model around it.

However, a smaller but nicely tuned version of GPT-3 will be more tidy in terms of profit and loss.

Finally, as OpenAI’s experiments show, the size / performance ratio of the codex follows a logarithmic scale. This means that the speed of performance gradually decreases as you increase the size of the model. Therefore, the additional cost of collecting and training data and operating the larger model will not accelerate small performance.

And note that code generation is a very lucrative market. Due to the high hourly pay of programmers, even saving a few hours of coding time a month is enough to cover the codex subscription fee. In other domains where labor is less expensive, automated tasks with a larger language model will be more challenging from a profit and loss perspective.

Creating an understanding code

One thing that needs to be reminded is that, while the output of the codecs is so compelling, the learning nada study model does not understand programming. Models of other deep learning-based languages, like Dello, retrieve a statistical correlation between pieces of codex code.

In their paper, OpenAI scientists acknowledge that the codex “is not a efficient model for training” and that “even experienced developers do not get anywhere near this amount of code in their careers.”

He further states that “a strong student who completes an early computer science course is expected to be able to solve problems larger than Codex-12B.”

Here’s an interesting piece of paper: “We’ve sampled tokens from the codex until we find one of the following stop sequences: ‘c nclass’, ‘ndef’, ‘ n #’, ‘nif’, or ‘pr n’ Print ‘because the model would otherwise continue to generate additional functions or statements. “

This means that Codex will continue to generate code mentally, even if it has finished the block considering the problem mentioned in the prompt.

This is a plan that works well when you want to solve simple problems that come up again and again. But when you enlarge and try to write a large program that encounters a problem that can be solved in multiple steps, the limitations of the codex become clear.

Scientists at OpenAI discovered that as the number of components in the work description increased, the model’s performance declined rapidly.

The researchers wrote in their paper, “This behavior is reckless of a human programmer who, for a chain of lengths, should be able to properly implement a program for a chain of lengths,” the researchers wrote in their paper.

To further expose the lack of understanding of the codecs regarding program structure and code is the fact that it can “recommend syntactically inaccurate or obscure code, and functions, variables, and features that may be uncertain or demanding beyond the scope of the codebase.” . Practically, this means that in some cases, the machine learning model will tank different pieces of code together, even if they don’t fit together.

In their paper, the researchers also discuss “misunderstandings” in codecs, where model Dell can solve a specific problem, but does not do so due to various errors. The codex uses the contents of the file to which you are referring to generate the output. If your code contains subtle errors (which is quite common if you are a human programmer), the codex “knowingly” code indicates that the superficial looks good but is incorrect, researchers warn.

Miscellaneous is an interesting phenomenon that requires further study. But OpenAI’s experiments further show that “if the data, parameters, and training duration are extended, the misalignment is likely to continue and even worse,” which could be another reason to balance the model’s size at 12 billion dimensions.

This paper also discusses in detail the devaluation of codecs and the possibility of generating weak code (which is appropriate for a separate article, so I have not discussed it here).

Responsible use and A.I. No reporting

As I said after the introduction of Copylot, “AI Pair Programmer,” the term used for Copylot on the GitHub webpage is inaccurate.

Codex is not a programmer. And it won’t even take your job (if you’re a programmer). Coding is just one part of what programmers do. OpenAI’s scientists observe that in its current state, the codex “reduces the cost of creating software by increasing programmer productivity,” but it will not replace other tasks that software developers routinely perform, such as “commanding with colleagues, writing design specifications, and Upgrading software stacks. “

False codecs can also lead to “over-reliance” for the programmer, where the programmer visually allows any code generated by the model without modifying it. Given the obvious and subtle flaws the codex can make, ignoring this threat can lead to quality and security risks. “The safe use of code generation systems such as Codex requires human oversight and vigilance,” OpenAI researchers warned in their paper.

Overall, the reaction of the programmer community shows that Codex is a very useful tool with potentially huge impact on the future of the software industry. At the same time, given the hype surrounding the co-pilot’s introduction, it’s important to understand its unintended effects. In this regard, the people of OpenAI are to be commended for responsibly studying, documenting, and reporting the limitations and threats of the Codex.

Ben Dixon is a software engineer and founder of Techtex. He writes about technology, business and politics.

The story originally appeared on Bdtechtalks.com. Copyright Pirate 2021

Venturebet

VentureBet’s mission is to become Digital Town Square for technical decision makers to gain knowledge about transformative technology and transactions. Our site provides essential information on data technology and strategies to guide you as you lead your organizations. We invite you to become a member of our community for:

  • Up-to-date information on topics of interest to you
  • Our newsletters
  • Gated thought-leader content and discounted access to our precious events, e.g. Transformation 2021: Learn more
  • Networking features and more

Become a member