You are viewing archived messages.
Go here to search the history.

Maeliza 2020-07-07 02:27:31

Hey folks, With Shubhadeep Roychowdhury we just released the last version of our Future of Coding projet codeBert a package to automatically review Python code documentation (docstring) We'd love to have your feedback on it: codeBERT github package

This is all part of a longer-term ambition: automatically document any source code. This is our plan: ✔ automate source code mining   with tree-hugger ✔  masked language model trained on source code (Python) with codeBERT MLM ▶  fined tuned language model to analyse docstring vs function definition (Python) - and improve it ⏺  extend model to different languages

Ivan Reese 2020-07-07 05:53:21

This seems so cool. I wish I had a meaningful way to evaluate it, but I never learned Python. Congrats on the release!

Maeliza 2020-07-07 11:50:14

Thanks Ivan Reese! We want to extend to other languages, I'll keep you posted

Jack Rusher 2020-07-07 13:05:20

This is an interesting project! I've been hoping to see more useful DL model-based programming assistance. How's the accuracy at this stage?

Shubhadeep Roychowdhury 2020-07-07 13:14:21

Hi, Thanks for the feedback. We also believe the same. That is why we are building it. In a synthetic dataset (eval data) the accuracy is close to 92% however, we are preparing a better data set at the moment and we are planning to re-train on that. That will drop the accuracy I am pretty certain but will increase the generalisation power.

Peter van Hardenberg 2020-07-07 23:24:33

Distributed systems struggle with the evolution of their data types over time. The classical solution is carefully orchestrated database migrations, API deprecations, and the like. In decentralised software many of these techniques do not apply. How then, should we manage change over time in evolving systems? We're exploring this question at Ink & Switch with the Cambria project. You might enjoy following along with our weekly updates posted here: https://inkandswitch.github.io/cambria/

Garth Goldwater 2020-07-07 23:29:42

Chet Corcos this reminds me of what you were saying about apps with data you can get at for 30 years

Chet Corcos 2020-07-07 23:32:18

Totally. Data modeling is an essential complexity -- there's no way around this. For that reason, I find that schemaless data models are preferable because then you don't need to worry about atomic changes to the data model. In fact, this is what Uber and Facebook have done to address the scale that they have to operate on.

Garth Goldwater 2020-07-07 23:32:41

We’d optimized for the experience of our demos (which we took to calling the Researcher Experience, or RX) instead of the developer experience (DX) of building real software. this quote has succesfully sued for lifetime land rights inside my head

Garth Goldwater 2020-07-07 23:37:06

also: there are a few references to demo videos that aren’t linked to in these blog posts—where should I look for them?

Peter van Hardenberg 2020-07-07 23:46:46

I've linked the Week 3 demo in as a Loom embed now. (I'll do the other two when I have a minute.)

Garth Goldwater 2020-07-08 01:07:29

ohh man I’m really glad you added that video—the diffs on the schema definitions + the git analogy = really great

Garth Goldwater 2020-07-08 01:07:42

are you using a particular library for the JSON outline UI in the demo?

yoshiki 2020-07-08 04:26:10

Garth Goldwater it looks like react-json-tree to me: https://github.com/alexkuz/react-json-tree

Garth Goldwater 2020-07-08 13:01:57

thanks yoshiki!

Peter van Hardenberg 2020-07-08 16:08:27

That's correct. Geoffrey Litt picked it so you can get color commentary from him

Felix Kohlgrüber 2020-07-08 15:16:27

Hi, I've just published another experiment and would like to hear what you think. It's a simple text editor that uses different fonts depending on the type of syntactic element (e.g. comments use sans-serif, punctuation uses monospace). The experiment is online so that you play around with it and I've even managed to write some text about the experiment this time. Twitter: https://twitter.com/FKohlgrueber/status/1280881676500054016 Github: https://github.com/fkohlgrueber/edix-1 Live Demo: https://fkohlgrueber.github.io/edix-1/

🐦 Felix Kohlgrüber: Monospace or proportional fonts, which are better for coding? I'd say, why not use both? This is my latest experiment and I'd be interested to hear your thoughts. https://github.com/fkohlgrueber/edix-1

nicolas decoster 2020-07-08 16:09:40

Very nice! 🙂 I constantly try different fonts for text code editor, I have tried some proportional because I find monospace not very readable for "meaningful text", i.e. comments, string literal with phrases and also for symbols (especially long ones). But when I plug a proportional in an editor I have a global negative feeling without knowing precisely why. Until now. You are absolutely right: it is because that depends on the part of the code. And punctuations are very important and a proportional font make them too dense. Maybe an other way to achieve the same expressiveness is to use a proportional font everywhere and change the weight and the margin around punctuation characters? Or maybe design a font that provides some specific character for code punctuations?

Garth Goldwater 2020-07-08 16:23:03

damn... this is one of those ideas that’s so good it feels obvious in retrospect and generates a lot of new ideas

Felix Kohlgrüber 2020-07-08 16:23:18

nicolas decoster Thanks for your feedback. Weights / Margins are worth trying, but I could imagine that some characters would still be too thin. Others like @ are too large in most fonts, which you couldn't fix using weight / margins. Designing a font for specific characters would be possible, but one could also just use custom glyphs for certain syntactic elements in the editor. When using a single proportional font, one of the best options is https://input.fontbureau.com/, which also contains a proportional font specifically tuned for writing code (which includes large punctuation characters). You'd still have the problem that these large punctuation characters wouldn't really fit for comment text.

Garth Goldwater 2020-07-08 16:24:05

eg: this immediately made me think of rich text for comments, and of browsing just the comments of well-documented code and then expanding them for the actual code. kind of a play on literate programming

Felix Kohlgrüber 2020-07-08 16:32:37

Garth Goldwater yeah, that's why I've named the project edix-1 , hoping to follow up with other experiments. There are many ways in which the rendering could be improved. I'll probably try to maintain the usual text editing experience as much as possible to avoid the "magic" that makes using WYSIWYG editors unpredictable (e.g. having the cursor between bold and regular characters and not knowing what style inserted characters would get). This includes showing all characters making up the source code.

Garth Goldwater 2020-07-08 16:35:03

oh i just meant for comments—the classic cheat here would be to do markdown

Felix Kohlgrüber 2020-07-08 16:44:23

Garth Goldwater I'd like to implement something similar to https://stackedit.io/app (left panel) for comments. Markdown characters are visible, but the content is still styled (font-variants, sizes, ...). Compare this to slack's WYSIWYG editor where markdown is not shown. foo vs. foo.

Garth Goldwater 2020-07-08 16:52:13

wow—that looks really cool! yeah, i think it’s a good model

nicolas decoster 2020-07-08 17:27:49

Thanks for the pointer to stackedit, I was looking for something like this a few weeks ago fro some Markdown editing. I no more need it now, but it is good to know it exists. Well, until you develop something similar! 🙂

Martin Sosic 2020-07-08 18:03:06

Cool work! I have to admit that at first, I don't see big difference between this and all monospace / same font, but it is exciting to think about being able to have more expressivity in comments, for example Markdown or even more.

Felix Kohlgrüber 2020-07-08 18:11:07

Martin Sosic For me, comments are easier to visually scan when in sans-serif. They're also shorter. But you're right, the difference isn't huge.

Felix Kohlgrüber 2020-07-08 18:17:47

BTW, Intellij IDEA recently added support for rendering comments. Haven't tried it myself yet, but it looks cool: https://www.jetbrains.com/help/idea/working-with-code-documentation.html#toggle-rendered-view

Garth Goldwater 2020-07-08 18:42:37

if a lot of this is about readability, i’d love to see comments side-by-side with code, maybe with a border? even a slack-style > border

Garth Goldwater 2020-07-08 18:43:13

also if line-wrap would just work on the right—ensuring readability by not making the width huge and go off the screen

nicolas decoster 2020-07-08 19:37:13

All that makes me think about something I read some times ago (but can't find it now... anyone?) and I agree 100%. It is that most (all?) of the code highlighting schemes render comments in a "blur" style (light gray etc.). Which is not intuitive for me because the comments are at least as important as the code (even more important if well written) and they need more focus.

Konrad Hinsen 2020-07-09 07:13:18

De-emphasizing markup, punctuation, etc. by changing colors or fonts is quite commonly done in Emacs using its font-lock package. I use this for org-mode markup (much like Markdown), and for dimming parentheses in Lisp code. It makes a big difference in readability.

nicolas decoster 2020-07-09 07:37:14

Cool! I guess I need to give Emacs a new try (it was my editor of choice 13 years ago).

Garth Goldwater 2020-07-09 14:42:31

i think org-mode in particular has a setting for this

Sverrir Thorgeirsson 2020-07-10 08:57:30

my partner is planning to do a study this summer on how text representation and surrounding visual elements affect reading comprehension (for natural languages). have you considered doing some kind of A/B testing on code comprehension for your project? I think that would be very interesting to see

Felix Kohlgrüber 2020-07-10 12:46:39

Sverrir Thorgeirsson Since this is a personal project, I'm not planning to do any A/B testing. I'd suspect that the difference isn't large (it won't make you 2x more productive), but it'd be interesting to see anyways.