[Meta] Small forum change in auto-linkingPhilipp Lenssen | Thursday, March 6, 2008 16 years ago • 5,194 views |
There was a problem in this forum with how URLs auto-linked. As of now, you don't need to always include a blank after the URL anymore, e.g. you can use brackets immediately before and after, writing (http://example.com). Sometimes the blank still needs to be used e.g. when writing a dot, as otherwise the auto-linker thinks the dot is part of the URL... |
Roger Browne | 16 years ago # |
Good move. Brackets can of course be part of a valid URL, so let's see what happens with this:
(http://www.example.com/foo(bar)) |
Roger Browne | 16 years ago # |
Seems it didn't work. |
Philipp Lenssen | 16 years ago # |
Good point Roger. Until I find something smarter, I just did a quick hack so that brackets will be handled old-style if there seems to be Wikipedia links anywhere within the comment... because isn't Wikipedia one of the biggest use-cases for brackets in URLs?
http://en.wikipedia.org/foo(bar) |
Philipp Lenssen | 16 years ago # |
http://example.org/foo(bar) |
Philipp Lenssen | 16 years ago # |
PS: Does anyone know a good solid auto-linking regular expression dealing with all these cases? |
Tony Ruscoe | 16 years ago # |
Technically, the brackets should be URL encoded anyway, so I don't think this is a problem. (Same goes for spaces – they're valid characters in a URL but they usually get URL encoded to %20.)
e.g. http://en.wikipedia.org/wiki/Foo%28bar%29 |
Philipp Lenssen | 16 years ago # |
(Hmm, anyone got a link to an actual Wikipedia article using brackets?) |
David Mulder | 16 years ago # |
http://en.wikipedia.org/wiki/Hindenburg_%28Mangaka%29 |
Philipp Lenssen | 16 years ago # |
(Thanks David. Looks like while you pasted it with encoded brackets, it shows up with normal brackets in e.g. Google, so I better keep excluding Wikipedia from the new linking mechanism.) |
Roger Browne | 16 years ago # |
If you click on David's link with Firefox, the address bar shows the brackets URL-encoded. If you click on the same link with Konqueror, the address bar shows the brackets in plaintext.
So brackets are inevitably going to be cut-and-pasted into posts. |
Ionut Alex. Chitu | 16 years ago # |
I think a better idea is to keep the brackets in the URL. Most of the issues are related to brackets that are closed after a URL. So you should detect:
"(like this page: http://something.com/) "
) should not be a part of the URL.
Nobody will write: " http://something.com/(great site, actually)." |
Ramibotros | 16 years ago # |
I dunnow but this might help: http://www.truerwords.net/articles/ut/urlactivation.html |
Tony Ruscoe | 16 years ago # |
> So brackets are inevitably going to be cut-and-pasted into posts.
If we're using those rules, does Konqueror also maintain spaces in URLs? If so, you'd never be able to create an accurate parser for auto-linking those, unless you use some kind of markup.
You could try to implement this like Microsoft. e.g. autolink obvious URLs but allow people to write longer URLs containing spaces, brackets, etc. in angle brackets. For example:
http:// example.com/foo (bar) links only: http:// example.com/foo
but
<http:// example.com/foo (bar)> links: http:// example.com/foo (bar) |
Philipp Lenssen | 16 years ago # |
Well, I think the algorithm should be something like "the URL ends if there's a ')' unless there was a '(' before within the URL". Would have to figure out how to do that with a regular expression, or just parse/ convert it myself using non-regex code. (Currently the regular expression loads a function which also handles stuff like YouTube auto-embedding, picture embedding, adding the top-arrow for internal-thread references and so on.) But it seems the currently active solution might work on most cases and only perhaps require a once-in-a-blue-moon editing from a moderator... |
Philipp Lenssen | 16 years ago # |
Update: The code now doesn't check against Wikipedia when disabling the new auto-linking mechanism, but check against occurrence of "_(" anywhere in the comment. This should cause even less troubles (though it still requires moderation of some rarer cases, until a better solution is found...). |
Philipp Lenssen | 16 years ago # |
Test 1: Hello (http://en.wikipedia.org/foo) world. |
Philipp Lenssen | 16 years ago # |
Test 2: Hello http://en.wikipedia.org/foo_(bar) world. |
Ionut Alex. Chitu | 16 years ago # |
Very interesting (although I don't know who would post http://en.wikipedia.org/foo_(bar)) . |
Tony Ruscoe | 16 years ago # |
Heh. I think we should see how it goes now. It's probably going to catch 99% of links, so it should save us quite a bit of time.
My tests:
This is http://example.com This is http://example.com. (This is http://example.com) (This is http://example.com.) (This is http://example.com). |
Tony Ruscoe | 16 years ago # |
Hmm. I think it's just as important that any full-stop / period at the end of the URL with white-space following it shouldn't get included in the link. |
Roger Browne | 16 years ago # |
Are there any real URLs with the full-stop/period as the last character? |
Tony Ruscoe | 16 years ago # |
I don't think so. A domain / IP obviously can't end in a full-stop and you can't have a file name ending in a full-stop (under Windows, at least). |
Tony Ruscoe | 16 years ago # |
(Weird. It seems the regular expression just added a space before my closing bracket. And it will probably do the same here...) |
Haochi | 16 years ago # |
http://ihaochi.com/files/auto-link-url-temp.php |
Motti | 16 years ago # |
Tony: A domain can end in a full stop as technically all domains end with the zero-th-level domain "." (I forget what the technical term is) and is used (e.g.) in DNS records. If you type (say) microsoft.com. (with the final full-stop/period) in most browsers it will work.
Can anyone find a regular URL ending with a period/full-stop (the filetype: operator in google doesn't help here obviously) or, even better, figure out a general method to find URLs ending with a "."? |
Tony Ruscoe | 16 years ago # |
Motti, exactly. Although the full-stop is used in DNS records, they're not generally used in links. We're not talking about theory; we're talking about practice. You don't need it and it shouldn't really be linked, although it does work in most browsers, just like you say. |
David Mulder | 16 years ago # |
Just had to try the new system... ((test)) http://www.foo.com/te_(test) (((test)) http://www.foo.com/te_(test) (((test)) http://www.foo.com/te_(test))
|
Motti | 16 years ago # |
A comma just got included in a URL in my post here: http://blogoscoped.com/forum/125486.html#id125756 |
Philipp Lenssen | 16 years ago # |
Well, first of all – the current linkifier does not fully work with handling the brackets right if you include several tests in a single comment, because it does some comment-wide checks. Second, there is apparently some bug in it right now which puts a blank before brackets, which I need to fix ASAP :)
If we talk about solutions, along the lines of what Tony says I'm mostly interested in a 99.9% working thing. If there is 1 in 100,000 URLs ending in a dot, it's not as important as when 1 out of 50 comments would include a URL ending in a dot-the-sentence-ending-kind. Similar for brackets, it seems there's rarely any URL ending in a bracket which does not contain an opening bracket in it as well, so I think we can safely disregard this if we go for a 99.9% perfect solution. (Besides, there's still us moderators for those one in a blue moon URL turning out wrong if the algo fails.)
Will look into Haochi's regex, looks interesting. Ideally we need something that handles comma, question marks, exclamation marks etc. in the most "pragmatic" sense (not necessarily the most correct...). |
George R | 16 years ago # |
If we could see a problem before it is actually posted, then we could adjust for it.
Could Philipp provide a preview button where we could enter some text without actually posting it, then return a page of how it would appear if it were actually posted. Convert urls to links and images. Show any other formatting or transformations.
Checking for the validity of url's and spell checking would be nice also, but that seems like unnecessary work.
|