Are you ready to switch to HTML parsing permanently?

83.3% 65
1.3% 1
15.4% 12

78 Datum 2010-04-14 00:49

Forums / Cotonti / Development / Poll: A global switch to HTML parsing

12345>>>

Are you ready?

Trustmaster
#1 2010-04-14 00:49
We started to discuss it in HTML parsing vs. BBcode topic and now I would like to know if our developers and community are ready for a global move from BBcode to HTML parsing.

Here are some positive and negative sides of such a decision:
Pros:
  1. Use of popular WYSIWYG editors
  2. Limitless layout tweaking abilities (for users with enough permissions)
  3. Better typography support
  4. No more need to invent bbcodes and edit nasty regexps
  5. No need to call parsers on output, just output the data as it is
  6. Smaller database size: only html version of strings is stored
Cons:
  1. HTML purifier is required to make sure HTML is neat and secure
  2. A migration script will be needed to accurately covert existing pages
  3. Some existing plugins depend on bbcode parsing
  4. We are so used to our lovely bbcodes :P

Now I'd like to know your opinion. Also, some recommendations on HTML editor choice are required as it may be the next poll.
May the Source be with you!

Dit bericht is bewerkt door Trustmaster (2010-04-14 23:36, 14 jaren ago)
GHengeveld
#2 2010-04-14 01:32
I would have voted for 'yes, whatever the cost' if that were an option.

One of the major downsides of Cotonti is the lack of easy html pages support. Most of my clients don't want to work with bbcodes. Thats something I can understand, since bbcodes is a pretty nerdy way of doing things, and certainly old fashioned.

Migration should be painless, since we'd only have to parse existing pages in the database with the current bbcode parser. From then we can continue in html.

My editor of choice is TinyMCE since it has great features for editing tables (something clients often want to use). I've integrated it into Cotonti once but it sometimes failed to load properly.
Alex300
#3 2010-04-14 02:11
The webmasters should have a choice between parsers.

Installation and usage of the WYSIWYG-editor for pages solves now with plug-in CKEditor. Probably it makes sense to make a plug-in to use TinyMCE.

For safety rise, it is necessary to include in them HTML purifier. It will allow to give possibility to create HTML pages to a wide range of users without damage to safety.

Comments and forum should use BB-code parser. It is the standard practice and users who communicates at different forums perceive it as self-evident. And safety so above.

Есть миры, не здесь, там, где небеса горят, и моря засыпают, и реки дремлют; люди сделаны из дыма, а города – из песен. Где-то опасность, где-то несправедливость, даже где-то остыл чай. Идем Эйс, у нас много работы!...
...Sorry for my english...
Бесплатные расширения для Cotonti: https://lily-software.com/free-scripts/
Kort
#4 2010-04-14 02:22
The benefits are obvious. BBCodes is should be discarded in favor of HTML no matter how painful it is.
SED.by - создание сайтов, разработка плагинов и тем для Котонти
pieter
#5 2010-04-14 02:37
I agree, but a painless transistion is needed. Because redoing all pages is not an option for a lot of sites.
... can we help you ...
GHengeveld
#6 2010-04-14 04:03
# Alex300 : The webmasters should have a choice between parsers.

Comments and forum should use BB-code parser. It is the standard practice and users who communicates at different forums perceive it as self-evident. And safety so above.
Agreed, forums can stay BBcode-style in my opinion, so can comments (although disabling styling in comments completely is preferable). It's pages we're talking about.
It's probably best to use an editor with support for both HTML and BBcodes, to keep the same style between editors. Of course this would be up to the admin to choose between Markitup or another editor for forums.
Trustmaster
#7 2010-04-14 04:11
I'm not talking of just pages alone, because HTML parser mode has existed there for ages and is being improved for better support. I also want you to consider the possibility of getting rid of complexities of multi-parser environments (such as having to parse bbcodes page subtitles, forum subtitles, etc.)
May the Source be with you!
GHengeveld
#8 2010-04-14 04:18
True, but then we'd need a way to limit one's options to using specific HTML tags. In pages one can use basically all HTML, but in forums I'd limit it to just font styling, lists and things like that. I wouldn't want my forum users posting elaborate tables, image maps and that sort of stuff, or even use css classes or IDs.
This isn't simply solved by changing the editor options, since that's a client-side solution and can be bypassed.
donP
#9 2010-04-14 20:18
1. I voted for HTML. Transition IS painless: a simple script (using the actual BBcode parser, as Koradhil said) would take care of that.

2. I vote for CKeditor (Koradhil: it also has good table support like tinyMCE) 'cause we can find free plugins to manage files and images (vs tinyMCE that proposes commercial plugins for this work) and it's more complete in language localization (for example, there isn't a finished Italian localization in tinyMCE).

3. I'm also in the idea of having a unique parser everywhere: HTML, and no parsing at all for comments but simple text for comments.
We can study well the integration and configuration of HTMLPurifier with distinguished config-files called for pages or for forum areas (I also think forums are the best place where hackers can try to make their jobs).
In tinyMCE documentation wiki there are some methods to secure from XSS attack: for example, creating textareas with JavaScript itself, so nobody can disable Javascript in his own browser trying to bypass security restrictions in editor configuration.
Read it here:
http://wiki.moxiecode.com/index.php/TinyMCE:Security#DOM-Compliant_Method

4. I would add a PROS for HTML parsing vs BBcode: No more huge database (cause we'll never have to store page_text and page_html or fp_text and fp_html, but only a field containing the actual HTML content).
in [color=#729FCF][b]BLUES[/b][/color] I trust
GHengeveld
#10 2010-04-14 21:00
donP:
In tinyMCE documentation wiki there are some methods to secure from XSS attack: for example, creating textareas with JavaScript itself, so nobody can disable Javascript in his own browser trying to bypass security restrictions in editor configuration.
You can still send a manual POST containing malicious HTML (through a custom HTTP request), so Javascript can still be bypassed.
TinyMCE Documentation:
You have a pretty secure installation of TinyMCE. Unfortunately, all of this can be bypassed. Therefore, you need to create a secure backend, in our case, we are using PHP. Your destination script should filter out all the same baddies that TinyMCE does. This is duplication of effort, but it is needed.

donP:
I vote for CKeditor (Koradhil: it also has good table support like tinyMCE)
As far as I know, there is no option to insert new rows or columns in the middle of an existing table (or even add rows at the end) in FCK / CKeditor.
donP
#11 2010-04-14 21:35
Koradhil:
You can still send a manual POST containing malicious HTML (through a custom HTTP request), so Javascript can still be bypassed.
But if anybody tries disabling Javascript to edit directly HTML code NO TEXTAREA AT ALL would appear with that method, no? Or I'm misunderstanding? :/
So, how can anybody try to send HTML code to database if there's no a textarea?
Koradhil:
As far as I know, there is no option to insert new rows or columns in the middle of an existing table (or even add rows at the end) in FCK / CKeditor.
You're wrong, try here yourself:
http://ckeditor.com/demo
Every part of table, rows, cells is configurable clicking with the right button of your mouse. You can add, delete, merge, separate, change colors etc of rows, columns, cells...
in [color=#729FCF][b]BLUES[/b][/color] I trust
Trustmaster
#12 2010-04-14 23:35
# donP : But if anybody tries disabling Javascript to edit directly HTML code NO TEXTAREA AT ALL would appear with that method, no? Or I'm misunderstanding? :/
So, how can anybody try to send HTML code to database if there's no a textarea?
Koradhil means that an experienced hacker would make a special formed HTML page himself to submit unfiltered POST data, so server-side filtering with HTML-purifier is still required.
May the Source be with you!
GHengeveld
#13 2010-04-15 07:10
Even better: write HTTP requests manually. Such a thing is possible with RESTTest. You don't need any HTML at all in order to send data to the server.

I wasn't aware of the right-click menu (didn't know it existed), thanks for that.
donP
#14 2010-04-15 16:54
# Trustmaster : Koradhil means that an experienced hacker would make a special formed HTML page himself to submit unfiltered POST data, so server-side filtering with HTML-purifier is still required.
So we have to filter ALL contents? I was hoping we only had to filter pages/forums fields when submitting them, to speed-up HTMLPurifier process calling it only at submitting moment, not to filter all HTMLoutput content at displaying moment... :/
Why we couldn't make a security gate prohibiting the inclusion of HTML code except through Cotonti core files (from a regular logged user passing through HTMLPurifier)?

Added 13 hours 7 minutes later:

I think we would put this topic sticky and send a massPM or mail newsetter to reach all Cotonti users and asking them about this important argument...
in [color=#729FCF][b]BLUES[/b][/color] I trust

Dit bericht is bewerkt door donP (2010-04-16 06:05, 14 jaren ago)
tensh
#15 2010-04-16 16:53
DonP, just consider that for a hacker your client-side filtering is naked. Only server side filtering is the most reliable. A hacker can submit whatever he wants to a form.

As for Bbcodes for forums, why not use html syntax in the FCKeditor buttons?
Now it's:
[b] [/b]
But it can be anyways:
<b> </b>

The rest of html syntax would be forbidden and removed by some internal settings.

Hmm, but for things like "code", "quote" display, ... it would be some kind of mix of html and the rest, normally parsing html but also awared of some special non-html bbcodes?

I saw once a mix of bbcode and html, but have no idea what was the coding background of that.

12345>>>