I was under the impression that Sublime provided this ability as well. When I found out that it wasn't, I had the idea of using regular expressions. Even though regex are usually considered inappropriate for parsing XML/HTML, I found this approach to be acceptable in this case. Sublime is also said to be highly customizable by plugins, so I think this would be a way.
Sublime Plugins
To be honest, I could have thought of tidy
or at least suspect that there must be plugins out there dealing with your issue. Instead I ended up writing my first sublime plugin. I have only tested it with your input and expected output, which it satisfied, but it is most certainly far from working reliably. However, I post it here to share what I've learned and it's still an answer to the problem.
Opening a new buffer (Ctrl+n) and choosing the 'New Plugin...' entry in the Menu 'Tools' generously generates a little 'Hello World!' example plugin (as a Python module), which gives a great template for implementing a sublime_plugin.TextCommand
subclass. A TextCommand provides access to an active buffer/currently open file. Like its relatives WindowCommand
and ApplicationCommand
, it is required to overwrite a run-method.
The official API Reference suggests learning by reading the example sources distributed with the Sublime builds and located in Packages/Default
relative to the Sublime config path. Further examples can be found on the website. There's more on the internet.
Processing selected text
To get to a solution for your issue, we primarily need access to a View
object which represents an active text buffer. Fortunately, the TextCommand
subclass we are about to implement has one, and we can conveniently ask it for the currently selected regions and their selection contents, process selected text conforming our needs and replace the selected text with our preference afterwards.
To sum up the string operations: There are four regular expressions, each of which matches one of the element classes <start-tag>
, <empty-tag/>
, </close-tag>
and text-node
. Assuming that all of our markup text is covered by these, we did each line in selection into matching substrings. These are then realigned one-per-line. Having done this, we apply simple indentation by remembering to indent every line whose predecessor contains a start tag. Lines containing end tags are unindented immediately.
Using the group addressing features of Python regex, we can determine the indentation of every line and align the next one accordingly. This, with no further ado, will result in internally consistent indented markup, but with no consideration of the lines outside the selection. By extending the selection to an enclosing element, or at least complying with the indentation levels of the adjacent lines, one could easily improve the results. Its always possible to make use of the default commands.
Another thing to take care of is binding keys to the plugin command and contributing menu entries. It is probably possible somehow, and the default .sublime-menu
and .sublime-commands
files in Packages/Default
at least give an idea. Anyway, here's some code. It has to be saved under Packages/User/whatever.py
and can be called from the Sublime Python Console (Ctrl+`
) like this: view.run_command('guess_indentation')
.
Code
import sublime
import sublime_plugin
import re
class GuessIndentationCommand(sublime_plugin.TextCommand):
def run(self, edit):
view = self.view
#view.begin_edit()
# patterns
start_tag = '<\w+(?:\s+[^>\/]+)*\s*>' # tag_start
node_patterns = [start_tag,
start_tag[:-1]+'\/\s*>', # tag_empty
'<\/\s?\w+\s?>', # tag_close
'[^>\s][^<>]*[^<\s]'] # text_node
patterns = '(?:{0})'.format('|'.join(node_patterns))
indentors = re.compile('[ \t]*({0})'.format('|'.join(node_patterns[:1])))
unindentors=re.compile('[ \t]*({0})'.format(node_patterns[2]))
# process selected text
for region in view.sel():
# if selection contains text:
if not region.empty():
selection = view.substr(region)
expanded = []
# divide selected lines into XML elements, if it contains more than one
for line in selection.split('\n'):
elements = re.findall(patterns, line)
if len(elements)>0:
expanded += elements
else:
expanded.append(line)
# indent output
indent=0
indented = []
for line in expanded:
match = unindentors.match(line)
if match:
indent = max(0, indent-1)
# append line to output, unindented if closing tag
indented.append('\t'*indent+line)
if match:
continue
# test for possible indentation candidate
# indentation applies to the NEXT line
match = indentors.match(line)
if match:
indent+=1
# replace selection with aligned output
view.replace(edit, region, '\n'.join(indented))