Translated english article to portuguese following the english version

fsdude · fsdude · commit a906139f8f01 · 2020-05-25T19:18:51.000-03:00
diff --git a/9-regular-expressions/03-regexp-character-classes/article.md b/9-regular-expressions/03-regexp-character-classes/article.md
@@ -1,265 +1,203 @@
-# Character classes
+# Classes de caracteres
 
-Consider a practical task -- we have a phone number `"+7(903)-123-45-67"`, and we need to turn it into pure numbers: `79035419441`.
+Considere uma tarefa prática - temos um número de telefone como `"+7(903)-123-45-67"` e precisamos transformá-lo em números puros: `79031234567`.
 
-To do so, we can find and remove anything that's not a number. Character classes can help with that.
+Para fazer isso, podemos encontrar e remover qualquer coisa que não seja um número. Classes de personagens podem ajudar com isso.
 
-A character class is a special notation that matches any symbol from a certain set.
+Uma *classe de caracteres* é uma notação especial que corresponde a qualquer símbolo de um determinado conjunto.
 
-For the start, let's explore a "digit" class. It's written as `\d`. We put it in the pattern, that means "any single digit".
+Para começar, vamos explorar a classe "digit". Está escrito como `padrão: \d` e corresponde a "qualquer dígito único".
 
-For instance, the let's find the first digit in the phone number:
+Por exemplo, vamos encontrar o primeiro dígito no número de telefone:
 
 ```js run
 let str = "+7(903)-123-45-67";
 
-let reg = /\d/;
+let regexp = /\d/;
 
-alert( str.match(reg) ); // 7
+alert( str.match(regexp) ); // 7
 ```
 
-Without the flag `g`, the regular expression only looks for the first match, that is the first digit `\d`.
+Sem a flag `padrão:g`, a expressão regular procura apenas a primeira correspondência, que é o primeiro dígito `padrão:\d`.
 
-Let's add the `g` flag to find all digits:
+Vamos adicionar a flag `padrão:g` para encontrar todos os dígitos:
 
 ```js run
 let str = "+7(903)-123-45-67";
 
-let reg = /\d/g;
+let regexp = /\d/g;
 
-alert( str.match(reg) ); // array of matches: 7,9,0,3,1,2,3,4,5,6,7
+alert( str.match(regexp) ); // matriz de correspondências: 7,9,0,3,1,2,3,4,5,6,7
 
-alert( str.match(reg).join('') ); // 79035419441
+// vamos criar o número de telefone apenas com dígitos:
+alert( str.match(regexp).join('') ); // 79031234567
 ```
 
-That was a character class for digits. There are other character classes as well.
+Essa era uma classe de caracteres para dígitos. Existem outras classes de caracteres também.
 
-Most used are:
+As mais usadas são:
 
-`\d` ("d" is from "digit")
-: A digit: a character from `0` to `9`.
+`padrão:\d` ("d" é de "digit")
+: Um dígito: um caractere de `0` a `9`.
 
-`\s` ("s" is from "space")
-: A space symbol: that includes spaces, tabs, newlines.
+`padrão:\s` ("s" é de "space")
+: Um símbolo de espaço: inclui espaços, tabulações `\t`, novas linhas `\n` e alguns outros caracteres raros, como `\v`, `\f` and `\r`.
 
-`\w` ("w" is from "word")
-: A "wordly" character: either a letter of English alphabet or a digit or an underscore. Non-english letters (like cyrillic or hindi) do not belong to `\w`.
+`padrão:\w` ("w" é de "word")
+: Um caractere de texto: uma letra do alfabeto latino ou um dígito ou um sublinhado `_`. Letras não latinas (como cirílico ou hindu) não pertecem ao `padrão:\w`.
 
-For instance, `pattern:\d\s\w` means a "digit" followed by a "space character" followed by a "wordly character", like `"1 a"`.
+Por exemplo, `padrão:\d\s\w` significa um "dígito" seguido de um "caractere de espaço" seguido de um "caractere de texto", como `correspondência:1 a`.
 
-**A regexp may contain both regular symbols and character classes.**
+**Uma regexp pode conter símbolos regulares e classes de caracteres.**
 
-For instance, `pattern:CSS\d` matches a string `match:CSS` with a digit after it:
+Por exemplo, `padrão:CSS\d` corresponde a uma string `correspondência:CSS` com um dígito após:
 
 ```js run
-let str = "CSS4 is cool";
-let reg = /CSS\d/
+let str = "Existe CSS4?";
+let regexp = /CSS\d/
 
-alert( str.match(reg) ); // CSS4
+alert( str.match(regexp) ); // CSS4
 ```
 
-Also we can use many character classes:
+Também podemos usar muitas classes de caracteres:
 
 ```js run
-alert( "I love HTML5!".match(/\s\w\w\w\w\d/) ); // 'HTML5'
+alert( "Eu amo HTML5!".match(/\s\w\w\w\w\d/) ); // ' HTML5'
 ```
 
-The match (each character class corresponds to one result character):
+A correspondência (cada classe de caracteres regexp possui o caractere de resultado correspondente):
 
 ![](love-html5-classes.svg)
 
-## Word boundary: \b
+## Classes inversas
 
-A word boundary `pattern:\b` -- is a special character class.
+Para cada classe de caractere existe uma "classe inversa", denotada com a mesma letra, mas em maiúsculas.
 
-It does not denote a character, but rather a boundary between characters.
+O "inverso" significa que ele corresponde a todos os outros caracteres, por exemplo:
 
-For instance, `pattern:\bJava\b` matches `match:Java` in the string `subject:Hello, Java!`, but not in the script `subject:Hello, JavaScript!`.
+`padrão:\D`
+: Sem dígito: qualquer caractere, exceto `padrão:\d`, por exemplo, uma letra.
 
-```js run
-alert( "Hello, Java!".match(/\bJava\b/) ); // Java
-alert( "Hello, JavaScript!".match(/\bJava\b/) ); // null
-```
-
-The boundary has "zero width" in a sense that usually a character class means a character in the result (like a wordly character or a digit), but not in this case.
-
-The boundary is a test.
-
-When regular expression engine is doing the search, it's moving along the string in an attempt to find the match. At each string position it tries to find the pattern.
-
-When the pattern contains `pattern:\b`, it tests that the position in string is a word boundary, that is one of three variants:
-
-- Immediately before is `\w`, and immediately after -- not `\w`, or vise versa.
-- At string start, and the first string character is `\w`.
-- At string end, and the last string character is `\w`.
-
-For instance, in the string `subject:Hello, Java!` the following positions match `\b`:
-
-![](hello-java-boundaries.svg)
-
-So it matches `pattern:\bHello\b`, because:
-
-1. At the beginning of the string the first `\b` test matches.
-2. Then the word `Hello` matches.
-3. Then `\b` matches, as we're between `o` and a space.
-
-Pattern `pattern:\bJava\b` also matches. But not `pattern:\bHell\b` (because there's no word boundary after `l`) and not `Java!\b` (because the exclamation sign is not a wordly character, so there's no word boundary after it).
+`padrão:\S`
+: Sem espaço: qualquer caractere, exceto `padrão:\s`, por exemplo, uma letra.
 
+`padrão:\W`
+: Caractere não verbal: qualquer coisa, exceto `padrão:\w`, por exemploo uma letra não latina ou um espaço.
 
-```js run
-alert( "Hello, Java!".match(/\bHello\b/) ); // Hello
-alert( "Hello, Java!".match(/\bJava\b/) );  // Java
-alert( "Hello, Java!".match(/\bHell\b/) );  // null (no match)
-alert( "Hello, Java!".match(/\bJava!\b/) ); // null (no match)
-```
-
-Once again let's note that `pattern:\b` makes the searching engine to test for the boundary, so that `pattern:Java\b` finds `match:Java` only when followed by a word boundary, but it does not add a letter to the result.
-
-Usually we use `\b` to find standalone English words. So that if we want `"Java"` language then `pattern:\bJava\b` finds exactly a standalone word and ignores it when it's a part of `"JavaScript"`.
-
-Another example: a regexp `pattern:\b\d\d\b` looks for standalone two-digit numbers. In other words, it requires that before and after `pattern:\d\d` must be a symbol different from `\w` (or beginning/end of the string).
-
-```js run
-alert( "1 23 456 78".match(/\b\d\d\b/g) ); // 23,78
-```
-
-```warn header="Word boundary doesn't work for non-English alphabets"
-The word boundary check `\b` tests for a boundary between `\w` and something else. But `\w` means an English letter (or a digit or an underscore), so the test won't work for other characters (like cyrillic or hieroglyphs).
-```
-
-
-## Inverse classes
-
-For every character class there exists an "inverse class", denoted with the same letter, but uppercased.
-
-The "reverse" means that it matches all other characters, for instance:
-
-`\D`
-: Non-digit: any character except `\d`, for instance a letter.
-
-`\S`
-: Non-space: any character except `\s`, for instance a letter.
-
-`\W`
-: Non-wordly character: anything but `\w`.
-
-`\B`
-: Non-boundary: a test reverse to `\b`.
-
-In the beginning of the chapter we saw how to get all digits from the phone `subject:+7(903)-123-45-67`.
-
-One way was to match all digits and join them:
+No início do capítulo, vimos como criar um número de telefone somente para números a partir de uma string como `subject:+7(903)-123-45-67`: encontre todos os dígitos e junte-se a eles.
 
 ```js run
 let str = "+7(903)-123-45-67";
 
 alert( str.match(/\d/g).join('') ); // 79031234567
 ```
 
-An alternative, shorter way is to find non-digits `\D` and remove them from the string:
-
+Uma maneira alternativa e mais curta é encontrar um `padrão:\D` não-dígito e removê-lo da string:
 
 ```js run
 let str = "+7(903)-123-45-67";
 
 alert( str.replace(/\D/g, "") ); // 79031234567
 ```
 
-## Spaces are regular characters
+## Um ponto é "qualquer caractere"
 
-Usually we pay little attention to spaces. For us strings `subject:1-5` and `subject:1 - 5` are nearly identical.
+Um ponto `padrão:.` é uma classe de caractere especial que corresponde a "qualquer caractere, exceto uma nova linha".
 
-But if a regexp doesn't take spaces into account, it may fail to work.
-
-Let's try to find digits separated by a dash:
+Por exemplo:
 
 ```js run
-alert( "1 - 5".match(/\d-\d/) ); // null, no match!
+alert( "Z".match(/./) ); // Z
 ```
 
-Here we fix it by adding spaces into the regexp `pattern:\d - \d`:
+Ou no meio de uma regexp:
 
 ```js run
-alert( "1 - 5".match(/\d - \d/) ); // 1 - 5, now it works
-```
+let regexp = /CS.4/;
 
-**A space is a character. Equal in importance with any other character.**
+alert( "CSS4".match(regexp) ); // CSS4
+alert( "CS-4".match(regexp) ); // CS-4
+alert( "CS 4".match(regexp) ); // CS 4 (o espaço é também um caractere)
+```
 
-Of course, spaces in a regexp are needed only if we look for them. Extra spaces (just like any other extra characters) may prevent a match:
+Observe que um ponto significa "qualquer caractere", mas não a "ausência de um caractere". Deve haver um caractere para corresponder a ele:
 
 ```js run
-alert( "1-5".match(/\d - \d/) ); // null, because the string 1-5 has no spaces
+alert( "CS4".match(/CS.4/) ); // null, sem correspondência porque não há caractere para o ponto
 ```
 
-In other words, in a regular expression all characters matter, spaces too.
-
-## A dot is any character
+### Ponto como literalmente qualquer caractere com a flag "s"
 
-The dot `"."` is a special character class that matches "any character except a newline".
+Por padrão, um ponto não corresponde ao caractere de nova linha `\n`.
 
-For instance:
+Por exemplo, a regexp `padrão:A.B` corresponde `corresponde:A` e, em seguida, `corresponde:B` com qualquer caractere entre eles, exceto uma nova linha `\n`:
 
 ```js run
-alert( "Z".match(/./) ); // Z
+alert( "A\nB".match(/A.B/) ); // null (sem correspondência)
 ```
 
-Or in the middle of a regexp:
+Há muitas situações em que gostaríamos que um ponto significasse literalmente "qualquer caractere", incluindo a nova linha.
 
-```js run
-let reg = /CS.4/;
+É o que flag `padrão:s` faz. Se uma regexp possui, então um ponto `padrão:.` corresponde literalmente a qualquer caractere:
 
-alert( "CSS4".match(reg) ); // CSS4
-alert( "CS-4".match(reg) ); // CS-4
-alert( "CS 4".match(reg) ); // CS 4 (space is also a character)
+```js run
+alert( "A\nB".match(/A.B/s) ); // A\nB (correspondência!)
 ```
 
-Please note that the dot means "any character", but not the "absense of a character". There must be a character to match it:
+````warn header="Não suportado no Firefox, IE, Edge"
+Verifique <https://caniuse.com/#search=dotall> para obter o estado de suporte mais recente. No momento da redação deste documento, não inclui o Firefox, IE, Edge.
+
+Felizmente, há uma alternativa, que funciona em qualquer lugar. Podemos usar uma regexp como `padrão:[\s\S]` para corresponder a "qualquer caractere".
 
 ```js run
-alert( "CS4".match(/CS.4/) ); // null, no match because there's no character for the dot
+alert( "A\nB".match(/A[\s\S]B/) ); // A\nB (match!)
 ```
 
-### The dotall "s" flag
+O padrão `padrão:[\s\S]` diz literalmente: "um caractere de espaço OU não um caractere de espaço". Em outras palavras, "qualquer coisa". Poderíamos usar outro par de classes complementares, como `padrão:[\d\D]`, que não importa. Ou mesmo o padrão `padrão:[^]` - pois significa corresponder a qualquer caractere, exceto nada.
 
-Usually a dot doesn't match a newline character.
+Também podemos usar esse truque se quisermos os dois tipos de "pontos" no mesmo padrão: o ponto real `padrão:.` comportando-se da maneira regular ("não incluindo uma nova linha") e também uma maneira de combinar "qualquer caractere" com `padrão:[\s\S]` ou similar.
+````
 
-For instance, this doesn't match:
+````warn header="Preste atenção nos espaços"
+Geralmente prestamos pouca atenção aos espaços. Para nós, as strings `sujeito:1-5` e `sujeito:1 - 5` são quase idênticas.
 
-```js run
-alert( "A\nB".match(/A.B/) ); // null (no match)
+Mas se uma regexp não leva em consideração os espaços, ela pode falhar.
 
-// a space character would match
-// or a letter, but not \n
-```
+Vamos tentar encontrar dígitos separados por um hífen:
 
-Sometimes it's inconvenient, we really want "any character", newline included.
+```js run
+alert( "1 - 5".match(/\d-\d/) ); // null, sem correspondência!
+```
 
-That's what `s` flag does. If a regexp has it, then the dot `"."` match literally any character:
+Vamos corrigi-lo adicionando espaços ao padrão regexp `padrão:\d - \d`:
 
 ```js run
-alert( "A\nB".match(/A.B/s) ); // A\nB (match!)
+alert( "1 - 5".match(/\d - \d/) ); // 1 - 5, agora funciona
+// ou podemos usar a classe \s:
+alert( "1 - 5".match(/\d\s-\s\d/) ); // 1 - 5, também funciona
 ```
 
+**Um espaço é um caractere. Igual em importância com qualquer outro caractere.**
+
+Não podemos adicionar ou remover espaços de uma expressão regular e esperamos funcionar da mesma maneira.
 
-## Summary
+Em outras palavras, em uma expressão regular, todos os caracteres são importantes, espaços também.
+````
 
-There exist following character classes:
+## Resumo
 
-- `pattern:\d` -- digits.
-- `pattern:\D` -- non-digits.
-- `pattern:\s` -- space symbols, tabs, newlines.
-- `pattern:\S` -- all but `pattern:\s`.
-- `pattern:\w` -- English letters, digits, underscore `'_'`.
-- `pattern:\W` -- all but `pattern:\w`.
-- `pattern:.` -- any character if with the regexp `'s'` flag, otherwise any except a newline.
+Existem as seguintes classes de caracteres:
 
-...But that's not all!
+- `padrão:\d` - dígitos.
+- `padrão:\D` - sem dígitos.
+- `padrão:\s` - símbolos de espaço, tabulações, novas linhas.
+- `padrão:\S` - todos, exceto `padrão:\s`.
+- `padrão:\w` - Letras latinas, dígitos, sublinhado `'_'`.
+- `padrão:\W` - todos, exceto `padrão:\w`.
+- `padrão:.` - qualquer caractere se estiver com a flag regexp `'s' `; caso contrário, qualquer um, exceto uma nova linha `\n`.
 
-Modern JavaScript also allows to look for characters by their Unicode properties, for instance:
+...Mas isso não é tudo!
 
-- A cyrillic letter is: `pattern:\p{Script=Cyrillic}` or `pattern:\p{sc=Cyrillic}`.
-- A dash (be it a small hyphen `-` or a long dash `—`): `pattern:\p{Dash_Punctuation}` or `pattern:\p{pd}`.
-- A currency symbol: `pattern:\p{Currency_Symbol}` or `pattern:\p{sc}`.
-- ...And much more. Unicode has a lot of character categories that we can select from.
+A codificação unicode, usada pelo JavaScript para strings, fornece muitas propriedades para caracteres, como: a qual idioma a letra pertence (se é uma letra), é um sinal de pontuação etc.
 
-These patterns require `'u'` regexp flag to work. More about that in the chapter [](info:regexp-unicode).
+Também podemos pesquisar por essas propriedades. Isso requer a flag `padrão:u`, abordada no próximo artigo.